Knowledge Assisted Metamodeling and …summit.sfu.ca/system/files/iritems1/19476/etd20516.pdfboth...

Knowledge Assisted Metamodeling and Optimization

Method for Large-Scale Engineering Design

by

Di Wu

M.Sc., Beijing Institute of Technology, 2015

B.Sc., Beijing Institute of Technology, 2013

Thesis Submitted in Partial Fulfillment of the

Requirements for the Degree of

Doctor of Philosophy

in the

School of Mechatronic System Engineering

Faculty of Applied Sciences

© Di Wu 2019

SIMON FRASER UNIVERSITY

Summer 2019

Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation.

ii

Approval

Name:

Degree:

Di Wu

Doctor of Philosophy

Knowledge-assisted Metamodeling and Optimization Method for Large-Scale Engineering Design

Chair: Mohammad Narimani

Lecturer

Title:

Examining Committee:

G. Gary WangSenior SupervisorProfessor

Krishna Vijayaraghavan Supervisor Associate Professor

Siamak Arzanpour Supervisor Associate Professor

Woo Soo Kim Internal Examiner Associate Professor School of Mechatronic System Engineering

Carolyn Conner Seepersad External Examiner Professor Department of Mechanical Engineering The University of Texas at Austin

Date Defended/Approved: Aug. 22, 2019

iii

Abstract

Simulation-based design optimization methods commonly treat simulation as a black-

box function. An approximation model of the simulation, called metamodel, is often built

and used in optimization. However, modeling and searching in an unknown design

space lead to high computational cost. To further improve the efficiency of optimization,

knowledge of design problems needs to be involved in assisting metamodeling and

optimization. This work endeavors to systematically incorporating knowledge for this

purpose. After extensive review, two types of knowledge, sensitivity information and

causal relations, are employed in solving large scale engineering design problems.

Instead of constructing a complete metamodel, a Partial Metamodel-based Optimization

(PMO) method is developed to reduce the number of samples for optimizing large-scale

problems, using Radial Basis Function-High Dimensional Model Representation (RBF-

HDMR) along with a moving cut-center strategy. Sensitivity information is used to

selectively model component functions in a partial metamodel. The cut center of a

HDMR model moves to the current optimum at each iteration to pursue the optimum.

Numerical tests and an airfoil design case show that the PMO method can lead to better

optimal results when the samples are scarce.

Causal graphs capture relational knowledge among design variables and outcomes. By

constructing and performing qualitative analysis on a causal graph, variables without

contradiction can be found, whose values can be determined without resorting to

optimization. The design problem can thus be divided into two sub-problems based on

impact of variables. This dimension reduction and decomposition strategy is applied to a

power converter design and an aircraft concept design problem with significantly

improved efficiency.

Combing the structure of Artificial Neural Networks (ANNs) with causal graphs, a causal-

ANN is developed to improve the accuracy of metamodels by involving knowledge. The

structure of causal graphs is employed to decompose an ANN into sub-networks.

Additionally, leveraging the structure of causal-ANN and theory of Bayesian Networks,

the attractive variable subspaces can be identified without additional simulation. Finally,

iv

the causal-ANN is applied in a residential energy consumption forecasting problem and

both the modeling accuracy and efficiency are improved.

This work systematically and methodically models and captures knowledge and brings

knowledge in metamodeling and optimization. Sensitivities and causal relations have

been incorporated in optimization strategies that have been successfully applied to

various engineering design problems. Further research can be extended to studies on

how to incorporate other types of knowledge to assist metamodeling and optimization.

Keywords: Knowledge, Causal graph, Sensitivity analysis, Dimension reduction,

Metamodeling, Optimization

v

Dedication

To my loving parents,

for their endless support and sacrifice.

vi

Acknowledgments

I would like to first thank my senior supervisor, Dr. G. Gary Wang, for his

supervising and support during the four years. I learned how to be a good student in my

previous studies, but I learned how to be an independent researcher from Dr. Wang. I

am very grateful for the plenty of opportunities that he provided to me. It has been my

greatest honor having had Dr. Wang as my senior supervisor.

I would like to thank Dr. Krishna Vijayaraghavan and Dr. Siamak Arzanpour, the

members of my Ph.D. supervisory committee, for providing comments that helped me to

improve this work. I would also like to thank Dr. Woo Soo Kim and Dr. Carolyn Conner

Seepersad for evaluating my work as examiners.

I would like to acknowledge Dr. Eric Coatanea, for the collaborative research

opportunity on the dimension reduction method, and for sharing his knowledge in system

engineering with us. I would also like to acknowledge Mr. Hootan Jarollahi, for his

support in the residential load forecasting model, and for his help when I was working

together with him.

Last but not least, I would like to thank my friends in SFU who have made me my

life enjoyable in a foreign country.

vii

Table of Contents

Approval .......................................................................................................................... ii

Abstract .......................................................................................................................... iii

Dedication ....................................................................................................................... v

Acknowledgments .......................................................................................................... vi

Table of Contents .......................................................................................................... vii

List of Tables ................................................................................................................... x

List of Figures................................................................................................................ xii

List of Acronyms ............................................................................................................ xiii

Chapter 1. Introduction .............................................................................................. 1

1.1. Motivation .............................................................................................................. 1

1.2. Objectives of the research ..................................................................................... 2

1.3. Structure of Dissertation ........................................................................................ 3

Chapter 2. Literature review ...................................................................................... 5

2.1. Concept of knowledge ........................................................................................... 5

2.1.1. Knowledge in Artificial Intelligence ................................................................. 6

2.1.2. Knowledge in product design ......................................................................... 8

2.1.3. Summary remarks ....................................................................................... 11

2.2. Existing Applications of knowledge in design optimization ................................... 12

2.2.1. Symbolic knowledge .................................................................................... 12

2.2.2. Linguistic Knowledge ................................................................................... 14

2.2.3. Virtual knowledge ........................................................................................ 15

2.2.4. Algorithmic knowledge ................................................................................. 15

2.2.5. Summary remarks ....................................................................................... 16

2.3. Potential applications of knowledge ..................................................................... 17

2.3.1. Problem formulation..................................................................................... 18

2.3.2. Dimension reduction .................................................................................... 19

2.3.3. Decomposition ............................................................................................. 21

2.3.4. Metamodeling .............................................................................................. 22

2.3.5. Optimization strategy ................................................................................... 23

2.3.6. Optimization, machine learning, and knowledge .......................................... 24

2.3.7. Summary remarks ....................................................................................... 26

2.4. Review of RBF-HDMR ......................................................................................... 26

2.5. Artificial neural network architecture .................................................................... 29

2.6. Bayesian network and causal graph .................................................................... 30

2.7. Summary ............................................................................................................. 31

Chapter 3. Partial metamodel-based optimization (PMO) method ........................ 32

3.1. Algorithm description ........................................................................................... 32

3.2. Example of PMO ................................................................................................. 36

viii

3.3. Properties of PMO ............................................................................................... 38

3.4. Testing of PMO ................................................................................................... 39

3.5. Trust Region based PMO .................................................................................... 45

3.6. Application to Airfoil Design ................................................................................. 49

3.7. Summary ............................................................................................................. 52

Chapter 4. Dimension reduction method employing causal relations ................. 53

4.1. Dimension reduction method description ............................................................. 53

4.1.1. Overall process ............................................................................................ 53

4.1.2. Qualitative Analysis based on design structure matrix ................................. 57

4.1.3. Weight calculation ....................................................................................... 60

4.1.4. Two-stage optimization process .................................................................. 61

4.1.5. Numerical example ...................................................................................... 63

4.2. Engineering case studies ..................................................................................... 70

4.2.1. Power converter design problem ................................................................. 70

4.2.2. Aircraft concept design problem .................................................................. 73

4.3. Summary ............................................................................................................. 77

Chapter 5. Casual-Artificial Neural Network (Causal-ANN) and its application ... 78

5.1. Causal ANN and application in attractive sub-space identification ....................... 78

5.1.1. Causal artificial neural network .................................................................... 79

5.1.2. Attractive sub-space identification method ................................................... 82

5.2. Case studies ........................................................................................................ 85

Constructing causal-ANN ....................................................................................... 86

Attractive sub-space identification .......................................................................... 88

5.2.2. Aircraft concept design problem .................................................................. 91

5.2.3. Discussion ................................................................................................... 95

Generation of high-level causal graph .................................................................... 95

Fault tolerance studies on causal relations ............................................................. 96

Impact of variable correlations ............................................................................... 98

5.3. Summary ........................................................................................................... 100

Chapter 6. Applying causal-ANN in energy consumption prediction ................. 101

6.1. Residential End-Use Stock and Flow Model ...................................................... 101

6.1.1. Total life cycle cost calculation ................................................................... 103

6.1.2. Logit model ................................................................................................ 104

6.1.3. Stock turnover engine ................................................................................ 104

6.1.4. Example of REUSF model ......................................................................... 106

6.1.5. Logit model training ................................................................................... 107

6.2. Applying causal-ANN in market share prediction ............................................... 108

6.3. Results and discussion ...................................................................................... 110

6.3.1. Case study: dish washer ............................................................................ 110

6.3.2. Full model prediction.................................................................................. 112

6.4. Summary ........................................................................................................... 114

ix

Chapter 7. Conclusions and future work .............................................................. 115

7.1. Conclusions ....................................................................................................... 115

7.2. Future Research ................................................................................................ 117

7.2.1. Knowledge validation, correction, and updating ......................................... 118

7.2.2. Employing different kinds of knowledge ..................................................... 118

7.2.3. Knowledge-assisted optimization strategies .............................................. 118

References ................................................................................................................. 120

Appendix A. Numerical Benchmark Functions .................................................. 134

Appendix B. List of Publications during PhD Studies ...................................... 137

Journals ...................................................................................................................... 137

Conferences ................................................................................................................ 138

x

List of Tables

Table 2-1: Classification of knowledge representation [43]. ............................................. 9

Table 2-2: Existing applications of knowledge in optimization. ....................................... 12

Table 2-3: Potential applications of knowledge in different stages of optimization. ........ 18

Table 3-1: Optimization results with numerical benchmark problems. ........................... 40

Table 3-2. Optimized results with benchmark functions in different dimensions. ............ 42

Table 3-3: Dimensions selected in PMO on SUR-T1-14 for five independent runs. ....... 43

Table 3-4: TRMPS parameter settings. ......................................................................... 48

Table 3-5: OMID parameter settings. ............................................................................ 48

Table 3-6: Optimization results of using TR-PMO, TRMPS OMID and PMO. ................ 48

Table 3-7: Parameters of NACA0012. ........................................................................... 50

Table 3-8: Optimization results with airfoil design problem. ........................................... 51

Table 4-1: The Taguchi orthogonal array for t=7............................................................ 60

Table 4-2: Matrix [A] for the numerical example. ........................................................... 64

Table 4-3: Matrix [A1] for the numerical example. ......................................................... 64

Table 4-4: Modified matrix [A’] for the numerical example. ............................................ 65

Table 4-5: Modified matrix [A1’] for the numerical example. .......................................... 65

Table 4-6: Matrix [Anoc] for the numerical example. ...................................................... 65

Table 4-7: Matrix [C] for the numerical example. ........................................................... 66

Table 4-8: Element values in the objective column in [A’] and [A1’]. .............................. 66

Table 4-9: Taguchi sampling table of objective function. ............................................... 67

Table 4-10: Weighted matrix [Aw] for numerical example. .............................................. 67

Table 4-11: Optimization results of the original problem and decomposed problem. ..... 69

Table 4-12: Comparison of two thresholds (10% and 20%). .......................................... 69

Table 4-13: Design variables in power converter design. .............................................. 71

Table 4-14: Optimization results for the power converter problem. ................................ 72

Table 4-15: Comparison of optimization results with a fixed number of SA for the power converter problem. ................................................................................. 73

Table 4-16 Design variables in aircraft concept design .................................................. 75

Table 4-17: Optimization results of aircraft concept design............................................ 76

Table 4-18: Comparison of optimization results with a fixed number of SA for the aircraft problem. ................................................................................................. 76

Table 5-1: Design variables in power converter design. ................................................. 85

Table 5-2: Comparison of accuracy among three metamodels. ..................................... 88

Table 5-3: Accuracy of each sub-network. ..................................................................... 88

Table 5-4: Probability distribution P(y ≠ 1|xi, i = 1,2,… ,6) on actual model. ................... 89

Table 5-5: Probability distribution Pprediction(y ≠ 1|xi, , i = 1,2,… ,6) on causal-ANN. .. 89

xi

Table 5-6: Probability distribution P(y ≠ 1|xi, i = 1,2,… ,6) with new upper bound. ......... 90

Table 5-7: Probability distribution Pprediction(y ≠ 1|xi, , i = 1,2,… ,6) with new upper bound. .................................................................................................... 90

Table 5-8: Interesting interval with the largest likelihood. ............................................... 91

Table 5-9: Design variables in aircraft concept design. .................................................. 92

Table 5-10: Comparison of accuracy value among three metamodels. ......................... 94

Table 5-11: Accuracy of each sub-network. ................................................................... 94

Table 5-12: Probability distribution P(y ≠ 1|xi, i = 1,2,… ,9) on real model. ................... 95

Table 5-13: Probability distribution Pprediction(y ≠ 1|xi, , i = 1,2,… ,9) on causal-ANN. 95

Table 5-14: Interesting interval with the largest likelihood. ............................................. 95

Table 5-15: R2 value of objective and intermediate variables for the causal-ANN without

y2. .......................................................................................................... 97

Table 5-16: Comparison of R2 values when missing links in causal graphs. .................. 98

Table 5-17: ANOVA analysis results of [x1,… , x6] to y2 ................................................ 98

Table 5-18: Interesting area detected with independent assumption in power converter design. ................................................................................................... 99

Table 5-19: Interesting area detected with independent assumption in aircraft concept design. ................................................................................................... 99

Table 6-1: The parameters of dish washers in 2010. ................................................... 106

Table 6-2: The inputs of two end-use technologies...................................................... 106

Table 6-3: Comparison in RMSE and time among three approximation models. ......... 111

Table 6-4 Approximation results of causal-ANN and logit model. ................................ 113

xii

List of Figures

Figure 2-1: Knowledge representation methods. ............................................................. 6

Figure 3-1: Flow chart of PMO ...................................................................................... 33

Figure 3-2: Box-plots of optimized values. ..................................................................... 42

Figure 3-3: Convergence plot of PMO in SUR-T1-14 problem. ...................................... 44

Figure 3-4: Flowchart of TR-PMO. ................................................................................. 47

Figure 3-5: Airfoil design problem. ................................................................................. 50

Figure 3-6: Optimization results on the airfoil design problem ....................................... 51

Figure 4-1: Causal graph example. ............................................................................... 54

Figure 4-2: Causal graph of a numerical example. ........................................................ 63

Figure 4-3: Simplified causal graph for the numerical example. ..................................... 68

Figure 4-4: Causal graph of the power converter problem. ............................................ 71

Figure 4-5 Causal graph of the aircraft concept design problem .................................... 75

Figure 5-1: An example of high-level causal graph ........................................................ 80

Figure 5-2: Causal-ANN with a cheap model. ................................................................ 80

Figure 5-3: Two separate sub-networks. ....................................................................... 81

Figure 5-4: Casual-ANN with known intermediate variables and cheap models. ........... 81

Figure 5-5: Variable discretization. ................................................................................ 83

Figure 5-6: Discretization for the variable without fixed bounds. .................................... 83

Figure 5-7: Causal graph of the power converter problem. ............................................ 85

Figure 5-8: Simplified causal graph of the power converter design problem. ................. 86

Figure 5-9: Six sub-networks for the power converter design problem. ......................... 87

Figure 5-10: Causal graph of the aircraft concept design problem. ................................ 92

Figure 5-11: Simplified causal graph for aircraft concept design .................................... 93

Figure 5-12: Sub-networks for aircraft concept design. .................................................. 93

Figure 5-13: Casual graph with one intermediate layer for power converter design ....... 97

Figure 6-1: Flow chart of Residential End-Use Stock and Flow Model. ........................ 103

Figure 6-2: Flow of stocks in the stock turnover engine. .............................................. 105

Figure 6-3: Flow chart of the market share prediction. ................................................. 108

Figure 6-4: High-level causal relations of the market share prediction model. ............. 109

Figure 6-5: Structure of causal-ANN to predict market shares ..................................... 109

Figure 6-6: Market shares comparison for dish washers. ............................................ 112

Figure 6-7 Energy consumption prediction using causal-ANN. .................................... 114

xiii

List of Acronyms

AI Artificial Intelligence

AIC Akaaike’s Information Criterion

ANN Artificial Neural Network

ANOVA ANalysis Of VAriances

BLISS Bi-Level Integrated System Synthesis

BN Bayesian Network

CAD Computer-Aided Design

CAE Computer-Aided Engineering

CC Capital Cost

CO Collaborative Optimization

CSSO Concurrent SubSpace Optimization

CST Class function/Shape function airfoil transformation representation Tool

DACM Dimensional Analysis Concept Modeling

DAG Directed Acyclic Graph

DSM Design Structure Matrix

GA Genetic Algorithm

GM Graphical Models

GPS General Problem Solver

HDMR High Dimensional Model Representation

KBE Knowledge-Based Engineering

KEE Knowledge Engineering Environment

MAE Mean Absolute Error

MPS Mode Pursuing Sampling

MS Market Shares

NFE Number of Function Evaluations

OMID Optimization on Metamodeling-supported Iterative Decomposition

PCA Principle Component Analysis

PMO Partial Metamodel-based Optimization

RBF Radial Basis Function

RBF-HDMR Radial Basis Function-HDMR

xiv

REUSF Residential End-Use Stock and Flow

RMSE Root Mean Square Error

RSM Response Surface Method

SA System Analysis

SU Stock Units

TLCC Total Life Cycle Cost

TRMPS Trust Region-based Mode Pursuing Sampling

TR-PMO Trust Region-PMO

UEC Unit Energy Consumption

VDS Visual Design Steering

1

Chapter 1. Introduction

1.1. Motivation

High dimensionality, expensive computational cost, and black-box functions (HEB) are

three main challenges in simulation-based design optimization [1]. More design variables

are involved in the engineering design problem, which increase the dimensionality of the

optimization problem. The simulation model can improve the accuracy of simulation but

increases computational cost in the meantime. The computational costs of solving a

large-scale simulation-based optimization problem often become unacceptable in

practice. Therefore, high-efficiency optimization strategies need to be developed to deal

with such large-scale engineering design problems.

Current simulation-based optimization strategies usually treat simulation as a black-box

function. The assumption of black-box functions is derived from the fact that simulations

are used to evaluate design functions, whose mathematical expressions are unknown to

the user. The presence of noises in simulation renders the approximated gradients

untrustworthy, even if the added cost to obtain such gradient information is tolerable.

One main advantage of treating simulation as a black box is that the optimization

method can be generalized for solving any design problem. Different non-gradient based

optimization algorithms [2]–[4] and metamodel-based optimization methods have been

developed to deal with black-box optimization problems [5]–[10]. The metamodel,

meaning “model of a model,” is a simplified mathematical model that approximates the

hidden function in simulation, e.g., a polynomial function or an artificial neural network

(ANN) model. Generally, either in non-gradient based optimization methods or in

metamodel based optimization methods, the key to the optimization algorithms is the

way to generate useful samples (offspring or particles) in a high-dimensional space.

Generation of new samples needs to balance between exploration and exploitation.

Especially for exploitation, information obtained from previous iterations and existing

samples is usually used to help generating better samples. However, the lack of

information may lead to low efficiency or even wrong search direction.

2

Another issue of the black-box assumption is more computational cost is demanded

since the optimization is blind to the design problem at hand. This phenomenon is more

severe when the dimensionality of the problem is high. With the increase of the

dimensionality, the volume of the design space grows exponentially. Even thousands of

samples are sparse in a 100-dimensional space. It becomes even difficult to explore and

optimize blindly in such a huge space. This problem brought by the dimensionality of the

problem is known as the “curse-of-dimensionality” [11]. In the high-dimensional space,

only information obtained from samples are not enough for solving high-dimensional

problems since the property of the design space cannot be represented accurately

through such sparse samples.

In real-world engineering design, practitioners usually have certain of knowledge of the

design problem such as the variables involved in the problem, the input-output relations,

or even have some mathematical functions based on physical laws. Such information is

largely ignored when solving the engineering design problems in current simulation-

based optimization strategies. As aforementioned, different information along with

samples are required to break the “curse-of-dimensionality”. If existing knowledge of the

engineering problem can be incorporated into modeling and optimization, the number of

sample points necessary to capture the behavior of such a function and the design

space could be reduced. Additionally, by analyzing existing knowledge about an

engineering design problem, some hidden valuable information can be extracted, which

can help to perform optimization more efficiently. For instance, if one can find that the

objective function is monotonic with respect to some design variables, values of such

design variables can be determined without the need of optimization and the

dimensionality of the problem can be reduced. If one knows the input-output relationship

follows a certain trend, it will help the selection of the most suitable metamodel and

reduces the costs of model construction. Therefore, how to systematically incorporate

different kinds of knowledge into optimization, rather than ad hoc and problem-specific

treatment, becomes an interesting research topic. This issue becomes especially

relevant for large-scale design problems in order to break the “curse of dimensionality.”

1.2. Objectives of the research

The main objective of this thesis is to develop methodologies that employ knowledge to

assist in solving large-scale engineering optimization problems. One of the methods to

3

break the “curse of dimensionality” is to reduce the dimensionality of the problem. Thus,

the first objective is to develop dimension reduction strategies to solve large-scale

problems. In this thesis, by employing the sensitivity information, important variables are

identified and employed to construct a partial metamodel to reduce the dimensionality.

To avoid losing key information caused by omitting variables or the errors in the

sensitivity analysis, the dimensionality of the partial metamodel grows gradually in the

optimization process according to the sensitivity information, which means more and

more design variables are involved in the optimization to reach better optimal solutions.

Another kind of knowledge will be employed to reduce the dimensionality is the causal

relations in engineering problems. The variables without contradiction are identified

before optimization. Such variables are monotonic with respect to the objective function,

which means the optimal value for those variables can be determined without

participating in the optimization. Thus, the number of design variables in the optimization

can be reduced.

Another challenge in dealing with large-scale problems is constructing an accurate

metamodel with scarce samples. Therefore, the second objective of this thesis is to

improve the accuracy of the metamodel by employing knowledge to break the limitation

of the black-box assumption. Knowledge of engineering problems, such as causal

relations of variables in the problem, mathematical equations, and values of the

intermediate variables can be applied in metamodeling to improve the model accuracy.

The attractive design area of the problem can be detected by employing the proposed

metamodel and Bayesian theory.

These methodologies are to be applied to an airfoil design problem, a power converter

design problem, a conceptual aircraft design problem, and an energy consumption

prediction problem sponsored by a local company.

1.3. Structure of Dissertation

To develop the knowledge-assisted metamodeling and optimization strategy, the thesis

can be divided into five parts, including 1) literature review and related theory, 2) the

dimension reduction method based on sensitivity information, 3) the causal relation-

based dimension reduction method, 4) the causal relation-based metamodeling method

4

and its application, and 5) the applications in the energy consumption prediction model.

The thesis is organized into seven chapters:

Chapter 2 reviews the concept of knowledge and existing applications of knowledge in

optimization. Potential applications of knowledge at different stages of optimization are

discussed and the knowledge employed in this thesis is identified. Next, related

algorithms and theories employed in this thesis are introduced.

Chapter 3 presents a partial metamodel based optimization method based on the

sensitivity information. The partial metamodel is constructed on important variables and

updated at each iteration by considering more variables. Then, the proposed

optimization method is described and tested in both numerical problems and an airfoil

design problem.

Chapter 4 develops a causal relation-based dimension reduction method. Causal

relations of a problem are employed to identify variables without contradiction. Details of

the dimension reduction method are introduced with a numerical example. The proposed

methods are applied in two engineering optimization problems to test the efficiency of

the method.

Chapter 5 proposes the causal-Artificial Neural Network (causal-ANN) method and its

application in detecting attractive design area. Different types of causal-ANN are

constructed based on the involved knowledge. The attractive design area detection

method is developed based on the causal-ANN and Bayesian theory. Finally, the causal-

ANN is employed to construct the metamodel for two engineering problems and the

attractive areas are also detected by the proposed method.

Chapter 6 applied causal-ANN in an energy consumption forecasting model to predict

the market shares of different end-use technologies, a project funded by a local

company. There are totally 304 market share prediction models to be constructed with

only 12 years of historic data for the training for each model. The total number of end-

use technologies involved is 1488. The accuracy and efficiency of the causal-ANN are

compared with the original logit model.

Chapter 7 summarizes the work has been done in this thesis and makes suggestions for

future work.

5

Chapter 2. Literature review

Before proposing knowledge-assisted metamodeling and optimization methods, the

concept of the knowledge is reviewed first. Then, existing applications of knowledge are

summarized according to different types of knowledge. Next, potential applications of

knowledge in optimization are discussed. Additionally, related algorithms and theories

employed in proposed methods in later chapters are introduced in this chapter, including

the Radial Basis Function-High Dimensional Model Representation (RBF-HDMR) model,

ANN, and Bayesian network.

2.1. Concept of knowledge

Knowledge is defined as familiarity, awareness or understanding of someone or

something [12]. The word “knowledge” is widely used in the AI field and the definition of

knowledge used in the engineering field also comes from AI.

To obtain knowledge from problems, different AI methods were applied in the

optimization. The applications of the AI methods can be classified into two categories,

knowledge from graph and documents and knowledge from data. The expert system,

which belongs to the first category, was used in design problems for decision making

[13], [14]. However there are few applications using experts directly in assisting

optimization. For knowledge from the data, multiple data mining methods [15], [16] and

classification methods [17]–[19] were applied in optimization problem formulation and in

optimization strategies for generating new samples.

In this section, the concept of knowledge in AI is reviewed first to give a clear description

of knowledge representation and capture. Then, to define what kinds of knowledge can

be obtained from and applied in the engineering world, the knowledge concept in

product design is also surveyed.

6

2.1.1. Knowledge in Artificial Intelligence

AI is currently one of the most popular research fields around the world. AI is defined as

the study of intelligent agents: any device that perceives its environment and takes

actions that maximize its chance of success at some goal [20]. In other words, AI is a

technique which can help machines to deal with different problems in an intelligent

manner. There are two main problems in AI, learning and problem solving [20].

Knowledge is involved in both problems. In learning, knowledge should be captured and

represented in a form that machines can understand. On the other hand, knowledge

should be reused to solve problems.

General problem

solverExpert system Semantic nets

Figure 2-1: Knowledge representation methods.

Knowledge representation is central to AI research, which focuses on designing

computer representations that capture information about the world to solve complex

tasks [21], [22]. The earliest knowledge representation work was focused on a general

problem solver (GPS) [23], which was to develop as a universal solver machine.

Although the development of GPS is not successful due to its limitation on the problem

definition format, GPS is the first attempt to regard knowledge as an input to solve

problems. Following the idea of GPS, expert systems are developed to represent human

knowledge.

Expert systems could match human competence on a specific task [24]–[26]. Two

techniques developed at that time and still used today are the rule-based knowledge

representation [27] and frame-based knowledge representation [28]. Rule-based

systems are widely used in domains such as automatic control [29], [30], decision

support [31], [32], and system diagnosis [26]. The frame-based method is used on

systems geared toward human interaction for choosing appropriate responses to varied

solutions. The frame-based knowledge representation focuses on the structure of the

concept, while the rule-based knowledge base focuses on logic choices. To combine

properties of the two expert systems, one of the most well-known integrated systems of

frame and rule was developed in 1983 named as Knowledge Engineering Environment

(KEE) [33], which contained a complete rule engine with forward and backward chaining

7

and a complete frame-based knowledge base with triggers, slots, inheritance, and

message passing. The expert system is a useful knowledge representation tool. By

employing the expert system, users can make reasonable decisions. However, the

expert system is defined by expert experiences. The effectiveness of expert systems

highly depends on the accuracy of the contents in the system. Thus, an incorrect or

outdated expert system may lead to wrong decisions. Therefore, how to define an

appropriate and evolving expert system remains the main challenge.

Currently, one of the most active areas of knowledge representation research is on

semantic nets [34], [35], which is a network that represents semantic relations between

concepts. Different from neural networks, semantic nets are made up by different

concepts and semantic relations between concepts. A related concept is ontology [36],

[37]. The concept of ontology in philosophy is that it is the study of the nature of being,

becoming existence or reality, as well as the basic categories of being and their

relations. In computer science, ontology is a formal naming and definition system of the

properties and interrelationships of entities that fundamentally exist in a particular

domain [38]. The main benefit of ontology is that it is not only able to describe different

concepts in the domain but also the relationships that hold between concepts. Another

property of ontology is that by employing ontology mapping [39] and ontology merging

method [40], similar ontologies can be integrated to include more information, especially

relationships between different concepts. The semantic net is a way to create ontologies.

In the definition of a general problem solver, knowledge is defined as information of the

real world. A problem is solved by employing a knowledge representation method. In AI

or computer science, knowledge is represented by language or knowledge graph, which

cannot be directly and automatically used in engineering design, however. Compared to

language-represented knowledge, input-output relations data are more applicable for

engineering design. Current popular machine learning helps to find interrelations in a

complex system, which is based purely on data. How to combine the language and

graphic knowledge and the knowledge embedded in data is the main question that is to

be addressed in future research.

8

2.1.2. Knowledge in product design

Although knowledge has been used in product design for a long period, the definition of

knowledge is borrowed from AI. In product design, knowledge is understood as the

information which is not directly available but is obtained from analysis of data. In other

references, knowledge is also described as the experience, concepts, values, beliefs,

and ways of working that can be shared and communicated [41]. Sunnersjo [42] argued

that knowledge should include not only the rules that the designer should adhere to, but

also the background knowledge that makes the design rules possible to review and

understand. In summary, the definition of knowledge in product design is varied. But one

consensus is that knowledge needs to be captured and represented in an appropriate

way.

In engineering design, knowledge is often used in the concept design phase to help

designers come up with better designs [43]. Knowledge used in design can be classified

into two categories, formal knowledge and tacit knowledge. Formal knowledge is

embedded in product documents, repositories, product function and structure

descriptions, problem-solving routines, technical and management systems, computer

algorithms, and so on [44]. On the other hand, knowledge tied to experience, intuition,

unarticulated models, or implicit rules of thumb is regarded as tacit knowledge [45]. It is

easier to capture and represent formal knowledge than the latter. On the other hand,

tacit knowledge is rather difficult to be expressed, which is generally gained over a long

period with learning and experiencing. One reason is that there is not a common

recording method to capture the knowledge in human’s brains. Another reason is that

such knowledge can only be transferred by willing and articulating people. One main

research direction of the knowledge in product design is how to capture and represent

the tacit knowledge. Either formal knowledge or tacit knowledge should be represented

in a way that is easy to be understood [46].

Knowledge representation methods can be classified into five categories [44]: pictorial,

symbolic, linguistic, virtual, and algorithmic approaches as shown in Table 2-1. Pictorial

presentation presents knowledge as a picture or graph, including sketches, detailed

drawings, and photographs. The symbolic method represents knowledge by drawing a

chart or a network. Decision tables, flow charts, assembly trees, and ontologies are all

symbolic representation methods. The rule-based and frame-based expert system can

9

be regarded as symbolic knowledge. The linguistic representation uses document files

including customer requirements, design rules, constraints, and so on. CAD models,

CAE simulations, and virtual reality simulations are examples of virtual representation

methods. Finally, the algorithmic methods contain the procedural or methodical

knowledge used in modeling, analysis, and optimization. The information obtained from

AI methods such as data mining methods or machine learning methods can also be

classified into algorithmic knowledge.

Table 2-1: Classification of knowledge representation [43].

Representation approaches Examples

Pictorial Sketches Detailed drawings Photographs

Symbolic Decision tables Flow charts Assembly tree Semantic nets Expert systems

Linguistic Customer requirements Design rules Constraint analogies

Virtual CAD Models CAE simulations Virtual Reality simulations

Algorithmic Mathematical equations Computer algorithms Optimization algorithms Data mining methods Machine learning methods

Different knowledge is used at different stages of product design [36-37]. To start a

design, user requirements are needed for the requirement modeling period. House of

quality, belonging to the linguistic knowledge representation method category, is often

used to summarize the necessary requirements. In the functional modeling section,

decision trees can be used to determine the function of the product and how to realize

those functions. Then, some linguistic methods, such as design principles will be used to

generate concepts whose behaviors are modeled based on the functions of the product.

Different ideas are generated in the concept design period. A rich and well-structured

knowledge representation system is needed to support such plenty of concepts and

ideas [47]. Ontology is an appropriate method to organize ideas in this period. Ontology,

10

which is a highly structured domain covering the processes, objects and attributes, has

the ability to integrate and migrate valuable, unstructured information and knowledge to

provide a complex domain that contains rich conceptualization [48], [49]. The semantic

net is a tool to capture and represent the ontology in a graph with nodes and arcs [50]. In

the previous three stages, i.e., requirement modeling period, functional modeling period,

and concept design period, the linguistic and pictorial knowledge plays the main role.

The next stage is embodiment design, where symbolic, algorithm, and pictorial methods

are highly involved. The information of the product architecture, material, and

mathematical equations are applied in this step. Next is the detailed design, where

different virtual knowledge, including CAD model, CAE, and virtual reality are used to

generate 3D models of the design. Then, more accurate simulation models are

generated and optimization is employed to modify details of the product.

Different kinds of knowledge can be utilized for engineering design. The issue of using

knowledge is that traditional knowledge is often represented by documents or graph.

How to use the knowledge appropriately in forming an engineering design and

optimization problem is the main task. An engineering simulation model is one attractive

type of virtual knowledge that can help in design. Such model gives input and output

relations, based on which one can dig out more hidden information such as the

monotonic influence of certain inputs on the output. Besides, approximate models can

be constructed based on simulation models.

To combine the rule-based and frame-based expert system with engineering design, the

Knowledge-based engineering (KBE) system is developed [51], [52]. The rule-based and

frame-based knowledge can be captured, represented and reused with computer-aided

design tools (CAD) and simulation tools in the KBE system to reduce time and costs of

product development. References [52] and [53] stated that KBE was likely to be the best

possible technology at hand to deal with rule-driven, multidisciplinary, and repetitive

geometry manipulation problems. In [52–54], a multi-model generator was created by

KBE to develop a distributed design framework to support aircraft multidisciplinary

design optimization. A specific family of aircrafts was generated automatically through

the KBE system [55]. In each model, discipline abstractions are obtained and used as

the input of simulation tools to evaluate the performance of an aircraft. One of the

disadvantages of the KBE system is that it can only deal with revision from existing

designs. In other words, before using KBE to design a product, similar products and their

11

design details are required. Another shortcoming is the expert system used in the KBE

system. One issue is how to validate the accuracy of the rules and classes in the

knowledge base. Another issue is that modeling the knowledge domain is also a burden

to developers. Additionally, only the expert system is involved in the KBE system to deal

with design problems, which is just one type of knowledge applied in the design. To

better assist the design process, different kinds of knowledge need to be involved. Thus,

the KBE system needs to be enhanced to include other kinds of knowledge when it is

used for optimization.

2.1.3. Summary remarks

Knowledge has been employed in problem solving and engineering design for decades.

Knowledge is captured from different resources, including documents, human

experiences, previous designs, and so on, and it is represented in a structured way for

further usage. The knowledge-based system is first developed in the AI field and the

expert system is one of the most common applications. By employing knowledge, the

engineering design process can be executed with no or little human intervention.

However, by employing frames and rules, the generated design through the expert

system is only a feasible design but not an optimal one. To reach the best, optimization

needs to be performed on the design obtained from KBE. Another issue about the

current knowledge base is that the focus of it is on the knowledge represented by

language. However, in engineering, knowledge is not only represented by language but

could also be data obtained from engineering analyses. In addition, knowledge

represented in an expert system can be used to help defining the optimization problem

and guiding the optimization process. Nevertheless, fundamental elements of

optimization are still data or numbers. Therefore, how to mine knowledge from data and

how to utilize such knowledge in optimization are two research directions of the

knowledge assisted optimization methodology. Moreover, how to combine linguistic

knowledge, such as design rules and customer requirements, with data is another area

of interest.

12

2.2. Existing Applications of knowledge in design optimization

Large-scale design optimization problems are difficult to solve. There are several

techniques can be used to tackle these problems, including dimension reduction,

decomposition, metamodeling, and optimization strategies [1]. Although knowledge is

not formally incorporated in optimization methods, there are some techniques employing

knowledge to deal with large-scale optimization problems. Table 2-2 gives a summary of

existing optimization methods involving knowledge. Note that the pictorial knowledge

which is usually used at the beginning of the concept design only includes rough

information of the design problem. Thus, the pictorial knowledge is rarely applied in

optimization. As for the other four kinds of knowledge, the symbolic and algorithmic

knowledge are widely used in different solution methods when dealing with high-

dimensional optimization problems. The details are reviewed in the following sections.

Table 2-2: Existing applications of knowledge in optimization.

Dimension reduction

Decomposition Metamodeling Optimization

strategy

Pictorial

Symbolic ◎ ◎ ◎

Linguistic ◎

Virtual ◎

Algorithmic ◎ ◎ ◎ ◎

2.2.1. Symbolic knowledge

The symbolic knowledge is the knowledge represented through graphs and symbols.

The symbolic knowledge is widely employed in optimization methods. To reduce the

dimensionality of an optimization problem, the causal graph is employed to identify and

remove certain design variables. A causal graph is an oriented graph showing the causal

relations between variables. Through analyzing causal relationships between design

variables and the objective, variables monotonically influencing the objective are

identified. The optimal values of these variables can thus be determined without

optimization, which means the number of design variables can be reduced. To further

13

decompose the problem, sensitivity values are applied to simplify the causal graph and

decompose the original problem into several sub-problems with fewer design variables.

This method was applied to solve an aircraft concept design problem and a power

converter design with significantly improved optimization efficiency [57]. The

shortcoming of this method is that if the monotonic variables cannot be found in the

problem or the range of variables is not carefully chosen to ensure monotonicity, this

method will fail or ineffective.

One kind of symbolic knowledge, named as design structure matrix (DSM), is usually

used to show the interdependence of each discipline in the decomposition strategies.

DSM is a square matrix that has identical row and column listings to represent a single

set of objects. The key advantage of DSM is that DSM can show to designers a

complete view of the coupling structure within a system [58]. By analyzing DSM,

decomposition can be performed and the multidisciplinary design optimization

architecture can be constructed. Moreover, different DSM analysis methods are

developed to simplify optimization problems. By performing the Graph partitioning [59]–

[61], clustering analysis [62] and optimization [63] on DSM, complex problems can be

decomposed into sub-problems. Then, different decomposition strategies, including

Concurrent SubSpace Optimization (CSSO) [64], Collaborative Optimization (CO) [65],

and Bi-Level Integrated System Synthesis (BLISS) [66] have been developed according

to the relations represented in the DSM. The main disadvantage of those decomposition

strategies is the large number of function evaluations needed when dealing with high-

dimensional optimization problems. In [67], CO and CSSO were tested with several

numerical benchmarks and the results show that even for low-dimensional problems, CO

and CSSO need thousands of discipline function calls. BLISS was used to solve an

aircraft concept design problem. For different variations of BLISS methods, although the

number of system analysis was reduced to be around 10, the number of total discipline

calls is around 400 and especially BLISS/RS2 required more than 1,000 discipline calls

[66].

In metamodeling methods, symbolic knowledge was used to determine the structure of

the approximation model. In [68], the intermediate variables in a Bayesian network were

used as hidden nodes to construct an artificial neural network (ANN) in a traffic accident

prediction. However, the Bayesian network was only used to represent the input-output

14

relations between variables and the mathematical relations cannot be captured from the

Bayesian network.

In summary, the symbolic knowledge usually assists optimizations at the beginning

stage of optimization. By employing the symbolic knowledge, properties of the problem

can be found out to reduce the difficulties of the high-dimensional optimization problems

either in reducing the number of dimensionality or in constructing a more accurate

metamodel.

2.2.2. Linguistic Knowledge

Linguistic knowledge is the information represented by documents. This kind of

knowledge is difficult to involve in the optimization problem since the optimization

methods usually focus on the trend of the data. One way to apply linguistic knowledge in

optimization is in selecting suitable approximation methods according to properties of the

problem. Response surface method (RSM) with different orders can be chosen

according to the problem. Additionally, different metamodels are fit in different problems.

A common conclusion regarding the traditional metamodeling methods is that Kriging

method performs better for low dimensional problems while radial basis function (RBF)

outperforms others for high-dimensional problems [69]. Thus, considering the properties

of metamodeling methods and features of the problem, a suitable metamodeling method

can be selected for a certain problem. However, determination of the metamodeling is

still based on the experience of the experts, which may lead to a wrong direction of the

selection. Moreover, there are other conditions which will influence the selection of

metamodeling methods, which also need to be considered when constructing a proper

metamodel.

Other than selecting the metamodeling method, properties of the problem can be used in

selecting operators in optimization methods such as the Genetic Algorithm (GA).

Reference [70] suggested using domain knowledge in the three stages of GA, in the

initial population generation, in encoding the genotype, and in genetic operators of

crossover and mutation. In [71], the knowledge of truss was used to guide the initial

sampling in the GA method. Hu and Yang [72] used specific knowledge in GA to solve a

path planning problem. Piroozfard et al. [73] employed knowledge-based operators to

solve job shop scheduling problems. In general, the specific property of the problem is

15

applied to generate custom operators for the problem. However, such ad hoc

approaches cannot be extended to solve other problems.

2.2.3. Virtual knowledge

Virtual knowledge, such as CAD, CAE, and virtual reality models, allows users to get

insight into problems, find key trends and relationships among variables in a problem,

and make decisions by interacting with the data. A Visual Design Steering (VDS) method

[74], [75] was developed as an aid in multidisciplinary design optimization, which can

help a designer to make decisions before, during, or after analysis or optimization via a

visual environment to effectively steer the solution process. The virtual knowledge is

helpful when little is known about the data and the exploration goals are implicit since

users are able to directly participate in the exploration processes, shift and adjust the

exploration goals if necessary. However there lacks of direct translation of such

knowledge into formulation for optimization problems.

2.2.4. Algorithmic knowledge

Algorithmic knowledge is the most popular knowledge used in optimizations since this

knowledge has the closest relation with data. As mentioned in Section 2.1, equations,

simulation models of the problems and information obtained from machine learning

algorithms can all be categorized as algorithmic knowledge.

Equations, which widely exist in different optimization problems, can be used in different

stages of optimization process. Note that the equations may not be the accurate model

of the problems, but the mathematical relations provided in the equations can also help

in dealing with optimization problems. The empirical equations with lower fidelity can be

employed in the multi-fidelity models to reduce the number of function evaluations of the

expensive simulation models. The co-Kriging method can be employed to generate

metamodels based on multi-fidelity models [76]. In [77], the empirical equations were

used to construct a knowledge layer in the ANN to dealing with a microwave design

problem. Physical theories, empirical data, and historical data are treated as white-box

models and they may be involved in constructing a grey-box metamodel [78]. The

residual between the white-box prediction and the simulation data is estimated by a

metamodel. The grey-box method is applied in prediction in two manufacturing problems

16

and the results show that the metamodel is sufficiently accurate with small amount of

sample points. Equations can make the optimization easier but the accuracy of the

equations has large impact on the optimization results. When the equations are not

reliable, the accuracy of the metamodels will be poor.

Data of historic designs can also be employed in the optimization. Kurek et al. [79]

developed a novel approach for automatic optimization of reconfigurable design

parameters based on knowledge transfer. Solutions and history data of related previous

design are treated as priori knowledge and will be transferred to the new design and

optimization. The auto-transfer algorithm was developed based on Bayesian

optimization [80] to determine which design would be transferred, when it would be

transferred, and how it would be transferred. The efficiency improvement of the

optimization method based on knowledge transfer algorithm was significant.

Recently, machine learning methods are increasingly employed in assisting optimization.

The screening methods and mapping methods are employed to reduce the

dimensionality of the problem [81], [82]. But there is information loss either in screening

or mapping. The influences of those lost information on the optimization results are

difficult to quantify. If the key information was lost due to screening or mapping, the

optimization would fail. The classification method is also employed in optimization

methods to help in sampling. A classifier-guided sampling method was developed to

generate samples towards the area with a high probability of yielding preferred

performance [17]. Instead of randomly sampling, the samples are generated based on

the information obtained from the classification results. Compared with traditional

optimization methods such as GA, the rate of convergence is improved significantly by

the proposed method. In many cases, users tend to specify an excessive number of, and

often redundant, constraints. Methods were developed to find the redundant constraints

for the mathematical problems [83]. Cutbill and Wang [16] introduced a novel method

based on association analysis to detect redundant black-box constraints. Those are

methods for finding redundant constraints through data.


Knowledge has been used in solving optimization problems although the concept of

knowledge is not widely applied in the optimization field. In current optimization methods,

17

algorithmic knowledge is still the most used type of knowledge. On the other hand,

symbolic knowledge, such as causal graph and DSM, are also employed in optimization

methods. However, the limitations still exist in those methods. Employing specific

knowledge may help in improving effectiveness and efficiency of optimization for one

problem but it may not be suitable for different kinds of problems. The issue of the

current knowledge assisted optimization method is that there is no systematic way to

employ different kinds of knowledge together to deal with one problem. Besides,

because of the property of linguistic knowledge and virtual knowledge, they are difficult

to involve in optimization. Therefore, how to combine linguistic/virtual knowledge and

data information is one of the research directions for knowledge assisted optimization

methods.

2.3. Potential applications of knowledge

The potential applications of knowledge to assist optimization are summarized in this

section. The four techniques to deal with high-dimensional optimization problems can be

treated as four stages in optimization process combining with the problem formulation

process. At the beginning, the optimization is formulated according to the design

requirements. Then, the dimension reduction method can be applied to reduce the

number of design variables while decomposition methods can be employed to

decompose the problem into several sub-problems. If the simulation model is expensive,

metamodeling method can be used to reduce the computational cost. Finally, different

optimization algorithms and strategies can be employed to find the optimal solutions.

However, each stage exists limitations when considering the problem as a black-box.

Here the knowledge contains linguistic knowledge, pictorial/symbolic knowledge, and

data knowledge. As shown in Table 2-3, to overcome the challenges in the high-

dimensional problems, possible applications of knowledge to support different aspects of

optimization are listed. This section is organized as follows. In each sub-section, after

introducing the challenges of each stage, the potential applications of knowledge are

introduced.

18

Table 2-3: Potential applications of knowledge in different stages of optimization.

Optimization stages Challenges Knowledge

Problem formulation Constraints definition

Determining feasible area

Design rules Custom requirements

Decision trees Ontologies

Dimension reduction Determining omittable variables

Flow charts Causal graph Design rules

Equations

Decomposition Relations between disciplines

Correlations between variables

Flow charts Causal graph

Bayesian graph Equations

Metamodeling Selecting metamodeling method The accuracy of the metamodel

Equations Historical data

Historical design Causal graph

Bayesian graph

Optimization strategy How to generate new samples Equations

Bayesian graph Flow chart

2.3.1. Problem formulation

There are three elements in an optimization problem, design variables, objectives, and

constraints. The number of design variables, number of constraints, strictness of the

constraints, and other definitions of the problem will influence the efficiency and

effectiveness of optimization. One of the challenges in problem formulation is constraint

specification. The number of constraints will influence the efficiency of the optimization.

A large number of constraints will increase the computational cost in an optimization

problem. Additionally, the strictness of the constraints is another issue. A very strict set

of constraints may cause the optimization fail since it is difficult to find a feasible solution,

while a loose set of constraints may lead to a failure design in the real world. Another

task related to problem formulation is to detect the feasible area. If the feasible area can

be determined, it will be much easier for optimization algorithms to find the optimum. To

deal with these two challenges, different kinds of knowledge can be applied including

linguistic knowledge and the symbolic knowledge. Moreover, the expert system and

machine learning methods can also be employed.

19

Symbolic knowledge can be involved for constraint specification to avoid redundant

constraints. The expert system can become one of the useful tools for problem

formulation. KBE systems are widely used in engineering design problems to represent

the rules and requirements in a structural way [51], through which a more complete and

accurate set of constraints can be defined according to different design scenarios [84].

By employing the structured representation of the rules and frames, the constraints and

relations between different constraints can be obtained and the definition of the problem

can be generated through the expert system.

For constrained optimization, some data-based methods were developed to find the

feasible areas in the black-box constrained optimization problems [85]–[87]. The expert

system has the capability to generate feasible design considering different rules, and

also it can be used to detect the feasible area of one design problem [51], [53], [88], [89].

The expert system can also be used to determine the constraints that must not be

violated and the constraints can be mildly violated.

Ontology is a knowledge representation method considering not only a single concept

but also the relations between different concepts. Semantic nets are often used to

present the ontology. If one treats the design variables, constraints, and objectives as

the nodes in semantic nets and generates semantic nets among those nodes, designers

may have a clearer and deeper understanding of the problem and the optimization

problem formulation may be more targeted. Similar to expert systems, ontology can help

to make judgements. In [90] and [91], ontology was used to represent the requirements

in engineering design. The relationships between different concepts in the ontology can

give a clearer insight of the design problem at the problem formulation stage. For

example, similar requirements can be detected through analyzing the ontology. Then,

the constraints of the optimization problem can be defined more appropriately by

employing knowledge. Additionally, ontology can be used to early validate system

requirements [92]. Thus, using ontology to guide formulation is a future research

direction.

2.3.2. Dimension reduction

The dimensionality of an optimization problem often determines the computational cost,

especially when choosing metamodel-based optimization methods. Dimension reduction

20

is a common way to improve the optimization efficiency. There are two kinds of methods,

one is screening to select the important variables and the other is mapping which maps

the high-dimensional data to the low-dimensional space. However, how to determine the

omittable variables in the screening methods and how to determine the dimensionality of

the lower-dimensional space in mapping method are two challenges for dimension

reduction.

As mentioned in Section 2.2, the dimensionality of the optimization problem can be

reduced through analyzing the causal graph of the problem [93]. Some design variables

are removed from the variable set due to their monotonic influences on the objective.

The monotonic influences can also be obtained from other knowledge, such as

equations or design rules.

In screening methods, sensitivity analysis is performed to determine the importance of

variables. The screening process can also be performed based on rules and frames in

an expert system. In this case, sensitivity analysis can be employed as a validation

method by checking the screening results from the expert system.

In the traditional mapping method, the dimensionality of the mapped low dimensional

space is always a question. Usually, the dimensionality is determined by the user and

often fixed at an arbitrary small number. Knowledge can be used to find out such

dimensionality. By analyzing relations between design variables, the dimensionality of

the lower dimensional space may be defined. In [94], a mapping method named as

generative topographic mapping was used to solve 30-dimensional airfoil design

optimization problems and different lower-dimensional spaces were tested. It is found

that the optimized result of two-dimensional lower spaces is the best. One reason is that

for the airfoil design problem, the naive 30 NURBS variables might have a more sensible

dimension of two. If the designers can find out that the 2-D spaces are the best through

knowledge, the optimization results may be more accurate. The ontology knowledge

base can also be applied to find the latent variables by analyzing relationships of design

variables.

Various dimension reduction methods are developed in the Data Mining field. Feature

selection is one of the dimension reduction methods by removing irrelevant and

redundant features to reduce dimensionality [95]. Two categories of feature selection

21

methods, filter methods and wrapper methods, were developed to select features [96]. In

filter methods, variables are ranked according to different principle criteria, such as

correlation criteria [97] and Mutual Information [98]. Wrapper methods use the prediction

performance of different sub-set of variables to reduce the dimensionality [99], [100].

Compared with wrapper methods, the computational cost of filter methods is cheap, but

the accuracy of filter methods is lower.

2.3.3. Decomposition

There are two categories of commonly used decomposition methods, one is based on

relations of disciplines and the other is based on correlations among variables. One

multidisciplinary design optimization problem is usually decomposed according to

relations among disciplines. A common problem in decomposition is that there is no

general method in generating the decomposition framework. In other words, a new

framework needs to be constructed for every different problem. The expert system may

give a way to generate the frame for different design problems with little or no human

intervention. For the variable-based decomposition method, the main challenge is

detecting the correlations of the variables with lower computational cost.

As mentioned in Section 2.2.2, a problem can be decomposed according to variables

rather than disciplines. The causal graph or Bayesian networks constructed based on

the variables and their relationships can be used to help finding out where the coupling

is in the problem. In [93], the three coupled loops can be reduced to one when breaking

the discipline-based DSM to the variable-based DSM. Thus, the graphic knowledge

representation methods have the capability in generating more efficient decomposition

results. A group of decomposition methods based on variables was rooted in the high

dimensional model representation (HDMR) method [80-81]. High dimensional problems

were decomposed into several sub-problems based on the sensitivity information of

different component functions in the HDMR model.

For engineering problems, correlations between different variables can be determined

through the documented (linguistic) knowledge or the analysis of the graphic knowledge,

such as ontology knowledge base. Then, decomposition based on the HDMR model

may be performed according to the obtained knowledge.

22

2.3.4. Metamodeling

Metamodels are widely employed to replace expensive simulation models. Different

metamodels have different properties. Therefore, how to select a suitable metamodeling

method is one of the tasks. Second, the accuracy of the metamodel is another issue

when approximating high-dimensional problems. The basic idea of improving the

accuracy is to generate more samples in the space when treating the problem as a

black-box. However, thousands of samples are still scarce for a 100-dimensional

problem. Moreover, for some specific metamodeling method such as RBF, adding more

samples may lead to over-shooting. To overcome this problem, other information should

be considered rather than simply regarding the problem as a black-box.

Artificial neural network (ANN) is one effective metamodeling method for nonlinear

problems [103]. Increasing the number of nodes and the number of layers can improve

the accuracy of the ANN model to a certain extent. However, in a fully connected ANN

with plenty of nodes, the number of weights needed to be estimated is very large and

often thousands of sample points are needed to generate an accurate set of weights.

Thus, how to reduce the number of nodes and the number of links, or in other words,

how to determine the structure of a neural network is one of the issues for ANN

approximation. Similar with ANN, the causal graph or Bayesian network is also a

structure based on nodes and links. Those graphic knowledge representation methods

can be used as a guide in generating the ANN structure. Another potential improvement

of ANN is to consider the values of intermediate variables. Even in a black-box function

model, actual values of some intermediate variables can be obtained by simulation.

However, such information is not considered in metamodel construction. After employing

causal graph to determine the structure of an ANN, values of intermediate variables can

be determined to improve the approximation accuracy. In other words, some hidden

layers in ANN can be taken to the surface as actual values and the links related to them

can be obtained.

In [10], a partial metamodel was employed to deal with high dimensional problems. In

that case, only selected component functions are constructed in the HDMR model

instead of constructing the complete model to reduce the function evaluations. A

component function is selected through the importance of the design variables via

estimated sensitivity information. By using knowledge of the engineering problems, such

23

as causal graphs or empirical equations, the important component functions may be

predetermined.

Fuzzy logic knowledge can be used to construct the prediction model. In fuzzy expert

systems, continuous inputs and outputs are transferred to fuzzy sets and they are linked

together by if-then rules. In prediction, the predicted fuzzy output will be converted back

to the continuous output. In [104], fuzzy logit knowledge was used to forecast energy

demand. A type-2 fuzzy rule-based expert system model was constructed to estimate

the stock prices [105].

Previous design knowledge can also be utilized by using existing samples and design

results in constructing the metamodel for the similar problems. The response value of

the metamodel may be different from the actual model, but the trend of the problems or

some interesting design spaces may be found through the metamodel. Multi-task

regression is a method to construct a regression model for different but related tasks by

analyzing data from all the tasks instead of constructing individual regression model for

each task [106]. Thus, combining with the data from previous design problems, a multi-

task regression model can be constructed on all the related designs. Additionally, with

the number of design increases, the multi-task models can be updated.

2.3.5. Optimization strategy

Most of the optimization strategies are based on samples or offspring. How to generate

new samples is the main question for metamodel-based optimization strategies. Some

strategies generated samples in the area with the highest uncertainty [107], some

generated samples uniformly in the desired space [108]–[110] and others generated

samples according to the probability distribution calculated from the previous metamodel

[6], [7]. Those methods are all based on the data captured from the analysis of the black-

box model.

Knowledge can also be employed to guide the sampling in the design space. Bayesian

network is a method that represents not only the graphic structure of the problem but

also the probability distribution of different variables [103]. Bayesian networks can also

be used to estimate the probability distribution for input variables when given a certain

value of the output. This distribution is named as likelihood. By predicting the likelihood

24

and generating samples following the likelihood trend, more improvements are expected

in the metamodel-based optimization. Additionally, if the priori probability distribution in

the Bayesian network is known before optimization, the initial sampling and the updating

can be performed following the knowledge of the priori probability distribution.

The equation is another kind of information that can be used in the optimization. The

optimization results of the empirical equations may not be accurate, but equations can

be used in helping generating new samples in the optimization iterations.

Evolutionary and metaheuristic optimization algorithms have been widely used for

optimization on inexpensive problems. In those algorithms, new samples at each

iteration are generated following the evolution theory or other crowd behavior. The

properties of the design problem can be captured and involved in the algorithm to guide

the generation of new sample points to improve the search efficiency of those

optimization algorithms. Most of the current knowledge-based operators in evolutionary

algorithms are developed for special cases. Therefore, general-applicable methods of

employing knowledge in assisting generating offspring should be developed.

2.3.6. Optimization, machine learning, and knowledge

There exists close ties between sample-based optimization and machine learning. One

of the issues in sample-based optimization is to determine the next samples without or

with less expensive function evaluations. In heuristic optimization algorithms (e.g., GA,

PSO, etc.), the expensive function is evaluated at all sample points, which increases the

computational cost significantly. Instead of using expensive functions, metamodels are

employed to predict the responses and only the interesting points are evaluated by the

expensive functions to improve the efficiency of metamodel-based optimization (e.g.,

MPS, EGO, etc.). Similar to optimization, machine learning methods also try to learn

from data (or samples). ANN, one of the machine learning models, is widely used as

prediction and classification models in manufacturing [111]. ANN essentially plays the

same role as a metamodel and it is in fact a commonly-used metamodel in design

optimization community. The ability of ANNs (especially deep ANNs) in dealing with

high-dimensional spaces and large amount of data has been noticed [112]. For example,

convolutional neural networks (CNN) can be used to deal with pictures with thousands or

even millions of pixels as inputs [113]. Instead of estimating the actual responses of

25

samples, judging the performance of samples via classification is another way to guide

sampling, which has been used in optimization algorithms by employing Bayesian

network classifier [17], [19] or Support Vector Machine (SVM) [114]. Another application

of classification models is found in heuristic optimization algorithms, where classification

is used to determine whether the next generation of sample positions improves the

search or not. Some other ANNs can also be employed to assist optimization.

Autoencoder can be used to reduce dimensionality [115]. Recurrent neural networks

(RNN) are usually used to learn from sequential data as the circular architecture of RNN

[116]. An optimization process also has a loop structure that the current optimal point

and samples can determine the next optimal solutions. Thus, there is a potential to use

RNN to learn the optimization process.

Data Mining (DM), also known as knowledge discovery from data (KDD), is a method to

help find knowledge from existing data [117]. Regression and classification are also

employed in DM to find the trend of the data. Another benefit obtained from DM is the

ability of pre-processing. As mentioned in Section 4.2, feature selection methods can be

used to reduce the dimensionality. Additionally, feature selection methods can also be

used in determining redundant constraints according to the data of constraints. If data of

intermediate variables can be obtained from simulations, feature selection can also be

applied in input-intermediate and intermediate-output pairs to identify the structure of

engineering problems.

Both machine learning methods and data mining methods, however, are also based on

samples, similar to sample-based optimization. Wu et al. suggested that domain and

application knowledge should be applied to design big data mining algorithms and

systems [118]. In machine learning, deep learning methods are developed to improve

the effectiveness of learning without engineering skills and domain expertise [119], but

the amount of training data and computational costs are large. Therefore, knowledge

can help both optimization and machine learning. In [120] and [121], fuzzy rules were

employed to predict the flying ash and the performance of a gasoline engine and the

results were similar to or outperformed the ANN predictions. Bayesian networks came

into sights of researchers, as they contain the structures of problems (knowledge) and

the probability distributions of variables (data). By combining knowledge and data

together, Bayesian networks have the potential to be applied in optimization to guide

sampling.

26


To overcome the limitation of assuming black-box functions in MBDO, knowledge is

involved to help in solving large-scale optimization problems. Knowledge can be applied

at different stages of the optimization process. At the beginning, knowledge is very

useful in defining a reasonable and effective optimization problem, either in dimension

reduction or in constraint specification. During the optimization, knowledge can help in

metamodel construction and to guide generation of new samples.

In assisting optimization, equations tend to be most useful information. Graphic

knowledge such as causal graph and Bayesian networks can also be used in problem

formulation, metamodel construction, new samples generation and other processes of

optimization. Additionally, ontology knowledge base tends to be useful in the problem

formulation stage to determine the constraints and design variables. Another important

piece of information which is not considered yet is data records from previous similar

optimizations. In practice, sample points in similar optimizations can potentially be

employed for the current problem through modifications.

2.4. Review of RBF-HDMR

The general form of HDMR [122] is:

𝑓(𝒙) = 𝑓0 +∑𝑓𝑖(𝑥𝑖)

𝑑

𝑖=1

+ ∑ 𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗)

1≤𝑖<𝑗≤𝑑

+ ∑ 𝑓𝑖𝑗𝑘(𝑥𝑖, 𝑥𝑗 , 𝑥𝑘)

1≤𝑖<𝑗<𝑘≤𝑑

+⋯

+ ∑ 𝑓𝑖1𝑖2…,𝑖𝑙(𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑙)

1≤𝑖1<⋯<𝑖𝑙≤𝑑

+⋯

+𝑓12…𝑑(𝑥1, 𝑥2, … , 𝑥𝑑)

(2-1)

where, 𝑓0 is a constant representing the zero-order effect on 𝑓(𝒙) ; the first order

component function, i.e., 𝑓𝑖(𝑥𝑖), gives the effect of the variable 𝑥𝑖 acting independently

on the output 𝑓(𝒙), which can be either linear or nonlinear; 𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗), the second order

27

component function, describes the correlated contribution of variable 𝑥𝑖 and 𝑥𝑗 upon

𝑓(𝒙).

In RBF-HDMR [123], an RBF model with a sum of thin plate spline plus a linear

polynomial is employed to approximate the component functions. The RBF model is

shown as follows [123]:

𝑓(𝒙) =∑𝛽𝑖|𝑥 − 𝑥𝑖𝑠|2𝑙𝑜𝑔|𝑥 − 𝑥𝑖

𝑠|

𝑁

𝑖=1

+ 𝑃(𝑥)

∑𝛽𝑖𝑝(𝑥)

𝑁

𝑖=1

= 0

𝑃(𝒙) = 𝒑𝜶 = [𝑝1, 𝑝2, ⋯ 𝑝𝑞][𝛼1, 𝛼2,⋯𝛼𝑞]𝑇

(2-2)

Where 𝑥𝑖𝑠 is the sampled point of input variables; 𝜷 = [𝛽1, 𝛽2,⋯ , 𝛽𝑁] and 𝜶 are

parameters to be found. 𝑁 is the number of sample points. 𝑃(𝑥) is a polynomial function

and 𝑝 is the vector of basis of polynomial, chosen as (1, 𝑥1, 𝑥2,⋯ 𝑥𝑑), so 𝑞 = 𝑑 + 1. The

function ∑ 𝛽𝑖𝑝(𝒙)𝑁𝑖=1 = 0 is imposed on 𝜷 to avoid the singularity of distance matrix.

The modeling process is described as follows.

(1) Randomly choose a point 𝒙0 in the design space as the cut center. Evaluate 𝑓(𝑥)

at 𝒙0 to obtain the zeroth-order component function 𝑓0.

(2) To approximate the first-order component function 𝑓𝑖(𝑥𝑖), first generate samples

in the close neighborhood of the upper bound and lower bound of 𝑥𝑖. Evaluate

those two ends and model the component function as 𝑓𝑖(𝑥𝑖) by a one-

dimensional RBF for variable 𝑥𝑖 using those two points.

(3) Check the linearity of 𝑓𝑖(𝑥𝑖) . If the cut center is on the line formed by the

approximation model 𝑓𝑖(𝑥𝑖) , then consider 𝑓𝑖(𝑥𝑖) as linear and terminate the

modeling process for 𝑓𝑖(𝑥𝑖). Otherwise, rebuild the RBF model 𝑓𝑖(𝑥𝑖) by using the

cut center and the two end points. Generate a random point along 𝑥𝑖 to test the

accuracy of the newly built 𝑓𝑖(𝑥𝑖). If the relative error between the actual value

and the approximation one is larger than a given criterion (e.g., 0.01), the test

28

point and all the existing points will be used to rebuild 𝑓𝑖(𝑥𝑖) until sufficient

accuracy is obtained.

(4) Check the accuracy of the first-order HDMR model. Form a new point through

randomly combining the sample values for each input variable. Then, compare

the value predicted by the approximation model with the value obtained from the

original expensive function. If these two values are sufficiently close, it indicates

that no higher-order components exist in the model, the modeling process

terminates. Otherwise, go to Step 5.

(5) Combine the value of 𝑥𝑖 and 𝑥𝑗 (𝑗 ≠ 𝑖) in the existing samples with the rest of the

elements 𝑥𝑘(𝑘 ≠ 𝑖, 𝑗) at 𝒙0 to create new points in two-dimensional planes. One

of the new points is randomly chosen to test the first-order RBF-HDMR model. If

the approximation model goes through the new point, 𝑥𝑖 and 𝑥𝑗 are deemed not

correlated and continue to test the next pair of input variables. Otherwise, use the

new point as well as the aforementioned evaluated points to construct the

second order component function, 𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗). This sampling-remodeling process

continues iteratively for all two-variable correlation until convergence. The higher

order component functions can be constructed in the same manner of Step 5.

The above process of building an RBF-HDMR model adaptively models a problem and

leads to high model accuracy for high-dimensional problems. The construction process

is simple. Moreover, RBF-HDMR can significantly reduce the number of expensive

function evaluations in approximating high-dimensional problems.

Besides the original RBF-HDMR, there are several modifications of RBF-HDMR in the

literature. Cai et al. [124] proposed an enhanced RBF-HDMR (ERBF-HDMR) that uses

enhanced RBF model based on the ensemble model to increase the accuracy of HDMR.

Other types of metamodel were employed, instead of RBF, to construct the component

functions. Huang et al. [125] and Wang et al. [126] employed Support Vector Regression

(SVR) and Moving Least Square (MLS) to replace RBF model in RBF-HDMR

respectively to obtain more accurate metamodels. The mentioned modifications all focus

on how to improve the accuracy of RBF-HDMR. Although original RBF-HDMR is used in

this thesis to construct the partial metamodel-based optimization, the user may use other

variations as well.

29

2.5. Artificial neural network architecture

ANNs have been widely used in different fields for real-world problem approximation and

prediction [127]–[129], and the feedforward ANN is one of the most popular types.

Building a proper ANN, however, is still a nontrivial task due to difficulties of determining

the architecture of networks, which affects the prediction accuracy [130]. The

architecture of an ANN includes the number of hidden layers, number of hidden nodes,

and connections between nodes. An improper architecture of ANN may lead to

overfitting, which will reduce the accuracy of the metamodel. In general, the number of

layers and number of hidden nodes are determined based on experience. ANN with two

hidden layers usually provides more benefits for different types of nonlinear problems

compared with the network with one hidden layer [131]. On the other hand, different

guidelines for the number of hidden nodes are developed, including “2n+1”[132], “2n”

[133], “n/2” [134], and so on, where n is the number of input nodes, but none of them

outperforms the others when considering all kinds of problems. A full-connected layer

structure is usually used in ANN.

The main issue of determining the ANN architecture by experience is that the guidelines

may not perform well in every situation. Different research is conducted to develop a

more intelligent architecture determination method for different approximation tasks. The

Akaaike’s Information Criterion (AIC) was used to determine the number of hidden

nodes in ANN [134], where statistic properties of the training set were considered to

generate the network structure. Another kind of architecture determination method is

based on the accuracy of the network. Different structures were tested and the most

accurate one was selected as the desired structure. Srivastava et al. [135] used the

dropout method to find the appropriate structure of ANN to avoid overfitting. Nodes in the

ANN were randomly dropped out during the training process to find the most accurate

structure of the network. Optimization was also employed to search for the structure with

the highest accuracy. A layer-wise structure learning method based on multi-objective

optimization was developed to construct a deep neural network [136]. By employing the

structure learning method, the network was no longer fully connected, and some of the

connections were deleted based on approximation accuracy. Moreover, some of the

researchers focus on breaking the layer-wise structure of the neural network, which

means there exists links connecting nodes not in the adjacent layers. Genetic evolution

30

methods were employed to find out the optimum topology of the network in [137]. In

those aforementioned methods, the architecture of the network is purely determined

from data. To find a more accurate structure of ANN, large amount of computation is

usually required.

Another kind of structure determination method is based on knowledge. A knowledge-

based neural network was developed for microwave design problems [77]. The existing

knowledge, such as empirical formulations, was involved to construct the knowledge

layer in the network. In [68], the intermediate variables in Bayesian networks were used

as the hidden nodes to construct an ANN. However, the Bayesian network can only

represent the input-output relations between variables and mathematical relations

cannot be captured from the Bayesian network. Therefore, in this thesis, the Bayesian

network is employed to guide the modeling of the structure of the causal-ANN rather

than using the Bayesian network directly. Also, mathematical relations will be involved in

the Bayesian network and causal-ANN to construct a more accurate metamodel by

considering the values of intermediate variables in Bayesian networks.

2.6. Bayesian network and causal graph

Bayesian networks (BNs), also known as belief networks, belong to the family of

probabilistic graphical models (GMs) [138]. These graphical structures can be used to

represent knowledge about an uncertain domain. BNs can also be regarded as a

directed acyclic graph (DAG), which means that there is no circle or loop in the graph

[139]. A more formal definition of a BN is given as follows: a Bayesian network is an

annotated acyclic graph that represents a joint probability distribution over a set of

random variables [140]. Hence, in a BN, there are two main members, the variables and

the conditional probability distribution of each variable. BN provides an efficient way to

compute the posterior probabilities given the evidence by reducing the number of

parameters that are required to characterize the joint probability distribution of variables

[21-22].

Causal graph is one of the variances of BN, which represents the cause-effect relations

embedded in human thinking. Compared with the original BN, the edges in causal

graphs contain directions, which express the judgement that certain events or actions

will lead to particular outcomes. Causal graphs have been used in the decision making

31

field to represent relationships between different factors. Besides, causal graph is also a

useful tool in representing structures of engineering systems. Based on causal graphs,

the Dimensional Analysis Concept Modeling (DACM) framework was developed to

gather and organize the information associated with an engineering problem during the

concept design phase [8,9].

2.7. Summary

To overcome the challenge of search blindly in design optimization, the application of

knowledge to assist optimization is discussed in this chapter. The concepts of

knowledge in AI and product design are reviewed. In those fields, knowledge is

captured, represented, and reused to solve decision-making or design problems. Next,

some existing applications of knowledge assisting optimization are described and

categorized. Although the concept of knowledge may not explicitly appear in these

methods, the idea of involving knowledge in improving efficiency of optimization is

employed in these works. Finally, multiple future potential applications of knowledge in

optimization are discussed. Some related algorithms and theories are also introduced in

this chapter.

In this thesis, two kinds of knowledge, sensitivity information and causal relations are

employed to deal with the large-scale optimization problem. Next chapter will discuss

about how the sensitivity information is applied in constructing a partial metamodel in

optimization to reduce the dimensionality.

32

Chapter 3. Partial metamodel-based optimization (PMO) method

As introduced in Chapter 2, although RBF-HDMR is an efficient metamodeling method,

the cost of building a complete RBF-HDMR can still be very high for high-dimensional

problems. Also, RBF-HDMR requires structured samples. This is essentially in conflict

with the fact that the optimization process may lead the search anywhere in a design

space. One approach is to build a new RBF-HDMR in a smaller area, such as a trust

region. The cost of doing so is also too high as almost none of the existing points can be

inherited for the new model.

This work is based on the fundamental belief that optimization can be performed on an

imperfect or incomplete metamodel. Instead of building a costly complete metamodel, I

propose to use partial metamodels in the optimization process, in order to gain efficiency

without sacrificing, or even gain, search quality. To reduce the exponentially increasing

cost of building an accurate metamodel for high dimensional problems, partial RBF-

HDMR models of selected design variables are constructed at every iteration in the

proposed strategy based on sensitivity analysis. After every iteration, the cut center of

RBF-HDMR is moved to the most recent optimum point in order to pursue the optimum

To improve the performance of the PMO method, a trust region based PMO (TR-PMO)

is developed.

3.1. Algorithm description

To reduce number of expensive function calls, a partial RBF-HDMR is built at every

iteration according to the importance of variables in the proposed PMO method. The cut

center of RBF-HDMR model is moved after every iteration to the newest optimum point.

The flow chart of the PMO method is shown in Figure 3-1. For better understanding of

the procedure, an n-dimensional optimization problem is considered and the details of

the proposed method are explained as follows:

33

Step 1. Construct a first-order RBF-HDMR and use this metamodel for optimization. A

random cut center in the design space (i.e., 𝒙0) is selected. Then, the first-order RBF-

HDMR model (i.e., Eq. (3-1)) is built based on this cut center.

𝑓(𝒙) = 𝑓0(𝒙0) +∑ 𝑓𝑖(𝑥𝑖)𝑛

𝑖=1 (3-1)

Then, the first-order HDMR model is optimized to obtain the optimal point 𝒙𝑜𝑝𝑡. The cut

center 𝒙0𝑛𝑒𝑤 is moved to this newly found optimum point.

Construct first-order RBF-HDMR

Select one coordinate

Construct partial RBF-HDMR

Optimize

Stopping criteria met?

Output

No

Yes

Roulette

Optimize first-order RBF-HDMR

Sensitivity analysis

Figure 3-1: Flow chart of PMO

Step 2. Select one dimension. First, sensitivity analysis is done on the constructed first

order RBF-HDMR model and the normalized sensitivity indices of the variables are used

for quantifying the importance of each variable. Next, the sensitivity indices are sorted in

descending order to obtain the sensitivity set 𝑺 = [𝑠1, 𝑠2, … , 𝑠𝑛], where 𝑠1 is the sensitivity

index of the most important variable (highest index) and 𝑠𝑛 is the sensitivity index of the

least important variable (lowest index). Next, use the sensitivity set 𝑺 to construct the

34

probability density set 𝑮 = [𝑔1, 𝑔2, … , 𝑔𝑛] , where 𝑔𝑖 = 𝑠1 + 𝑠2 +⋯+ 𝑠𝑖, 𝑖 = 1,2,… , 𝑛 .

Hence, 𝑔1 = 𝑠1 and 𝑔𝑛 = 1. When determining which dimension is selected in the PMO

approach, the larger the value of sensitivity index, the higher the chance of being

selected for optimization. However, in most cases, the probability densities of the

dimensions are close to each other. To ensure the most sensitive dimension is picked

up, a speed control factor used in Mode Pursuing Sampling method [6] is also used in

PMO to adjust the sampling aggressiveness. With the adjustment, the probability density

set 𝑮 is changed to �̂� = [𝑔11 𝑟⁄ , 𝑔2

1 𝑟⁄ , … , 𝑔𝑛1/𝑟] , where 𝑟 is the speed control factor. To

avoid being trapped into the same solution and balance between exploration and

exploitation, a roulette wheel selection operator is used next to randomly select one

variable, 𝑥𝑘1, according to the set �̂�, where the subscript 𝑘1 is the index of the variable,

𝑘1 ∈ [1, 𝑛] and 𝑘1 is an integer. The index is stored in the selected index set 𝑲 = [𝑘1].

Step 3. Construct a partial RBF-HDMR of 𝑥𝑘1 and use the partial metamodel for

optimization. Once the variable is selected, a partial RBF-HDMR model with only one

variable is constructed based on the new cut center.

𝑓(𝑥𝑘1) = 𝑓0 + 𝑓𝑘1(𝑥𝑘1) (3-2)

Thus, the partial HDMR model is a one-dimensional function of only 𝑥𝑘1 with the rest of

variables taking the corresponding values of 𝒙0. Then, optimize the partial HDMR model

to obtain the optimum 𝑥𝑘1∗ . The cut center is moved to 𝒙0

𝑛𝑒𝑤 = (𝑥1, … , 𝑥𝑘1∗ , … , 𝑥𝑛)

𝑇, and the

function value 𝑓0 at the new cut center 𝒙0𝑛𝑒𝑤 , which is the current optimum value, is

calculated.

Step 4. Select the 𝑑-th variable. Assume before this step, (𝑑 − 1) variables have been

picked from all the variables (𝑑 ≥ 2), and the selected index set is 𝑲 = [𝑘1, 𝑘2, … , 𝑘𝑑−1].

The new cut center 𝒙0𝑛𝑒𝑤 equals to (𝑥1, … , 𝑥𝑘1

∗ , … , 𝑥𝑘2∗ , … , 𝑥𝑘𝑑−1

∗ , … , 𝑥𝑛)𝑇, and the function

value at 𝒙0𝑛𝑒𝑤 is selected as the new 𝑓0. After removing the selected variables, the left-

out sensitivity set is expressed as 𝑺 = {𝑠𝑖}, 𝑖 ∉ 𝑲, and the transferred probability density

set can be represented as �̂� = {𝑔𝑖1/𝑟}, 𝑖 ∉ 𝑲. The 𝑑-th variable is then selected through

the roulette wheel selection operator from the rest of un-picked variables. The index of

that variable 𝑘𝑑 is then added to the index set 𝑲.

35

Step 5. Construct a new partial RBF-HDMR model and use the new partial metamodel

for optimization. Once the 𝑑-th variable is selected, the partial RBF-HDMR model can be

constructed as follows.

𝑓(𝒙) = 𝑓0 +∑𝑓𝑘𝑖(𝑥𝑘𝑖)

𝑑−1

𝑖=1

+ 𝑓𝑘𝑑(𝑥𝑘𝑑) + ∑ 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗)

1≤𝑖≤𝑗≤𝑑−1

+∑𝑓𝑘𝑖𝑘𝑑(𝑥𝑘𝑖 , 𝑥𝑘𝑑)

𝑑−1

𝑖=1

(3-3)

As shown in Eq. (3-3), the samples used to construct components 𝑓𝑘𝑖(𝑥𝑘𝑖) (𝑖 = 1,2, . . , 𝑑 −

1 ) and 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗) (1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑑 − 1 ) are all located in the partial design space,

𝒙 = (𝑥𝑘1 , 𝑥𝑘2 , … , 𝑥𝑘𝑑−1)𝑇 , 𝒙 ∈ [𝒙𝑙𝑏 , 𝒙𝑢𝑏] , where 𝒙𝑙𝑏 and 𝒙𝑢𝑏 are lower bound and upper

bound of the design space. To reduce the number of function evaluations, function

values of most samples used to construct those components can be predicted by the

RBF-HDMR model built in the last iteration, which is represented as

𝑓(𝒙) = 𝑓0 +∑𝑓𝑘𝑖(𝑥𝑘𝑖)

𝑑−1

𝑖=1

+ ∑ 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗)

1≤𝑖≤𝑗≤𝑑−1

(3-4)

Eq. (3-4) is a function of 𝒙 = (𝑥𝑘1 , 𝑥𝑘2 , … , 𝑥𝑘𝑑−1)𝑇. Therefore, the function values of the

component function 𝑓𝑘𝑖(𝑥𝑘𝑖) (𝑖 = 1,2, . . , 𝑑 − 1) and 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗) (1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑑 − 1) in Eq.

(3-3) can be calculated via Eq. (3-4). Thus, to construct the new partial HDMR model,

only the points used to construct component functions 𝑓𝑘𝑑(𝑥𝑘𝑑) and 𝑓𝑘𝑖𝑘𝑑(𝑥𝑘𝑖 , 𝑥𝑘𝑑)

(𝑖 = 1,2, . . , 𝑑 − 1) need to be calculated by calling the actual function. Next, optimize the

d-dimensional partial HDMR and obtain the optimum solution 𝒙∗ = (𝑥𝑘1∗ , 𝑥𝑘2

∗ , … , . 𝑥𝑘𝑑∗ )𝑇 .

Combining with other fixed variables, the new cut center moves to

𝒙0𝑛𝑒𝑤 = (𝑥1, … , 𝑥𝑘1

∗ , … , 𝑥𝑘𝑑∗ … , 𝑥𝑛)

𝑇. The function value 𝑓0 at the new cut center is set to be

the current optimum value.

Step 6. Repeat Steps 4 and 5 until reaching the terminate criterion. In the PMO method,

the maximum number of iterations is chosen as the termination criterion. If the maximum

number of iterations is reached, the process is stopped and output the current cut center

𝒙0𝑛𝑒𝑤 as the optimum solution and the function value 𝑓0 at that cut center as the optimum

value; otherwise, go to Step 4 and repeat the procedure. The maximum number of

iterations gives the number of variables selected in to perform PMO method. Selecting

more design variables can improve the optimization results. However, more design

variables selected means more second-order component functions need to be

36

constructed, which means that much more samples need to be generated. Thus, the

negative influence in efficiency caused by selecting more variables is larger than the

positive influence in effectiveness. In practice, four or five selected design variables can

make a balance between effectiveness and efficiency.

3.2. Example of PMO

A 3-dimensional problem [143] shown in Eq. (3-5) is selected as an example to explain

the process of the PMO method step-by-step.

𝑓(𝒙) = −∑ 𝛼𝑖 exp [−∑ 𝐴𝑖𝑗(𝑥𝑗 − 𝑃𝑖𝑗)23

𝑗=1]

4

𝑖=1

𝑥1,2,3 = [0, 1]

(3-5)

Where, 𝛼 = [1, 1.2, 3,3.2]𝑇 , 𝑨 = [

3.0 10 300.1 10 353.0 10 300.1 10 35

] , 𝑷 = 10−4 [

3689 1170 26734699 4387 74701091 8732 5547381 5743 8808

] .The

theoretical optimum point is 𝒙∗ = [0.114,0.556,0.852]𝑇; the optimum value is -3.86.

The center of the design space, 𝒙0 = [0.5,0.5,0.5]𝑇 is selected as the initial cut center,

and the function value at this cut center is evaluated as 𝑓0 = −0.628. Then, a first-order

RBF-HDMR model is constructed based on this cut center as shown in Eq. (3-6).

𝑓(𝒙) = 𝑓0 + 𝑓1(𝑥1) + 𝑓2(𝑥2) + 𝑓3(𝑥3) (3-6)

Genetic Algorithm (GA) from Matlab is employed to optimize the first-order RBF-HDMR,

and the optimum point 𝒙∗ = [0.132,0.816,0.774]𝑇 is found with the function value equal to

-2.143. The new cut center 𝒙0𝑛𝑒𝑤 moves to the optimum point. Then, sensitivity analysis

is performed based on the first-order RBF-HDMR model. The normalized sensitivity

index of the variables are 0.590, 0.289 and 0.121, respectively, for 𝑥1, 𝑥2, and 𝑥3. The

sensitivity indices are sorted in descending order to obtain the sensitivity indices set

𝑺 = [0.590, 0.289, 0.121] . The probability density set is then obtained as 𝑮 =

[0.590,0.879, 1.000] . After adjusting the speed control factor 𝑟 = 2 , the transferred

probability density set �̂� is [0.768,0.938,1.000]. The roulette wheel selection is

performed to determine the variable constructed in the first iteration. Thus, a partial RBF-

HDMR, which only contains zeroth-order component function (i.e., 𝑓0) and first-order

component function of 𝑥1 (i.e., 𝑓1(𝑥1)), will be constructed as follows:

37

𝑓(𝒙) = 𝑓0 + 𝑓1(𝑥1) (3-7)

One-dimensional optimization is performed on the partial RBF-HDMR model, and the

optimum point is found 𝑥1∗ = 0.131 . The cut center moves to the new point 𝒙0

𝑛𝑒𝑤 =

[0.131,0.816,0.774]𝑇 along 𝑥1. The function value at the new cut center is -2.143.

Next, excluding the first variable, the new sensitivity indices value set is 𝑆2 =

[0.705,0.295] and the transferred probability density set �̂� is [0.840,1]. After a roulette

wheel selection process, variable 𝑥2 is selected as the next constructed variable.

Therefore, a partial RBF-HDMR (as shown in Eq. (3-8)) with zeroth-order component

function, first-order component function, and second-order component of (𝑥1, 𝑥2) is

constructed based on the new cut center.

𝑓(𝒙) = 𝑓0 + 𝑓1(𝑥1) + 𝑓2(𝑥2) + 𝑓1,2(𝑥1, 𝑥2) (3-8)

Figure 3-2 shows the samples used to construct the partial model that is shown in Eq.

(3-8). The cut-center of the new partial model is moved to the current optimal point (star

in Figure 3-2). New samples need to be generated to construct the component functions

in Eq. (3-8). The samples at the 𝑥1 axis (triangle in Figure 3-2) will be evaluated through

previous partial model, i.e., Eq. (3-7). The samples at 𝑥2 axis and at the 𝑥1−𝑥2 plane

(circle in Figure 3-2) will be calculated through the real function. Therefore, the response

values of 5+8=13 samples will be evaluated from the real function when constructing the

partial model in Eq. (3-8).

38

New cut-center

Samples estimated from previous model

Samples estimated from real function

Figure 3-2: Samples used to construct Eq. (3-8).

At this time, this two-dimensional partial HDMR model is optimized and the optimum

point 𝒙12∗ = [0.134,0.575]𝑇 is obtained. Replace the first and second value of the cut

center, the optimum point in this iteration is generated, 𝒙0𝑛𝑒𝑤 = [0.134,0.575,0.774]𝑇, and

the function value at this optimum point is -3.3654, which is much closer to the

theoretical optimum point. Finally, only 𝑥3 is left, and the RBF-HDMR with one zeroth-

order, three first-order and three second-order components are built. The optimum point

𝒙12∗ = [0.132,0.568,0.863]𝑇 with optimum value -3.847 can be found. In this example,

3x5=15 points are used to construct the initial first-order RBF-HDMR model; five new

points are used to construct the partial HDMR model shown in Eq. (3-7), and 13 points

are used to construct the model of Eq. (3-8). Adding the initial cut center and two

optimum points obtained in two iterations, in total 36 new points are involved to finish the

second iteration. On the other hand, 1+3×5+3×8+1=41 sample points are needed to

construct and optimize a complete second-order RBF-HDMR model. Using GA to

optimize the complete RBF-HDMR with the same initial cut center, the optimum value is

-3.013. Hence, the PMO method can find a smaller optimum with higher efficiency than

optimizing on a complete RBF-HDMR model.

3.3. Properties of PMO

There are two key strategies in PMO. The first one is that a partial HDMR model is used

in optimization. Since not all of the variables are involved in the partial HDMR model, the

number of sample points used to construct the HDMR model is much less than building

the complete model. For instance, for a 10-dimensional problem, assuming that five

39

points are used to construct each first-order component function and eight points are

needed to construct each second-order component function, constructing a full second-

order HDMR needs 1+10×5+45×8=411 sample points, where 45 is the number of

second-order component functions. On the other hand, assuming a second-order partial

HDMR is built with five iterations (i.e., five variables are involved in the partial HDMR

with 10 possible second-order component functions), one only needs to generate

5×5+10×8=105 expensive sample points during the iterations, adding the initial

1+10×5=51 samples for the first order model at the start of PMO, the total number of

sample points is only 156, about one third of the cost of the complete model approach.

Another important strategy used in PMO is the moving cut center. In PMO process, the

cut center is moved at every iteration to the current optimum point, and a new partial

RBF-HDMR is constructed based on the new cut center. That means PMO does not

focus on the global accuracy of the HDMR model but pay more attention on the

accuracy around the interesting area (i.e., the area around the current optimum point).

With the moving cut center, an HDMR model will be built in a more interesting area at

every iteration. Moreover, when a new variable is selected to be added to the partial

HDMR, one can use the former partial HDMR model to predict the values at the new

samples, rather than invoking the actual expensive function. Although there is a risk in

using the former partial HDMR due to the moving cut-center and inaccuracy of the

models themselves, such a risk is mitigated by the PMO process, as evidenced from the

test results in the next section. It is easy to see that no matter how many iterations PMO

takes, the total number of function calls in PMO equals to the number of sample points

used to construct the final partial HDMR, plus those used for constructing the first-order

HMDR model at the beginning.

In addition, sensitivity analysis is employed in the PMO process to help selecting the

most important variables to optimize, rather than randomly selecting variables. The

roulette wheel selection process helps to balance the exploration and exploitation

phases to avoid being trapped in a local optimum.

3.4. Testing of PMO

A number of numerical benchmark functions are selected to test the performance of the

PMO algorithm. In this test, the proposed PMO algorithm is directly compared to the

40

approach of optimizing a complete RBF-HDMR. An RBF-HDMR model is deemed

“complete” if the modeling process is terminated according to the modeling process as

described in Chapter 2. In other words, “complete” means the construction process of

RBF-HDMR is completed. In RBF-HDMR construction, before constructing the second-

order component functions, the accuracy of the first-order RBF-HDMR is checked. If the

first-order RBF-HDMR is accurate enough, no second-order component functions are

built and the constructing process will be terminated. Additionally, before constructing

each second-order component function, whether the two variables, 𝑥𝑖 and 𝑥𝑗 , are

correlated or not is checked. If 𝑥𝑖 and 𝑥𝑗 are not correlated, the component function

𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗) will not be built in the RBF-HDMR model. Hence, in the complete RBF-HDMR

model, not all of the component functions are constructed in some cases. A full RBF-

HDMR, however, indicates that all first-order and second-order component functions

have been constructed. A complete RBF-HDMR may have skipped modeling of some of

the second-order component functions, and thus costs less than a full RBF-HDMR.

SUR-T1-14 [144], Rosenbrock [145], Trid [145], F16 [143], Griewank [145], Ackley [145],

Rastrigin [145], SUR-T1-16 [143], Powell [143] and Perm [145] problems are chosen as

the benchmark problems, which are listed in Appendix. In the test, the maximum number

of points used to construct a first-order and second-order component function in both

PMO algorithm and RBF-HDMR is set to six and eight, respectively. The maximum

number of iterations of PMO is set to be five for the test problems. Additionally, GA from

MATLAB global optimization toolbox is employed as the optimizer in PMO and RBF-

HDMR optimization, and the settings of GA are all as default. Each problem is run 30

times independently. The initial cut centers of both methods are randomly chosen in the

30 runs. The average of the found optimum values (𝑓∗) and the number of function

evaluations (NFE) are recorded to illustrate the effectiveness and efficiency of PMO. The

results are summarized in Table 3-1. Note that the NEF values in the table are the

average results of the 30 runs, so the decimal values appear. The box-plots of 𝑓∗ are

shown in Figure 3-3.

Table 3-1: Optimization results with numerical benchmark problems.

dim

Actual optimum

PMO Optimizing complete RBF-

HDMR

𝑓∗ NFE 𝑓∗ NFE

41

SUR-T1-14 10 0 21.38 156.6 74.33 161.3

Rosenbrock 10 0 107.08 153.5 187.61 194.5

Trid 10 -210 151.7 150.4 618.11 161.7

F16 16 25.88 25.93 151.0 26.76 397.2

Griewank 20 0 3.19 167.0 6.10 194.2

Ackley 20 0 10.31 236.6 21.07 1547.1

Rastrigin 20 0 196.85 158.0 234.27 111.2

SUR-T1-16 20 0 837.61 231.0 2060.3 430.1

Powell 20 0 596.96 215.4 7222.8 434.6

Perm 20 0 5.69e51 238.0 2.45e52 1625.0

As shown in Table 3-1, for all ten problems, the proposed method obtained a smaller

optimum value than directly optimizing RBF-HDMR model. Figure 3-3 gives the box-

plots of the optimum values of each problem. It can be found that for almost all the

problems, the ranges of 𝑓∗ in the 30 runs of PMO are smaller than the ranges of

optimizing a complete RBF-HDMR, except for Rosenbrock and Ackley. This means

PMO is more robust in optimization. From the perspective of the cost (NFEs), PMO

clearly costs less than RBF-HDMR except for Rastrigin. Such an advantage is more

distinct for higher scale problems. For twenty-dimensional functions such as Ackley and

Perm, due to the structure of the function, the cost to construct a second-order

metamodel becomes high because it has more second-order pairs. For PMO, because

the maximum number of iterations is fixed as five, the maximum number of second-order

component functions need to be constructed is 10, which is much smaller than building a

full 20-dimensioanl HDMR function (i.e., 190). For Rastrigin, since the problem is

decomposable, when using the process described in Chapter 2 to construct RBF-HDMR,

because the metamodeling process is terminated after construction of all first-order

component functions, the number of sample points used to construct the complete RBF-

HDMR is thus very small. However, due to the lower accuracy of partial RBF-HDMR,

some second-order components are constructed in PMO. Hence in the case that the

problem is decomposable, the advantage of PMO is not revealed.

0 50 100 150 200 250

PMO

RBF-HDMR

50 100 150 200 250 300 350 400 450

PMO

RBF-HDMR

42

(a) SUR-T1-14 (b) Rosenbrock

(c) Trid (d) F16

(e) Griewank (f) Ackley

(g) Rastringin (h) SUR-T1-16

(i) Powell (j) Perm

Figure 3-3: Box-plots of optimized values.

Next, three benchmark functions, including SUR-T1-14, Griewank and Ackley, with three

different dimensions (10, 20 and 30) are randomly chosen in order to see the

performance change of PMO as the problem dimensionality increases. 30 optimization

runs are performed for each problem in each dimension. The results are shown in Table

3-2.

Table 3-2. Optimized results with benchmark functions in different dimensions.

dim

Actual optimum

PMO Optimizing complete RBF-

HDMR

𝑓∗ NFE 𝑓∗ NFE

SUR-T1-14 10 0 21.38 156.6 74.33 161.3

0 200 400 600 800 1000

PMO

RBF-HDMR

26 26.5 27 27.5

PMO

RBF-HDMR

0 5 10 15 20 25 30 35

PMO

RBF-HDMR

6 8 10 12 14 16 18 20 22

PMO

RBF-HDMR

150 200 250 300 350

PMO

RBF-HDMR

0 1000 2000 3000 4000 5000 6000

PMO

RBF-HDMR

0 2000 4000 6000 8000 10000

PMO

RBF-HDMR

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

x 1052

PMO

RBF-HDMR

43

20 0 214.00 171.5 1117.5 434.1

30 0 863.29 221.7 3971.4 836.3

Griewank

10 0 1.19 93.5 1.28 138.2

20 0 3.19 167.0 6.10 194.2

30 0 28.10 204.2 37.03 223.9

Ackley

10 0 5.55 165.9 20.55 407.4

20 0 10.31 236.6 21.07 1547.1

30 0 11.24 264.9 21.37 3387.0

As shown in Table 3-2, with the increase of problem dimensions, the advantages of

PMO method over RBF-HDMR become larger. For SUR-T1-14 problem, in 10

dimensions, the average 𝑓∗of PMO is 21.38, which is 29% of the RBF-HDMR’s result.

When the dimension is increased to 30, the optimum result of PMO reduces to 21% of

the RBF-HDMR’s result. The advantage of PMO over higher dimensional problems

becomes clearer in terms of NFEs. As mentioned before, most samples are used to

construct the second-order component functions in high-dimensional problems. For

Ackley function, because every variable pair has strong correlations, with the increase of

dimension, the number of second-order functions becomes very large. Thus, the number

of sample points increases significantly to build a complete RBF-HDMR. For SUR-T1-14

and Griewank, the variables have mixed weak and strong correlations. Hence, the PMO

savings in terms of NFE is milder for SUR-T1-14 and Griewank than for Ackley when the

dimensionality arises.

The 10-dimensional SUR-T1-14 problem is chosen to show which dimensions are

selected through the roulette wheel selection method at each iteration. The SUR-T1-14

problem is optimized five times with PMO and the data are listed in Table 3-3. The index

is the sensitivity value of each dimension and number of iterations represents at which

iteration the variable is selected.

Table 3-3: Dimensions selected in PMO on SUR-T1-14 for five independent runs.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9 𝑥10 Optimum

1 Index 0.116 0.114 0.108 0.107 0.102 0.099 0.091 0.090 0.088 0.083

16.87 Rank 1 2 3 4 5 6 7 8 9 10

No. of iterations

1 3 5 4 2

44

2 Index 0.117 0.112 0.107 0.107 0.102 0.096 0.096 0.093 0.086 0.083

15.44 Rank 1 2 3 4 5 6 7 8 9 10 No. of

iterations 2 1 3 4 5

3 Index 0.112 0.114 0.112 0.107 0.100 0.098 0.095 0.091 0.088 0.081

14.92 Rank 2 1 3 4 5 6 7 8 9 10 No. of


4 Index 0.112 0.117 0.107 0.107 0.102 0.101 0.095 0.087 0.086 0.082

19.43 Rank 2 1 3 4 5 6 7 8 9 10 No. of


5 Index 0.116 0.114 0.108 0.105 0.102 0.103 0.095 0.091 0.088 0.081

22.89 Rank 1 2 3 4 6 5 7 8 9 10

No. of iterations

3 2 5 1 4

Figure 3-4: Convergence plot of PMO in SUR-T1-14 problem.

As shown in Table 3-3, dimensions 𝑥1 to 𝑥5, which have larger sensitivity values, are

more likely to be picked up in the five runs. The selection, however, does show its

stochastic nature. Different selection schemes lead to different optimum solutions with

slight variations for the test problem.

Figure 3-4 illustrates the current optimal value obtained from PMO in seven iterations in

SUT-T1-14 problem. As shown in Figure 3-4, from the third to the fifth iteration, the

optimization results do not improve. With the number of variables involving in the RBF-

45

HDMR model increasing, the accuracy of the HDMR model decreases, which may

influence the optimization result.

3.5. Trust Region based PMO

The performance of PMO can be further improved by applying different strategies when

optimizing each partial model. Trust region is often used as a strategy to guide the

optimization method to find optimum, and balancing the exploration and exploitation

phases. In this section, a simple trust region strategy is added when optimizing the

partial model at each iteration to generate a higher performance version of PMO.

The trust region strategy follows the description of reference [146]. The approximation

accuracy ratio 𝑟𝑎𝑡 at the 𝑡-th iteration can be calculated via the following equation,

𝑟𝑎𝑡 =𝑓(𝒙0,𝑡) − 𝑓(𝒙𝑡

∗)

𝑓(𝒙0,𝑡) − 𝑓(𝒙𝑡∗)

(3-9)

Where, 𝒙0,𝑡 is the center of the design space, 𝒙∗ is the optimal point, 𝑓(𝒙0,𝑡) and 𝑓(𝒙𝑡∗)

are the responses of the approximate model at 𝒙0,𝑡 and 𝒙𝑡∗, respectively. 𝑟𝑎𝑡 gives the

accuracy of the current metamodel, and the value of 𝑟𝑎𝑡 determines the shrinkage or

enlargement of the design space. The new size (𝐿𝑡+1) of the trust region is defined as

follows.

𝛿𝑡+1 = {

max (𝑐0𝐿𝑡, 𝐿𝑚𝑖𝑛) 𝑟𝑎𝑡 < 0 max (𝑐1𝐿𝑡, 𝐿𝑚𝑖𝑛) 0 ≤ 𝑟𝑎𝑡 < 𝜏1

𝐿𝑡 𝜏1 ≤ 𝑟𝑎𝑡 < 𝜏2

min(𝑐2𝐿𝑡, 𝐿𝑚𝑎𝑥) 𝑟𝑎𝑡 ≥ 𝜏2

(3-10)

Where, 𝜏1 and 𝜏2 are two positive constants to judge the accuracy of the metamodel and

𝜏1 < 𝜏2 < 1, 𝑐0, 𝑐1, and 𝑐2 are positive constant ratios to shrink or enlarge the trust region

where 𝑐0 ≤ 𝑐1 < 1 < 𝑐2, 𝐿𝑡 is the size of the current trust region, and 𝐿𝑚𝑖𝑛 and 𝐿𝑚𝑎𝑥 are

the minimal size and maximal size of trust region. In this thesis, the values of the

parameters are set as 𝜏1 = 0.25 , 𝜏2 = 0.75 , 𝑐0 = 0.25 , 𝑐1 = 0.5 , and 𝑐2 = 2 . Define

𝐿𝑚𝑖𝑛 = 0.01𝐿𝑚𝑎𝑥 and 𝐿𝑚𝑎𝑥 is set to be the size of the original design space.

For the center of the trust region, if 𝑟𝑎𝑡 < 0, it means that the objective function value of

current optimum (𝒙𝑡∗) is worse than the value of the center (𝒙0,𝑡). Thus, the center will not

46

move, i.e., 𝒙0,𝑡+1 = 𝒙0,𝑡 . Otherwise, the center moves to the current optimum, i.e.,

𝒙0,𝑡+1 = 𝒙𝑡∗.

The flowchart of trust region based PMO (TR-PMO) algorithm is shown in Figure 3-5,

which is similar to the flowchart of PMO as shown in Figure 3-1 with the insertion of the

Trust Region box in the flow. In this algorithm, the trust region strategy is used to find a

better solution in each partial metamodel. Assume that at the d-th iteration, coordinates

𝑘1 , 𝑘2 , … , and 𝑘𝑑 are selected to construct the partial RBF-HDMR model and the

current cut center and optimal solution are 𝒙0 and 𝒙∗, respectively. The steps as related

to the trust region are introduced as follows.

Step A. After optimizing on a partial RBF-HDMR, check if reaching the maximal iteration

number. If the maximal iteration number is reached, the trust region loop terminates and

the process goes to Step 6 as described in Section 3.1 to select a new coordinate,

otherwise, continue to Step B.

Step B. Calculate the appropriate accuracy ratio 𝑟𝑎𝑡 via Eq. (3-9). The partial HDMR

model of d variables is used to calculate the value 𝑓(𝒙0) and 𝑓(𝒙∗).

Step C. Determine the new trust region. If 𝑟𝑎𝑡 < 0, the cut center remains; otherwise, the

cut center will move to the current optimal point. Then, Eq. (3-10) is employed to shrink

or enlarge the trust region. Note that only the upper and lower bounds of the selected

variables are modified and the regions of other variables remain unchanged.

Step D. Generate a certain number of random sample points in the trust region (e.g.,

five). These new samples are used to update the partial model. If the cut center does not

move, the sample points used to construct the previous partial model can be inherited. If

the cut center moves, only these new samples in the updated trust region are used to

build a metamodel for optimization. Then go back to Step A.

47

Construct first-order RBF-HDMR

Select one coordinate

Construct partial RBF-HDMR

Optimize

Stopping criteria met?

Output

No

Yes

Roulette

Optimize first-order RBF-HDMR

Sensitivity analysis

Reach maximal iteration

Trust

region

No

Yes

Figure 3-5: Flowchart of TR-PMO.

To benchmark the performance of TR-PMO, two effective optimization strategies

developed for HEB problems are chosen for comparison, i.e., Trust Region based Mode

Pursuing Sampling method (TRMPS) [7] and Optimization on Metamodeling-supported

Iterative Decomposition (OMID) [8]. PMO also participates in the comparison.

The same ten numerical problems used in Section 3.4 are used to perform the

comparison and each problem is repeated 10 times. The numbers of points used to

construct the first- and second-order component functions are five and eight

respectively. For PMO, the number of sample points used to construct first- and second-

order components is increased to make a fair comparison with other methods with

similar NFE. Note that PMO cannot terminate at a certain NFE, so the NFE is controlled

to be as close as possible to the NFE used in TR-PMO and the average NFE of PMO is

also listed in Table 6. Additionally, the number of selected variables is set to be four for

48

both PMO and TR-PMO. The setting of trust region in TR-PMO is introduced earlier in

this section. The parameters in TRMPS and OMID are set the same as in Refs. [7] and

[8], as shown in Table 3-4 and Table 3-5. The maximal number of function evaluations of

TRMPS and TR-PMO is set to be the average number of function calls used by TR-PMO

in each benchmark. The results are shown in Table 3-6.

Table 3-4: TRMPS parameter settings.

𝑅𝑚𝑖𝑛 𝑅𝑚𝑎𝑥 Stall iterations 𝑘𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑅𝑠,𝑖𝑛𝑖𝑡𝑖𝑎𝑙 𝑅𝐵,𝑖𝑛𝑖𝑡𝑖𝑎𝑙

0.01 1 5 0.7 0.25 1

Table 3-5: OMID parameter settings.

𝑁𝐼𝑛𝑖𝑡 𝑛𝑐𝑜𝑚𝑝 𝑛𝑏𝑎𝑠𝑖𝑠 𝑁𝑎𝑠

10 × 𝑑𝑖𝑚 2 2 5 × 𝑑𝑖𝑚

In Table 3-6, the NFE data in the fourth column is the number of function evaluations

used in TR-PMO, TRMPS, and OMID, while the data in the eighth column is the number

of function evaluations used in PMO. The values of NFEs of PMO cannot be used to

compare the efficiency of PMO with that of TR-PMO but they show that TR-PMO and

PMO are compared in the similar NFE level. As shown in Table 3-6, with similar NFE,

TR-PMO outperforms PMO in optimizing all the ten benchmark functions. In PMO, the

partial HDMR model is a static metamodel at each iteration, while the trust region

strategy can actively add points to find better results for each partial model optimization.

The cost of NFEs of TR-PMO is in general higher than that of PMO however.

Table 3-6: Optimization results of using TR-PMO, TRMPS OMID and PMO.

dim

Actual optimum

NFE f* NFE f*

TR-PMO TRMPS OMID PMO

SUR-T1-14 10 0 276 19.64 20.11 91.23 279 20.01

Rosenbrock 10 0 157 63.55 273.23 2587.4 150 108.98

Trid 10 -210 165 100.67 331.7 3227.0 154 427.38

F16 16 25.88 276 26.98 25.92 30.97 233 27.08

Griewank 20 0 160 2.11 5.67 39.64 169 15.72

Ackley 20 0 287 9.68 15.83 17.38 280 12.29

49

Rastrigin 20 0 189 159.9 124.39 214.02 215 186.31

SUR-T1-16 20 0 255 935.87 86.45 2178.6 275 1079.0

Powell 20 0 288 157.05 71.93 2827.1 281 661.00

Perm 20 0 296 1.92e51 2.36e49 1.39e49 298 5.53e52

It also can be found that in case of SUR-T1-14, Rosenbrock, TRID, Griewank, and

Ackley functions, TR-PMO obtained better results than TRMPS while in other problems

the results of TR-PMO are worse (boldfaced numbers are the best for each problem).

On the other hand, compared with OMID, TR-PMO performs better for almost all

benchmark problems except for Perm function. TRMPS and OMID are two effective

optimization strategies for high dimensional problems but often need much more

function calls to reach a good optimal solution. When the allowed number of function

calls is limited to a few hundreds, TR-PMO method has comparable or better

performances than TRMPS and OMID. This is because for TRMPS and OMID, samples

are generated in the entire design space. To adequately cover a high-dimensional

space, a comparatively larger amount of samples are needed for both methods. For TR-

PMO, in 10-dimensional problems, selecting four variables seems to be enough for

obtaining acceptable results with scarce samples. However, in 20-dimensional problems,

four variables are only 1/5 of the total variables, which limits the optimization

performance of TR-PMO. Also, it is noticed that the range of objective function values in

SUR-T1-16, Powell and Perm problems are too large and a small change in the design

variables causes significant changes in function values. It is likely the reason that TR-

PMO did not perform as well as TRMPS for these cases.

In summary, when the number of samples is limited, the advantages of using a partial

metamodel emerge and TR-PMO shows better or comparable results as other methods.

3.6. Application to Airfoil Design

After testing with benchmark functions, both PMO and TR-PMO are applied to an airfoil

design problem as shown in Figure 3-6. The symbol 𝛼 is the attack angle and 𝑉∞ is the

flow velocity. Class function/shape function airfoil transformation representation tool

(CST) [147], as shown in Eq. (3-11), is used to model the geometry of the airfoil.

50

{

𝜉𝑈(𝜓) = 𝜓

0.5(1 − 𝜓)1.0∑ 𝐴𝑢𝑖5!

𝑖! (5 − 𝑖)!𝜓𝑖(1 − 𝜓)5−𝑖

5

𝑖=0+ 𝜓∆𝜉𝑈

𝜉𝐿(𝜓) = 𝜓0.5(1 − 𝜓)1.0∑ 𝐴𝑙𝑖

5!

𝑖! (5 − 𝑖)!𝜓𝑖(1 − 𝜓)5−𝑖

5

𝑖=0+ 𝜓∆𝜉𝐿

(3-11)

Where, 𝜉𝑈 and 𝜉𝐿 are the geometry function of the upper and lower surfaces of the

airfoil, respectively; 𝜓 is the non-dimensional horizontal coordinate; ∆𝜉𝑈 and ∆𝜉𝐿 are the

thickness ratios of the trailing edge of upper and lower surfaces, which can be

represented by the distance between the upper (or lower) surface and the x-axis at

trailing edge; 𝐴𝑢 and 𝐴𝑙 are the coefficients of the shape function. In this example, the

airfoil is a closed curve, so the trailing edge thicknesses of upper and lower surfaces are

zero, i.e., ∆𝜉𝑈 = 0 and ∆𝜉𝐿 = 0. In this parametric function, six upper surface coefficients

and six lower surface coefficients are selected as the design variables. The NACA0012

airfoil is selected as the baseline airfoil in this design problem, and the coefficients of

NACA0012 are shown in Table 3-7. The upper and lower boundaries of the design

variables are 130% and 70% of the baseline.

V�

α

Δξ U

Y

XO

Δξ L

Figure 3-6: Airfoil design problem.

Table 3-7: Parameters of NACA0012.

parameter 0Au

1Au 2Au

3Au 4Au

5Au

Initial value 0.1703 0.1602 0.1436 0.1664 0.1105 0.1794

parameter 0Al

1Al 2Al

3Al 4Al

5Al

Initial value -0.1703 -0.1602 -0.1436 -0.1664 -0.1105 -0.1794

The objective of the airfoil design problem is to maximize the lift-to-drag ratio (L/D). The

constraint of the problem is that the maximum thickness (𝑡𝑚𝑎𝑥) of the new airfoil is not

less than the baseline value (𝑡𝑚𝑎𝑥𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒). Thus, the optimization model is as shown in Eq.

(3-12).

51

min −𝐿

𝐷

𝑠. 𝑡. 𝑡𝑚𝑎𝑥𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 − 𝑡𝑚𝑎𝑥 ≤ 0

0.7𝑥𝑖𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 ≤ 𝑥𝑖 ≤ 1.3𝑥𝑖

𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒

𝑖 = 1,2, … ,12

(3-12)

Software XFOIL [148] is employed to calculate the value of L/D. In this test, the Mach

number of the flow is 0.5, and the Reynolds number is 5,000,000. The results obtained

by optimizing RBF-HDMR model directly are also listed for comparison. 30 independent

runs are carried out for each method. The settings of all methods are the same as in the

numerical tests. The average optimization results over 30 runs are shown in Table 3-8. It

should be noted that the constraint is considered as a cheap constraint.

Table 3-8: Optimization results with airfoil design problem.

dim

TR-PMO PMO Optimizing complete

RBF-HDMR

𝑓∗ NFE 𝑓∗ NFE 𝑓∗ NFE

Airfoil design 12 -117.94 295.1 -106.87 175.5 -51.06 584.5

(a) 𝑓∗ of airfoil design (b) NFE of airfoil design

Figure 3-7: Optimization results on the airfoil design problem

As shown in Table 8, the average NFE of PMO is only 30% of that used to construct a

complete RBF-HDMR model, but the objective is more than twice better than the

optimum value obtained from optimizing RBF-HDMR model. TR-PMO achieves a better

optimum than PMO with more NFEs, which is still 50% of the cost of the RBF-HDMR

approach. Figure 3-7 illustrates the box-plots of 𝑓∗ and NFE of the three methods. It can

be found that PMO and TR-PMO have similar robustness, which is better than

-120 -100 -80 -60 -40 -20 0

TR-PMO

PMO

RBF-HDMR

200 300 400 500 600

TR-PMO

PMO

RBF-HDMR

52

optimizing RBF-HMDR. The variations of NFEs for the three approaches are very

similar.

3.7. Summary

This chapter proposed a Partial Metamodel-based Optimization (PMO) algorithm to deal

with High-dimensional, Expensive, and Black-box (HEB) problems. Instead of building

the complete RBF-HDMR model, a series of partial RBF-HDMR models are constructed

to reduce the number of function evaluations in high-dimensional optimization problems.

To balance the exploration and exploitation phases of the method, a roulette wheel

selection process is employed to select variables to construct the partial HDMR model,

according to the sensitivity index values of all variables. The cut center of the partial

HDMR model at each iteration moves to the newly found optimum point to achieve

higher optimization performance. The HDMR model in previous iterations is used to

predict the function values used in constructing a new partial RBF-HDMR model. The

proposed method is compared with optimizing a complete RBF-HDMR using ten

numerical benchmark functions. PMO obtained better optimum solutions than optimizing

a complete RBF-HDMR, using fewer function calls in almost all the problems. A trust

region strategy is combined with PMO to improve the performance of PMO, and thus the

trust region based PMO method (TR-PMO) is developed. When the sample points are

scarce, TR-PMO method shows comparable or better performance than both TRMPS

and OMID. The proposed approaches are successfully applied to an airfoil design

problem. Note that TR-PMO provides a method to improve the performance of PMO.

Other space reduction-based methods that improve the searching ability for partial

metamodel optimization can also be employed to modify the PMO method. In next

chapter, causal relations will be employed to help with dimension reduction.

53

Chapter 4. Dimension reduction method employing causal relations

Many dimension reduction methods have been developed to reduce the dimensionality

of large-scale problems. Those strategies usually consider design problems as a black-

box functions. However, practitioners usually have certain knowledge of their problem. In

this chapter, a method leveraging causal graph and qualitative analysis is developed to

reduce the dimensionality of the problem by systematically modeling and incorporating

the knowledge about the design problem into optimization. Causal graph is created to

show the input-output relationships between variables. A qualitative analysis algorithm

using design structure matrix (DSM) is developed to automatically find the variables

whose values can be determined without resorting to optimization. According to impact

of variables, one problem is divided into two sub-problems, the optimization problem with

respect to the most important variables, and the other with variables of lower

importance.

4.1. Dimension reduction method description

A causal relationship assisted dimension reduction method is developed in this section.

By building a causal graph, input-output relations between variables in the numerical

model are illustrated. Dimensional analysis with qualitative analysis provides a method

to detect source of contradictions. This is supporting the reduction of dimensionality

before performing optimization. The DSM, constructed according to the causal graph, is

employed to automatically find the variables leading to no contradiction. Calculation of

the impact of variables helps to divide the optimization problem into sub-problems. The

following sub-sections describe the proposed method in detail.

4.1.1. Overall process

The overall process of our proposed dimension reduction method includes constructing

causal graph, performing qualitative analysis, removing variables, calculating the weight

54

of each link, simplifying causal graph and performing two-stage optimization. The steps

of the method are described as follows.

Step 1. Construct a causal graph based on cause-effect relationship. A causal graph is

an oriented graph showing the causal relations between variables. Figure 4-1 is an

example of a causal graph. In the graph, the nodes represent variables, the arrows give

the input-output relations and labels “+1” and “-1” represent how the input influences the

output. For example, the “+1” on the arrow from A to C means that C increases when A

increases. It should be noted that the input and output in one link should have monotonic

relations. This can be achieved by defining design space carefully. Additionally, the more

elaborate the causal graph is, the simpler the causal relations will be in each link and the

easier it will be to achieve monotonic relations for each link. Also, values of the

variables in an engineering problem are usually larger than zero, which helps to avoid

non-monotonic links to some extents.

Reference [142] gives a process of constructing a causal graph. First, all the

fundamental variables are listed and located in a functional structure that represents the

functional flow of the system. Then, the causal rules are employed to define the causality

for each variable. Finally, all the variables are linked together to form the causal graph.

The causal graph should not miss important links and this requirement can be satisfied if

the designers are familiar with the design problem. Once the causal graph is

constructed, the links in the causal graph can be checked by giving a perturbation on

each design variable. If the causal graph can reflect the changes on each intermediate

variable and objective, the causal graph can be regarded as correct.

A

B

C

D F

+1

-1+1

+1

+1

E+1+1

Figure 4-1: Causal graph example.

Step 2. Perform a qualitative analysis. The causal graph is used to detect variables with

or without contradictions. By multiplying the labels on the arrows of one route, the

relation between the input and the final output of the route can be detected. For the

55

example in Figure 4-1, if multiplying “+1” on the arrow from A to C by the “+1” on the

arrow from C to F, the multiplication result is “+1”, which means that F is monotonically

increasing with respect to A. If all the relations between design variables and the

objectives are calculated, contradictions of the variable can be found according to the

multiplications. In Figure 4-1, A influences F via C or D (i.e., -1 via D and +1 via C). A is

generating a contradictory influence on F if we consider both routes and multiplications.

On the other hand, F is a monotonically increasing function with respect to B no matter if

it traverses through D or E. Therefore, A has a contradiction and B is a variable without

contradictions. The vector of design variables is represented as 𝒙 . After qualitative

analysis, the design variables can be divided into two parts, variables with contradictions

( 𝒙𝑐 ) and variables without contradictions ( 𝒙𝑢𝑐 ). This qualitative analysis can be

performed by checking the causal graph manually. However, in optimization, all the

steps are desired to be executed automatically. Thus, a DSM based qualitative analysis

method is proposed to fulfill the requirement and will be described in Section 4.1.2 in

more detail.

Step 3. Determine values of 𝒙𝑢𝑐 and remove them from the causal graph and the set of

design variables. After qualitative analysis, the way design variables in 𝒙𝑢𝑐 influence the

objective can be confirmed. Thus, 𝒙𝑢𝑐 can be regarded as a constant variable set at its

lower (or upper) bounds. Taking Figure 4-1 as an example, B has no contradiction and

thus decrease of B leads to the decrease of the objective F. If a minimum F is desired, B

should be set at its lower bound value. Thus, the optimal value of B is determined before

optimization. Now the design variable set of the optimization problem becomes 𝒙𝑐 only.

Variables without contradictions can thus be removed from the causal graph and from

the optimization variable set.

Step 4. Calculate the weight of each link. The causal graph can be further simplified by

considering the weight of each link. In this step, the Taguchi method is used to calculate

the weight of each link [142]. Section 4.1.3 describes the approach in detail. Before

calculating the weights, the range of every variable, including design variables and state

variables is required. There exist two methods to determine those ranges. First, for

engineering problems, the recommended range of variables can be found from

references and it can be used in the sensitivity analysis. Second, the range can be

determined by sampling a certain number of random points and calculating the

56

responses of the samples. The maximal and minimal values can be used as the upper

and lower bound, respectively.

Step 5. Simplify the causal graph according to calculated weights. The link whose weight

is lower than a threshold is regarded as a low importance link and is removed from the

causal graph. The threshold can be selected according to the weights obtained from

Step 4. The main principle of threshold selection is that the threshold should not be too

high to miss important links, nor too low to be ineffective. A higher threshold will let one

regard more variables as unimportant variables then may increase the number of

iterations in the two-stage optimization, and removing more variables from the important

variable set may reduce the accuracy of the optimization. Generally, the threshold is no

larger than 15%. In this thesis, 10% is selected based on the different case studies that I

have tested during the development of the approach. It provides a good balance

between number of iterations and accuracy of the optimization. The value can be

adjusted if needed.

After removing those deemed less important links, some variables may not affect the

objective at all. Those removed variables are represented by 𝒙𝑢𝑛. On the other hand,

contradictions of some variables (represented by 𝒙𝑢𝑛𝑐) may disappear due to removal of

the less important links. Then, in a similar way to Step 3, values of such variables can be

determined according to the qualitative analysis. Thus, the design variables 𝒙𝑐 (variables

with contradictions) can be divided into two parts, the kept variables with contradictions

𝒙𝑘𝑒, and less important variables 𝒙𝑟𝑒, which includes both 𝒙𝑢𝑛 and 𝒙𝑢𝑛𝑐.

Step 6. Use a two-stage optimization process to obtain the final optimal solution. The

original optimization problem is divided into two sub-problems: one with respect to 𝒙𝑘𝑒

and the other with respect to 𝒙𝑟𝑒.

Then the two optimization problems are optimized separately. Results of the two

optimization problems are combined together to form the final optimal solution. Details of

the two-stage optimization process are shown in Section 4.1.4.

57

4.1.2. Qualitative Analysis based on design structure matrix

The qualitative analysis process is designed to find the variables without contradictions

and it has to be executed automatically for optimization. Thus, a novel design structure

matrix (DSM)-based qualitative analysis method is developed in this section.

In the DSM-based qualitative analysis, two matrices, [A] and [A1] are built according to

the causal graph, where [A] shows the input-output relations between each pair of

variables and [A1] gives [A] and direction of each link. For both [A] and [A1], the first

rows (columns) refer to design variables, the last row (column) is for the objective, and

the intermediate variables are in between. [A] and [A1] are n-by-n matrices, where n is

the number of entities including design variables, intermediate variables, and the

objective. For convenience and consistency with DACM nomenclature, I refer to these

entities as variables with the understanding that the global objective is located in the last

row (column) for both [A] and [A1]. Matrix [A] uses “1” to represent the links between two

variables. If 𝑖 is the input of 𝑗, then 𝑎𝑖𝑗 = 1; otherwise, 𝑎𝑖𝑗 = 0. In matrix [A1], the number

“+1” and “-1” are used to represent the relationship between the input and output. If

variable 𝑗 decreases with 𝑖 increasing, 𝑎𝑖𝑗 = −1; otherwise, 𝑎𝑖𝑗 = +1. I assume that the

optimization problem is a single objective problem. Thus, the last column of [A] and [A1]

shows the objective and its direct inputs. It is also possible to consider a multi-objective

problem. By checking the absolute values of elements in the last column of [A1] and [A],

variables without contradictions can be detected. Details of DSM-based qualitative

analysis method are shown as follows. Section 4.1.5 gives a numeric example for better

explanations.

Assume that the number of design variables is 𝑛𝑉𝑎𝑟 and the number of intermediate

variables is 𝑛𝐼𝑛𝑡, then 𝑛 = 𝑛𝑉𝑎𝑟 + 𝑛𝐼𝑛𝑡 + 1, for a single objective problem.

Step 1. Find coupled variables. In practice, if a “1” appears under the diagonal in DSM,

one can recognize that there is a feedback link. However, these feedback links do not

necessarily represent a loop and those links that do not represent loops should be

moved above the diagonal to simplify the DSM. This can be accomplished via a simple

strategy modifying [A]. The rows of [A] are checked one-by-one. If there is a “1” element

under the diagonal, i.e., 𝑎𝑖𝑗 = 1, 𝑖 > 𝑗, variable j is re-ordered to be before 𝑖. The number

of “1” elements under the diagonal (i.e., 𝑛𝑓) is counted after one movement, which is

58

compared with the smallest number of “1”s under the diagonal (𝑛𝑓∗ ). If 𝑛𝑓 < 𝑛𝑓∗ ,

𝑛𝑓∗ = 𝑛𝑓 and the sequence of the variables is recorded. If the value of 𝑛𝑓∗ does not

change for a given time (i.e., 5 iterations), the modification will stop and the sequence

with the smallest number of feedbacks is used to reconstruct [A] and [A1] and to obtain

[A’] and [A1’]. The location of the “1” element under the diagonal is used to give the

coupled variables. For example, if 𝑎𝑖𝑗 = 1 and 𝑖 > 𝑗, then variables 𝑖 and 𝑗 are coupled.

The coupled variables are stored in a 2-by-𝑛𝑓∗ matrix 𝐹𝐵 , each column of which is

shown as one pair of coupled variables.

Step 2. Calculate the number of links in the longest route. To detect the contradictions of

the 𝑖-th design variable, all the routes from 𝑖-th design variable to the objective should be

identified. The longest route should be determined and the other shorter routes are

checked at the same time. In some cases, the coupling relations make it difficult to find

the longest route because of the presence of the feedback loop. Thus the longest route

is considered in this thesis to be obtained by going through each feedback link once and

only once.

The number of links in the longest route contains two parts: one is the number of links

(𝑛𝑁𝑜𝐶 ) in the longest route without feedbacks and others are the number of links

(𝑛𝐶𝑖, 𝑖 = 1,2, . . , 𝑛𝑓∗) in all loops. Summing 𝑛𝑁𝑜𝐶 and 𝑛𝐶𝑖, 𝑖 = 1,2, . . , 𝑛𝑓

∗ together, the final

number of links (𝑛𝑀𝑎𝑥) can be obtained as follows

𝑛𝑀𝑎𝑥 = 𝑛𝑁𝑜𝐶 +∑ 𝑛𝐶𝑖𝑛𝑓∗

𝑖=1 (4-1)

To count the number of links in the route without feedbacks, the “1” elements under the

diagonal in [A’] are turned to “0” to obtain matrix [Anoc]. Reference [149] used the

number of multiplications to represent the links between two variables. It reported that

multiplying matrix [A] by itself 𝑘 times and if 𝑎𝑖,𝑛 is non-zero, it means that variable 𝑖

influences the objective through a route with 𝑘 + 1 links. [Anoc] is multiplied by itself,

and when the objective column contains non-zero elements, the number of multiplication

(𝑚𝑁𝑜𝑐) is recorded. After multiplying 𝑛 − 1 times, the largest 𝑚𝑁𝑜𝑐 gives the number of

links, i.e., 𝑛𝑁𝑜𝐶 = 𝑚𝑁𝑜𝑐 + 1.

After multiplying a DSM matrix itself several times, if non-zero elements exist in the

diagonal of the DSM matrix, it means there is at least one loop in the problem and a non-

59

zero element means that this variable goes through the loop once and back to itself.

Therefore, once a loop has been detected, by counting the times of multiplications

before non-zero elements appear in the diagonal, the number of links in the loop can be

identified [149].

For the 𝑖-th coupling loop, the two coupled variables are 𝐹𝐵1,𝑖 and 𝐹𝐵2,𝑖. The variables

from 𝐹𝐵1,𝑖 to 𝐹𝐵2,𝑖 are used to construct a small DSM with one coupling loop, [𝐶𝑖] .

Between 𝐹𝐵1,𝑖 and 𝐹𝐵2,𝑖, there are 𝑛𝐿 = 𝐹𝐵2,𝑖 − 𝐹𝐵1,𝑖 links. The matrix [𝐶𝑖]. is multiplied

by itself 𝑛𝐿 + 1 times, and when 𝑐1,1 = 𝑐𝑛𝐿+1,𝑛𝐿+1 = 1, the number of multiplication (𝑚𝐶𝑖)

is recorded. After 𝑛𝐿 + 1 times of multiplication, the largest 𝑚𝐶𝑖 gives the number of links

in the coupling loop, i.e., 𝑛𝐶𝑖 = 𝑚𝐶𝑖 + 1.

Step 3. Find the variables without contradictions. After obtaining the number of links

𝑛𝑀𝑎𝑥, matrices [A’] and [A1’] are multiplied by themselves (𝑛𝑀𝑎𝑥 − 1) times to check

the contradictions.

In general, at the kth multiplication, if 𝑎𝑖,𝑛𝑘 (𝑖 = 1,… , 𝑛𝑉𝑎𝑟) is non-zero, which means that

variable 𝑖 has impact on the objective through 𝑘 + 1 links, the absolute value in the

objective column |𝑎𝑖,𝑛𝑘 | and |𝑎1𝑖,𝑛

𝑘 | are compared. The value |𝑎𝑖,𝑛𝑘 | shows that there are

|𝑎𝑖,𝑛𝑘 | routes from variable 𝑖 to the objective, which contain (𝑘 + 1) links. If |𝑎𝑖,𝑛

𝑘 | ≠ |𝑎1𝑖,𝑛𝑘 |,

it means that there is at least one route through which the objective changes in

directions as compared with other routes. Thus the variable 𝑖 has contradictions. If

|𝑎𝑖,𝑛𝑘 | = |𝑎1𝑖,𝑛

𝑘 | for all the multiplications, the sign of non-zero 𝑎1𝑖,𝑛𝑘 in every multiplication

is checked. If the signs of 𝑎1𝑖,𝑛𝑘 are different, that means through different routes, the

objective changes in directions. Therefore, the variable i has contradictions. Otherwise, if

the sign of 𝑎1𝑖,𝑛𝑘 are the same for all the multiplications, variable 𝑖 has no contraction.

After multiplying [A’] and [A1’] by themselves 𝑛𝑀𝑎𝑥 times, variables without

contradictions can be picked out. The sign of 𝑎1𝑖,𝑛𝑘 indicates the relations between

variable 𝑖 and the objective. Assuming that the objective is to be minimized, if the sign of

𝑎1𝑖,𝑛𝑘 is “+”, it means that the variable 𝑖 should be set at the lower bound value;

otherwise, the upper bound value should be selected.

60

4.1.3. Weight calculation

For the impact of each design variable on the objective is different, a practitioner often

focuses on important variables. By selecting important variables, the problem

dimensionality can be reduced further. Thus, the weight of each link in the causal graph

is calculated in the proposed method, and the original optimization problem is divided

into two sub-problems according to weights of the links. Several methods have been

developed to calculate the weights, including analysis of variances (ANOVA) [82],

principle component analysis (PCA) [81], and so on. Taguchi method [150], [151], one of

the design of experiment tools, offers a simple and systematic approach to calculate the

impact of each input on the output. In this thesis, a two-level Taguchi approach to

compute impact is selected to calculate the weight of each link Assume that an equation

is as follows.

𝑦 = 𝑓(𝒙), 𝑥 = {𝑥1, 𝑥2, … , 𝑥𝑡} (4-2)

There are 𝑡 inputs of 𝑦 in Eq. (4-2); 𝑦 can represent for example an intermediate variable

and 𝑥𝑖 represent the variables influencing 𝑦 . First, the sample points are generated

according to the Taguchi orthogonal arrays. In this thesis, it is assumed the boundary of

design variables and the intermediate variables are appropriately selected so that the

output is monotonic or nearly monotonic with respect to each input. Therefore, a two-

level Taguchi design has the capability to capture the impact of each input. The two-level

Taguchi orthogonal array selected is shown in Table 4-1.

Table 4-1: The Taguchi orthogonal array for t=7.

Experiment number

Column

1 2 3 4 5 6 7

1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 3 1 2 2 1 1 2 2 4 1 2 2 2 2 1 1 5 2 1 2 1 2 1 2 6 2 1 2 2 1 2 1 7 2 2 1 1 2 2 1 8 2 2 1 2 1 1 2

“1” means that the variable takes the value of the lower bound and “2” the upper bound.

For the equation shown in Eq. (4-3), eight sample points are generated according to

Table 4-1 and the responses are calculated at each sample. The symbol 𝑖 represents

61

the columns of the table. The effect of 𝑥𝑖 (𝑖 = 1,2,… , 𝑛𝑉𝑎𝑟) to 𝑦 can be calculated as

follows.

𝐸𝑓𝑓𝑒𝑐𝑡𝑥𝑖−𝑦 =∑ 𝑦𝐿𝑒𝑣𝑒𝑙 2𝑓𝑜𝑟 𝑥𝑖 𝑎𝑡 𝑙𝑒𝑣𝑒𝑙 ℎ𝑖𝑔ℎ𝑗=1 𝑡𝑜 𝑚

𝑚/2−∑ 𝑦𝐿𝑒𝑣𝑒𝑙 1𝑓𝑜𝑟 𝑥𝑖 𝑎𝑡 𝑙𝑒𝑣𝑒𝑙 𝑙𝑜𝑤𝑗=1 𝑡𝑜 𝑚

𝑚/2 (4-3)

where, m is the number of experiments. In this case, 𝑚 = 8 . Then, the effect is

normalized by Eq. (4-4) and the normalized effect is the weight of link 𝑥𝑖 to 𝑦.

𝑊𝑒𝑖𝑔ℎ𝑡 𝑥𝑖−𝑦 =𝐸𝑓𝑓𝑒𝑐𝑡𝑥𝑖−𝑦

∑ 𝐸𝑓𝑓𝑒𝑐𝑡𝑥𝑘−𝑦𝑛𝑘=1

(4-4)

For problems of variables equal to or less than seven, as shown in Table 4-1, only eight

samples are needed. One sample corresponds to one system analysis, or a complete

simulation of the whole system. With these samples, one can perform the Taguchi

computation of weights for each link, and thus the added cost of function evaluations is

eight. For problems of larger scale, the added expense is determined by the specific

orthogonal array that one chooses. The size of the orthogonal array is dependent on the

number of variables in the problem. Note that although only the influence of each single

input is calculated, the cross effect of the inputs are considered in Taguchi method

because the sampling array is designed to consider the cross effect by employing as

small number of sample points as possible.

Each link represents the influence of a variable on another variable and instead of

calculating the importance of a variable to the final objective, the weight of every single

link in the causal graph is estimated. By removing the links with low importance, the

causal graph can be simplified. Then, another qualitative analysis is performed on the

simplified causal graph to find variables without contradictions as well as variables that

have no links to the objective. Optimal values of these variables can thus be determined

and removed from the set of important optimization variables.

4.1.4. Two-stage optimization process

After the second simplification, design variables are divided into two parts, the important

variables 𝒙𝑘𝑒 and the less important variables 𝒙𝑟𝑒. Then, two optimization problems are

constructed shown as Eqs. (4-5) & (4-6) and optimized sequentially.

Problem 1:

62

𝑓𝑖𝑛𝑑 𝒙𝑘𝑒

min𝑓(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐)

𝑠. 𝑡. 𝑔(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐) ≤ 0

𝒙𝑘𝑒𝑙𝑏 ≤ 𝒙𝑘𝑒 ≤ 𝒙𝑘𝑒

𝑢𝑏

(4-5)

Problem 2:

𝑓𝑖𝑛𝑑 𝒙𝑟𝑒

min𝑓(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐)

𝑠. 𝑡. 𝑔(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐) ≤ 0

𝒙𝑟𝑒𝑙𝑏 ≤ 𝒙𝑟𝑒 ≤ 𝒙𝑟𝑒

𝑢𝑏

(4-6)

In both problems, 𝒙𝑢𝑐 is fixed at the value determined by the qualitative analysis. When

optimizing Problem 1, 𝒙𝑟𝑒, 𝒙𝑢𝑛𝑐 are fixed at the determined values, while the value of 𝒙𝑘𝑒

is fixed at the optimal solution obtained from Problem 1 when optimizing Problem 2. In

the following tests, MATLAB function fmincon(.) is employed to solve the optimization

problem. Other optimization methods can also be used to solve the sub-problems. The

stopping criterion is checked at the end of each problem. If the optimum is not found yet

after optimizing problem 2, the sequential optimization process will be performed again.

For the purpose of comparing the efficiency with other methods, one need to fix the

quality of the solution. Therefore, if the relative difference between the optimal results

from Problem 1 (or Problem 2) 𝑓1∗ (𝑜𝑟𝑓2

∗) and the given optimal results 𝑓∗ is less than a

given tolerance (i.e., 10−4), the optimization process terminates. The relative difference

is defined as follows.

휀 =|𝑓∗ − 𝑓1

∗|

𝑓∗ (4-7)

The sequential optimization method may be stuck into a sub-optimum when dealing with

multimodal problems. In the proposed decomposition method, it should be noted that the

unimportant variables include two categories, the variables without contradictions and

the variables having less impact on the objective. For the variables without

contradictions, the optimal solution can be accurately determined according to the

qualitative analysis results. By separating the rest of variables into unimportant and

important variables using knowledge also reduces the risks of falling into a sub-optimum.

63

4.1.5. Numerical example

A simple numerical problem is employed in this section to explain how the proposed

method works. The expression of the problem is as follows.

𝐹𝑖𝑛𝑑 𝒙 = [𝐴, 𝐸, 𝐻, 𝐼] min𝐺 = 10𝐷𝐹−2 + 100𝐶2 𝑤ℎ𝑒𝑟𝑒, 𝐹 = 2𝐶1.8𝐷−2𝐸−2.2𝐻2.5 𝐷 = 2𝐼−1.5 − 𝐶4 𝐶 = 0.5𝐸0.3𝐵−1.2 𝐵 = 2𝐴𝐷 𝑠. 𝑡. 1 ≤ 𝐴, 𝐸, 𝐻, 𝐼 ≤ 2

(4-8)

Step 1. The causal graph (Figure 4-2) is constructed according to Eq. (4-8). The design

variables are drawn at the left side and the objective is located at the right side. As

shown in the causal graph, a coupling loop involved B, C and D exists in the problem.

The labels “+1” and “-1” are assigned above the arrows according to each equation.

Take the equation 𝐶 = 0.5𝐸0.3𝐵−1.2 as an example, C increases when E increases or

when B decreases. Thus, a “+1” is located above the arrow from E to C and a “-1” is

added above the arrow from B to C.

A B

I

E

H

D

C

F

G

+1

-1

+1

-1

+1

+1

-1

+1

-1+1

+1

-1 -1

Figure 4-2: Causal graph of a numerical example.

Step 2. Qualitative analysis based on design structure matrix is performed to find the

design variables without contradictions. The two matrices [A] and [A1] are constructed

as shown in Table 4-2 and Table 4-3. The first four columns refer to design variables

and the last column G shows the objective. For example, B is the output of A and the

input of C, the elements (A, B) and (B, C) are “1” in [A]. The labels “+1” and “-1” above

the arrows in the causal graph are used to construct [A1]. The process of DSM-based

qualitative analysis is presented step-by-step.

64

Table 4-2: Matrix [A] for the numerical example.

A E H I B C F D G

A 0 0 0 0 1 0 0 0 0

E 0 0 0 0 0 1 1 0 0

H 0 0 0 0 0 0 1 0 0

I 0 0 0 0 0 0 0 1 0

B 0 0 0 0 0 1 0 0 0

C 0 0 0 0 0 0 1 1 1

F 0 0 0 0 0 0 0 0 1

D 0 0 0 0 1 0 1 0 1

G 0 0 0 0 0 0 0 0 0

Table 4-3: Matrix [A1] for the numerical example.

A E H I B C F D G

A 0 0 0 0 +1 0 0 0 0

E 0 0 0 0 0 +1 -1 0 0

H 0 0 0 0 0 0 +1 0 0

I 0 0 0 0 0 0 0 +1 0

B 0 0 0 0 0 -1 0 0 0

C 0 0 0 0 0 0 +1 -1 +1

F 0 0 0 0 0 0 0 0 -1

D 0 0 0 0 +1 0 -1 0 +1

G 0 0 0 0 0 0 0 0 0

Step 2.1. The coupled variables are found in this step. As shown in Figure 4-2, there is

one loop involving B, C, and D. In Table 4-2, non-zero elements exist under the diagonal

(boldfaced). For detecting the loop, the sequence of the columns in matrix [A] is changed.

First, the element (D, B) is detected and to remove the “1” under the diagonal, variable D

is moved to the front of B. Then, the modified [A] (named as [A’]) is shown in Table 4-4,

and the number of “1”s under diagonal is one in the modified matrix, i.e., 𝑛𝑓∗ = 1 .

Repeating this step five times, the 𝑛𝑓∗ does not change during the repeating process.

Thus, the new sequence of the variables and objective is

𝑆𝑒𝑞 = [𝐴, 𝐸, 𝐻, 𝐼, 𝐷, 𝐵, 𝐶, 𝐹, 𝐺] (4-9)

The modified [A’] and [A1’] are listed in Table 4-4 and Table 4-5. There is one loop (“1” in

boldface) detected through this step and the pair of the coupled variables are D and C.

65

Table 4-4: Modified matrix [A’] for the numerical example.

A E H I D B C F G

A 0 0 0 0 0 1 0 0 0

E 0 0 0 0 0 0 1 1 0

H 0 0 0 0 0 0 0 1 0

I 0 0 0 0 1 0 0 0 0

D 0 0 0 0 0 1 0 1 1

B 0 0 0 0 0 0 1 0 0

C 0 0 0 0 1 0 0 1 1

F 0 0 0 0 0 0 0 0 1

G 0 0 0 0 0 0 0 0 0

Table 4-5: Modified matrix [A1’] for the numerical example.

A E H I D B C F G

A 0 0 0 0 0 1 0 0 0

E 0 0 0 0 0 0 1 -1 0

H 0 0 0 0 0 0 0 1 0

I 0 0 0 0 1 0 0 0 0

D 0 0 0 0 0 1 0 -1 1

B 0 0 0 0 0 0 -1 0 0

C 0 0 0 0 -1 0 0 1 1

F 0 0 0 0 0 0 0 0 -1

G 0 0 0 0 0 0 0 0 0

Step 2.2. The number of links in the longest route is counted in this step. The “1”

element in (C, D) in matrix [A’] is turned to be “0” to construct the matrix [Anoc] (as

shown in Table 4-6). [Anoc] is multiplied by itself eight times and at the first, second,

third and fourth time, the objective column contains non-zero elements. Thus 𝑛𝑁𝑜𝐶 = 5.

Table 4-6: Matrix [Anoc] for the numerical example.

A E H I D B C F G

A 0 0 0 0 0 1 0 0 0

E 0 0 0 0 0 0 1 1 0

H 0 0 0 0 0 0 0 1 0

I 0 0 0 0 1 0 0 0 0

D 0 0 0 0 0 1 0 1 1

B 0 0 0 0 0 0 1 0 0

C 0 0 0 0 0 0 0 1 1

F 0 0 0 0 0 0 0 0 1

G 0 0 0 0 0 0 0 0 0

In the example, only one coupling exists, so one matrix [C] is built. As shown in Table

4-4, “1” showing the feedback appears in the element (C, D). Thus, variables from D to

C in Table 4 (i.e., D, B, and C) are used to construct the matrix [C], which is shown in

Table 4-7. In matrix [C], the two coupled variables, D and C are located at the first and

66

third column, so 𝑛𝐿 = 3 − 1 = 2. [C] is multiplied by itself three times and at the second

multiplication, 𝑐1,1 = 𝑐𝑛𝐿+1,𝑛𝐿+1 = 1, which means 𝑛𝐶 = 3. Thus, the total number of links

(nMax) in the longest route is 𝑛𝑀𝑎𝑥 = 𝑛𝑁𝑜𝑐 + 𝑛𝐶 = 5 + 3 = 8. As one can see from

Figure 4-2, the longest path is A-B-C-D-B-C-D-F-G with eight steps.

Table 4-7: Matrix [C] for the numerical example.

D B C

D 0 1 0

B 0 0 1

C 1 0 0

Step 2.3. The variables without contradictions are detected in this step, because

𝑛𝑀𝑎𝑥 = 8, [A’] and [A1’] are multiplied by themselves seven times. Table 4-8 illustrates

that for the four design variables, the values in the objective column in [A’] and [A1’] after

each multiplication.

Table 4-8: Element values in the objective column in [A’] and [A1’].

Multipli-cation

No. 1 2 3 4 5 6 7

G G1 G G1 G G1 G G1 G G1 G G1 G G1

A 0 0 1 -1 2 2 1 1 1 -1 2 2 1 1 E 2 2 2 -2 1 -1 1 1 2 -2 1 -1 1 1 H 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 I 1 1 1 1 1 -1 2 2 1 1 1 -1 2 2

In Table 4-8, G is the element value in the objective column in [A’] while G1 gives the

values in the objective column in [A1’] in every multiplication. As shown in Table 4-8, the

absolute values of G and G1 are the same for every variable in every multiplication.

Now if checking the first and second multiplication for variable E, it can be found that in

the first multiplication, G1 has the sign “+” while in the second multiplication, G1’s sign is

“-”, which means that through two steps the increasing of E can increase G while the

increasing of E may decrease G when going another route with three links. Therefore, a

contradiction exists in variable E. The same applies to A and I when checking the

second and the third multiplications. On the other hand, for H, only at the first

multiplication, the values in (H, G) and (H, G1) is non-zero and the absolute values are

the same, which means that H only influences objective G through one route with two

links. Therefore, H has no contradiction. Thus, A, E and I are variables containing

contradictions while H is without contradictions, i.e., 𝒙𝑢𝑐 = 𝐻 and 𝒙𝑐 = [𝐴, 𝐸, 𝐼].

67

Step 3. After qualitative analysis, one variable (H) is found without contradictions. Since

the sign of element (𝐻, 𝐺1) is “-”, H should be set at the upper bound value because G is

to be minimized. The other three variables (A, E and I) will go through the next steps.

Step 4. The weight of each link is calculated using the Taguchi method. The objective

function 𝐺 = 10𝐷𝐹−2 + 100𝐶2 is taken as an example to show the process. Because D,

F and C are intermediate variables, their ranges are decided by calculating the

responses of 50 random sample points. In this case, the ranges of D, F and C are

[0.7140, 1.9848], [0.0026, 0.8635], and [0.0525, 0.3388]. Next, the Taguchi table is

constructed as shown in Table 4-9 and the response of each sample is constructed.

Table 4-9: Taguchi sampling table of objective function.

Experiment number

Inputs G

C D F

1 0.0525 0.714 0.0026 1056213 2 0.0525 0.714 0.0026 1056213 3 0.0525 1.9848 0.8635 26.89465 4 0.0525 1.9848 0.8635 26.89465 5 0.3388 0.714 0.8635 21.05431 6 0.3388 0.714 0.8635 21.05431 7 0.3388 1.9848 0.0026 2936106 8 0.3388 1.9848 0.0026 2936106

By calling Eq. (4-3) and (4-4), the weight of the three inputs links (i.e., D to G, F to G and

C to G) are 24.2%, 51.6%, and 24.2% respectively. Using the same method to calculate

the weight of all the links and using the weight to replace the “1” element in matrix [A],

the weighted matrix [Aw] is constructed as shown in Table 4-10.

Table 4-10: Weighted matrix [Aw] for numerical example.

A E H I B C F D G

A 0 0 0 0 15.0% 0 0 0 0

E 0 0 0 0 0 3.6% 25.5% 0 0

H 0 0 0 0 0 0 25.6% 0 0

I 0 0 0 0 0 0 0 96.1% 0

B 0 0 0 0 0 96.4% 0 0 0

C 0 0 0 0 0 0 27.3% 3.9% 24.2%

F 0 0 0 0 0 0 0 0 51.6%

D 0 0 0 0 85.0% 0 21.6% 0 24.2%

G 0 0 0 0 0 0 0 0 0

68

Step 5. The causal graph is simplified according to the weight and the variable sets, 𝒙𝑘𝑒

and 𝒙𝑟𝑒 are detected in this step. In this case, the threshold is selected as 10%.

Comparing the weights of each link with the threshold, the links E -> C and C -> D are

removed and the simplified causal graph is shown in Figure 4-3.

A B

I

E

H

D

C

F

G

+1

-1

-1

+1

+1

-1

+1

-1+1

+1

-1

Figure 4-3: Simplified causal graph for the numerical example.

From Figure 4-3, it can be found that the coupling loop is decoupled because the link

between C and D is cut. In the simplified graph, the variables without contradictions are

detected through qualitative analysis. In this case, variable E is found without

contradictions and the objective G decreases with E decreasing. Therefore, the kept

variables needed to be optimized 𝒙𝑘𝑒 = [𝐴, 𝐼] and the less important variable 𝑥𝑟𝑒 = 𝐸.

Step 6. The two-stage optimization problem is constructed as follows.

Problem 1

𝑓𝑖𝑛𝑑 𝒙𝑘𝑒 = [𝐴, 𝐼] min𝐺 = 𝑓(𝒙𝑘𝑒 , 𝐸, 𝐻) 1 ≤ 𝒙𝑘𝑒 ≤ 2 𝑤ℎ𝑒𝑟𝑒, 𝐸 = 1,𝐻 = 2

(4-10)

Problem 2

𝑓𝑖𝑛𝑑 𝑥𝑟𝑒 = 𝐸 min𝑓(𝑥𝑟𝑒 , 𝒙𝑘𝑒 , 𝐻) 1 ≤ 𝑥𝑟𝑒 ≤ 2 𝑤ℎ𝑒𝑟𝑒, 𝒙𝑘𝑒 = [𝐴

∗, 𝐼∗], 𝐻 = 2

(4-11)

The optimal value 𝑓∗ = 7.9735 is set to be the stopping criterion value. Matlab function

fmincon(.) is employed to perform the optimization and the results are shown in Table

4-11. The starting point of the original problem is randomly generated in the design

69

space. For the two-stage optimization problem, the starting point of the two stages is the

same as that in the original problem. For example, if the starting point in the original

problem is 𝑥0 = [𝐴0, 𝐼0, 𝐸0, 𝐻0] = [1.2,1.3,1.4,1.5], then the starting point for Problem 1 will

be 𝒙𝑘𝑒,0 = [𝐴0, 𝐼0] = [1.2,1.3] and the starting point for Problem 2 will be 𝑥𝑟𝑒,0 = 𝐸0 = 1.4.

Since the optimal value is reached at the first stage optimization, the second stage

optimization is not run in this case. 𝑓∗ is the optimal value and SA stands for system

analysis. The optimization is repeated 11 times so the median is an actual tested value.

The median number of SA is shown in Table 4-11. The optimal value and the optimal

points are the results in the run with the median number of SA.

Table 4-11: Optimization results of the original problem and decomposed problem.

𝑥∗ 𝑓∗ # of SA Variance of # of SA

Original [1.365,1,2,2] 7.9735 91 [60,131] Decomposed [1.365,1,2,2] 7.9735 41 [38,48]

As shown in Table 4-11, the number of system analysis for the two-stage optimization is

41, including eight system analyses in weight calculation, which is 45% of the number of

analysis used in optimization of the original problem. That is because the four-

dimensional problem is reduced to a two-dimensional problem.

To test the influence of the threshold, the threshold is selected as 15%. Then, link A -> B

is removed from Figure 4-3 as well, which means A has less impact on the final

objective. Thus, the kept variables 𝑥𝑘𝑒 = 𝐼 and the less important variable 𝒙𝑟𝑒 = [𝐴, 𝐸].

Using fmincon(.) function to optimize the decomposed problem, the results are shown in

Table 4-12.

Table 4-12: Comparison of two thresholds (10% and 20%).

Threshold 𝑥∗ 𝑓∗ # of SA Variance of # of SA

10% [1.365,1,2,2] 7.9735 41 [38,48] 15% [1.365,1,2,2] 7.9735 55 [41,65]

As shown in Table 4-12, the number of SA when using 10% threshold is smaller than

that with 15% threshold. When using 10% as threshold, after optimizing the important

variable 𝒙𝒌𝒆 the optimum is reached and the optimization process is terminated.

However, when selecting 15% as threshold, optimizing 𝑥𝑘𝑒 cannot reach the target value

because one of the important variable A left as an important variable. As the result, the

unimportant variable 𝒙𝑟𝑒 needs to be optimized, which increase the number of SA.

70

Therefore, missing important variables will lead to more function calls. To avoid remove

important variables mistakenly, a smaller threshold is preferred, i.e., 10%.

4.2. Engineering case studies

4.2.1. Power converter design problem

A power converter design problem [152], [153] is used to test the performance of the

proposed dimension reduction methodology. The design problem has six design

variables, as shown in Table 4-13. The upper and lower bounds defined in [154] are

used in this thesis. The objective of the problem is to minimize the weight of the power

converter as shown in Eq. (4-12). The formulation of the problem is defined as follows

and all constant values are taken from [152].

min𝑦1 = 𝑊𝑐 +𝑊𝑤 +𝑊𝑐𝑎𝑝 +𝑊ℎ𝑠 (4-12)

where, 𝑊𝑐 = |𝐷𝐼𝑦6(𝑍𝑃1 + 𝑦7)| , 𝑍𝑃1 = 2(1 + 𝐾2)𝑥6 , 𝑊𝑤 = |(𝑋𝑀𝐿𝑇)(𝐷𝐶)𝑥2𝑥3| , 𝑋𝑀𝐿𝑇 =

2𝑥1(1 + 𝐾1)𝐹𝐶, 𝑊𝑐𝑎𝑝 = |𝐷𝐾5𝑥5|, and 𝑊ℎ𝑠 = |𝑃𝑂

𝐾𝐻(1

𝑦2− 1)|.

Electrical design state analysis duty cycle:

𝑦3 =𝐸𝑂

(𝑦2𝐸𝐼2(𝑋𝑁)

)

(4-13)

Minimum duty cycle:

𝑦4 =𝐸𝑂

(𝑦2𝐸𝐼𝑀𝐴𝑋2(𝑋𝑁)

)

(4-14)

Inductor resistance:

𝑦5 =𝑋𝑀𝐿𝑇𝑥2(𝑅𝑂)

𝑥3 (4-15)

Core cross-sectional area:

𝑦6 = 𝐾1𝑥12 (4-16)

Magnetic path length:

71

𝑦7 =𝜋

2𝑥1 (4-17)

Inductor value:

𝑦8 =(𝐸𝑂 + 𝑉𝐷)(1 − 𝑦3)

𝑦6𝑥2(𝐹𝑅) (4-18)

Loss design state analysis:

𝑦2 =𝑃𝑂

𝑃𝑄 + 𝑃𝐷 + 𝑃𝑂𝐹 + 𝑃𝑋𝐹𝑅 (4-19)

Table 4-13: Design variables in power converter design.

Variables Name Description Lower Bound Upper Bound

𝑥1 𝐶𝑤 Core center leg width (m) 0.001 0.1

𝑥2 𝑇𝑢𝑟𝑛𝑠 Inductor turns 1.0 10

𝑥3 𝐴𝑐𝑝 Copper size (m2) 7.29e-8 1.0e-5

𝑥4 𝐿𝑓 𝑃𝐼𝑁𝐷𝑈𝐶⁄ Inductance (H) 1.0e-6 1.0e-5

𝑥5 𝐶𝑓 Capacitance (F) 1.0e-5 0.01

𝑥6 𝑤𝑤 Core window width (m) 0.001 0.01

x1

x2

x3

x4

x5

x6

DELI

y6

XMLT

y7

ZP1

y5

CIRMS

XIRMS

XIMIN

XIP

ESRy3

PQ

PD

POF

y2

Wc

Ww

Wcap

Whs

y8

y1

+1

+1

+1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1+1

+1

+1+1

+1

+1

+1

+1

+1

+1

+1

Figure 4-4: Causal graph of the power converter problem.

The proposed dimension reduction method is employed to solve the six-dimensional

multidisciplinary design optimization problem. It is to be noted that this problem entails

mathematical expressions, which are used to build the causal graph as shown in Figure

72

4-4. In most engineering problems, one does not have equations and thus should use

their knowledge to construct a causal graph. By employing the qualitative analysis, it can

be found that all variables contain contradictions. To further simplify the causal graph,

the less important links are removed according to the weights and two-stage

optimization is constructed as shown in Eqs. (4-20) and (4-21).

Problem 1:

𝑓𝑖𝑛𝑑 𝒙𝑘𝑒 = [𝑥1, 𝑥2, 𝑥5]𝑇

min𝑦1 = 𝑓(𝒙𝑘𝑒 , 𝒙𝑟𝑒)


𝑢𝑏

(4-20)

Problem 2:

𝑓𝑖𝑛𝑑 𝒙𝑟𝑒 = [𝑥3, 𝑥4, 𝑥6]𝑇

min𝑦1 = 𝑓(𝒙𝑟𝑒 , 𝒙𝑘𝑒)

𝒙𝑟𝑒𝑙𝑏 ≤ 𝒙𝑟𝑒 ≤ 𝒙𝑟𝑒

𝑢𝑏

(4-21)

When optimizing Problem 1 for the first time, the design variables 𝒙𝑟𝑒 are fixed at the

given value determined by the qualitative analysis. According to the previous qualitative

analysis results, the upper bounds of 𝑥3 and 𝑥4 and the lower bound of 𝑥6 should be

selected. In this case, 𝑥3 = 1𝑒−5, 𝑥4 = 1𝑒

−5, and 𝑥6 = 0.001.

The MATLAB function fmincon(.) is employed to optimize the two problems. The starting

point is generated randomly in the design space for the original problem. The starting

points for the problems 1 and 2 in the two-stage optimization are the same as the

starting point for the original problem. The original problem with six design variables is

optimized first. The optimal result of the original problem is used as the stopping criterion

for the two-stage optimization. Both optimizations are repeated 11 times and the median

number of SA and the optimal results in that run are shown in Table 4-14.

Table 4-14: Optimization results for the power converter problem.

𝑥∗ 𝑓∗ # of SA Variance of # of SA

Original [0.003,3.605,1e-5,1e-5,8e-5,0.001] 0.9864 887 [636,1442] Decomposed [0.003,3.586,1e-5,1e-5,1e-4,0.001] 0.9866 210 [186,557]

For the two-stage optimization, after optimizing Problems 1 and 2 once, the optimal

value reaches 0.9866. The number of SA in the two-stage optimization is 210 including

eight system analyses in sensitivity analysis, which is only 23% of that used in the

73

original optimization. The significant reduction of SA is due to the reduction of

dimensions. In the original problem, six design variables need to be optimized. Although

all the variables contain contradictions at the beginning, three variables with weak

contradictions are selected from the original design variable set in the second

simplification and the six-dimensional problem are divided into two lower-dimensional

problems with three variables each. The reduction of the dimensions improves

significantly the optimization efficiency. Although the description seems tedious, the

qualitative analysis and dimension reduction are automatically conducted using the

developed algorithm and code.

To illustrate the efficiency of the proposed method, the decomposed problem is

compared with the original problem with the same number of function evaluations. In this

case, the number of function calls is fixed at 250 for both problems. Note that for the

decomposed problem, the maximum number of function calls for Problem 1 is set as

250. If Problem 1 optimization is terminated before 250 function calls, Problem 2 will

continue to run to reach to 250 function evaluations. This test is also repeated 11 times

for both methods and the results are shown in Table 4-15.

Table 4-15: Comparison of optimization results with a fixed number of SA for the power converter problem.

𝑥∗ 𝑓∗ Variance of 𝑓∗ # of SA

Original [0.0028,3.618,9e-6,8e-6,1e-4,0.001] 1.0024 [0.9887,1.0222] 250 Decomposed [0.0030,3.384,1e-5,1e-5,1e-4,0.001] 0.9865 [0.9864,0.9893] 250

When fixing the number of function evaluations, optimizing decomposed problem can

obtain better results than optimizing the original problem. For the original problem, using

250 function calls cannot obtain the optimal results. However, for the decomposition

problem, Problem 1 usually needs about 200 SAs to find the optimal solution for the

important variables. Then, around 50 function evaluations are used in Problem 2 to

obtain the final optimal results. To summarize, using the proposed dimension reduction

method can help to achieve better results when the number of function evaluations is

fixed.

4.2.2. Aircraft concept design problem

The aircraft concept design problem [66] is used to test the performance of the proposed

method. There are ten design variables (listed in Table 4-16) and three coupled

74

disciplines (structure, aerodynamics, and propulsion). The objective of the problem is to

maximize the range computed by the Breguet equation. The causal graph is shown in

Figure 4-5. By employing the proposed method, it can be found that variable ℎ has no

contradiction and the upper bound of ℎ is desired. Then, the original problem is divided

into two optimization problems,

Problem 1:

𝑓𝑖𝑛𝑑 𝒙𝑘𝑒 = [𝑀, 𝑇, 𝑆𝑅𝐸𝐹 , 𝑡 𝑐⁄ ,Λ, 𝑥, 𝐶𝑓]𝑇

max𝑅(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐)

𝑠. 𝑡. 𝑔(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐) ≤ 0


𝑢𝑏

(4-22)

Problem 2:

𝑓𝑖𝑛𝑑 𝒙𝑟𝑒 = [𝜆, 𝐴𝑅]𝑇

max𝑅(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐) 𝑠. 𝑡. 𝑔(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐) ≤ 0 𝒙𝑟𝑒

𝑙𝑏 ≤ 𝒙𝑟𝑒 ≤ 𝒙𝑟𝑒𝑢𝑏

(4-23)

75

R

V

L/D

WT/(WT-

WF)

SFC

D

WT

WT-WF

CD

WW

WE

WFWFW

CDmin

CL

k

Θ

Fo2

ESF

x

b/2λ

AR

t

Sref

t/c

Cf

Λ

T

M

h

+1

-1

-1

-1-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1+1

+1

+1+1

+1

+1

+1

+1

+1 +1+1

+1

+1

+1 +1

+1

+1

+1

+1 +1

+1

+1 +1

+1

+1

+1

+1

+1

+1

+1 +1

+1

+1

+1

+1

+1

+1

+1

Figure 4-5 Causal graph of the aircraft concept design problem

When optimizing Problem 1 at the first time, the design variables 𝒙𝑟𝑒 and 𝒙𝑢𝑐 are fixed at

the given values that are determined by the qualitative analysis. In this case, ℎ = 60000

and 𝐴𝑅 = 2.5. For 𝜆, because it has no impact on the objective, 𝜆 is set to be the initial

number 0.25. The details of how the proposed method performs in the aircraft concept

design can be found in [155].

Table 4-16 Design variables in aircraft concept design

Variables Description Lower Bound Upper Bound

1 𝑀 Mach number 1.4 1.8 2 𝑇 Throttle setting 0.1 1.0

3 𝑆𝑅𝐸𝐹 Wing surface area (ft2) 500 1500 4 𝐴𝑅 Aspect ratio 2.5 8.5

5 𝑡/𝑐 Thickness/chord ratio 0.01 0.09 6 𝜆 Wing taper ratio 0.1 0.4

7 Λ Wing sweep (deg) 40 70 8 𝑥 Wingbox x-section area (ft2) 0.9 1.25

9 ℎ Skin friction coefficient 38000 60000

10 𝐶𝑓 Mach number 0.75 1.25

76

The MATLAB function fmincon(.) is employed to optimize the two problems. The starting

points are selected randomly in the design space. The original problem with ten design

variables is optimized first. The optimal result of the original problem is used as the

stopping criterion in two-stage optimization. Each optimization is run 11 times and the

results are shown in Table 4-17.

Table 4-17: Optimization results of aircraft concept design.

𝑥∗ 𝑓∗ # of SA

Variance of # of SA

Original [1.4,0.265,1500,2.5,0.09,0.1,70,0.9,60000,0.75] 4459 453 [420,724] Decomposed [1.4,0.265,1500,2.5,0.09,0.1,70,0.9,60000,0.75] 4459 210 [166,231]

As shown in Table 4-17, it can be found that after decomposition, the two-stage

optimization reaches the same optimal value with 210 function evaluations, which is half

of the function calls used in the original optimization. In the original problem, ten design

variables need to be optimized. After employing the causal knowledge to analyze the

problem, one can find that there exists one monotonic variable. Then, the ten-

dimensional problem turns to be a nine-dimensional problem. After simplification, the

nine-dimensional problem is divided into two problems with seven and two variables

respectively. The reduction of the dimensions improves significantly the efficiency of the

optimization.

Then, the two-stage optimization is compared with the original optimization with a fixed

number of SA. In this case, the maximum number of function evaluations is set to be 180

for both optimizations. Each optimization is run 11 times and the median results are

shown in Table 4-18. It can be found that with the fixed number of function evaluations,

the result of the two-stage optimization is much better than the optimal value obtained

from the original problem.

Table 4-18: Comparison of optimization results with a fixed number of SA for the aircraft problem.

𝑥∗ 𝑓∗ Variance of 𝑓∗ # of SA

Original [1.4,0.366,870,2.5,0.09,0.14,70,0.9,50535,0.75] 1885 [951,4413] 180 Decomposed [1.4,0.263,1500,2.5,0.09,0.1,70,0.9,60000,0.75] 4458 [4458,4459] 180

77

4.3. Summary

This chapter proposed a dimension reduction method using causal graph and qualitative

analysis and this method is used to solve two optimization problems to test the

optimization efficiency. Causal graphs are constructed to show the input-output

relationships between variables. To find the variables without contradictions

automatically, a novel design structure matrix (DSM) based qualitative analysis method

is developed. Then, the values of the variables without contradictions can be determined

before optimization and the dimensionality of the problem can be reduced. Taguchi

method is employed to calculate the weight of each relationship and the original problem

is divided into two sub-problems, one with important variables and the other with less

important variables. Thus, the number of variables in each sub-problem is reduced

compared with the original problem. Finally, the two sub-problems are optimized

sequentially to obtain the optimal solution. The proposed method is employed to solve a

power converter design problem and an aircraft concept design problem from literature

and the results are compared with that obtained by optimizing the original problem. With

the same optimal value, the efficiency of the proposed method is significantly higher as

compared to optimizing the original problem. On the other hand, with the same number

of function calls, the proposed method arrives at a better optimal solution. Nevertheless,

the method can reach its limit if in the optimization problem all variables have

contradictions and no simplifications can be made with the approach developed in the

article. It is to be noticed that the only added function calls by the proposed method are

for the weight calculation; the total number of these calls is limited according to the

problem dimension and corresponding orthogonal array. Other steps of the proposed

method are only the analyses of the causal graph with matrix operations. The associated

cost of those operations is negligible and the operations are automated. Thus, the

dimension reduction method can be performed as a pre-analysis before launching the

optimization.

In addition to assisting dimension reduction, can one use causal relations to build a more

accurate metamodel? Next chapter addresses this very question.

78

Chapter 5. Casual-Artificial Neural Network (Causal-ANN) and its application

To reduce the computational cost in engineering design, expensive high-fidelity

simulation models are approximated by metamodels. Typical metamodeling methods

assume that expensive simulation models are black-box functions. A totally unknown

design space implies that more sample points are needed to seek for enough

information to construct an accurate metamodel in the entire design space. In order to

improve the efficacy of metamodels, knowledge about engineering design problems is

employed to help develop a novel metamodel, named as causal artificial neural network

(causal-ANN). Cause-effect relations intrinsic to the design problem are employed to

decompose an ANN into sub-networks and values of intermediate variables are utilized

to train these sub-networks.

Apart from giving a good prediction, an accurate metamodel can be used in different

applications in engineering design. Considering the structural representation of a causal-

ANN, not only the objective values, but also values of intermediate variables can be

predicted from the causal-ANN. Therefore, combining with the theory of Bayesian

Network [138], distribution of the variables and objectives can be estimated through the

causal-ANN. By analyzing the distribution, attractive design subspaces can be identified,

which means the subspace where the optimum solution may locate. In this thesis, the

application of causal-ANN in identifying attractive design spaces is also developed.

5.1. Causal ANN and application in attractive sub-space identification

In this section, causal relations are employed to help the neural network construction.

According to the causal graph, the entire network is divided into multiple sub-networks.

Intermediate variables are used together with the design variables and objective to train

each sub-network. The constructed casual-ANN can be used to identify the attractive

sub-spaces where the optimum design may locate. The likelihood of design variables

can be estimated from the causal-ANN and the attractive sub-spaces can be selected

79

through the likelihood distribution. In this section, the process of constructing a causal-

ANN is described and its application in identifying attractive sub-spaces is represented.

Case studies are given in Section 5.2 for more detailed description of each step.

5.1.1. Causal artificial neural network

The main challenge of using ANN in engineering design is the large number of sample

points needed to build a reasonably accurate model. Engineers have certain

understanding of the problem at hand, and furthermore, during engineering simulation,

values of some intermediate variables can be obtained through one simulation along

with the objective value. But those values are not employed in constructing metamodels.

In this chapter, causal relations are used to form the structure of the ANN and values of

the intermediate variables are used in training ANN. The process of constructing ANN is

represented as follows.

Step 1. Generate causal relations of the design problem. A high-level causal relation

map (i.e., simplified causal graph) is needed before constructing a causal-ANN. The

simplified causal graph needs inputs, output, and key intermediate variables. Such

intermediate variables can be the coupling variables, the outputs from each discipline, or

variables whose values can be obtained from simulation as by-products. Usually, key

intermediate variables can be selected according to the problem simulation process and

experience of the designers. There are two ways to generate a high-level causal graph.

One simple method is to simplify an existing causal graph. A causal graph of an

engineering problem usually contains all of the variables involved in the problem. By

keeping the key variables and removing others, a causal graph can be simplified to

represent the high-level causal relations. If the causal graph does not exist, knowledge

of the design problem can be used to generate the high-level causal-relations. By

connecting inputs and output with the selected key intermediate variables, a high-level

causal graph can be generated. Case studies in Section 5.2 will show examples.

Step 2. Generate sub-networks according to the causal relations. The high-level causal

graph is divided into multiple sub-graphs, which include only two layers, inputs and

outputs. For example, for the causal graph in Figure 5-1, two sub-graphs can be

generated, [A, B] to C and [C, D] to E.

80

A

B

C

E

D

Figure 5-1: An example of high-level causal graph

Step 3. Construct neural networks on the sub-graphs. Apart from causal relations, other

knowledge can also be employed in the causal-ANN. According to the kind of knowledge

applied in the problem, causal-ANNs can be divided into three categories.

The first category is that cheap models, i.e., mathematical models or inexpensive

simulation models, exist as part of the prediction model. Since the network is divided into

several sub-nets in the casual-ANN, some of the sub-nets can be replaced by the

existing cheap models. Taking the problem in Figure 5-1 as an example, if it needs to

employ computationally expensive models to calculate C while calculation E from [C, D]

is a cheap model, then a causal-ANN with cheap models can be constructed as shown

in Figure 5-2. Thus, there will be one ANN to be trained with two inputs. By involving the

cheap model in the causal-ANN, the accuracy of the prediction model can be improved.

Additionally, the reduced number of weights in the causal-ANN can reduce the training

cost. Moreover, the incorporation of cheap models in causal-ANN has negligible

computational overhead for ANN training.

A

B

C

E

D

Cheap model

Figure 5-2: Causal-ANN with a cheap model.

The second category of causal-ANN is that values of the intermediate variables can be

obtained as a by-product from the output evaluation, usually from running expensive

simulations. Thus, the causal-ANN can be divided into multiple independent sub-nets.

81

For the example problem shown in Figure 5-1, if the value of variable C can be obtained

from simulation, then two separate sub-nets can be constructed. If the sub-ANN is

between intermediate variables and the objective (e.g., [C, D] to E in Figure 5-3), the

actual values of the intermediate variable C is used as the inputs of the ANN. In

contrast, if one builds one system causal-ANN, after the model is constructed and used

in prediction, values of the intermediate variables are only estimated from previous

layers of the network, instead of the actual values. For this category of causal-ANN, the

complex prediction model can be divided into multiple sub-networks with less

complexity, which may lead to high accuracy of each sub-network.

A

B

C

C

E

D

Figure 5-3: Two separate sub-networks.

The last category of causal-ANN is that both the value of the intermediate variables and

the cheap model are available in the problem. Thus, the entire causal-ANN can be

divided into multiple sub-networks and some of them can be replaced by the cheap

model as shown in Figure 5-4. Then, the number of sub-networks to be trained can be

reduced.

A

B

C

C

E

D

Cheap model

Figure 5-4: Casual-ANN with known intermediate variables and cheap models.

The purpose of the causal-ANN constructing method is to employ knowledge in

constructing more accurate metamodels. The structure of the ANN is determined

according to causal relations and the problem is divided into several sub-ANNs. Then,

the sub-ANNs are trained based on values of the intermediate variables. The main

82

advantage of causal-ANN is to reduce the complexity of ANN. It is often difficult to train

ANN to approximate large-scale nonlinear problems. Thus, by dividing the entire network

into several sub-networks, the complexity of each network is reduced and the accuracy

of the entire model can be improved. By generating sample points from the neural

network, Bayesian probability inference can be performed with lower computational cost

than on the actual simulation.

5.1.2. Attractive sub-space identification method

Distribution of the obtained objective values is one kind of important information for

guiding sampling and performing optimization. In the Mode-Pursuing Sample (MPS)

method [6], a large number of cheap samples are generated by evaluating the

metamodel and distribution of the objective values is estimated through those cheap

samples and their responses. Then, new sample points are generated following the

distribution of the objective values to balance between exploration and exploitation.

Bayesian network is one kind of belief graphic modeling method that gives the joint

distribution of each variable. By constructing a Bayesian network of the engineering

design problem, the distribution of the objective 𝑝(𝑓|𝒙, 𝐷, 𝐺) can be found, where f is the

objective, 𝒙 is the design variables, D is the data and G is the graph structure. After

obtaining the distribution of the objective, the likelihood of the objective 𝑝(𝒙|𝑓, 𝐷, 𝐺) can

be calculated via the Bayesian theorem as shown in the following equation

𝑝(𝒙|𝑓, 𝐷, 𝐺) =𝑝(𝑓|𝒙, 𝐷, 𝐺)𝑝(𝒙)

𝑝(𝑓) (5-1)

Where, 𝑝(𝒙) and 𝑝(𝑓) are the distribution of the design variables and the objective

respectively, which can be estimated through analyzing the sample data. In general, for

a certain engineering design problem, designers or the decision makers often have an

expected objective value or range. The likelihood of the objective gives the information

about what area (or range) of the design variables has higher probability to generate

expected designs. Details of the method are shown as follows.

Step 1. Generate sample points. Sample points are generated following the uniform

distribution (or other variable distributions if known). The causal-ANN model is evaluated

to calculate the responses of the sample points. Note that the responses include the

objectives and also the intermediate variables.

83

Step 2. Discretize all the variables and objective. Most of BNs only deal with discrete

variables, while the variables in design problems are usually continuous. One method to

deal with the problem is to discretize variables and objective.

At the beginning, all the variables including inputs, intermediate variables, and outputs

are assumed to follow uniform distribution. Then, the range of each variable is divided

into n intervals with certain indices, as shown in Figure 5-5.

xi [ lb ub ]... ...

Index 1 ... ... nm+1

Figure 5-5: Variable discretization.

𝑙𝑏 and 𝑢𝑏 are lower bound and upper bound of the variables, respectively. If the sample

falls between (𝑚(𝑢𝑏−𝑙𝑏)

𝑛+ 𝑙𝑏) and (

(𝑚+1)(𝑢𝑏−𝑙𝑏)

𝑛+ 𝑙𝑏), 𝑚 = 0, . . , 𝑛, the index of the sample

is 𝑚+ 1. Note that when the variable does not have fixed lower bound and upper bound,

a rough bound can be determined and then two additional sections, which are smaller

than the lower bound and larger than the upper bound are added, as shown in Figure

5-6.

xi lb ub ... ...

Index 1 ... ... nm+10 n+1

Figure 5-6: Discretization for the variable without fixed bounds.

Step 3. Calculate the joint probability of the objective, 𝑝(𝑓|𝒙, 𝐷, 𝐺). The approximate

inferencing method is employed to generate the conditional distribution of the variables,

𝑝(𝑥𝑖|𝑃𝑥𝑖, 𝐷, 𝐺) , where, 𝑥𝑖 is the intermediate variables, 𝑃𝑥𝑖 is the parents of 𝑥𝑖 . The

conditional distribution can be calculated as follows

𝑝(𝑥𝑖 = 𝑎|𝑃𝑥𝑖 = 𝑏, 𝐷, 𝐺) =𝑁𝑥𝑖=𝑎,𝑃𝑥𝑖=𝑏𝑁𝑃𝑥𝑖=𝑏

(5-2)

where, 𝑁𝑃𝑥𝑖=𝑏 is the number of samples that 𝑃𝑥𝑖 = 𝑏, and 𝑁𝑥𝑖=𝑎,𝑃𝑥𝑖=𝑏 is the number of

samples that 𝑥𝑖 = 𝑎 as well 𝑃𝑥𝑖 = 𝑏 . Because the design variables are generated

following the uniform distribution, the prior probability of the design variable can be

84

calculated as 𝑝(𝑥 = 𝑎) = 1/𝑛. Then, the joint probability of objective can be calculated

as follows

𝑝(𝑓 = 𝑎|𝑥, 𝐷, 𝐺)

=∑ …∑ ∑ (𝑝(𝑓 = 𝑎|𝑃𝑥𝑖1)𝑝(𝑃𝑥𝑖1|𝑃𝑥𝑖2)⋯ 𝑝(𝑃𝑥𝑖𝑘|𝑥)𝑝(𝑥))𝑛𝑥

𝑖𝑥=1

𝑛𝑘

𝑖𝑘=1

𝑛1

𝑖1=1

= ∑ ( 𝑝(𝑓 = 𝑎|𝑃𝑥𝑖1)⋯∑ 𝑝(𝑃𝑥𝑖𝑘|𝑥)𝑛𝑘

𝑖𝑘=1

∑ 𝑝(𝑥)𝑛𝑥

𝑖𝑥=1)

𝑛1

𝑖1=1

(5-3)

where, 𝑛𝑘 gives the discrete number of each parent variable (i.e., intermediate

variables), and 𝑛𝑥 represents the number of discrete sections of design variables. By

counting the data and analyzing the Bayesian network, the joint probability of objective

can be estimated.

Step 4. Estimate the likelihood of the design variables and find the interesting area of

each variable. The likelihood is estimated according to the Bayesian theorem. 𝑝(𝑓) is

estimated through the function, 𝑝(𝑓 = 𝑎) =𝑁𝑓=𝑎

𝑁, where 𝑁 is the number of samples and

𝑁𝑓=𝑎 is the number of samples where the objective value falling in the section 𝑎. Finally,

the likelihood of the design variable is estimated via (5-3). The interval with the largest

likelihood of the design variables is selected as the interesting area.

Note that when there are multiple parents for one variable, the correlations of those

parents should be considered. However, to estimate the joint distribution considering the

correlations of multiple parents, a huge amount of samples are needed to cover all the

possible combinations of the multiple parents. One of the methods is assuming the

probability distribution given each parent is independent. For example, if A and B are the

parents of C, the distribution 𝑝(𝐶|𝐴) and 𝑝(𝐶|𝐵) are calculated independently. However,

ignoring the correlations between parents may lead to wrong likelihood estimation when

the correlations between parents are very strong. Therefore, a method named “Noisy-or”

is employed to estimate the probability distribution. In Noisy-or method, the joint

distribution given multiple parents can be calculated as follows.

𝑃(𝑓 = 𝑎|𝑥1, 𝑥2, … , 𝑥𝑛) = 1 −∏𝑃(𝑓 ≠ 𝑎|𝑥𝑖)

𝑛

𝑖=1

(5-4)

In the Noisy-or method, the probability distribution considering correlation can be

estimated by the probability distribution given each parent, which can reduce the number

of samples significantly.

85

By comparing the likelihood of each interval, the interesting sub-space can be

determined. However, the number of samples used in likelihood estimation is usually

very large. In this work, I use causal-ANN to generate the samples and thus

computational cost of attractive sub-space identification is negligible.

5.2. Case studies

The power converter design problem [152], [153] used in Chapter 4 is used to test the

performance of the proposed dimension reduction methodology. The design problem

has six design variables, as shown in Table 5-1. The objective of the problem is to

minimize the weight of the power converter.

Table 5-1: Design variables in power converter design.

Variables Name Description Lower Bound

Upper Bound

𝑥1 𝐶𝑤 Core center leg width (m) 0.001 0.1

𝑥2 𝑇𝑢𝑟𝑛𝑠 Inductor turns 1.0 10

𝑥3 𝐴𝑐𝑝 Copper size (m2) 7.29e-8 1.0e-5

𝑥4 𝐿𝑓 𝑃𝐼𝑁𝐷𝑈𝐶⁄ Inductance (H) 1.0e-6 1.0e-5

𝑥5 𝐶𝑓 Capacitance (F) 1.0e-5 0.01

𝑥6 𝑤𝑤 Core window width (m) 0.001 0.01

min𝑦1 = 𝑊𝑐 +𝑊𝑤 +𝑊𝑐𝑎𝑝 +𝑊ℎ𝑠 (5-5)

x1

x2

x3

x4

x5

x6

DELI

y6

XMLT

y7

ZP1

y5

CIRMS

XIRMS

XIMIN

XIP

ESRy3

PQ

PD

POF

y2

Wc

Ww

Wcap

Whs

y8

y1

+1

+1

+1

+1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1+1

+1

+1+1

+1

+1

+1

+1

+1

+1

+1

Figure 5-7: Causal graph of the power converter problem.

86

The problem is mainly dominated by the coupling between 𝑦2 and 𝑦3, where 𝑦2 is the

circuit efficiency and 𝑦3 is the duty circle. It is to be noted that this problem entails

mathematical expressions, which are used to build the causal graph as shown in Figure

5-7. In most engineering problems, one does not have equations and thus should use

their knowledge to construct a causal graph. The six variables at the left side are the six

design variables and the one at the right side is the objective. There are 21 intermediate

variables involved in the problem, which are listed between the design variables and the

objective in Figure 5-7. As shown in Figure 5-7, 𝑦2 is influenced by 𝑦3 through different

routes and 𝑦2 influences 𝑦3 directly. All the design variables are involved into the loops

through different links and then finally influence the objective.

Constructing causal-ANN

The casual graph can be simplified to generate the high-level causal graph. Since the

objective of the problem is to minimize the total mass of the power converter, the mass

of the four components, i.e., 𝑊𝑐, 𝑊𝑤, 𝑊𝑐𝑎𝑝, and 𝑊ℎ𝑠 can be outputs from the simulation.

Additionally, the circuit coefficient 𝑦2 as one coupled variable can also be an output from

the simulation. Thus, the simplified causal graph can be formed in Figure 5-8.

x1

x2

x3

x4

x6

x5

y2

Wc

Ww

Whs

Wcap

y1

Figure 5-8: Simplified causal graph of the power converter design problem.

According to the simplified causal graph, six sub-networks are divided as shown in

Figure 5-9. Note that the objective is a sum of the mass of each component. Thus, the

third category of causal-ANN can be constructed, which means to construct ANNs for

the first to the fifth sub-networks and use Eq. (5-5) for the sixth sub-networks. For

constructing the fourth ANN, the actual values of 𝑦2 are used as the input of this ANN.

87

For each sub-network, there are two hidden layers and four hidden neurons for each

layer. The active function is tangent sigmoid function.

x1

x6

Wc

x1

x2

x3

Ww

x1

x2

x3

x4

x6

x5

y2

y2 Whs

x5 Wcap

Wc

Ww

Whs

Wcap

y1

Figure 5-9: Six sub-networks for the power converter design problem.

In this case, 100, 200, and 500 sample points are generated by Latin hypercube design to train the causal-ANN model. The Matlab neural network toolbox is employed to construct the ANN. To test the accuracy of the ANN, another 2,000 samples are

generated and the 𝑅2 error and mean absolute error (MAE) shown in Eqs. (5-6) & (5-7)

are calculated to illustrate the error, where 𝑓 is the actual output value; 𝑓 is the predicted

value, 𝑓̅ is the average value of the actual output, N is the number of test samples. Additionally, a RBF model and an ANN with two hidden layers and four hidden neurons

for each layer are constructed on the same training samples sets. The 𝑅2 value and MAE are also calculated and compared in

𝑀𝐴𝐸 =1

𝑁∑ |𝑓

𝑖− �̂�

𝑖|

𝑁

𝑖=1 (5-7)

Table 5-2.

88

𝑅2 = 1 −∑ (𝑓𝑖 − 𝑓𝑖)

2𝑖

∑ (𝑓𝑖 − 𝑓)̅2

𝑖

(5-6)

𝑀𝐴𝐸 =1

𝑁∑ |𝑓

𝑖− �̂�

𝑖|

𝑁

𝑖=1 (5-7)

Table 5-2: Comparison of accuracy among three metamodels.

# of samples Criteria Causal-ANN ANN RBF

100 𝑅2 0.634 0.217 0.372

MAE 33.3 56.1 69.3

200 𝑅2 0.949 0.718 0.410

MAE 10.6 30.2 67.7

500 𝑅2 0.965 0.878 0.484

MAE 8.8 13.1 61.5

As shown in

𝑀𝐴𝐸 =1

𝑁∑ |𝑓

𝑖− �̂�

𝑖|

𝑁

𝑖=1 (5-7)

Table 5-2, the 𝑅2 value of the causal-ANN is the highest among the three metamodeling

method, which means causal-ANN is the most accurate metamodel. The lower value of

𝑅2 of ANN and RBF is caused by the high non-linearity of the problem, especially for

ANN. Among three metamodels, causal-ANN has the smallest MAE value, which shows

that causal-ANN is the most accurate metamodel. Additionally, with the increase of

training samples, the accuracy of three metamodels increases. It can be found that with

200 training samples, the accuracy of causal-ANN is more accurate than ANN and RBF

with 500 samples. To further illustrate the performance of the causal-ANN, the 𝑅2 value

and MAE of each sub-network is shown in Table 5-3. It can be found that all sub-

networks are accurate. Note that the third network is between all the six design variable

and 𝑦2, which has the same number of inputs as the entire design problem. However,

the accuracy of the sub-network is very high compared with the accuracy of the entire

model. By dividing the entire network to sub-networks to reduce the complexity of each

one, the accuracy of each sub-network can be improved.

Table 5-3: Accuracy of each sub-network.

# of samples

Criteria 𝑦2 𝑊𝑐 𝑊𝑤 𝑊ℎ𝑠 𝑊𝑐𝑎𝑝

89

100 𝑅2 0.770 0.999 0.997 0.634 0.980

MAE 0.023 1e-4 3.824 33.2 7e-05

200 𝑅2 0.883 0.999 0.988 0.916 0.988

MAE 0.015 9 e-05 8 e-05 10.6 6 e-05

500 𝑅2 0.988 0.999 0.971 0.919 0.990

MAE 0.005 8 e-05 1e-4 8.84 5e-05

Attractive sub-space identification

After constructing the causal-ANN, the probability distribution of the objective values and

likelihood of the design variables can be estimated on the samples generated from

causal-ANN. In this test, 200 samples are used to train the causal-ANN. At the

beginning, the design variables, the intermediate variables, and the objective are

discretized. For this case, the upper and lower bounds are used to determine the interval

of design variables. While for the intermediate variables and the objective, the minima

and maxima are selected to determine the boundary of the intervals. Thus, all the

variables and objective are divided into five intervals based on their own bounds.

The objective of the power converter problem is to minimize the mass, which means a

smaller objective value is desired. Therefore, the first interval of the objective, i.e., 𝑦 = 1,

is selected and the conditional probability 𝑃(𝑦 = 1|𝒙) and likelihood 𝑃(𝒙|𝑦 = 1) are

estimated in this problem. Considering the correlations among the six design variables,

the Noisy-or method is employed and the probability distribution of each design variable

𝑃(𝑦 ≠ 1|𝑥𝑖), 𝑖 = 1,2,… ,6 is calculated. To estimate the probability distribution and the

likelihood, 10,000 samples are generated from both the actual model and the causal-

ANN. The probability distributions estimated from the actual model and prediction model,

𝑃(𝑦 ≠ 1|𝑥𝑖) and 𝑃𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛(𝑦 ≠ 1|𝑥𝑖), 𝑖 = 1,2, … ,6 are shown in Table 5-4 and Table

5-5, where 𝑥𝑖 = 1 means the sample locates in the first interval of 𝑥𝑖.

Table 5-4: Probability distribution 𝑷(𝒚 ≠ 𝟏|𝒙𝒊), 𝒊 = 𝟏, 𝟐,… , 𝟔 on actual model.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

𝑥𝑖 = 1 0 0.0005 0.038 0.004 0.007 0.007

𝑥𝑖 = 2 0.002 0.002 0 0.0095 0.009 0.006

𝑥𝑖 = 3 0.0055 0.0075 0 0.009 0.012 0.009

𝑥𝑖 = 4 0.0105 0.014 0 0.011 0.006 0.007

90

𝑥𝑖 = 5 0.02 0.014 0 0.0045 0.004 0.009

Table 5-5: Probability distribution 𝑷𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏(𝒚 ≠ 𝟏|𝒙𝒊), , 𝒊 = 𝟏, 𝟐,… , 𝟔 on causal-

ANN.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

𝑥𝑖 = 1 0 0.0005 0.038 0.004 0.007 0.007

𝑥𝑖 = 2 0.002 0.002 0 0.0095 0.009 0.006

𝑥𝑖 = 3 0.0055 0.0075 0 0.009 0.012 0.009

𝑥𝑖 = 4 0.0105 0.014 0 0.011 0.006 0.007

𝑥𝑖 = 5 0.02 0.014 0 0.0045 0.004 0.009

As shown in the both tables, the probability distributions estimated from the predicted

model is the same as the distribution calculated from the actual model which means that

causal-ANN is accurate to estimate the predicted distribution. However, in some cases

the probability distribution is equal to zero for example when 𝑥3 = 2, which means that if

the third coordinate of the sample located in the second interval, all the objective values

will be located in its first interval. This is caused by the distribution of the objective value.

By setting the upper bound of the objective at the maximum value, over 95% objective

values will locate at the first interval. The ill-defined boundary of the objective may

render the likelihood estimation useless because the likelihood of some intervals may

reach 100% according to Eq. (5-5). Therefore, the upper bound of the objective should

be reduced to avoid 0% appeared in the probability distribution. In this case, 11 is

selected as the upper bound according to the distribution of the objective values and

then the objective is discretized into six intervals. The first interval of the objective is still

the desired space. Then, the probability distributions estimated on the actual model and

the causal-ANN is shown in Table 5-6 and Table 5-7.

Table 5-6: Probability distribution 𝑷(𝒚 ≠ 𝟏|𝒙𝒊), 𝒊 = 𝟏, 𝟐,… , 𝟔 with new upper bound.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

𝑥𝑖 = 1 0.0505 0.5775 0.7915 0.6455 0.6290 0.6035 𝑥𝑖 = 2 0.2245 0.6075 0.6315 0.6470 0.6345 0.6160

𝑥𝑖 = 3 0.9270 0.6580 0.6215 0.6360 0.6295 0.6570 𝑥𝑖 = 4 1 0.6695 0.5780 0.6425 0.6455 0.6435

𝑥𝑖 = 5 1 0.6895 0.5795 0.6310 0.6635 0.6820

Table 5-7: Probability distribution 𝑷𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏(𝒚 ≠ 𝟏|𝒙𝒊), , 𝒊 = 𝟏, 𝟐,… , 𝟔 with new

upper bound.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

𝑥𝑖 = 1 0.0495 0.5820 0.7915 0.6445 0.6290 0.6035

91

𝑥𝑖 = 2 0.2215 0.6060 0.6295 0.6465 0.6355 0.6145

𝑥𝑖 = 3 0.9265 0.6540 0.6200 0.6340 0.6290 0.6530

𝑥𝑖 = 4 1 0.6670 0.5765 0.6420 0.6420 0.6435

𝑥𝑖 = 5 1 0.6885 0.5800 0.6305 0.6620 0.6830

As shown in Table 5-6 and Table 5-7, the probability distributions estimated from the

predicted model are close to the values estimated from the actual model. Note that only

200 expensive points are used to construct the causal-ANN, and probability estimation is

performed on the causal-ANN whose cost is negligible.

Then, by employing the Noisy-or method and the Bayes theory, the interval of the design

variable with the largest likelihood can be determined, which is shown in Table 5-8. In

the table, the number of each design variable represents the interval of each variable.

Same as the above comparison, the likelihood is estimated on both the actual and

predicted models. Additionally, the interval that the optimum is located in is also

represented in the table. As shown in the table, the interesting interval generated from

the prediction model is the same as the result from the actual model, which is almost the

same as the interval where the actual optimum locates, except for 𝑥2. This is because

that the second design variable of the optimum point is located near the boundary of the

first and the second interval and the likelihood distribution cannot capture it accurately.

Table 5-8: Interesting interval with the largest likelihood.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

Actual model 1 1 5 5 1 1 Predicted model 1 1 5 5 1 1 Optimal solution 1 2 5 5 1 1

5.2.2. Aircraft concept design problem

The aircraft concept design problem [66] is used to test the performance of the proposed

method. There are nine design variables (listed in Table 5-9) and three coupled

disciplines (structure, aerodynamics, and propulsion). The objective of the problem is to

maximize the range computed by the Breguet equation. The causal graph is shown in

Figure 5-10.

92

R

V

L/D

WT/(WT-

WF)

SFC

D

WT

WT-WF

CD

WW

WE

WFWFW

CDmin

CL

k

Θ

Fo2

ESF

x

b/2λ

AR

t

Sref

t/c

Cf

Λ

T

M

h

+1

-1

-1

-1-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

+1+1

+1

+1+1

+1

+1

+1

+1

+1 +1+1

+1

+1

+1 +1

+1

+1

+1

+1 +1

+1

+1 +1

+1

+1

+1

+1

+1

+1

+1 +1

+1

+1

+1

+1

+1

+1

+1

Figure 5-10: Causal graph of the aircraft concept design problem.

Table 5-9: Design variables in aircraft concept design.

Variables Description Lower Bound Upper Bound

1 𝑀 Mach number 1.4 1.8 2 𝑇 Throttle setting 0.1 1.0

3 𝑆𝑅𝐸𝐹 Wing surface area (ft2) 500 1500 4 𝐴𝑅 Aspect ratio 2.5 8.5 5 𝑡/𝑐 Thickness/chord ratio 0.01 0.09

6 𝜆 Wing taper ratio 0.1 0.4

7 Λ Wing sweep (⸰) 40 70

8 𝑥 Wingbox x-section area (ft2) 0.9 1.25

9 𝐶𝑓 Skin friction coefficient 0.75 1.25

To simplify the causal graph, the two coupled variables, total weight of the aircraft, 𝑊𝑇

and the drag 𝐷 are selected as two intermediate variables. Also, the weight of the fuel

(𝑊𝐹 ) from the structural discipline and the specific fuel consumption (𝑆𝐹𝐶) from the

propulsion discipline are selected as the other two intermediate variables. Thus, the

simplified the causal graph is shown in Figure 5-11.

93

M

x

λ

AR

Sref

Cf

t/c

Λ

T

WT

SFC

WF

D

R

Figure 5-11: Simplified causal graph for aircraft concept design

M

x

λ

AR

Sref

Cf

t/c

Λ

T

WT

M

T

SFC

M

x

λ

AR

Sref

Cf

t/c

Λ

T

D

AR

Sref

t/c

WF

WT

SFC

WF

D

R

Figure 5-12: Sub-networks for aircraft concept design.

The simplified causal graph can be divided into five sub-networks as shown in Figure

5-12. Because the actual values of the intermediate variables can be obtained from one

simulation and there is no simple equation existing, this problem belongs to the second

category of the causal-ANN. The ANNs with two hidden layers and four hidden neurons

for each layer is constructed based on the five sub-networks. The active function is

selected as the tangent sigmoid function. Note that for the fifth network, the actual

values of 𝑊𝑇, 𝑊𝐹, 𝑆𝐹𝐶, and 𝐷 are used as the input of the network. 100, 200, and 500

training samples are generated according to the simulation model. The Matlab toolbox is

employed to train the neural network. 2,000 testing points are generated and the 𝑅2

94

value and MAE are calculated to illustrate the estimation error of the causal-ANN.

Additionally, the accuracy of a RBF model and an ANN on the entire problem are also

calculated for comparison, which are shown in Table 5-10. As shown in Table 5-10,

causal-ANN is more accurate than ANN and RBF when the number of samples is 100 or

200, while causal-ANN is comparable with others with 500 samples. In this case, the

ANN and RBF are also accurate due to their high 𝑅2 values. As increasing the number

of training samples, the accuracy of causal-ANN increases. Table 5-11 gives the 𝑅2

value and MAE of each sub-network. The accuracy of the sub-network between all

design variables and 𝑊𝑇 is the lowest compared with other networks, which brings down

the overall accuracy of the causal-ANN. The reason of the lower accuracy of the first

sub-network is that the coupling among the three disciplines is involved in this network,

which increases the complexity of the sub-problem.

Table 5-10: Comparison of accuracy value among three metamodels.

# of samples Criteria Causal-ANN ANN RBF

100 𝑅2 0.905 0.797 0.902

MAE 74.6 111.5 78.6

200 𝑅2 0.968 0.943 0.940

MAE 41.7 49.7 52.7

500 𝑅2 0.980 0.990 0.987

MAE 35.4 23.1 25.7

Table 5-11: Accuracy of each sub-network.

# of samples Criteria 𝑊𝑇 𝑊𝐹 𝑆𝐹𝐶 𝐷

100 𝑅2 0.743 0.988 0.997 0.956

MAE 6128.9 252.6 0.015 326.0

200 𝑅2 0.906 0.987 0.983 0.980

MAE 3217.3 253.3 0.009 103.7

500 𝑅2 0.931 0.993 0.999 0.997

MAE 1696.8 227.7 0.009 67.4

Once the causal-ANN is constructed with 200 training samples, the likelihood is

estimated based on the samples generated from the neural network. To illustrate the

performance of the likelihood estimation on the neural network, 10,000 testing samples

are generated on the actual model and the causal-ANN. The design variables,

intermediate variables, and the objective are discretized into five intervals. As the

objective is to maximize the range, the fifth interval of the objective is desired. To

estimate the likelihood through the Noisy-or method, the probability distribution

95

𝑃(𝑦 ≠ 5|𝑥𝑖), 𝑖 = 1,2,… ,9 is estimated on the actual model and the causal-ANN as shown

in Table 5-12 and Table 5-13. It can be found that the probability distribution estimated

from causal-ANN is similar with the results from the actual model. Then, the likelihood is

calculated via Bayes theory and the interval with the largest likelihood is represented in

Table 5-14. The interval where the optimal solution locates in is also shown in the same

table. It can be found that the interesting interval generated from causal-ANN is the

same as that obtained from the actual model. Additionally, this interesting interval is

exactly where the optimal solution locates. Therefore, by employing the causal-ANN and

the likelihood estimation method, interesting design subspaces of the problem can be

detected with few expensive function evaluations.

Table 5-12: Probability distribution 𝑷(𝒚 ≠ 𝟏|𝒙𝒊), 𝒊 = 𝟏, 𝟐,… , 𝟗 on real model.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9

𝑥𝑖 = 1 1 0.9955 0.9950 1 1 0.9965 0.9990 0.9985 0.9975

𝑥𝑖 = 2 1 0.9980 0.9995 1 0.9990 1 0.9990 0.9985 0.9960

𝑥𝑖 = 3 0.9990 1 0.9990 1 0.9995 0.9990 0.9965 0.9980 1

𝑥𝑖 = 4 0.9970 1 1 0.9995 0.9985 0.9995 0.999 0.9985 1

𝑥𝑖 = 5 0.9975 1 1 0.9940 0.9965 0.9985 1 1 1

Table 5-13: Probability distribution 𝑷𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏(𝒚 ≠ 𝟏|𝒙𝒊), , 𝒊 = 𝟏, 𝟐, … , 𝟗 on causal-

ANN.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9

𝑥𝑖 = 1 1 0.9875 0.9830 1 0.9965 0.9905 0.9920 0.9940 0.9900

𝑥𝑖 = 2 1 0.9930 0.9945 1 0.9955 0.9975 0.9950 0.9955 0.9875

𝑥𝑖 = 3 0.9985 0.9965 0.9980 1 0.9985 0.9960 0.9940 0.9940 0.9975

𝑥𝑖 = 4 0.9880 0.9990 0.9990 0.9995 0.9950 0.9950 0.9980 0.9940 1

𝑥𝑖 = 5 0.9885 0.9990 1 0.9755 0.9895 0.9960 0.9960 0.9975 1

Table 5-14: Interesting interval with the largest likelihood.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9

Actual model 5 1 1 5 5 1 1 1 2 Predicted model 5 1 1 5 5 1 1 1 2 Optimal solution 5 1 1 5 5 1 1 1 2

5.2.3. Discussion

Generation of high-level causal graph

The causal relations are employed as the premier knowledge in causal-ANN. However, it

is hard to generate an accurate and complete causal graph. In this thesis, only a high-

level causal graph including key intermediate variables is needed to represent the

96

causal-effect relations in the design problem. As described in Section 5.1, finding the

important intermediate variables is the key step in constructing a high-level casual

graph. One of the criteria to select intermediate variables is if the variable value can be

calculated or is an output from simulation. These intermediate variables can thus be

called by-product variables. Basically, the coupling variables, outputs of each discipline,

and by-product variables can be selected as key intermediate variables in the high-level

causal graph. Another suggestion is to simplify the structure of an existing causal graph.

Involving many variables in the causal graph may cause difficulty in constructing causal-

ANN. Thus, a causal graph with less than two intermediate layers is recommended.

Additionally, for the problem with coupling loops, one variable in each coupling relation is

selected to avoid coupling in the causal graph since the BN cannot deal with coupling

well. Finally, the intermediate variables that have directly and prominent impact on the

objective are usually selected as key variables.

In this chapter, the complete causal graph exists for the two case study problems. Thus,

the high-level causal relations can be generated through simplifying the causal graph.

For the power converter problem, considering the objective is to minimize the total

weight of the converter, the weight of each component can be selected as the key

intermediate variables. Also, one of the coupling variables, circuit coefficient (𝑦2) is kept

in the high-level causal graph. For the aircraft design problem, the total weight of the

aircraft (𝑊𝑇), the drag (𝐷), the weight of the fuel (𝑊𝐹), and the structural discipline and

the specific fuel consumption (𝑆𝐹𝐶), which directly influence the final objective, i.e.,

range, are picked as the key variables. Additionally, 𝑊𝑇 and 𝐷 are the coupled variables,

while 𝑊𝐹 and 𝑆𝐹𝐶 are the outputs from structure and propulsion disciplines. If a

complete causal graph does not exist, high-level knowledge about the design problem

can be utilized to generate the causal relations in causal-ANN.

Fault tolerance studies on causal relations

Even though only high-level causal relations are required in constructing causal-ANNs,

there might be errors in defining these causal relations, which may influence the

accuracy of the causal-ANN. Thus, the impact of the faulty causal relations on the

accuracy of the causal-ANN is discussed in this section.

First, the influences of the number of layers in the causal relations are discussed. Figure

5-7 illustrates a high-level causal relation including two intermediate layers. As shown in

97

Figure 5-13, one intermediate variable, 𝑦2, is removed from the causal graph to reduce

the number of intermediate layer to one. Compared with the original causal-ANN, the

sub-network, [𝑥1, … , 𝑥6] − 𝑦2 −𝑊ℎ𝑠 is replaced by the direct links from the design

variables to 𝑊ℎ𝑠 . Thus, the total number of sub-networks to be trained is four. The

casual-ANN with one intermediate layer is trained with 200 samples and the 𝑅2 values of

the objective and different intermediated variables are calculated on 2000 testing

samples and shown in Table 5-15. Note that the same training samples and test

samples as in Section 5.2.1 are used in this test and the following test. Compared with

the casual-ANN with 𝑦2, the accuracy of the new causal-ANN decreases. Comparing the

𝑅2 value of 𝑊ℎ𝑠 in Table 5-3 and Table 5-15, it can be found that involving more

intermediate variables in the complex networks can improve the prediction accuracy. On

the other hand, compared with ANN and RBF model, the accuracy of the new causal-

ANN is still better, which means a simple high-level causal graph can also improve the

accuracy of the prediction model.

x1

x2

x3

x4

x6

x5

Wc

Ww

Whs

Wcap

y1

Figure 5-13: Casual graph with one intermediate layer for power converter design

Table 5-15: 𝑹𝟐 value of objective and intermediate variables for the causal-ANN without 𝒚𝟐.

𝑦1 𝑊𝑐 𝑊𝑤 𝑊ℎ𝑠 𝑊𝑐𝑎𝑝

𝑅2 0.886 1.000 0.969 0.805 1.000

Second, the influence of missing links is studied. Six causal graphs with one of the links

from [𝑥1, … , 𝑥6] to 𝑦2 missing in each graph are employed to construct causal-ANNs and

the accuracies of those causal-ANNs are calculated. The 𝑅2 values of the objective and

the intermediate variables, 𝑦2 and 𝑊ℎ𝑠, are listed in Table 5-16. It can be found that,

missing the links will decrease the accuracy of the causal-ANN model. If any of the links

from [𝑥1, 𝑥2, 𝑥3] to 𝑦2 is removed, the causal-ANN fails. In the causal-ANN with multiple

98

layers, the accuracy of the previous sub-network has large impact on the next sub-

network and the errors will accumulate through the sub-networks. Thus, the low

accuracy of 𝑦2 when removing the links from [𝑥1, 𝑥2, 𝑥3] to 𝑦2 leads to a failed prediction

of 𝑦1 as the negative 𝑅2 value. However, if any of the links from [𝑥4, 𝑥5, 𝑥6] to 𝑦2 are

missed, the prediction accuracy will not decrease much compared with the correct

causal graph. Table 5-17 gives the ANOVA analysis results of [𝑥1, … , 𝑥6] to 𝑦2, which

illustrates that [𝑥1, 𝑥2, 𝑥3] are important variables while [𝑥4, 𝑥5, 𝑥6] are not. Therefore,

missing the links of the important variables will decrease the prediction accuracy

significantly while missing the links of unimportant variables will influence the accuracy

slightly. Additionally, another causal graph with all the links from [𝑥4, 𝑥5, 𝑥6] to 𝑦2

removed is used to build the causal-ANN and the 𝑅2 values are shown in Table 5-16 as

well. The results show that even if missing three unimportant variables from the causal

graph, the accuracy of the causal-ANN is still acceptable. In engineering design, the

chance of missing less important variables is much larger than missing important

variables and missing those less important variables will influence the accuracy of the

causal graph slightly. On the other hand, if important variables are missed from the

causal graph, the prediction of the causal-ANN will be poor or unacceptable.

Table 5-16: Comparison of 𝑹𝟐 values when missing links in causal graphs.

Missing link(s) 𝑦1 𝑦2 𝑊ℎ𝑠

None 0.967 0.994 0.934

𝑥1 − 𝑦2 -60.280 0.0913 -122.265

𝑥2 − 𝑦2 -4.120 0.6768 -9.300

𝑥3 − 𝑦2 -62.245 -1.002 -126.222

𝑥4 − 𝑦2 0.922 0.992 0.844

𝑥5 − 𝑦2 0.949 0.992 0.897

𝑥6 − 𝑦2 0.931 0.993 0.861

[𝑥4, 𝑥5, 𝑥6] − 𝑦2 0.909 0.993 0.818

Table 5-17: ANOVA analysis results of [𝒙𝟏, … , 𝒙𝟔] to 𝒚𝟐

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

Prob>F 0 0 0 0.111 0.165 0.565

Impact of variable correlations

In engineering problems, design variables usually correlate with each other. But

considering the correlations may lead to higher expenses because a large number of

variable combinations should be considered. To reduce the computational cost, the

99

multiple parents usually are assumed to be independent of each other in common

probability inference. In this case, each design variable is considered independently and

the interval of each design variable with the largest likelihood is determined separately

and finally those intervals of variables are put together to form the interesting design

subspace [17]. However, ignoring variable correlations may bring extra errors in

probability inference. Thus, the impact of variable correlations is discussed in this

section. To illustrate the differences between considering variable correlations or not, the

interval with the largest likelihood is estimated with the independence assumption on the

same 10,000 samples, and the results are shown in Table 5-18 and Table 5-19 for both

the power converter design problem and aircraft design problem. By comparing the

results between Table 5-8 and Table 5-18 for the power converter design problem, it can

be found that in the power converter design problem, ignoring the variable correlations

may lead to completely wrong results. It can be explained that for a highly nonlinear

problem, optimizing it along each dimension cannot find the optimal solution. When the

design variables are highly correlated, the combined influence of design variables may

dominate the objective value variance. On the other hand, as shown in Table 5-19 for

the aircraft design problem, the fifth design variable tends to be in a different interval

compared with the results considering correlations and the interval where the optimal

solution is in from Table 5-14. In this case, the correlation influence of the design

variables is weaker than that in the power converter problem. Thus, the interesting

interval estimated independently is near the actual one. Therefore, correlations between

design variables should be considered in probability inference.

Table 5-18: Interesting area detected with independent assumption in power converter design.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6

Actual model 1 2 5 5 1 1 Predicted model 3 5 2 4 4 2

Table 5-19: Interesting area detected with independent assumption in aircraft concept design.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9

Actual model 5 1 1 5 5 1 1 1 2 Predicted model 5 1 1 5 4 1 1 1 2

100

5.3. Summary

To improve metamodel accuracy, knowledge of the engineering design problem is

employed in building the metamodel. The cause-effect relations are combined with ANN

to develop the causal-ANN model. The entire ANN is divided into several sub-networks

according to the causal graph. Values of intermediate variables are employed in

constructing sub-networks. The causal-ANN is employed in two engineering case

studies and the results show that the prediction accuracy of the causal-ANN outperforms

ANN or RBF model. To further explore the applications of the causal-ANN, a causal-

ANN based attractive space identification method is developed. Likelihood distributions

of the design variables are estimated through Bayesian networks according to samples

estimated from the causal-ANN. In the two engineering cases, by employing the

proposed method, the attractive sub-spaces can be found in both cases. Since the

samples used in the distribution estimation come from the causal-ANN, there is no

expensive simulation involved and thus has low cost. Additionally, the impacts of errors

in the causal graph and variable correlations are discussed based on the testing results.

For the causal graph, involving intermediate variables in the complex sub-networks can

improve the prediction accuracy. Missing less important links will not influence the

accuracy much but missing important links will cause the prediction to fail. The variable

corrections have influence on the likelihood estimation and they should be considered in

attractive design space detection.

I will apply this method to a real-world problem that is sponsored by a local company.

This will be the topic of the next chapter.

101

Chapter 6. Applying causal-ANN in energy consumption prediction

The Residential End-Use Stock and Flow (REUSF) model is developed to predict the

energy consumption according to the unit energy consumption of 19 end-uses

(appliances) and the market shares of different end-use technologies. This model can be

used by power companies and governments for planning and policy making purposes.

To improve the efficiency and accuracy of the market share prediction model, the

causal-ANN is applied to replace the original logit model. The causal relations of the

market prediction model are used to construct the ANN structure. To reduce the training

difficulty, the simulation model, named stock turnover engine in the REUSF model, is

used to replace part of the ANN. Optimization is applied to train the causal-ANN by

reducing the error between the predicted market shares and historical data.

6.1. Residential End-Use Stock and Flow Model

In recent years, there is a growing interest in reducing the residential energy

consumption which is the largest consumer of energy compared to other sectors such as

commercial and industrial. The difficulty of developing an energy model for the

residential sector is the uncertainty around the consumers’ decision-making process.

[156]. Currently, there are two main methods, top-down and bottom-up, that is used to

estimate the historical and future energy consumption for the residential sector. [156].

The top-down approach only considers the total energy consumption at an aggregate

level and it uses macroeconomic indicators such as housing starts, energy prices, and

weather to estimate the historical and future energy growth. Because of the relatively

simple model structure and the ease of access to the different inputs, the top-down

approach is widely used in long-term energy forecasting [157]–[160]. However, the top-

down approach is not able to capture the short-term behavior changes, and therefore, it

is difficult to generate specific polices based on the outcomes. On the other hand, the

bottom-up approach is able to look at individual end-uses and housing types to develop

an energy forecast [161]. The bottom-up approach can be developed using either the

102

statistical or engineering methodology. During the calibration period of such models,

historical data such as: annual energy consumption, end-use efficiencies, fuel prices,

and life expectancy is used to develop an energy forecast. [162]–[165], The engineering

methodology requires specific inputs such as power rates and heat transfer rates to

estimate the historical and future load growth. [166]–[169]. By considering the energy

consumption of individual end-uses, the bottom-up approach enables researchers to

study the causality of the historical load as well as the impact of different scenarios

around government policies and consumer decision in the forecast period. However,

both bottom-up methodologies rely heavily on large amounts of survey data and the

technical knowledge for individual end-uses.

The U.S Energy Information Administration (EIA) uses a hybrid approach in their

National Energy Modeling System (NEMS) to take advantage of both bottom-up and top-

down methodologies. [170] The NEMS model uses individual end-use data such as,

energy consumption and market shares as well as other input data which are typically

used in a top-down model such as temperature and fuel prices to estimate historical and

future energy growth. In addition, the model’s logit function is able to take into account

the impact of the consumer decision making process. As such, the NEMS model can be

used to evaluate the effects of different policies.

The Residential End-Use Stock and Flow Model (REUSF) uses a similar approach as

the NEMS model to provide 30-year projections of energy consumption. Similarly, the

REUSF model uses a logit function to calculate the individual end-use’s market shares,

which will be used to calculate the total energy consumption. Specifically, the REUSF

model considers 13 end-uses, including dish washers, refrigerators, TVs, freezers,

heating modules, cooktops, ovens, clothes washers, clothes dryers, set-top boxes, water

heaters, air conditioners, and lightings. Note that the heating module contains four sub-

modules considering primary and secondary heating with two different kinds of energy

sources, electricity and fuel. The lighting module is divided into four different parts

including general purpose screw-in bulbs, general purpose reflector, linear fluorescent,

and others. Thus, there are totally 19 end-uses involved in the model. Additionally, the

model decomposes end-uses into different end-use technologies (brand models), which

are delineated by efficiency tiers. For example, dish washers include two end-use

technologies, named as basic with higher energy consumption and energy star with

lower energy consumption. Different end-uses include different numbers of end-use

103

technologies, and the largest number of end-use technologies is ten. The yearly energy

consumption of each end-use is a summation of consumption of each end-use

technology as presented in Eq. (6-1).

𝐸 =∑ 𝑒𝑖𝐸𝑇

𝑖=1=∑ 𝑈𝐸𝐶𝑖 × 𝑆𝑈𝑖

𝐸𝑇

𝑖=1 (6-1)

where 𝑒𝑖 is the energy consumption of the i-th end-use technology, which is calculated

by the product of unit energy consumption (UEC) and the stock units (SU). As shown in

Figure 6-1, the SU of the end-use technologies is estimated through a logit model and a

stock turnover engine. The logit model is used to estimate the customer preferences of a

specific end-use technology according to the total life cycle cost (TLCC) and capital cost

(CC). The Stock and Turnover Engine is used to calculate the SU in the current year

based on customer preferences, the saturation of the end-use technology, and the

replacement rate or new instruments rate. The saturation represents the percentage of

dwellings that have at least one of the end-use technologies. To specify the saturation of

the end-use technology, BC province is divided into four different regions while the

houses are categorized into four different types. The saturation of the end-use

technologies varies in different region and housing type combinations and the values are

based on the residential end use survey in different combinations in BC. The following

sub-sections introduce the main parts in the REUSF model.

Model

Inputs

Total Life

Cycle Cost

Calculation

Logit

Model

Stock

Turnover

Engine

Energy

Consumption

Prediction

Market shares prediction

Stock

Units

Figure 6-1: Flow chart of Residential End-Use Stock and Flow Model.

6.1.1. Total life cycle cost calculation

In REUSF model, the TLCC of one end-use technology is regarded as the key impact on

the customer preferences when buying new stocks, while CC is considered when

104

replacing stocks. TLCC is the cost of the end-use technology in its entire life, which

includes the CC, fuel costs, and other costs. In CC calculation, the incentive given by the

government to use more efficient technologies is also considered. The fuel costs present

the total energy costs in its entire life, which is calculated by the products of unit energy

consumption, life expectancy, and fuel price. The other costs contain operation costs

and maintenance costs.

6.1.2. Logit model

In REUSF, the customer preferences are quantified by replacement shares and new

shares. Instead of directly estimating the market shares (MS) of the end-use technology,

the dynamic changes in the market are predicted by replacement/new shares when

residents replace or buy new stocks. The logit model is employed to predict the new and

replacement shares according to the TLCC and CC of the end-use technology. The

equations of the logit model are shown in Eq. (6-2), where 𝛼 and 𝛽 are the coefficients of

the logit model that need to be determined based on the historical data. In the logit

model, the customer preferences are assumed following the logit distribution according

to the TLCC and CC. Since the numbers of CC and TLCC are very large, CC and TLCC

are normalized in the logit model.

𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 =𝑒𝛼−𝛽𝐶𝐶𝑖

∑ 𝑒𝛼−𝛽𝐶𝐶𝑖𝐸𝑇

𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 =𝑒𝛼−𝛽𝑇𝐿𝐶𝐶𝑖

∑ 𝑒𝛼−𝛽𝑇𝐿𝐶𝐶𝑖𝑖

(6-2)

6.1.3. Stock turnover engine

The stock turnover engine is used to predict the number of stocks of each end-use

technology in a specific year using the number of stocks in the previous year, and new

and replacement shares. As shown in Figure 6-2, in a year period, the decay and end-

of-life stocks will be replaced by new stocks. In addition, with the increase of the number

of houses, the number of stocks will be increased as well. Therefore, the number of

stocks for the i-th end-use technology in the Y-th year can be represented as

𝑆𝑈𝑌,𝑖 = 𝑆𝑈𝑌−1,𝑖 + 𝑅𝑆𝑌,𝑖 +𝑁𝑆𝑌,𝑖 − 𝐷𝑆𝑌,𝑖 − 𝐸𝑂𝐿𝑌,𝑖 (6-3)

105

Stocks in last

year

New stocks

Replacement stocks

Flow in

Decay stocks

End-of-life stocks

Flow out

Figure 6-2: Flow of stocks in the stock turnover engine.

where, 𝑆𝑈𝑌 and 𝑆𝑈𝑌−1 are the number of stock units in the Y-th year and the (Y-1)-th

year. 𝐷𝑆𝑌 is the number of decay stocks and 𝐸𝑂𝐿𝑌 is the number of end-of-life stocks.

𝐷𝑆 is calculated via the decay rate while 𝐸𝑂𝐿 is calculated from the life expectancy and

the number of stocks in the last year. For the i-th end-use technology, the 𝐷𝑆𝑌,𝑖 and

𝐸𝑂𝐿𝑌,𝑖 follow the market shares in the previous year as shown in Eq. (6-4), where

𝑀𝑆𝑌−1,𝑖 represents the market shares of the i-th end-use technology in the previous year.

𝐷𝑆𝑌,𝑖 = 𝐷𝑆𝑌 ×𝑀𝑆𝑌−1,𝑖

𝐸𝑂𝐿𝑌,𝑖 = 𝐸𝑂𝐿𝑌 ×𝑀𝑆𝑌−1,𝑖 (6-4)

The decay stocks and end-off life stocks are replaced by new stocks following the

replacement shares estimated from the logit model as Eq. (6-5). Note that the total

number of replacement stocks (𝑅𝑆𝑌) is equal to the sum of 𝐷𝑆𝑌 and 𝐸𝑂𝐿𝑌 for an end-

use, while the number of replacement stocks for the i-th end-use technology (𝑅𝑆𝑌,𝑖) will

not equal to the sum of 𝐷𝑆𝑌,𝑖 and 𝐸𝑂𝐿𝑌,𝑖 since the market shares of the i-th end-use

technology may be different.

𝑅𝑆𝑌,𝑖 = 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 × 𝑅𝑆𝑌 (6-5)

𝑁𝑆𝑌 is the number of new stocks in the current year. The number of houses increases

every year and new stocks are needed for the new house. The 𝑁𝑆𝑌 for an end-use can

be calculated as

106

𝑁𝑆𝑌 = 𝑁𝑒𝑤𝐻𝑜𝑢𝑠𝑒 × 𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑖𝑜𝑛 × 𝑆𝑃𝐻 (6-6)

where, 𝑆𝑃𝐻 is the number of stocks per house. Note that the SPH is the average

number of stocks with respect to the number of houses having at least one of the end

use technologies. The number of new stocks for the i-th end-use technology can be

calculated as

𝑁𝑆𝑌,𝑖 = 𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 × 𝑁𝑆𝑌 (6-7)

6.1.4. Example of REUSF model

The dish washer in REUSF model is employed to show the process to estimate the

energy consumption in the year 2010. The parameters used in the model are shown in

Table 6-1. There are two end-use technologies considered in this model, the basic and

energy star models. The performances and the logit model coefficients of the two

technologies are listed in Table 6-2.

Table 6-1: The parameters of dish washers in 2010.

Parameters Values

𝑇𝑁𝑆2009 433,064 DS & EOL 137,248

Number of new houses 10,071 Saturation 85.2%

SPH 1

Table 6-2: The inputs of two end-use technologies.

Inputs Basic Energy Star

TLCC ($) 1,166 1,167 CC ($) 908 929

Coefficient 𝛼 -0.6 -0.6

Coefficient 𝛽 0.7 0.5 Market shares, 2009 38% 62%

Unit Energy Consumption per year, kWh 995 212

The total number of stocks in 2009 (𝑇𝑁𝑆2009) is 433,064, and the numbers of stocks for

the two end-use technologies (𝑆𝑈2009,𝐵 & 𝑆𝑈2009,𝐸) are 164,564 and 268,500 respectively

according to the market shares in that year. Employing the logit function in Eq. (6-2), the

new shares of the two end-use technologies can be calculated as 47% and 53% and the

replacement shares as 48% and 52%, respectively. According to the market shares in

107

2009, the stocks to be replaced (𝐷𝑆𝑖 + 𝐸𝑂𝐿𝑖) in 2010 for the two end-use technologies

are 52,154 and 85,093 respectively. The total number of replaced stocks in 2010 is thus

137,248, which equals to the number of decay and end-of-life stocks. As Eq. (6-5), the

𝑅𝑆2010,𝐵 and 𝑅𝑆2010,𝐸 for the two end-use technologies are 65,430 and 71,818

respectively. For the new stocks, the total number of new stocks in 2010 is 8,580

calculated from Eq. (6-6) and the 𝑁𝑆2010,𝐵 and 𝑁𝑆2010,𝐸 are 4,033 and 4,547 respectively.

Thus, the number of stocks for the two end-use technologies can be calculated as

follows.

𝑆𝑈2010,𝐵 = 164564 + 4033 + 65430 − 52154 = 181873

𝑆𝑈2010,𝐸 = 268500 + 4547 + 71818 − 85093 = 259772 (6-8)

Therefore, the market shares of the two end-use technologies in 2010 are 41% and

59%, respectively. Finally, the annual energy consumption for dish washers can be

obtained as 236 GWh.

6.1.5. Logit model training

The coefficients, 𝛼 and 𝛽 in the logit model are estimated through the historical data.

Note that only the market shares of end-use technologies can be obtained through

survey rather than the new and replacement shares. Therefore, the stock turnover

engine is also needed to predict the market shares when training the coefficients. In

REUSF, 12-year market shares are obtained from market survey for all the end-use

technologies and the training goal is to find a set of coefficients to minimize the errors

between the predicted shares and the historical data in the 12 years. Optimization

algorithms are employed to train the logit model.

One of the shortcomings of the logit model is the training efficiency. Due to large

number of end-uses and large number of end-use technologies in each end-use, the

number of design variables is extremely large. In the REUSF model, there are 19 end-

uses / sub-end-uses and each end-use contains end-use technologies varied from two

to ten; the total number of variables is 178. Additionally, since four different regions and

four different housing types are considered in REUSF, the total number of coefficients is

178 × 4 × 4 = 2848. Since the different end-uses, different regions, and different housing

types are independent, the training problem can be divided into 19 × 4 × 4 = 304 sub-

problems. Thus, the number of design variables in each sub-problem will vary between

108

four and 20. Although the dimensionality of the optimization problem is reduced, the

large number of optimization problems makes the computational cost unacceptable.

Hence, a fast training model needs to be developed to replace the logit model.

Another issue of logit models is the accuracy. The logit model requires the market

shares following the logit distribution, which cannot be guaranteed for every end-use. If

the logit distribution assumption is violated, the accuracy of the prediction model will be

very low.

To improve the training efficiency and accuracy, the proposed casual-ANN is employed

to replace the logit model to predict the market shares. The causal-ANN is constructed

between TLCC/CC and market shares, which means the stock turnover engine will

become one component in the casual-ANN structure. The details of causal-ANN

construction are described in Section 6.2.

6.2. Applying causal-ANN in market share prediction

To construct the causal-ANN for the market shares prediction, the flow chart of the

market share prediction model is first listed in Figure 6-3. Note that the TLCC, CC, SU,

and MS are all vectors in the figure, and the number of elements in each vector is equal

to the number of end-use technologies. Thus, if the number of end-use technologies is

𝐸𝑇, the number of inputs of the market share prediction model will be 3 × 𝐸𝑇 + 1 and the

number of outputs is 𝐸𝑇. According to the flow chart of the prediction model, the high-

level causal relations can be constructed as Figure 6-4.

TLCC

CC

Logit

model

NewShares

ReplaShares

SUY-1

TNS

Stock

turnover

engine

MS

Figure 6-3: Flow chart of the market share prediction.

109

TLCC

CC

NewShares

ReplaShares

SUY-1

TNS

MS

Figure 6-4: High-level causal relations of the market share prediction model.

As shown in Figure 6-4, the entire high-level causal graph can be divided into three sub-

networks, 𝑇𝐿𝐶𝐶 to 𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑠 , 𝐶𝐶 to 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑠 , and [𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑠 , 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑠 ,

𝑆𝑈𝑌−1, 𝑇𝑁𝑆] to 𝑀𝑆. In the market share prediction model, the first two sub-networks are

black-box models which are predicted through logit models, while the last sub-networks

can be performed through the given stock turnover engine which is a cheap model only

containing mathematical functions. On the other hand, the values of the intermediate

variables (𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑠 & 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑠) are unknown. Therefore, the first category causal-

ANN can be constructed as shown in Figure 6-5. The new and replacement shares are

estimated through a neural network from TLCC and CC. Then, the outputs of the

network with other inputs including SUs and TNS are used in the stock turnover engine

to calculate the market shares.

TLCC

CC

NewShares

ReplaShares

SUY-1

TNS

Stock

turnover

engine

MS

Figure 6-5: Structure of causal-ANN to predict market shares

Training the causal-ANN is different from the traditional ANN because there is no

training data for the outputs of the ANN, i.e., new and replacement shares in this case.

110

But the actual data for the final outputs, MS, are provided. The optimization method is

employed to search the weight of the ANN to minimize the error between the estimated

MS and the historical data (HS). The root mean square error (RMSE) criterion is used to

estimate the error. Thus, the training model of the causal-ANN can be presented as Eq.

(6-9), where 𝒘 is the weights in the ANN. In this chapter, Genetic Algorithm (GA) is

employed to optimize the weights in the network to minimize the RMSE value.

find 𝒘

min𝑅𝑀𝑆𝐸 =1

𝐸𝑇∑ √∑ (𝐻𝑆𝑌,𝑖 −𝑀𝑆𝑌,𝑖)

22013𝑌=2002

12

𝐸𝑇

𝑖=1

(6-9)

6.3. Results and discussion

In this section, the causal-ANN is used to predict the market shares in the REUSF

model. First, one end-use, dish washer is selected as an example to illustrate the

accuracy and efficiency of the causal-ANN. Then, the proposed method is employed to

estimate the market shares for all end-uses.

6.3.1. Case study: dish washer

There are two end-use technologies in dish washer, one is named as basic and the

other is energy star with higher energy efficiency. Therefore, there are seven inputs for

the causal-ANN, specifically four inputs to predict new and replacement shares and

three used in the stock turnover engine. The proposed method is compared with logit

model in both RMSE value and computational time. Additionally, a feedforward ANN with

one hidden layer and four hidden nodes are employed to predict the market shares from

the seven inputs by treating the prediction model as a black-box. The Matlab ANN

toolbox is used to train the original ANN. Note that all the tests are run in a computer

with Core i7 @ 3.40GHz and 16 GB memory. The tests on all end-uses are also run in

the same computational environment. Table 6-3 lists the comparison results. Figure 6-6

gives the market shares curves of dish washers among the historical data and three

predicted shares.

Comparing causal-ANN with the logit model in Table 6-3, it can be found that the RMSE

value of the proposed method is 0.0195, which is much smaller than the logit model.

111

Thus, the accuracy of the causal-ANN is higher than the logit model. As shown in Figure

6-6, the market shares obtained from causal-ANN is almost the same as the historical

data, while there is an obvious gap between the predicted shares from logit model and

historical data. This is because the logit model cannot catch the trend of new and

replacement shares accurately. Logit model requires that the share follows logit

distribution, which may be violated in some end-uses. However, ANN can be used to fit

any kind of non-linear problems. Thus, the accuracy of ANN can be higher than the logit

model when the model does not follow the logit distribution. Additionally, training the

causal-ANN spends one second which is much smaller than training the logit model. The

most difficult part of training the logit model is that when the exponential number is

negative, the output value will change in a very small region. Usually, in the prediction

model, the exponential number in the logit model is negative, which cause the

optimization difficulty in converging to the optimum. Therefore, the computational cost of

the logit model training is usually very large. But in the causal-ANN, the optimization of

weights is much easier. Thus, the causal-ANN can be more efficient than the logit

model. Besides, although training the traditional ANN is the fastest, the RMSE values

and the shares curve show that the ANN fails to model accurately. This is because of the

lack of training data. In the prediction model, there are only 12 years of data for training,

which is so scarce that the ANN cannot be well-trained in this case. For the causal-ANN,

the neural network is only constructed in one part of the model and the other part

employs the cheap model to reduce the nonlinearity of the neural network. Additionally,

there are four inputs in the neural network of the causal-ANN compared with seven

inputs in the original ANN. The reduced dimensionality helps in reducing the difficulty of

training the network.

Table 6-3: Comparison in RMSE and time among three approximation models.

Causal-ANN Logit model ANN

RMSE 0.0195 0.1154 0.2357 Time (s) 2 27 0.5

112

(a) Basic (b) Energy star

Figure 6-6: Market shares comparison for dish washers.

6.3.2. Full model prediction

The causal-ANN is employed to predict the market shares of all the 19 end-uses in four

different housing types and four different regions, which means that there are 16

combinations of regions and housing types to be involved. For different combinations,

parameters such as the total number of stocks and saturation are different. Due to the

decomposable of the training process, the full prediction model can be divided into 304

sub-models which can be trained separately. However, the number of coefficients is still

large for some end-uses, which leads to high computational cost in the training of

coefficients using the optimization method. Moreover, the total number of training is so

large that it takes long time when optimizing those problems sequentially. To improve

efficiency, the parallel computing toolbox in Matlab is employed in training the market

shares prediction model.

In the full prediction model, both the causal-ANN and logit models are trained

sequentially as well as in parallel, for the entire Stock and Flow model. The results are

shown in Table 6-4. The RMSE values shown in the table are the average values of the

19 end-uses in 16 different combinations of housing types and regions. Since the ANN

fails in prediction of market shares due to the data scarcity for specific end use tests, it

will not be used for the entire Stock and Flow model. As shown in Table 6-4, the average

RMSE for all prediction models obtained from the causal-ANN is 0.023, which is much

lower than the value obtained from the original logit model. Therefore, the accuracy of

the market share prediction can be improved significantly. Moreover, the computational

113

cost of the proposed method is much smaller than the logit model, where training causal-

ANN spends half an hour sequentially and only six minutes in parallel, compared with

nine hours in sequential training and two and half hours in parallel training of the logit

model. The number of coefficients in the logit model has large impact on the

computational costs in model training. Thus, training the end-use with a large number of

end-use technologies will take much longer time than training dish washers. However,

the training costs for the causal-ANN increase little with the increase of the number of

end-use technologies. One of the applications of the REUSF model is to test the

influence of different policies on the residential energy consumption. For each test, the

market shares prediction model needs to be retrained since the input data may be

changed. Therefore, a higher training efficiency of employing causal-ANN can enable

more policy testing iterations to find better choices.

Finally, the market shares prediction model with the causal-ANN is employed in the

REUSF model to predict the energy consumption from 2014 to 2034 and the results are

shown in Figure 6-7. Two different cases are considered in the test, including static and

natural. In the static case, the new and replacement shares will remain static in the

forecasting period, while the new and replacement shares will vary following the causal-

ANN prediction in the natural case. The lower energy consumption in the natural case

means that more and more low-efficiency end-use technologies will be replaced by high

efficiency ones if considering the varying new and replacement shares.

Table 6-4 Approximation results of causal-ANN and logit model.

Causal-ANN Logit model

Average RMSE 0.023 0.128 Time (Sequential) 25 mins 9 hrs.

Time (Parallel) 6 mins 2.5 hrs.

114

Figure 6-7 Energy consumption prediction using causal-ANN.

6.4. Summary

Training a logit model to predict market shares in REUSF model is time-consuming and

the trained logit model is not sufficiently accurate. To improve the efficiency and

accuracy of the prediction model, a causal-ANN is proposed and applied in the REUSF

model. The causal relations are employed to construct the structure of the network. In

REUSF model, the cheap model, stock turnover engine, can be employed to reduce the

training difficulty, while the values of the intermediate variables, new and replacement

shares are unknown. To train the causal-ANN, the weights in the network are optimized

to minimize the RMSE value between the predicted shares and historical data. The

predicted results of causal-ANN are compared with the original logit model and the

results show that the casual-ANN can improve the accuracy and efficiency significantly.

Especially, the training time can be reduced to six minutes in parallel compared with 2.5

hours using the logit model.

115

Chapter 7. Conclusions and future work

7.1. Conclusions

The knowledge-assisted metamodeling and optimization methodology are discussed

and developed in this thesis. First, according to concepts of knowledge and existing

applications of knowledge in optimizations, different potential applications of knowledge

assisted optimizations are proposed. Next, two types of knowledge are employed to

assist metamodeling and optimizations. A PMO method is developed by employing

sensitivity information to improve the efficiency to deal with large-scale optimization

problems. Causal relations are employed to reduce the dimensionality of optimization

problems by determining variables without contradiction. Moreover, causal-ANN is

developed by combining the causal relations and ANN structures together to improve the

accuracy and efficiency of the metamodel. Combining with the Bayesian theory, the

attractive design spaces can be identified efficiently through causal-ANN. Finally, causal-

ANNs are applied in an energy forecasting model and the accuracy and efficiency of the

prediction significantly.

Specific knowledge, such as algorithmic and symbolic knowledge has been employed in

optimization in improving effectiveness and efficiency of optimization. However, there is

no systematic way to employ different kinds of knowledge together to deal with one

problem. Through analysis of potential applications of knowledge in optimization, it is

found that different knowledge can be applied to different stages of optimization from

problem formulation to optimization strategies. Equations and graphic knowledge such

as casual graph and Bayesian networks tend to be attractive in assisting large-scale

optimizations. In this thesis, two categories of knowledge, sensitivity information and

casual relations are employed in optimization to assist in dimension reduction,

metamodeling, and optimization process.

To decrease the dimensionality of the optimization problem, the sensitivity information is

employed to develop the Partial Metamodel-based Optimization (PMO) algorithm to

obtain the best optimal solution with scarce samples. Instead of constructing a complete

116

RBF-HDMR model, a series of partial RBF-HDMR models are constructed based on the

fundamental belief that optimization can be performed on an imperfect or incomplete

metamodel. The sensitivity information is used to quantify the importance of each

variable and the more important variables are more probable to be modeled in the partial

RBF-HDMR. The roulette wheel selection operator is used to select the constructed

variable to balance between exploration and exploitation. To pay more attention to the

accuracy around the interesting area, the cut center moves to the current optimal point

at every iteration and a new partial RBF-HDMR is built on the new cut center. To reduce

the number of real function evaluations, most of the points in constructing the new partial

model can be predicted by the RBF-HDMR in the last iteration. Compared with

optimization on a complete RBF-HDMR, PMO obtains better optimum solutions with

fewer number of function evaluations. Moreover, the trust region based PMO (TR-PMO)

is developed to further improve the performance by focusing on the most attractive

design area. The test results show that TR-PMO performs comparably or better than

TRMPS and OMID using scarce samples.

Next, causal relations are used to reduce the dimensionality of engineering design

problem. The main idea of the dimension reduction method is to find variables without

contradiction, which can be defined as the variables having a monotonic effect on the

objectives. To distinguish the variables without contradiction, the causal graph is applied

in design problems to show the route from design variables to the objective. A DSM-

based qualitative analysis method is developed to automatically find out the variables

without contradiction. By transferring the causal graph to a DSM and multiplying the

DSM by itself multiple times, the impact of design variables on the objective is analyzed

to find the contradictions. Taguchi method is used to calculate the weights of each link to

simplify the causal graph. According to the simplified causal graph, the design variables

can be divided into two parts, the important variables and the less important variables.

The two-stage optimization process is used to optimize the two parts of variables

sequentially. The proposed method is used to solve the power converter design and

aircraft concept design problems and the number of function evaluations after dimension

reduction is reduced significantly. On the other hands, when the number of function

evaluations is fixed, using the two-stage optimization process can obtain better

solutions.

117

To capture more information from the design problem rather than blindly constructing

metamodel, the causal relations are combined with neural networks to construct the

causal-ANN. The high-level causal graph is used to guide the structure of the neural

networks. Considering other kinds of knowledge involved in the causal-ANN, the causal-

ANN can be classified into three categories, involving cheap models, involving values of

intermediate variables, and involving both. Using cheap models to replace sub-networks

in the causal-ANN can improve the approximation accuracy. If the values of the

intermediate variables can be obtained from simulation, each sub-network can be

trained separately. By dividing the complex networks to multiple sub-networks can

reduce the complexity of each sub-model and improve the prediction accuracy.

Compared to constructing a metamodel between design variables and objective directly,

the causal-ANN is more accurate for the two test problems. Apart from giving accurate

prediction, the causal-ANN model is used to detect the attractive design space. Using

the prediction values from the causal-ANN, the likelihood values of the design variables

with respect to the objective are estimated and the attractive design space is detected by

comparing the likelihood values of different intervals. The test results show that the

interval where the optimal point lies in can be determined by estimating the likelihood

from causal-ANN models.

Finally, the causal-ANN model is applied in a residential energy consumption forecasting

model, a project sponsored by a local power company. The logit model in the REUSF

model is replaced by the causal-ANN model to predict the market shares of different

end-use technologies. Besides of the causal relations, the cheap model, i.e., stock

turnover engine is involved in the causal-ANN while the values of the intermediate

variables cannot be obtained. Compared to the original logit model, the accuracy of the

predicted market shares is improved by employing causal-ANN. Moreover, the training

time of the causal-ANN is reduced significantly from several hours to few minutes.

Therefore, applying knowledge in metamodeling in the energy consumption forecasting

can improve the accuracy and efficiency.

7.2. Future Research

This thesis offers a novel way to break the “curse-of-dimensionality” by involving

knowledge in the metamodeling and optimization. Besides the research in this thesis,

some future works can be proposed as follows.

118

7.2.1. Knowledge validation, correction, and updating

Correct knowledge can assist metamodeling and optimization, but wrong information

could lead to erroneous result. Therefore, the knowledge needs to be validated before

using in the optimization, especially the knowledge obtained through experiences or

experiments. The limitation of the current knowledge obtaining method is that the

knowledge is obtained from one source. To validate the knowledge, different sources

can be used. For instance, the experiment data can be used to validate the knowledge

obtained from experiences while the experience can be used to judge the correctness of

data. When errors are observed in the current knowledge base, these errors should be

corrected through certain methodologies. Also, different sources of knowledge can be

performed together to correct the errors. Another work is the knowledge updating. With

the optimization process, new knowledge can be obtained as more samples are

generated. Then, how to update the current knowledge base and how to apply the newly

obtained knowledge in optimization are research questions.

7.2.2. Employing different kinds of knowledge

More kinds of knowledge involved in the optimization process will further improve the

efficiency and accuracy of the optimization. One of the aspects of this challenge is how

to combine linguistic knowledge with data in the optimization. The linguistic knowledge

can be applied in problem formulation stage to formulate a more reasonable problem.

However, the usage of linguistic knowledge in the optimization process may be limited.

Within the process of optimization, more information can be obtained and the majority of

it is related to data. The task is how to systematically utilize different types of knowledge

at different stages of optimization. During optimization there are different kinds of

knowledge can be obtained and employing only a single kind of knowledge has

limitations in dealing with complex problems. Therefore, a systematic methodology of

organizing different kinds of knowledge is needed to concertedly assist optimization.

7.2.3. Knowledge-assisted optimization strategies

Besides the sensitivity information and causal graph assisted optimization strategies

developed in this thesis, other knowledge-assisted optimization strategies can be

proposed. The causal-ANN is applied in detecting the attractive design area combining

119

with Bayesian theory. This information can be used to generate new samples in the most

interesting design spaces. Combined with the input-output relations represented by

causal graphs, the components selection in partial metamodels can be more effective.

120

References

[1] S. Shan and G. G. Wang, “Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions,” Struct. Multidiscip. Optim., vol. 41, no. 2, pp. 219–241, 2010.

[2] J. H. Holland, Adaptation in natural and artificial systems : an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, 1992.

[3] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by Simulated Annealing,” Science (80-. )., vol. 220, no. 4598, pp. 671–680, 1983.

[4] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE international conference on neural networks, 1995, vol. 4, pp. 1942–1948.

[5] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient Global Optimization of Expensive Black-Box Functions,” J. Glob. Optim., vol. 13, no. 4, pp. 455–492, 1998.

[6] L. Wang, S. Shan, and G. G. Wang, “Mode-pursuing sampling method for global optimization on expensive black-box functions,” Eng. Optim., vol. 36, no. 4, pp. 419–438, 2004.

[7] G. H. Cheng, A. Younis, K. Haji Hajikolaei, and G. Gary Wang, “Trust Region Based Mode Pursuing Sampling Method for Global Optimization of High Dimensional Design Problems,” J. Mech. Des., vol. 137, no. 2, p. 21407, Feb. 2015.

[8] K. Haji Hajikolaei, G. H. Cheng, and G. G. Wang, “Optimization on Metamodeling-Supported Iterative Decomposition,” J. Mech. Des., vol. 138, no. 2, p. 21401, Dec. 2015.

[9] R. G. Regis and C. A. Shoemaker, “Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization,” Eng. Optim., vol. 45, no. 5, pp. 529–555, May 2013.

[10] D. Wu, K. H. Hajikolaei, and G. G. Wang, “Employing partial metamodels for optimization with scarce samples,” Struct. Multidiscip. Optim., pp. 1–15, Sep. 2017.

[11] R. Bellman, Dynamic programming. Univ. Pr, 1972.

[12] P. A. Boghossian, Fear of knowledge : against relativism and constructivism. Clarendon Press, 2006.

[13] M. Beynon, D. Cosker, and D. Marshall, “An expert system for multi-criteria decision making using Dempster Shafer theory,” Expert Syst. Appl., vol. 20, no. 4,

121

pp. 357–367, May 2001.

[14] M. B. Islam and G. Governatori, “RuleRS: a rule-based architecture for decision support systems,” Artif. Intell. Law, vol. 26, no. 4, pp. 315–344, Dec. 2018.

[15] P. Kim and Y. Ding, “Optimal Engineering System Design Guided by Data-Mining Methods,” Technometrics, vol. 47, no. 3, pp. 336–348, Aug. 2005.

[16] A. Cutbill and G. G. Wang, “Mining constraint relationships and redundancies with association analysis for optimization problem formulation,” Eng. Optim., vol. 48, no. 1, pp. 115–134, 2016.

[17] P. B. Backlund, D. W. Shahan, and C. C. Seepersad, “Classifier-guided sampling for discrete variable, discontinuous design space exploration: Convergence and computational performance,” Eng. Optim., vol. 47, no. 5, pp. 579–600, May 2015.

[18] P. B. Backlund, C. C. Seepersad, and T. M. Kiehne, “All-Electric Ship Energy System Design Using Classifier-Guided Sampling,” IEEE Trans. Transp. Electrif., vol. 1, no. 1, pp. 77–85, Jun. 2015.

[19] C. Sharpe, C. Morris, B. Goldsberry, C. C. Seepersad, and M. R. Haberman, “Bayesian Network Structure Optimization for Improved Design Space Mapping for Design Exploration With Materials Design Applications,” in Proceeding of ASME 2017 International Design Engineering Technical Conferences August 6-9, 2017, p. V02BT03A004.

[20] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach. New Jersey: Prentice Hall, 2003.

[21] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, vol. 60, no. 4. Prentice Hall, 2003.

[22] D. L. Poole, A. Mackworth, and R. G. Goebel, “Computational Intelligence: A Logical Approach,” Comput. Intell. A Log. Approach, vol. 2, no. 2, pp. 146–149, 1998.

[23] G. W. Ernst and A. Newell, GPS: A case study in generality and problem solving. Academic Pr, 1969.

[24] B. Chandrasekaran, “Generic tasks in knowledge-based reasoning: High-level building blocks for expert system design,” IEEE Expert, vol. 1, no. 3, pp. 23–30, 1986.

[25] B. G. Buchanan, D. Barstow, R. Bechtal, J. Bennett, W. Clancey, C. Kulikowski, T. Mitchell, and D. A. Waterman, “Constructing an expert system,” Build. Expert Syst., vol. 50, pp. 127–167, 1983.

[26] S. Liao, “Expert system methodologies and applications—a decade review from

122

1995 to 2004,” Expert Syst. Appl., vol. 28, no. 1, pp. 93–103, Jan. 2005.

[27] F. Hayes-Roth, D. Waterman, and D. Lenat, Building expert systems. Boston, MA: Addison-Wesley Longman Publishing Co., Inc, 1984.

[28] F. C. Bartlett and C. Burt, “Remembering: A study in experimental and social psychology,” Br. J. Educ. Psychol., vol. 3, no. 2, pp. 187–192, 1933.

[29] J. A. Bernard, “Use of a rule-based system for process control,” IEEE Control Syst. Mag., vol. 8, no. 5, pp. 3–13, Oct. 1988.

[30] K. J. Åström, J. J. Anton, and K.-E. Årzén, “Expert control,” Automatica, vol. 22, no. 3, pp. 277–286, May 1986.

[31] G. DeSanctis and R. B. Gallupe, “A Foundation for the Study of Group Decision Support Systems,” Manage. Sci., vol. 33, no. 5, pp. 589–609, May 1987.

[32] Z. Pawlak, “Rough set approach to knowledge-based decision support,” Eur. J. Oper. Res., vol. 99, no. 1, pp. 48–57, May 1997.

[33] M. H. Richer and M. H., AI tools and techniques. Ablex Pub, 1989.

[34] Y. Li, D. McLean, Z. A. Bandar, J. D. O’Shea, and K. Crockett, “Sentence similarity based on semantic nets and corpus statistics,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 8, pp. 1138–1150, Aug. 2006.

[35] R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and application of a metric on semantic nets,” IEEE Trans. Syst. Man. Cybern., vol. 19, no. 1, pp. 17–30, 1989.

[36] S. Mankovskii, M. Gogolla, S. D. Urban, S. W. Dietrich, S. D. Urban, S. W. Dietrich, M.-H. Yang, G. Dobbie, T. W. Ling, T. Halpin, B. Kemme, N. Schweikardt, A. Abelló, O. Romero, R. Jimenez-Peris, R. Stevens, P. Lord, T. Gruber, P. De Leenheer, A. Gal, S. Bechhofer, N. W. Paton, C. Li, A. Buchmann, N. Hardavellas, I. Pandis, B. Liu, M. Shapiro, L. Bellatreche, P. M. D. Gray, W. M. P. Aalst, N. Palmer, N. Palmer, T. Risch, W. Galuba, S. Girdzijauskas, and S. Bechhofer, “OWL: Web Ontology Language,” in Encyclopedia of Database Systems, Boston, MA: Springer US, 2009, pp. 2008–2009.

[37] N. Guarino, “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, 1998, pp. 3–15.

[38] T. R. Gruber, “A Translation Approach to Portable Ontology Specifications,” Appear. Knowl. Acquis., vol. 5, no. 2, pp. 199–220, 1993.

[39] A. Singhal, “Modern information retrieval: A brief overview,” IEEE Data Eng. Bull., vol. 24, no. 4, pp. 35–43, 2001.

123

[40] H. Hong, Y. Yin, and X. Chen, “Ontological modelling of knowledge management for human–machine integrated design of ultra-precision grinding machine,” Enterp. Inf. Syst., vol. 10, no. 9, pp. 970–981, 2016.

[41] P. Sainter, K. Oldham, A. Larkin, A. Murton, and R. Brimble, “Product knowledge management within knowledge-based engineering systems,” in Design Engineering Technical Conference, Baltimore, Setembro, 2000.

[42] S. Sunnersjö, “A taxonomy of engineering knowledge for design automation,” in Proceedings of TMCE 2010 symposium, April 12-16, 2010.

[43] S. K. Chandrasegaran, K. Ramani, R. D. Sriram, I. Horváth, A. Bernard, R. F. Harik, and W. Gao, “The evolution, challenges, and future of knowledge representation in product design systems,” Comput. Des., vol. 45, no. 2, pp. 204–228, 2013.

[44] R. Owen and I. Horváth, “Towards product-related knowledge asset warehousing in enterprises,” in Proceedings of the 4th international symposium on tools and methods of competitive engineering, TMCE, April 22-26, Wuhan, China, 2002, vol. 2002, pp. 155–170.

[45] I. Nonaka, The knowledge-creating company. Harvard Business Press, 2008.

[46] J. F. Sowa, Knowledge representation: logical, philosophical, and computational foundations, vol. 13. MIT Press, 2000.

[47] S. Gorti, A. Gupta, G. Kim, R. Sriram, and A. Wong, “An object-oriented representation for product and design processes,” Comput. Des., vol. 30, no. 7, pp. 489–501, Jun. 1998.

[48] Y. Rezgui, S. Boddy, M. Wetherill, and G. Cooper, “Past, present and future of information and knowledge sharing in the construction industry: Towards semantic service-based e-construction?,” Comput. Des., vol. 43, no. 5, pp. 502–515, 2011.

[49] Z. Li, V. Raskin, and K. Ramani, “Developing Engineering Ontology for Information Retrieval,” J. Comput. Inf. Sci. Eng., vol. 8, no. 1, p. 11003, Mar. 2008.

[50] M. N. Huhns and M. P. Singh, “Ontologies for agents,” IEEE Internet Comput., vol. 1, no. 6, pp. 81–83, 1997.

[51] G. La Rocca, “Knowledge based engineering: Between AI and CAD. Review of a language based technology to support engineering design,” Adv. Eng. Informatics, vol. 26, no. 2, pp. 159–179, Apr. 2012.

[52] G. La Rocca, “Knowledge based engineering techniques to support aircraft design and optimization,” TU Delft, 2011.

124

[53] P. . Lovett, A. Ingram, and C. . Bancroft, “Knowledge-based engineering for SMEs — a methodology,” J. Mater. Process. Technol., vol. 107, no. 1–3, pp. 384–389, Nov. 2000.

[54] G. La Rocca and M. J. L. Van Tooren, “Enabling distributed multi-disciplinary design of complex products: a knowledge based engineering approach,” J. Des. Res., vol. 5, no. 3, p. 333, 2007.

[55] A. H. Van Der Laan and M. J. L. Van Tooren, “Parametric Modeling of Movables for Structural Analysis,” J. Aircr., vol. 42, no. 6, pp. 1605–1613, Nov. 2005.

[56] R. Van Dijk, R. d’Ippolito, G. Tosi, and G. La Rocca, “Multidisciplinary design and optimization of a plastic injection mold using an integrated design and engineering environment,” in NAFEMS World Congress, Boston, 2011.

[57] D. Wu, E. Coatanea, and G. G. Wang, “Employing Knowledge on Causal Relationship to Assist Multidisciplinary Design Optimization,” J. Mech. Des., vol. 141, no. 4, p. 41402, Jan. 2019.

[58] J. R. R. a Martins and A. B. Lambe, “Multidisciplinary Design Optimization: A Survey of Architectures,” AIAA J., vol. 51, no. 9, pp. 2049–2075, 2013.

[59] R. S. Krishnamachari and P. Y. Papalambros, “Optimal Hierarchical Decomposition Synthesis Using Integer Programming,” J. Mech. Des., vol. 119, no. 4, pp. 440–447, Dec. 1997.

[60] N. F. Michelena and P. Y. Papalambros, “A Network Reliability Approach to Optimal Decomposition of Design Problems,” J. Mech. Des., vol. 117, no. 3, pp. 433–440, 1995.

[61] N. F. Michelena and P. Y. Papalambros, “A Hypergraph Framework for Optimal Model-Based Decomposition of Design Problems,” Comput. Optim. Appl., vol. 8, no. 2, pp. 173–196, 1997.

[62] T. C. Wagner and P. Y. Papalambros, “General framework for decomposition analysis in optimal design.,” ASME Des Eng Div Publ De., ASME, New York, NY(USA), vol. 65, pp. 315–325, 1993.

[63] L. Chen, Z. Ding, and S. Li, “A formal two-phase method for decomposition of complex design problems,” J. Mech. Des., vol. 127, no. 2, pp. 184–195, 2005.

[64] J. Sobieszczanski-Sobieski, “Optimization by decomposition: A step from hierarchic to non-hierarchic systems,” NASA Tech. Rep., pp. 51–78, 1988.

[65] D. R. Braun, “Collaborative optimization: an architecture for large-scale distributed design.” Department of Aeronautics and Astronautics, Stanford University, Standford, CA, 1996.

125

[66] J. Sobieszczanski -Sobieski, J. S. Agte, and R. Sandusky, “Bi-Level Integrated System Synthesis,” AIAA J., vol. 38, no. 1, pp. 164–172, 2000.

[67] N. P. Tedford and J. R. R. A. Martins, “Benchmarking multidisciplinary design optimization algorithms,” Optim. Eng., vol. 11, no. 1, pp. 159–183, 2010.

[68] D. Morris, A. Antoniades, and C. C. Took, “On making sense of neural networks in road analysis,” in Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN), May 14-19, 2017, pp. 4416–4421.

[69] R. Jin, W. Chen, and T. W. Simpson, “Comparative studies of metamodelling techniques under multiple modelling criteria,” Struct. Multidiscip. Optim., vol. 23, no. 1, pp. 1–13, Dec. 2001.

[70] D. Beasley, D. R. Bull, and R. R. Martin, “An overview of genetic algorithms: Part 2, research topics,” Univ. Comput., vol. 15, no. 4, pp. 170–181, 1993.

[71] S. J. Louis and F. Zhao, “Incorporating problem specific information in genetic algorithms,” Children, vol. 1, no. P2, p. C2, 1994.

[72] Y. Hu and S. X. Yang, “A knowledge based genetic algorithm for path planning of a mobile robot,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, 2004, vol. 5, pp. 4350–4355.

[73] H. Piroozfard, K. Y. Wong, and A. Hassan, “A hybrid genetic algorithm with a knowledge-based operator for solving the job shop scheduling problems,” J. Optim., vol. 2016, pp. 1–13, 2016.

[74] E. H. Winer and C. L. Bloebaum, “Development of visual design steering as an aid in large-scale multidisciplinary design optimization . Part I : method development,” Struct. Multidiscip. Optim., vol. 23, no. 6, pp. 412–424, 2002.

[75] E. H. Winer and C. L. Bloebaum, “Development of visual design steering as an aid in large-scale multidisciplinary design optimization. Part II: method validation,” Struct. Multidiscip. Optim., vol. 23, no. 6, pp. 425–435, Jul. 2002.

[76] A. I. J. Forrester, A. Sóbester, and A. J. Keane, “Multi-fidelity optimization via surrogate modelling,” Proc. R. Soc. A Math. Phys. Eng. Sci., vol. 463, no. 2088, pp. 3251–3269, Dec. 2007.

[77] Fang Wang and Qi-Jun Zhang, “Knowledge-based neural models for microwave design,” IEEE Trans. Microw. Theory Tech., vol. 45, no. 12, pp. 2333–2343, 1997.

[78] Z. Yang, D. Eddy, S. Krishnamurty, I. Grosse, P. Denno, Y. Lu, and P. Witherell, “Investigating Grey-Box Modeling for Predictive Analytics in Smart Manufacturing,” in Proceedings of ASME 2017 International Design Engineering Technical Conferences , August 6-9, 2017, p. V02BT03A024.

126

[79] M. Kurek, M. P. Deisenroth, W. Luk, and T. Todman, “Knowledge Transfer in Automatic Optimisation of Reconfigurable Designs,” in Proceedings of 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 1-3, 2016, pp. 84–87.

[80] M. Kurek, T. Becker, T. C. P. Chau, and W. Luk, “Automating Optimization of Reconfigurable Designs,” in Proceedings of 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, May 11-13, 2014, pp. 210–213.

[81] C. Ding, X. He, H. Zha, and H. D. Simon, “Adaptive dimension reduction for clustering high dimensional data,” in Proceedings of 2002 IEEE International Conference on Data Mining, Dec 9-12, 2002, pp. 147–154.

[82] M. D. Morris and T. J. Mitchell, “Exploratory designs for computational experiments,” J. Stat. Plan. Inference, vol. 43, no. 3, pp. 381–402, 1995.

[83] M. H. Karwan, V. Lotfi, J. Telgen, and S. Zionts, Redundancy in mathematical programming: A state-of-the-art survey, vol. 206. Springer Science & Business Media, 2012.

[84] Z.-L. Liu, Z. Zhang, and Y. Chen, “A scenario-based approach for requirements management in engineering design,” Concurr. Eng., vol. 20, no. 2, pp. 99–109, Jun. 2012.

[85] W. Chen and M. Fuge, “Beyond the Known: Detecting Novel Feasible Domains Over an Unbounded Design Space,” J. Mech. Des., vol. 139, no. 11, p. 111405, Oct. 2017.

[86] B. J. Larson and C. A. Mattson, “Design Space Exploration for Quantifying a System Model’s Feasible Domain,” J. Mech. Des., vol. 134, no. 4, p. 41010, Apr. 2012.

[87] T. H. Lee and J. J. Jung, “A sampling technique enhancing accuracy and efficiency of metamodel-based RBDO: Constraint boundary sampling,” Comput. Struct., vol. 86, no. 13–14, pp. 1463–1476, Jul. 2008.

[88] H. Z. Yang, J. F. Chen, N. Ma, and D. Y. Wang, “Implementation of knowledge-based engineering methodology in ship structural design,” Comput. Des., vol. 44, no. 3, pp. 196–202, Mar. 2012.

[89] P. Geyer, “Component-oriented decomposition for multidisciplinary design optimization in building design,” Adv. Eng. Informatics, vol. 23, no. 1, pp. 12–31, 2009.

[90] S. Ahmed, S. Kim, and K. M. Wallace, “A Methodology for Creating Ontologies for Engineering Design,” J. Comput. Inf. Sci. Eng., vol. 7, no. 2, pp. 132–140, Jun. 2007.

127

[91] J. Jinxin Lin, M. S. Fox, and T. Bilgic, “A Requirement Ontology for Engineering Design,” Concurr. Eng., vol. 4, no. 3, pp. 279–291, Sep. 1996.

[92] E. Stachtiari, A. Mavridou, P. Katsaros, S. Bliudze, and J. Sifakis, “Early validation of system requirements and design through correctness-by-construction,” J. Syst. Softw., vol. 145, pp. 52–78, Nov. 2018.

[93] D. Wu, E. Coatanea, and G. G. Wang, “Dimension Reduction and Decomposition Using Causal Graph and Qualitative Analysis for Aircraft Concept Design Optimization,” in Volume 2B: 43rd Design Automation Conference, 2017.

[94] A. Viswanath, A. I. J. Forrester, and A. J. Keane, “Dimension Reduction for Aerodynamic Design Optimization,” AIAA J., vol. 49, no. 6, pp. 1256–1266, 2011.

[95] K. Sutha and J. J. Tamilselvi, “A review of feature selection algorithms for data mining techniques,” Int. J. Comput. Sci. Eng., vol. 7, no. 6, p. 63, 2015.

[96] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014.

[97] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res., vol. 3, no. Mar, pp. 1157–1182, 2003.

[98] C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe, “A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 9, no. 4, pp. 1106–1119, Jul. 2012.

[99] J. Reunanen, “Overfitting in Making Comparisons Between Variable Selection Methods,” J. Mach. Learn. Res., vol. 3, no. Mar, pp. 1371–1382, 2003.

[100] A. Alexandridis, P. Patrinos, H. Sarimveis, and G. Tsekouras, “A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models,” Chemom. Intell. Lab. Syst., vol. 75, no. 2, pp. 149–162, Feb. 2005.

[101] S. Shan and G. G. Wang, “Turning Black-Box Functions Into White Functions,” J. Mech. Des., vol. 133, no. 3, p. 31003, 2011.

[102] K. Haji Hajikolaei, G. H. Cheng, and G. G. Wang, “Optimization on Metamodeling-Supported Iterative Decomposition,” J. Mech. Des., vol. 138, no. 2, p. 21401, Dec. 2015.

[103] C. M. Bishop, Pattern recognition and machine learning. Springer, 2006.

[104] A. Ghanbari, S. M. R. Kazemi, F. Mehmanpazir, and M. M. Nakhostin, “A Cooperative Ant Colony Optimization-Genetic Algorithm approach for construction of energy demand forecasting knowledge-based expert systems,” Knowledge-

128

Based Syst., vol. 39, pp. 194–206, Feb. 2013.

[105] M. H. Fazel Zarandi, B. Rezaee, I. B. Turksen, and E. Neshat, “A type-2 fuzzy rule-based expert system model for stock price analysis,” Expert Syst. Appl., vol. 36, no. 1, pp. 139–154, Jan. 2009.

[106] J. Zhang, Z. Ghahramani, and Y. Yang, “Flexible latent variable models for multi-task learning,” Mach. Learn., vol. 73, no. 3, pp. 221–242, Dec. 2008.

[107] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient Global Optimization of Expensive Black-Box Functions,” J. Glob. Optim., vol. 13, no. 4, pp. 455–492, 1998.

[108] G. G. Wang, “Adaptive Response Surface Method Using Inherited Latin Hypercube Design Points,” J. Mech. Des., vol. 125, no. 2, pp. 210–220, 2003.

[109] G. G. Wang, Z. Dong, and P. Attchison, “Adaptive Response Surface Method - A Global Optimization Scheme for Approximation-based Design Problems,” Eng. Optim., vol. 33, no. 6, pp. 707–733, Aug. 2001.

[110] T. Long, D. Wu, X. Guo, G. G. Wang, and L. Liu, “Efficient adaptive response surface method using intelligent space exploration strategy,” Struct. Multidiscip. Optim., vol. 51, no. 6, pp. 1335–1362, Jun. 2015.

[111] T. Wuest, D. Weimer, C. Irgens, and K.-D. Thoben, “Machine learning in manufacturing: advantages, challenges, and applications,” Prod. Manuf. Res., vol. 4, no. 1, pp. 23–45, Jan. 2016.

[112] G. Köksal, İ. Batmaz, and M. C. Testik, “A review of data mining applications for quality improvement in manufacturing industry,” Expert Syst. Appl., vol. 38, no. 10, pp. 13448–13467, Sep. 2011.

[113] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[114] R. Shi, L. Liu, T. Long, and J. Liu, “Sequential Radial Basis Function Using Support Vector Machine for Expensive Design Optimization,” AIAA J., vol. 55, no. 1, pp. 214–227, Jan. 2017.

[115] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, Oct. 1986.

[116] J. Wang, Y. Ma, L. Zhang, and R. X. Gao, “Deep learning for smart manufacturing: Methods and applications,” J. Manuf. Syst., vol. 48, pp. 144–156, Jul. 2018.

[117] O. Maimon and L. Rokach, Eds., Data Mining and Knowledge Discovery Handbook. Boston, MA: Springer US, 2010.

129

[118] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding, “Data mining with big data,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, Jan. 2014.

[119] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.

[120] İ. B. Topçu and M. Sarıdemir, “Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logic,” Comput. Mater. Sci., vol. 41, no. 3, pp. 305–311, Jan. 2008.

[121] S. Tasdemir, I. Saritas, M. Ciniviz, and N. Allahverdi, “Artificial neural network and fuzzy expert system comparison for prediction of performance and emission parameters on a gasoline engine,” Expert Syst. Appl., vol. 38, no. 11, pp. 13912–13923, Oct. 2011.

[122] H. Rabitz, Ö. Alis, and Ö. F. Alış, “General foundations of high-dimensional model representations,” J. Math. Chem., vol. 25, no. 2–3, pp. 197–233, 1999.

[123] S. Shan and G. G. Wang, “Metamodeling for High Dimensional Simulation-Based Design Problems,” J. Mech. Des., vol. 132, no. May 2010, p. 51009, 2010.

[124] X. Cai, H. Qiu, L. Gao, P. Yang, and X. Shao, “An enhanced RBF-HDMR integrated with an adaptive sampling method for approximating high dimensional problems in engineering design,” Struct. Multidiscip. Optim., vol. 53, no. 6, pp. 1209–1229, 2016.

[125] Z. Huang and H. Qiu, “An adaptive SVR-HDMR model for approximating high dimensional problems,” Eng. Comput. Int. J. Comput. Eng. Softw., vol. 32, no. 3, pp. 643–667, 2015.

[126] H. Wang, L. Tang, and G. Y. Li, “Adaptive MLS-HDMR metamodeling techniques for high dimensional problems,” Expert Syst. Appl., vol. 38, no. 11, pp. 14117–14126, 2011.

[127] E. Ebrahimi, M. Monjezi, M. R. Khalesi, and D. J. Armaghani, “Prediction and optimization of back-break and rock fragmentation using an artificial neural network and a bee colony algorithm,” Bull. Eng. Geol. Environ., vol. 75, no. 1, pp. 27–36, Feb. 2016.

[128] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short-term load forecasting: a review and evaluation,” IEEE Trans. Power Syst., vol. 16, no. 1, pp. 44–55, 2001.

[129] D. J. Fonseca, D. O. Navaresse, and G. P. Moynihan, “Simulation metamodeling through artificial neural networks,” Eng. Appl. Artif. Intell., vol. 16, no. 3, pp. 177–183, Apr. 2003.

[130] G. Zhang, B. Eddy Patuwo, and M. Y. Hu, “Forecasting with artificial neural

130

networks:: The state of the art,” Int. J. Forecast., vol. 14, no. 1, pp. 35–62, Mar. 1998.

[131] B. Cheng and D. M. Titterington, “Neural networks: A review from a statistical perspective,” Stat. Sci., pp. 2–30, 1994.

[132] R. Lippmann, “An introduction to computing with neural nets,” IEEE ASSP Mag., vol. 4, no. 2, pp. 4–22, 1987.

[133] F. S. Wong, “Time series forecasting using backpropagation neural networks,” Neurocomputing, vol. 2, no. 4, pp. 147–159, Jul. 1991.

[134] S. Y. Kang, “An investigation of the use of feedforward neural networks for forecasting.,” Kent State University, 1992.

[135] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.

[136] J. Liu, M. Gong, Q. Miao, X. Wang, and H. Li, “Structure Learning for Deep Neural Networks Based on Multiobjective Optimization,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 6, pp. 2450–2463, Jun. 2018.

[137] V. Maniezzo, “Genetic evolution of the topology and weight distribution of neural networks,” IEEE Trans. Neural Networks, vol. 5, no. 1, pp. 39–53, 1994.

[138] I. Ben-Gal, F. Ruggeri, F. Faltin, and R. Kenett, “Bayesian networks, encyclopedia of statistics in quality and reliability.” John Wiley and Sons, 2007.

[139] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 2014.

[140] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classifiers,” Mach. Learn., vol. 29, no. 2–3, pp. 131–163, 1997.

[141] P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction, and search. MIT press, 2000.

[142] E. Coatanea, R. Roca, H. Mokhtarian, F. Mokammel, and K. Ikkala, “A Conceptual Modeling and Simulation Framework for System Design,” Comput. Sci. Eng., vol. 18, no. 4, pp. 42–52, Jul. 2016.

[143] E. Adorio and U. Diliman, “MVF–multivariate test functions library in c for unconstrained global optimization,” pp. 1–56, 2005.

[144] K. Schittkowski, “More Test Examples for Nonlinear Programming Codes,” in Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, 1987.

131

[145] X. Duan, G. G. Wang, X. Kang, Q. Niu, G. Naterer, and Q. Peng, “Performance study of mode-pursuing sampling method,” Eng. Optim., vol. 41, no. 1, pp. 1–21, 2009.

[146] N. M. Alexandrov, J. E. Dennis, R. M. Lewis, and V. Torczon, “A trust-region framework for managing the use of approximation models in optimization,” Struct. Optim., vol. 15, no. 1, pp. 16–23, Feb. 1998.

[147] B. Kulfan and J. Bussoletti, “‘Fundamental’ Parameteric Geometry Representations for Aircraft Component Shapes,” 11th AIAA/ISSMO Multidiscip. Anal. Optim. Conf., vol. 1, pp. 547–591, 2006.

[148] “XFOIL.” [Online]. Available: http://web.mit.edu/drela/Public/web/xfoil/.

[149] J. Warfield, “Binary Matrices in System Modeling,” IEEE Trans. Syst. Man. Cybern., vol. SMC-3, no. 5, pp. 441–449, 1973.

[150] M. S. Phadke, “Quality Engineering Using Design of Experiment, Quality Control, Robust Design and Taguchi Method,” Wadsworth, Los Angeles, CA, 1998.

[151] J. . Ghani, I. . Choudhury, and H. . Hassan, “Application of Taguchi method in the optimization of end milling parameters,” J. Mater. Process. Technol., vol. 145, no. 1, pp. 84–92, 2004.

[152] “Test Suite Problem 2.5, POWER CONVERTER,” NASA MultiDisciplinary Optimization Branch, 2018. [Online]. Available: http://www.eng.buffalo.edu/Research/MODEL/mdo.test.orig/class2prob5/descr.html.

[153] D. Wang, G. G. Wang, and G. F. Naterer, “Extended collaboration pursuing method for solving larger multidisciplinary design optimization problems,” AIAA J., vol. 45, no. 6, p. 14, 2007.

[154] D. Wang, G. Wang, and G. Naterer, “Collaboration Pursuing Method for MDO Problems,” AIAA J., vol. 45, no. 5, pp. 1091–1103, 2007.

[155] D. Wu, E. Coatanea, and G. G. Wang, “Dimension Reduction and Decomposition Using Causal Graph and Qualitative Analysis for Aircraft Concept Design Optimization,” in Volume 2B: 43rd Design Automation Conference, 2017, p. V02BT03A035.

[156] L. G. Swan and V. I. Ugursal, “Modeling of end-use energy consumption in the residential sector: A review of modeling techniques,” Renew. Sustain. Energy Rev., vol. 13, no. 8, pp. 1819–1835, Oct. 2009.

[157] E. Hirst, W. Lin, and J. Cope, “Residential energy use model sensitive to demographic, economic, and technological factors,” Q. Rev. Econ. Bus., 1977.

132

[158] H. K. Ozturk, O. E. Canyurt, A. Hepbasli, and Z. Utlu, “Residential-commercial energy input estimation based on genetic algorithm (GA) approaches: an application of Turkey,” Energy Build., vol. 36, no. 2, pp. 175–183, Feb. 2004.

[159] Q. Zhang, “Residential energy consumption in China and its comparison with Japan, Canada, and USA,” Energy Build., vol. 36, no. 12, pp. 1217–1225, Dec. 2004.

[160] R. Haas and L. Schipper, “Residential energy demand in OECD-countries and the role of irreversible efficiency improvements,” Energy Econ., vol. 20, no. 4, pp. 421–442, Sep. 1998.

[161] M. Kavgic, A. Mavrogianni, D. Mumovic, A. Summerfield, Z. Stevanovic, and M. Djurovic-Petrovic, “A review of bottom-up building stock models for energy consumption in the residential sector,” Build. Environ., vol. 45, no. 7, pp. 1683–1697, Jul. 2010.

[162] R. Ghedamsi, N. Settou, A. Gouareh, A. Khamouli, N. Saifi, B. Recioui, and B. Dokkar, “Modeling and forecasting energy consumption for residential buildings in Algeria using bottom-up approach,” Energy Build., vol. 121, pp. 309–317, Jun. 2016.

[163] L. G. Swan, V. I. Ugursal, and I. Beausoleil-Morrison, “Occupant related household energy consumption in Canada: Estimation using a bottom-up neural-network technique,” Energy Build., vol. 43, no. 2–3, pp. 326–337, Feb. 2011.

[164] J. Yang, H. Rivard, and R. Zmeureanu, “Building energy prediction with adaptive artificial neural networks,” in Ninth International IBPSA Conference, Montréal, Canada, August, 2005, pp. 15–18.

[165] E. Hirst, R. Goeltz, and D. White, “Determination of household energy using ‘fingerprints’ from energy billing data,” Int. J. Energy Res., vol. 10, no. 4, pp. 393–405, Oct. 1986.

[166] Y. Ji and P. Xu, “A bottom-up and procedural calibration method for building energy simulation models based on hourly electricity submetering data,” Energy, vol. 93, pp. 2337–2350, Dec. 2015.

[167] H. Farahbakhsh, V. I. Ugursal, and A. S. Fung, “A residential end-use energy consumption model for Canada,” Int. J. Energy Res., vol. 22, no. 13, pp. 1133–1143, Oct. 1998.

[168] A. Capasso, W. Grattieri, R. Lamedica, and A. Prudenzi, “A bottom-up approach to residential load modeling,” IEEE Trans. Power Syst., vol. 9, no. 2, pp. 957–964, May 1994.

[169] R. Kadian, R. P. Dahiya, and H. P. Garg, “Energy-related emissions and mitigation opportunities from the household sector in Delhi,” Energy Policy, vol. 35, no. 12, pp. 6195–6211, Dec. 2007.

133

[170] J. T. Wilkerson, D. Cullenward, D. Davidian, and J. P. Weyant, “End use technology choice in the National Energy Modeling System (NEMS): An analysis of the residential and commercial building sectors,” Energy Econ., vol. 40, pp. 773–784, 2013.

134

Appendix A. Numerical Benchmark Functions

SUR-T1-14 function, 𝑛 = 10, 20, 30

𝑓(𝒙) = (𝑥1 − 1)2 + (𝑥𝑛 − 1)

2 + 𝑛∑(𝑛 − 𝑖)(𝑥𝑖2 − 𝑥𝑖+1)

2𝑛−1

𝑖=1

−3 ≤ 𝑥𝑖 ≤ 2, 𝑖 = 1,2,… , 𝑛

(A-1)

Rosenbrock function, 𝑛 = 10

𝑓(𝒙) = ∑(100(𝑥𝑖+1 − 𝑥𝑖2)2+ (𝑥𝑖 − 1)

2)

𝑛−1

𝑖=1

−5 ≤ 𝑥𝑖 ≤ 5, 𝑖 = 1,2,… , 𝑛

(A-2)

Trid function, 𝑛 = 10

𝑓(𝒙) =∑(𝑥𝑖 − 1)2

𝑛

𝑖=1

−∑𝑥𝑖𝑥𝑖−1

𝑛

𝑖=2

−𝑛2 ≤ 𝑥𝑖 ≤ 𝑛2, 𝑖 = 1,2,… , 𝑛

(A-3)

F16 function, 𝑛 = 16

𝑓(𝒙) =∑∑𝑎𝑖𝑗(𝑥𝑖2 + 𝑥𝑖 + 1)(𝑥𝑗

2 + 𝑥𝑗 + 1)

16

𝑗=1

16

𝑖=1

−1 ≤ 𝑥𝑖 ≤ 1, 𝑖 = 1,2, … , 𝑛

(A-4)

𝑎𝑖𝑗 =

[ 1 00 1

0 11 0

0 00 0

1 00 1

0 00 0

1 11 0

0 00 0

1 00 0

0 00 0

0 00 0

0 00 0

0 00 0

1 10 1

0 01 1

0 00 0

1 00 1

0 00 1

0 00 0

1 10 0

0 01 0

0 00 0

0 10 0

0 10 0

0 01 0

0 10 0

0 10 0

0 00 1

1 00 0

0 00 0

0 11 0

1 00 0

0 01 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

1 00 1

0 10 0

0 00 0

1 00 1

0 00 1

0 10 0

1 00 1

0 00 0

0 00 0

0 00 0

0 00 0

0 00 0

1 10 1

0 00 0

0 00 0

1 00 1]

135

Griewank function, 𝑛 = 10, 20, 30

𝑓(𝒙) =∑𝑥𝑖2

4000

𝑛

𝑖=1

−∏cos (𝑥𝑖

√𝑖)

𝑛

𝑖=1

+ 1

−300 ≤ 𝑥𝑖 ≤ 300, 𝑖 = 1,2, … , 𝑛 (A-5)

Ackley function, 𝑛 = 10, 20, 30

𝑓(𝒙) = 20 + 𝑒 − 20𝑒−15√1𝑛∑ 𝑥𝑖

2𝑛𝑖=1 − 𝑒

1𝑛∑ 𝑐𝑜𝑠(2𝜋𝑥𝑖)𝑛𝑖=1

−30 ≤ 𝑥𝑖 ≤ 30, 𝑖 = 1,2,… , 𝑛 (A-6)

Rastrigin function, 𝑛 = 20

𝑓(𝒙) = 10 × 20 +∑ (𝑥𝑖2 − 10 𝑐𝑜𝑠(2𝜋𝑥𝑖))

20

𝑖=1

−5.12 ≤ 𝑥𝑖 ≤ 5.12, 𝑖 = 1,2,… ,20 (A-7)

SUR-T1-16, 𝑛 = 20

𝑓(𝒙) =∑[(𝑥𝑖 + 10𝑥𝑖+5)2 + 5(𝑥𝑖+10 − 𝑥𝑖+15)

2 + (𝑥𝑖+5 − 2𝑥𝑖+10)2

5

𝑖=1

+ 10(𝑥𝑖 − 𝑥𝑖+15)4]

−2 ≤ 𝑥𝑖 ≤ 5, 𝑖 = 1,2, … , 𝑛

(A-8)

Powell function, 𝑛 = 20

𝑓(𝒙) =∑ (𝑥4𝑖−3 + 10𝑥4𝑖−2)2 + 5(𝑥4𝑖−1 − 𝑥4𝑖)

2 + (𝑥4𝑖−2 − 2𝑥4𝑖−1)4

𝑛 4⁄

𝑖=1

+ 10(𝑥4𝑖−3 − 𝑥4𝑖)2

−4 ≤ 𝑥𝑗 ≤ 5, 𝑗 = 1,2,…𝑛

(A-9)

Perm function, 𝑛 = 20

136

𝑓(𝒙) =∑ [∑ (𝑖𝑘 + 𝛽)((𝑥𝑖 𝑖⁄ )𝑘 − 1)

𝑛

𝑖=1]2𝑛

𝑘=1

−𝑛 ≤ 𝑥𝑖 ≤ 𝑛, 𝑖 = 1,2, … , 𝑛

𝛽 = 0.5

(A-10)

137

Appendix B. List of Publications during PhD Studies

Journals

D. Wu, K. H. Hajikolaei, and G. G. Wang, “Employing partial metamodels for

optimization with scarce samples,” Struct. Multidiscip. Optim., pp. 1–15, Sep. 2017.

D. Wu, E. Coatanea, and G. G. Wang, “Employing Knowledge on Causal Relationship to

Assist Multidisciplinary Design Optimization,” J. Mech. Des., vol. 141, no. 4, p. 41402,

Jan. 2019.

E. T. Woldemariam, E. Coatanéa, G. G. Wang, H. G. Lemu, and D. Wu, “Customized

dimensional analysis conceptual modelling framework for design optimization—a case

study on the cross-flow micro turbine model,” Eng. Optim., vol. 51, no. 7, pp. 1168–1184,

Jul. 2019.

D. Wu and G. G. Wang, “Knowledge Assisted Optimization for Large-scale Design

Problems: A Review and Proposition,” Journal of Mechanical Design, Accepted with

revisions, 2019.

D. Wu and G. G. Wang, “Causal Artificial Neural Network and its Applications in

Engineering Design,” submitted to Expert Systems with Application, 2019.

D. Wu, G. G. Wang, and H. Jarollahi, “Developing Causal-Artificial Neural Network in

Residential Energy Consumption Forecasting,” submitted to Expert Systems with

Application, 2019.

138

Conferences

D. Wu, E. Coatanea, and G. G. Wang, “Dimension Reduction and Decomposition Using

Causal Graph and Qualitative Analysis for Aircraft Concept Design Optimization,” IDETC

2017-67601, Cleveland, Ohio, USA, August 6–9, 2017.

D. Wu and G. G. Wang, “Knowledge Assisted Optimization for Large-Scale Problems: A

Review and Proposition,” IDETC 2018-85325, Quebec City, Quebec, Canada, August

26–29, 2018.

Knowledge Assisted Metamodeling and …summit.sfu.ca/system/files/iritems1/19476/etd20516.pdfboth...

Documents

Transcript of Knowledge Assisted Metamodeling and …summit.sfu.ca/system/files/iritems1/19476/etd20516.pdfboth...