Knowledge Assisted Metamodeling and …summit.sfu.ca/system/files/iritems1/19476/etd20516.pdfboth...
Transcript of Knowledge Assisted Metamodeling and …summit.sfu.ca/system/files/iritems1/19476/etd20516.pdfboth...
Knowledge Assisted Metamodeling and Optimization
Method for Large-Scale Engineering Design
by
Di Wu
M.Sc., Beijing Institute of Technology, 2015
B.Sc., Beijing Institute of Technology, 2013
Thesis Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
in the
School of Mechatronic System Engineering
Faculty of Applied Sciences
© Di Wu 2019
SIMON FRASER UNIVERSITY
Summer 2019
Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation.
ii
Approval
Name:
Degree:
Di Wu
Doctor of Philosophy
Knowledge-assisted Metamodeling and Optimization Method for Large-Scale Engineering Design
Chair: Mohammad Narimani
Lecturer
Title:
Examining Committee:
G. Gary WangSenior SupervisorProfessor
Krishna Vijayaraghavan Supervisor Associate Professor
Siamak Arzanpour Supervisor Associate Professor
Woo Soo Kim Internal Examiner Associate Professor School of Mechatronic System Engineering
Carolyn Conner Seepersad External Examiner Professor Department of Mechanical Engineering The University of Texas at Austin
Date Defended/Approved: Aug. 22, 2019
iii
Abstract
Simulation-based design optimization methods commonly treat simulation as a black-
box function. An approximation model of the simulation, called metamodel, is often built
and used in optimization. However, modeling and searching in an unknown design
space lead to high computational cost. To further improve the efficiency of optimization,
knowledge of design problems needs to be involved in assisting metamodeling and
optimization. This work endeavors to systematically incorporating knowledge for this
purpose. After extensive review, two types of knowledge, sensitivity information and
causal relations, are employed in solving large scale engineering design problems.
Instead of constructing a complete metamodel, a Partial Metamodel-based Optimization
(PMO) method is developed to reduce the number of samples for optimizing large-scale
problems, using Radial Basis Function-High Dimensional Model Representation (RBF-
HDMR) along with a moving cut-center strategy. Sensitivity information is used to
selectively model component functions in a partial metamodel. The cut center of a
HDMR model moves to the current optimum at each iteration to pursue the optimum.
Numerical tests and an airfoil design case show that the PMO method can lead to better
optimal results when the samples are scarce.
Causal graphs capture relational knowledge among design variables and outcomes. By
constructing and performing qualitative analysis on a causal graph, variables without
contradiction can be found, whose values can be determined without resorting to
optimization. The design problem can thus be divided into two sub-problems based on
impact of variables. This dimension reduction and decomposition strategy is applied to a
power converter design and an aircraft concept design problem with significantly
improved efficiency.
Combing the structure of Artificial Neural Networks (ANNs) with causal graphs, a causal-
ANN is developed to improve the accuracy of metamodels by involving knowledge. The
structure of causal graphs is employed to decompose an ANN into sub-networks.
Additionally, leveraging the structure of causal-ANN and theory of Bayesian Networks,
the attractive variable subspaces can be identified without additional simulation. Finally,
iv
the causal-ANN is applied in a residential energy consumption forecasting problem and
both the modeling accuracy and efficiency are improved.
This work systematically and methodically models and captures knowledge and brings
knowledge in metamodeling and optimization. Sensitivities and causal relations have
been incorporated in optimization strategies that have been successfully applied to
various engineering design problems. Further research can be extended to studies on
how to incorporate other types of knowledge to assist metamodeling and optimization.
Keywords: Knowledge, Causal graph, Sensitivity analysis, Dimension reduction,
Metamodeling, Optimization
v
Dedication
To my loving parents,
for their endless support and sacrifice.
vi
Acknowledgments
I would like to first thank my senior supervisor, Dr. G. Gary Wang, for his
supervising and support during the four years. I learned how to be a good student in my
previous studies, but I learned how to be an independent researcher from Dr. Wang. I
am very grateful for the plenty of opportunities that he provided to me. It has been my
greatest honor having had Dr. Wang as my senior supervisor.
I would like to thank Dr. Krishna Vijayaraghavan and Dr. Siamak Arzanpour, the
members of my Ph.D. supervisory committee, for providing comments that helped me to
improve this work. I would also like to thank Dr. Woo Soo Kim and Dr. Carolyn Conner
Seepersad for evaluating my work as examiners.
I would like to acknowledge Dr. Eric Coatanea, for the collaborative research
opportunity on the dimension reduction method, and for sharing his knowledge in system
engineering with us. I would also like to acknowledge Mr. Hootan Jarollahi, for his
support in the residential load forecasting model, and for his help when I was working
together with him.
Last but not least, I would like to thank my friends in SFU who have made me my
life enjoyable in a foreign country.
vii
Table of Contents
Approval .......................................................................................................................... ii
Abstract .......................................................................................................................... iii
Dedication ....................................................................................................................... v
Acknowledgments .......................................................................................................... vi
Table of Contents .......................................................................................................... vii
List of Tables ................................................................................................................... x
List of Figures................................................................................................................ xii
List of Acronyms ............................................................................................................ xiii
Chapter 1. Introduction .............................................................................................. 1
1.1. Motivation .............................................................................................................. 1
1.2. Objectives of the research ..................................................................................... 2
1.3. Structure of Dissertation ........................................................................................ 3
Chapter 2. Literature review ...................................................................................... 5
2.1. Concept of knowledge ........................................................................................... 5
2.1.1. Knowledge in Artificial Intelligence ................................................................. 6
2.1.2. Knowledge in product design ......................................................................... 8
2.1.3. Summary remarks ....................................................................................... 11
2.2. Existing Applications of knowledge in design optimization ................................... 12
2.2.1. Symbolic knowledge .................................................................................... 12
2.2.2. Linguistic Knowledge ................................................................................... 14
2.2.3. Virtual knowledge ........................................................................................ 15
2.2.4. Algorithmic knowledge ................................................................................. 15
2.2.5. Summary remarks ....................................................................................... 16
2.3. Potential applications of knowledge ..................................................................... 17
2.3.1. Problem formulation..................................................................................... 18
2.3.2. Dimension reduction .................................................................................... 19
2.3.3. Decomposition ............................................................................................. 21
2.3.4. Metamodeling .............................................................................................. 22
2.3.5. Optimization strategy ................................................................................... 23
2.3.6. Optimization, machine learning, and knowledge .......................................... 24
2.3.7. Summary remarks ....................................................................................... 26
2.4. Review of RBF-HDMR ......................................................................................... 26
2.5. Artificial neural network architecture .................................................................... 29
2.6. Bayesian network and causal graph .................................................................... 30
2.7. Summary ............................................................................................................. 31
Chapter 3. Partial metamodel-based optimization (PMO) method ........................ 32
3.1. Algorithm description ........................................................................................... 32
3.2. Example of PMO ................................................................................................. 36
viii
3.3. Properties of PMO ............................................................................................... 38
3.4. Testing of PMO ................................................................................................... 39
3.5. Trust Region based PMO .................................................................................... 45
3.6. Application to Airfoil Design ................................................................................. 49
3.7. Summary ............................................................................................................. 52
Chapter 4. Dimension reduction method employing causal relations ................. 53
4.1. Dimension reduction method description ............................................................. 53
4.1.1. Overall process ............................................................................................ 53
4.1.2. Qualitative Analysis based on design structure matrix ................................. 57
4.1.3. Weight calculation ....................................................................................... 60
4.1.4. Two-stage optimization process .................................................................. 61
4.1.5. Numerical example ...................................................................................... 63
4.2. Engineering case studies ..................................................................................... 70
4.2.1. Power converter design problem ................................................................. 70
4.2.2. Aircraft concept design problem .................................................................. 73
4.3. Summary ............................................................................................................. 77
Chapter 5. Casual-Artificial Neural Network (Causal-ANN) and its application ... 78
5.1. Causal ANN and application in attractive sub-space identification ....................... 78
5.1.1. Causal artificial neural network .................................................................... 79
5.1.2. Attractive sub-space identification method ................................................... 82
5.2. Case studies ........................................................................................................ 85
Constructing causal-ANN ....................................................................................... 86
Attractive sub-space identification .......................................................................... 88
5.2.2. Aircraft concept design problem .................................................................. 91
5.2.3. Discussion ................................................................................................... 95
Generation of high-level causal graph .................................................................... 95
Fault tolerance studies on causal relations ............................................................. 96
Impact of variable correlations ............................................................................... 98
5.3. Summary ........................................................................................................... 100
Chapter 6. Applying causal-ANN in energy consumption prediction ................. 101
6.1. Residential End-Use Stock and Flow Model ...................................................... 101
6.1.1. Total life cycle cost calculation ................................................................... 103
6.1.2. Logit model ................................................................................................ 104
6.1.3. Stock turnover engine ................................................................................ 104
6.1.4. Example of REUSF model ......................................................................... 106
6.1.5. Logit model training ................................................................................... 107
6.2. Applying causal-ANN in market share prediction ............................................... 108
6.3. Results and discussion ...................................................................................... 110
6.3.1. Case study: dish washer ............................................................................ 110
6.3.2. Full model prediction.................................................................................. 112
6.4. Summary ........................................................................................................... 114
ix
Chapter 7. Conclusions and future work .............................................................. 115
7.1. Conclusions ....................................................................................................... 115
7.2. Future Research ................................................................................................ 117
7.2.1. Knowledge validation, correction, and updating ......................................... 118
7.2.2. Employing different kinds of knowledge ..................................................... 118
7.2.3. Knowledge-assisted optimization strategies .............................................. 118
References ................................................................................................................. 120
Appendix A. Numerical Benchmark Functions .................................................. 134
Appendix B. List of Publications during PhD Studies ...................................... 137
Journals ...................................................................................................................... 137
Conferences ................................................................................................................ 138
x
List of Tables
Table 2-1: Classification of knowledge representation [43]. ............................................. 9
Table 2-2: Existing applications of knowledge in optimization. ....................................... 12
Table 2-3: Potential applications of knowledge in different stages of optimization. ........ 18
Table 3-1: Optimization results with numerical benchmark problems. ........................... 40
Table 3-2. Optimized results with benchmark functions in different dimensions. ............ 42
Table 3-3: Dimensions selected in PMO on SUR-T1-14 for five independent runs. ....... 43
Table 3-4: TRMPS parameter settings. ......................................................................... 48
Table 3-5: OMID parameter settings. ............................................................................ 48
Table 3-6: Optimization results of using TR-PMO, TRMPS OMID and PMO. ................ 48
Table 3-7: Parameters of NACA0012. ........................................................................... 50
Table 3-8: Optimization results with airfoil design problem. ........................................... 51
Table 4-1: The Taguchi orthogonal array for t=7............................................................ 60
Table 4-2: Matrix [A] for the numerical example. ........................................................... 64
Table 4-3: Matrix [A1] for the numerical example. ......................................................... 64
Table 4-4: Modified matrix [A’] for the numerical example. ............................................ 65
Table 4-5: Modified matrix [A1’] for the numerical example. .......................................... 65
Table 4-6: Matrix [Anoc] for the numerical example. ...................................................... 65
Table 4-7: Matrix [C] for the numerical example. ........................................................... 66
Table 4-8: Element values in the objective column in [A’] and [A1’]. .............................. 66
Table 4-9: Taguchi sampling table of objective function. ............................................... 67
Table 4-10: Weighted matrix [Aw] for numerical example. .............................................. 67
Table 4-11: Optimization results of the original problem and decomposed problem. ..... 69
Table 4-12: Comparison of two thresholds (10% and 20%). .......................................... 69
Table 4-13: Design variables in power converter design. .............................................. 71
Table 4-14: Optimization results for the power converter problem. ................................ 72
Table 4-15: Comparison of optimization results with a fixed number of SA for the power converter problem. ................................................................................. 73
Table 4-16 Design variables in aircraft concept design .................................................. 75
Table 4-17: Optimization results of aircraft concept design............................................ 76
Table 4-18: Comparison of optimization results with a fixed number of SA for the aircraft problem. ................................................................................................. 76
Table 5-1: Design variables in power converter design. ................................................. 85
Table 5-2: Comparison of accuracy among three metamodels. ..................................... 88
Table 5-3: Accuracy of each sub-network. ..................................................................... 88
Table 5-4: Probability distribution P(y ≠ 1|xi, i = 1,2,… ,6) on actual model. ................... 89
Table 5-5: Probability distribution Pprediction(y ≠ 1|xi, , i = 1,2,… ,6) on causal-ANN. .. 89
xi
Table 5-6: Probability distribution P(y ≠ 1|xi, i = 1,2,… ,6) with new upper bound. ......... 90
Table 5-7: Probability distribution Pprediction(y ≠ 1|xi, , i = 1,2,… ,6) with new upper bound. .................................................................................................... 90
Table 5-8: Interesting interval with the largest likelihood. ............................................... 91
Table 5-9: Design variables in aircraft concept design. .................................................. 92
Table 5-10: Comparison of accuracy value among three metamodels. ......................... 94
Table 5-11: Accuracy of each sub-network. ................................................................... 94
Table 5-12: Probability distribution P(y ≠ 1|xi, i = 1,2,… ,9) on real model. ................... 95
Table 5-13: Probability distribution Pprediction(y ≠ 1|xi, , i = 1,2,… ,9) on causal-ANN. 95
Table 5-14: Interesting interval with the largest likelihood. ............................................. 95
Table 5-15: R2 value of objective and intermediate variables for the causal-ANN without
y2. .......................................................................................................... 97
Table 5-16: Comparison of R2 values when missing links in causal graphs. .................. 98
Table 5-17: ANOVA analysis results of [x1,… , x6] to y2 ................................................ 98
Table 5-18: Interesting area detected with independent assumption in power converter design. ................................................................................................... 99
Table 5-19: Interesting area detected with independent assumption in aircraft concept design. ................................................................................................... 99
Table 6-1: The parameters of dish washers in 2010. ................................................... 106
Table 6-2: The inputs of two end-use technologies...................................................... 106
Table 6-3: Comparison in RMSE and time among three approximation models. ......... 111
Table 6-4 Approximation results of causal-ANN and logit model. ................................ 113
xii
List of Figures
Figure 2-1: Knowledge representation methods. ............................................................. 6
Figure 3-1: Flow chart of PMO ...................................................................................... 33
Figure 3-2: Box-plots of optimized values. ..................................................................... 42
Figure 3-3: Convergence plot of PMO in SUR-T1-14 problem. ...................................... 44
Figure 3-4: Flowchart of TR-PMO. ................................................................................. 47
Figure 3-5: Airfoil design problem. ................................................................................. 50
Figure 3-6: Optimization results on the airfoil design problem ....................................... 51
Figure 4-1: Causal graph example. ............................................................................... 54
Figure 4-2: Causal graph of a numerical example. ........................................................ 63
Figure 4-3: Simplified causal graph for the numerical example. ..................................... 68
Figure 4-4: Causal graph of the power converter problem. ............................................ 71
Figure 4-5 Causal graph of the aircraft concept design problem .................................... 75
Figure 5-1: An example of high-level causal graph ........................................................ 80
Figure 5-2: Causal-ANN with a cheap model. ................................................................ 80
Figure 5-3: Two separate sub-networks. ....................................................................... 81
Figure 5-4: Casual-ANN with known intermediate variables and cheap models. ........... 81
Figure 5-5: Variable discretization. ................................................................................ 83
Figure 5-6: Discretization for the variable without fixed bounds. .................................... 83
Figure 5-7: Causal graph of the power converter problem. ............................................ 85
Figure 5-8: Simplified causal graph of the power converter design problem. ................. 86
Figure 5-9: Six sub-networks for the power converter design problem. ......................... 87
Figure 5-10: Causal graph of the aircraft concept design problem. ................................ 92
Figure 5-11: Simplified causal graph for aircraft concept design .................................... 93
Figure 5-12: Sub-networks for aircraft concept design. .................................................. 93
Figure 5-13: Casual graph with one intermediate layer for power converter design ....... 97
Figure 6-1: Flow chart of Residential End-Use Stock and Flow Model. ........................ 103
Figure 6-2: Flow of stocks in the stock turnover engine. .............................................. 105
Figure 6-3: Flow chart of the market share prediction. ................................................. 108
Figure 6-4: High-level causal relations of the market share prediction model. ............. 109
Figure 6-5: Structure of causal-ANN to predict market shares ..................................... 109
Figure 6-6: Market shares comparison for dish washers. ............................................ 112
Figure 6-7 Energy consumption prediction using causal-ANN. .................................... 114
xiii
List of Acronyms
AI Artificial Intelligence
AIC Akaaike’s Information Criterion
ANN Artificial Neural Network
ANOVA ANalysis Of VAriances
BLISS Bi-Level Integrated System Synthesis
BN Bayesian Network
CAD Computer-Aided Design
CAE Computer-Aided Engineering
CC Capital Cost
CO Collaborative Optimization
CSSO Concurrent SubSpace Optimization
CST Class function/Shape function airfoil transformation representation Tool
DACM Dimensional Analysis Concept Modeling
DAG Directed Acyclic Graph
DSM Design Structure Matrix
GA Genetic Algorithm
GM Graphical Models
GPS General Problem Solver
HDMR High Dimensional Model Representation
KBE Knowledge-Based Engineering
KEE Knowledge Engineering Environment
MAE Mean Absolute Error
MPS Mode Pursuing Sampling
MS Market Shares
NFE Number of Function Evaluations
OMID Optimization on Metamodeling-supported Iterative Decomposition
PCA Principle Component Analysis
PMO Partial Metamodel-based Optimization
RBF Radial Basis Function
RBF-HDMR Radial Basis Function-HDMR
xiv
REUSF Residential End-Use Stock and Flow
RMSE Root Mean Square Error
RSM Response Surface Method
SA System Analysis
SU Stock Units
TLCC Total Life Cycle Cost
TRMPS Trust Region-based Mode Pursuing Sampling
TR-PMO Trust Region-PMO
UEC Unit Energy Consumption
VDS Visual Design Steering
1
Chapter 1. Introduction
1.1. Motivation
High dimensionality, expensive computational cost, and black-box functions (HEB) are
three main challenges in simulation-based design optimization [1]. More design variables
are involved in the engineering design problem, which increase the dimensionality of the
optimization problem. The simulation model can improve the accuracy of simulation but
increases computational cost in the meantime. The computational costs of solving a
large-scale simulation-based optimization problem often become unacceptable in
practice. Therefore, high-efficiency optimization strategies need to be developed to deal
with such large-scale engineering design problems.
Current simulation-based optimization strategies usually treat simulation as a black-box
function. The assumption of black-box functions is derived from the fact that simulations
are used to evaluate design functions, whose mathematical expressions are unknown to
the user. The presence of noises in simulation renders the approximated gradients
untrustworthy, even if the added cost to obtain such gradient information is tolerable.
One main advantage of treating simulation as a black box is that the optimization
method can be generalized for solving any design problem. Different non-gradient based
optimization algorithms [2]–[4] and metamodel-based optimization methods have been
developed to deal with black-box optimization problems [5]–[10]. The metamodel,
meaning “model of a model,” is a simplified mathematical model that approximates the
hidden function in simulation, e.g., a polynomial function or an artificial neural network
(ANN) model. Generally, either in non-gradient based optimization methods or in
metamodel based optimization methods, the key to the optimization algorithms is the
way to generate useful samples (offspring or particles) in a high-dimensional space.
Generation of new samples needs to balance between exploration and exploitation.
Especially for exploitation, information obtained from previous iterations and existing
samples is usually used to help generating better samples. However, the lack of
information may lead to low efficiency or even wrong search direction.
2
Another issue of the black-box assumption is more computational cost is demanded
since the optimization is blind to the design problem at hand. This phenomenon is more
severe when the dimensionality of the problem is high. With the increase of the
dimensionality, the volume of the design space grows exponentially. Even thousands of
samples are sparse in a 100-dimensional space. It becomes even difficult to explore and
optimize blindly in such a huge space. This problem brought by the dimensionality of the
problem is known as the “curse-of-dimensionality” [11]. In the high-dimensional space,
only information obtained from samples are not enough for solving high-dimensional
problems since the property of the design space cannot be represented accurately
through such sparse samples.
In real-world engineering design, practitioners usually have certain of knowledge of the
design problem such as the variables involved in the problem, the input-output relations,
or even have some mathematical functions based on physical laws. Such information is
largely ignored when solving the engineering design problems in current simulation-
based optimization strategies. As aforementioned, different information along with
samples are required to break the “curse-of-dimensionality”. If existing knowledge of the
engineering problem can be incorporated into modeling and optimization, the number of
sample points necessary to capture the behavior of such a function and the design
space could be reduced. Additionally, by analyzing existing knowledge about an
engineering design problem, some hidden valuable information can be extracted, which
can help to perform optimization more efficiently. For instance, if one can find that the
objective function is monotonic with respect to some design variables, values of such
design variables can be determined without the need of optimization and the
dimensionality of the problem can be reduced. If one knows the input-output relationship
follows a certain trend, it will help the selection of the most suitable metamodel and
reduces the costs of model construction. Therefore, how to systematically incorporate
different kinds of knowledge into optimization, rather than ad hoc and problem-specific
treatment, becomes an interesting research topic. This issue becomes especially
relevant for large-scale design problems in order to break the “curse of dimensionality.”
1.2. Objectives of the research
The main objective of this thesis is to develop methodologies that employ knowledge to
assist in solving large-scale engineering optimization problems. One of the methods to
3
break the “curse of dimensionality” is to reduce the dimensionality of the problem. Thus,
the first objective is to develop dimension reduction strategies to solve large-scale
problems. In this thesis, by employing the sensitivity information, important variables are
identified and employed to construct a partial metamodel to reduce the dimensionality.
To avoid losing key information caused by omitting variables or the errors in the
sensitivity analysis, the dimensionality of the partial metamodel grows gradually in the
optimization process according to the sensitivity information, which means more and
more design variables are involved in the optimization to reach better optimal solutions.
Another kind of knowledge will be employed to reduce the dimensionality is the causal
relations in engineering problems. The variables without contradiction are identified
before optimization. Such variables are monotonic with respect to the objective function,
which means the optimal value for those variables can be determined without
participating in the optimization. Thus, the number of design variables in the optimization
can be reduced.
Another challenge in dealing with large-scale problems is constructing an accurate
metamodel with scarce samples. Therefore, the second objective of this thesis is to
improve the accuracy of the metamodel by employing knowledge to break the limitation
of the black-box assumption. Knowledge of engineering problems, such as causal
relations of variables in the problem, mathematical equations, and values of the
intermediate variables can be applied in metamodeling to improve the model accuracy.
The attractive design area of the problem can be detected by employing the proposed
metamodel and Bayesian theory.
These methodologies are to be applied to an airfoil design problem, a power converter
design problem, a conceptual aircraft design problem, and an energy consumption
prediction problem sponsored by a local company.
1.3. Structure of Dissertation
To develop the knowledge-assisted metamodeling and optimization strategy, the thesis
can be divided into five parts, including 1) literature review and related theory, 2) the
dimension reduction method based on sensitivity information, 3) the causal relation-
based dimension reduction method, 4) the causal relation-based metamodeling method
4
and its application, and 5) the applications in the energy consumption prediction model.
The thesis is organized into seven chapters:
Chapter 2 reviews the concept of knowledge and existing applications of knowledge in
optimization. Potential applications of knowledge at different stages of optimization are
discussed and the knowledge employed in this thesis is identified. Next, related
algorithms and theories employed in this thesis are introduced.
Chapter 3 presents a partial metamodel based optimization method based on the
sensitivity information. The partial metamodel is constructed on important variables and
updated at each iteration by considering more variables. Then, the proposed
optimization method is described and tested in both numerical problems and an airfoil
design problem.
Chapter 4 develops a causal relation-based dimension reduction method. Causal
relations of a problem are employed to identify variables without contradiction. Details of
the dimension reduction method are introduced with a numerical example. The proposed
methods are applied in two engineering optimization problems to test the efficiency of
the method.
Chapter 5 proposes the causal-Artificial Neural Network (causal-ANN) method and its
application in detecting attractive design area. Different types of causal-ANN are
constructed based on the involved knowledge. The attractive design area detection
method is developed based on the causal-ANN and Bayesian theory. Finally, the causal-
ANN is employed to construct the metamodel for two engineering problems and the
attractive areas are also detected by the proposed method.
Chapter 6 applied causal-ANN in an energy consumption forecasting model to predict
the market shares of different end-use technologies, a project funded by a local
company. There are totally 304 market share prediction models to be constructed with
only 12 years of historic data for the training for each model. The total number of end-
use technologies involved is 1488. The accuracy and efficiency of the causal-ANN are
compared with the original logit model.
Chapter 7 summarizes the work has been done in this thesis and makes suggestions for
future work.
5
Chapter 2. Literature review
Before proposing knowledge-assisted metamodeling and optimization methods, the
concept of the knowledge is reviewed first. Then, existing applications of knowledge are
summarized according to different types of knowledge. Next, potential applications of
knowledge in optimization are discussed. Additionally, related algorithms and theories
employed in proposed methods in later chapters are introduced in this chapter, including
the Radial Basis Function-High Dimensional Model Representation (RBF-HDMR) model,
ANN, and Bayesian network.
2.1. Concept of knowledge
Knowledge is defined as familiarity, awareness or understanding of someone or
something [12]. The word “knowledge” is widely used in the AI field and the definition of
knowledge used in the engineering field also comes from AI.
To obtain knowledge from problems, different AI methods were applied in the
optimization. The applications of the AI methods can be classified into two categories,
knowledge from graph and documents and knowledge from data. The expert system,
which belongs to the first category, was used in design problems for decision making
[13], [14]. However there are few applications using experts directly in assisting
optimization. For knowledge from the data, multiple data mining methods [15], [16] and
classification methods [17]–[19] were applied in optimization problem formulation and in
optimization strategies for generating new samples.
In this section, the concept of knowledge in AI is reviewed first to give a clear description
of knowledge representation and capture. Then, to define what kinds of knowledge can
be obtained from and applied in the engineering world, the knowledge concept in
product design is also surveyed.
6
2.1.1. Knowledge in Artificial Intelligence
AI is currently one of the most popular research fields around the world. AI is defined as
the study of intelligent agents: any device that perceives its environment and takes
actions that maximize its chance of success at some goal [20]. In other words, AI is a
technique which can help machines to deal with different problems in an intelligent
manner. There are two main problems in AI, learning and problem solving [20].
Knowledge is involved in both problems. In learning, knowledge should be captured and
represented in a form that machines can understand. On the other hand, knowledge
should be reused to solve problems.
General problem
solverExpert system Semantic nets
Figure 2-1: Knowledge representation methods.
Knowledge representation is central to AI research, which focuses on designing
computer representations that capture information about the world to solve complex
tasks [21], [22]. The earliest knowledge representation work was focused on a general
problem solver (GPS) [23], which was to develop as a universal solver machine.
Although the development of GPS is not successful due to its limitation on the problem
definition format, GPS is the first attempt to regard knowledge as an input to solve
problems. Following the idea of GPS, expert systems are developed to represent human
knowledge.
Expert systems could match human competence on a specific task [24]–[26]. Two
techniques developed at that time and still used today are the rule-based knowledge
representation [27] and frame-based knowledge representation [28]. Rule-based
systems are widely used in domains such as automatic control [29], [30], decision
support [31], [32], and system diagnosis [26]. The frame-based method is used on
systems geared toward human interaction for choosing appropriate responses to varied
solutions. The frame-based knowledge representation focuses on the structure of the
concept, while the rule-based knowledge base focuses on logic choices. To combine
properties of the two expert systems, one of the most well-known integrated systems of
frame and rule was developed in 1983 named as Knowledge Engineering Environment
(KEE) [33], which contained a complete rule engine with forward and backward chaining
7
and a complete frame-based knowledge base with triggers, slots, inheritance, and
message passing. The expert system is a useful knowledge representation tool. By
employing the expert system, users can make reasonable decisions. However, the
expert system is defined by expert experiences. The effectiveness of expert systems
highly depends on the accuracy of the contents in the system. Thus, an incorrect or
outdated expert system may lead to wrong decisions. Therefore, how to define an
appropriate and evolving expert system remains the main challenge.
Currently, one of the most active areas of knowledge representation research is on
semantic nets [34], [35], which is a network that represents semantic relations between
concepts. Different from neural networks, semantic nets are made up by different
concepts and semantic relations between concepts. A related concept is ontology [36],
[37]. The concept of ontology in philosophy is that it is the study of the nature of being,
becoming existence or reality, as well as the basic categories of being and their
relations. In computer science, ontology is a formal naming and definition system of the
properties and interrelationships of entities that fundamentally exist in a particular
domain [38]. The main benefit of ontology is that it is not only able to describe different
concepts in the domain but also the relationships that hold between concepts. Another
property of ontology is that by employing ontology mapping [39] and ontology merging
method [40], similar ontologies can be integrated to include more information, especially
relationships between different concepts. The semantic net is a way to create ontologies.
In the definition of a general problem solver, knowledge is defined as information of the
real world. A problem is solved by employing a knowledge representation method. In AI
or computer science, knowledge is represented by language or knowledge graph, which
cannot be directly and automatically used in engineering design, however. Compared to
language-represented knowledge, input-output relations data are more applicable for
engineering design. Current popular machine learning helps to find interrelations in a
complex system, which is based purely on data. How to combine the language and
graphic knowledge and the knowledge embedded in data is the main question that is to
be addressed in future research.
8
2.1.2. Knowledge in product design
Although knowledge has been used in product design for a long period, the definition of
knowledge is borrowed from AI. In product design, knowledge is understood as the
information which is not directly available but is obtained from analysis of data. In other
references, knowledge is also described as the experience, concepts, values, beliefs,
and ways of working that can be shared and communicated [41]. Sunnersjo [42] argued
that knowledge should include not only the rules that the designer should adhere to, but
also the background knowledge that makes the design rules possible to review and
understand. In summary, the definition of knowledge in product design is varied. But one
consensus is that knowledge needs to be captured and represented in an appropriate
way.
In engineering design, knowledge is often used in the concept design phase to help
designers come up with better designs [43]. Knowledge used in design can be classified
into two categories, formal knowledge and tacit knowledge. Formal knowledge is
embedded in product documents, repositories, product function and structure
descriptions, problem-solving routines, technical and management systems, computer
algorithms, and so on [44]. On the other hand, knowledge tied to experience, intuition,
unarticulated models, or implicit rules of thumb is regarded as tacit knowledge [45]. It is
easier to capture and represent formal knowledge than the latter. On the other hand,
tacit knowledge is rather difficult to be expressed, which is generally gained over a long
period with learning and experiencing. One reason is that there is not a common
recording method to capture the knowledge in human’s brains. Another reason is that
such knowledge can only be transferred by willing and articulating people. One main
research direction of the knowledge in product design is how to capture and represent
the tacit knowledge. Either formal knowledge or tacit knowledge should be represented
in a way that is easy to be understood [46].
Knowledge representation methods can be classified into five categories [44]: pictorial,
symbolic, linguistic, virtual, and algorithmic approaches as shown in Table 2-1. Pictorial
presentation presents knowledge as a picture or graph, including sketches, detailed
drawings, and photographs. The symbolic method represents knowledge by drawing a
chart or a network. Decision tables, flow charts, assembly trees, and ontologies are all
symbolic representation methods. The rule-based and frame-based expert system can
9
be regarded as symbolic knowledge. The linguistic representation uses document files
including customer requirements, design rules, constraints, and so on. CAD models,
CAE simulations, and virtual reality simulations are examples of virtual representation
methods. Finally, the algorithmic methods contain the procedural or methodical
knowledge used in modeling, analysis, and optimization. The information obtained from
AI methods such as data mining methods or machine learning methods can also be
classified into algorithmic knowledge.
Table 2-1: Classification of knowledge representation [43].
Representation approaches Examples
Pictorial Sketches Detailed drawings Photographs
Symbolic Decision tables Flow charts Assembly tree Semantic nets Expert systems
Linguistic Customer requirements Design rules Constraint analogies
Virtual CAD Models CAE simulations Virtual Reality simulations
Algorithmic Mathematical equations Computer algorithms Optimization algorithms Data mining methods Machine learning methods
Different knowledge is used at different stages of product design [36-37]. To start a
design, user requirements are needed for the requirement modeling period. House of
quality, belonging to the linguistic knowledge representation method category, is often
used to summarize the necessary requirements. In the functional modeling section,
decision trees can be used to determine the function of the product and how to realize
those functions. Then, some linguistic methods, such as design principles will be used to
generate concepts whose behaviors are modeled based on the functions of the product.
Different ideas are generated in the concept design period. A rich and well-structured
knowledge representation system is needed to support such plenty of concepts and
ideas [47]. Ontology is an appropriate method to organize ideas in this period. Ontology,
10
which is a highly structured domain covering the processes, objects and attributes, has
the ability to integrate and migrate valuable, unstructured information and knowledge to
provide a complex domain that contains rich conceptualization [48], [49]. The semantic
net is a tool to capture and represent the ontology in a graph with nodes and arcs [50]. In
the previous three stages, i.e., requirement modeling period, functional modeling period,
and concept design period, the linguistic and pictorial knowledge plays the main role.
The next stage is embodiment design, where symbolic, algorithm, and pictorial methods
are highly involved. The information of the product architecture, material, and
mathematical equations are applied in this step. Next is the detailed design, where
different virtual knowledge, including CAD model, CAE, and virtual reality are used to
generate 3D models of the design. Then, more accurate simulation models are
generated and optimization is employed to modify details of the product.
Different kinds of knowledge can be utilized for engineering design. The issue of using
knowledge is that traditional knowledge is often represented by documents or graph.
How to use the knowledge appropriately in forming an engineering design and
optimization problem is the main task. An engineering simulation model is one attractive
type of virtual knowledge that can help in design. Such model gives input and output
relations, based on which one can dig out more hidden information such as the
monotonic influence of certain inputs on the output. Besides, approximate models can
be constructed based on simulation models.
To combine the rule-based and frame-based expert system with engineering design, the
Knowledge-based engineering (KBE) system is developed [51], [52]. The rule-based and
frame-based knowledge can be captured, represented and reused with computer-aided
design tools (CAD) and simulation tools in the KBE system to reduce time and costs of
product development. References [52] and [53] stated that KBE was likely to be the best
possible technology at hand to deal with rule-driven, multidisciplinary, and repetitive
geometry manipulation problems. In [52–54], a multi-model generator was created by
KBE to develop a distributed design framework to support aircraft multidisciplinary
design optimization. A specific family of aircrafts was generated automatically through
the KBE system [55]. In each model, discipline abstractions are obtained and used as
the input of simulation tools to evaluate the performance of an aircraft. One of the
disadvantages of the KBE system is that it can only deal with revision from existing
designs. In other words, before using KBE to design a product, similar products and their
11
design details are required. Another shortcoming is the expert system used in the KBE
system. One issue is how to validate the accuracy of the rules and classes in the
knowledge base. Another issue is that modeling the knowledge domain is also a burden
to developers. Additionally, only the expert system is involved in the KBE system to deal
with design problems, which is just one type of knowledge applied in the design. To
better assist the design process, different kinds of knowledge need to be involved. Thus,
the KBE system needs to be enhanced to include other kinds of knowledge when it is
used for optimization.
2.1.3. Summary remarks
Knowledge has been employed in problem solving and engineering design for decades.
Knowledge is captured from different resources, including documents, human
experiences, previous designs, and so on, and it is represented in a structured way for
further usage. The knowledge-based system is first developed in the AI field and the
expert system is one of the most common applications. By employing knowledge, the
engineering design process can be executed with no or little human intervention.
However, by employing frames and rules, the generated design through the expert
system is only a feasible design but not an optimal one. To reach the best, optimization
needs to be performed on the design obtained from KBE. Another issue about the
current knowledge base is that the focus of it is on the knowledge represented by
language. However, in engineering, knowledge is not only represented by language but
could also be data obtained from engineering analyses. In addition, knowledge
represented in an expert system can be used to help defining the optimization problem
and guiding the optimization process. Nevertheless, fundamental elements of
optimization are still data or numbers. Therefore, how to mine knowledge from data and
how to utilize such knowledge in optimization are two research directions of the
knowledge assisted optimization methodology. Moreover, how to combine linguistic
knowledge, such as design rules and customer requirements, with data is another area
of interest.
12
2.2. Existing Applications of knowledge in design optimization
Large-scale design optimization problems are difficult to solve. There are several
techniques can be used to tackle these problems, including dimension reduction,
decomposition, metamodeling, and optimization strategies [1]. Although knowledge is
not formally incorporated in optimization methods, there are some techniques employing
knowledge to deal with large-scale optimization problems. Table 2-2 gives a summary of
existing optimization methods involving knowledge. Note that the pictorial knowledge
which is usually used at the beginning of the concept design only includes rough
information of the design problem. Thus, the pictorial knowledge is rarely applied in
optimization. As for the other four kinds of knowledge, the symbolic and algorithmic
knowledge are widely used in different solution methods when dealing with high-
dimensional optimization problems. The details are reviewed in the following sections.
Table 2-2: Existing applications of knowledge in optimization.
Dimension reduction
Decomposition Metamodeling Optimization
strategy
Pictorial
Symbolic ◎ ◎ ◎
Linguistic ◎
Virtual ◎
Algorithmic ◎ ◎ ◎ ◎
2.2.1. Symbolic knowledge
The symbolic knowledge is the knowledge represented through graphs and symbols.
The symbolic knowledge is widely employed in optimization methods. To reduce the
dimensionality of an optimization problem, the causal graph is employed to identify and
remove certain design variables. A causal graph is an oriented graph showing the causal
relations between variables. Through analyzing causal relationships between design
variables and the objective, variables monotonically influencing the objective are
identified. The optimal values of these variables can thus be determined without
optimization, which means the number of design variables can be reduced. To further
13
decompose the problem, sensitivity values are applied to simplify the causal graph and
decompose the original problem into several sub-problems with fewer design variables.
This method was applied to solve an aircraft concept design problem and a power
converter design with significantly improved optimization efficiency [57]. The
shortcoming of this method is that if the monotonic variables cannot be found in the
problem or the range of variables is not carefully chosen to ensure monotonicity, this
method will fail or ineffective.
One kind of symbolic knowledge, named as design structure matrix (DSM), is usually
used to show the interdependence of each discipline in the decomposition strategies.
DSM is a square matrix that has identical row and column listings to represent a single
set of objects. The key advantage of DSM is that DSM can show to designers a
complete view of the coupling structure within a system [58]. By analyzing DSM,
decomposition can be performed and the multidisciplinary design optimization
architecture can be constructed. Moreover, different DSM analysis methods are
developed to simplify optimization problems. By performing the Graph partitioning [59]–
[61], clustering analysis [62] and optimization [63] on DSM, complex problems can be
decomposed into sub-problems. Then, different decomposition strategies, including
Concurrent SubSpace Optimization (CSSO) [64], Collaborative Optimization (CO) [65],
and Bi-Level Integrated System Synthesis (BLISS) [66] have been developed according
to the relations represented in the DSM. The main disadvantage of those decomposition
strategies is the large number of function evaluations needed when dealing with high-
dimensional optimization problems. In [67], CO and CSSO were tested with several
numerical benchmarks and the results show that even for low-dimensional problems, CO
and CSSO need thousands of discipline function calls. BLISS was used to solve an
aircraft concept design problem. For different variations of BLISS methods, although the
number of system analysis was reduced to be around 10, the number of total discipline
calls is around 400 and especially BLISS/RS2 required more than 1,000 discipline calls
[66].
In metamodeling methods, symbolic knowledge was used to determine the structure of
the approximation model. In [68], the intermediate variables in a Bayesian network were
used as hidden nodes to construct an artificial neural network (ANN) in a traffic accident
prediction. However, the Bayesian network was only used to represent the input-output
14
relations between variables and the mathematical relations cannot be captured from the
Bayesian network.
In summary, the symbolic knowledge usually assists optimizations at the beginning
stage of optimization. By employing the symbolic knowledge, properties of the problem
can be found out to reduce the difficulties of the high-dimensional optimization problems
either in reducing the number of dimensionality or in constructing a more accurate
metamodel.
2.2.2. Linguistic Knowledge
Linguistic knowledge is the information represented by documents. This kind of
knowledge is difficult to involve in the optimization problem since the optimization
methods usually focus on the trend of the data. One way to apply linguistic knowledge in
optimization is in selecting suitable approximation methods according to properties of the
problem. Response surface method (RSM) with different orders can be chosen
according to the problem. Additionally, different metamodels are fit in different problems.
A common conclusion regarding the traditional metamodeling methods is that Kriging
method performs better for low dimensional problems while radial basis function (RBF)
outperforms others for high-dimensional problems [69]. Thus, considering the properties
of metamodeling methods and features of the problem, a suitable metamodeling method
can be selected for a certain problem. However, determination of the metamodeling is
still based on the experience of the experts, which may lead to a wrong direction of the
selection. Moreover, there are other conditions which will influence the selection of
metamodeling methods, which also need to be considered when constructing a proper
metamodel.
Other than selecting the metamodeling method, properties of the problem can be used in
selecting operators in optimization methods such as the Genetic Algorithm (GA).
Reference [70] suggested using domain knowledge in the three stages of GA, in the
initial population generation, in encoding the genotype, and in genetic operators of
crossover and mutation. In [71], the knowledge of truss was used to guide the initial
sampling in the GA method. Hu and Yang [72] used specific knowledge in GA to solve a
path planning problem. Piroozfard et al. [73] employed knowledge-based operators to
solve job shop scheduling problems. In general, the specific property of the problem is
15
applied to generate custom operators for the problem. However, such ad hoc
approaches cannot be extended to solve other problems.
2.2.3. Virtual knowledge
Virtual knowledge, such as CAD, CAE, and virtual reality models, allows users to get
insight into problems, find key trends and relationships among variables in a problem,
and make decisions by interacting with the data. A Visual Design Steering (VDS) method
[74], [75] was developed as an aid in multidisciplinary design optimization, which can
help a designer to make decisions before, during, or after analysis or optimization via a
visual environment to effectively steer the solution process. The virtual knowledge is
helpful when little is known about the data and the exploration goals are implicit since
users are able to directly participate in the exploration processes, shift and adjust the
exploration goals if necessary. However there lacks of direct translation of such
knowledge into formulation for optimization problems.
2.2.4. Algorithmic knowledge
Algorithmic knowledge is the most popular knowledge used in optimizations since this
knowledge has the closest relation with data. As mentioned in Section 2.1, equations,
simulation models of the problems and information obtained from machine learning
algorithms can all be categorized as algorithmic knowledge.
Equations, which widely exist in different optimization problems, can be used in different
stages of optimization process. Note that the equations may not be the accurate model
of the problems, but the mathematical relations provided in the equations can also help
in dealing with optimization problems. The empirical equations with lower fidelity can be
employed in the multi-fidelity models to reduce the number of function evaluations of the
expensive simulation models. The co-Kriging method can be employed to generate
metamodels based on multi-fidelity models [76]. In [77], the empirical equations were
used to construct a knowledge layer in the ANN to dealing with a microwave design
problem. Physical theories, empirical data, and historical data are treated as white-box
models and they may be involved in constructing a grey-box metamodel [78]. The
residual between the white-box prediction and the simulation data is estimated by a
metamodel. The grey-box method is applied in prediction in two manufacturing problems
16
and the results show that the metamodel is sufficiently accurate with small amount of
sample points. Equations can make the optimization easier but the accuracy of the
equations has large impact on the optimization results. When the equations are not
reliable, the accuracy of the metamodels will be poor.
Data of historic designs can also be employed in the optimization. Kurek et al. [79]
developed a novel approach for automatic optimization of reconfigurable design
parameters based on knowledge transfer. Solutions and history data of related previous
design are treated as priori knowledge and will be transferred to the new design and
optimization. The auto-transfer algorithm was developed based on Bayesian
optimization [80] to determine which design would be transferred, when it would be
transferred, and how it would be transferred. The efficiency improvement of the
optimization method based on knowledge transfer algorithm was significant.
Recently, machine learning methods are increasingly employed in assisting optimization.
The screening methods and mapping methods are employed to reduce the
dimensionality of the problem [81], [82]. But there is information loss either in screening
or mapping. The influences of those lost information on the optimization results are
difficult to quantify. If the key information was lost due to screening or mapping, the
optimization would fail. The classification method is also employed in optimization
methods to help in sampling. A classifier-guided sampling method was developed to
generate samples towards the area with a high probability of yielding preferred
performance [17]. Instead of randomly sampling, the samples are generated based on
the information obtained from the classification results. Compared with traditional
optimization methods such as GA, the rate of convergence is improved significantly by
the proposed method. In many cases, users tend to specify an excessive number of, and
often redundant, constraints. Methods were developed to find the redundant constraints
for the mathematical problems [83]. Cutbill and Wang [16] introduced a novel method
based on association analysis to detect redundant black-box constraints. Those are
methods for finding redundant constraints through data.
2.2.5. Summary remarks
Knowledge has been used in solving optimization problems although the concept of
knowledge is not widely applied in the optimization field. In current optimization methods,
17
algorithmic knowledge is still the most used type of knowledge. On the other hand,
symbolic knowledge, such as causal graph and DSM, are also employed in optimization
methods. However, the limitations still exist in those methods. Employing specific
knowledge may help in improving effectiveness and efficiency of optimization for one
problem but it may not be suitable for different kinds of problems. The issue of the
current knowledge assisted optimization method is that there is no systematic way to
employ different kinds of knowledge together to deal with one problem. Besides,
because of the property of linguistic knowledge and virtual knowledge, they are difficult
to involve in optimization. Therefore, how to combine linguistic/virtual knowledge and
data information is one of the research directions for knowledge assisted optimization
methods.
2.3. Potential applications of knowledge
The potential applications of knowledge to assist optimization are summarized in this
section. The four techniques to deal with high-dimensional optimization problems can be
treated as four stages in optimization process combining with the problem formulation
process. At the beginning, the optimization is formulated according to the design
requirements. Then, the dimension reduction method can be applied to reduce the
number of design variables while decomposition methods can be employed to
decompose the problem into several sub-problems. If the simulation model is expensive,
metamodeling method can be used to reduce the computational cost. Finally, different
optimization algorithms and strategies can be employed to find the optimal solutions.
However, each stage exists limitations when considering the problem as a black-box.
Here the knowledge contains linguistic knowledge, pictorial/symbolic knowledge, and
data knowledge. As shown in Table 2-3, to overcome the challenges in the high-
dimensional problems, possible applications of knowledge to support different aspects of
optimization are listed. This section is organized as follows. In each sub-section, after
introducing the challenges of each stage, the potential applications of knowledge are
introduced.
18
Table 2-3: Potential applications of knowledge in different stages of optimization.
Optimization stages Challenges Knowledge
Problem formulation Constraints definition
Determining feasible area
Design rules Custom requirements
Decision trees Ontologies
Dimension reduction Determining omittable variables
Flow charts Causal graph Design rules
Equations
Decomposition Relations between disciplines
Correlations between variables
Flow charts Causal graph
Bayesian graph Equations
Metamodeling Selecting metamodeling method The accuracy of the metamodel
Equations Historical data
Historical design Causal graph
Bayesian graph
Optimization strategy How to generate new samples Equations
Bayesian graph Flow chart
2.3.1. Problem formulation
There are three elements in an optimization problem, design variables, objectives, and
constraints. The number of design variables, number of constraints, strictness of the
constraints, and other definitions of the problem will influence the efficiency and
effectiveness of optimization. One of the challenges in problem formulation is constraint
specification. The number of constraints will influence the efficiency of the optimization.
A large number of constraints will increase the computational cost in an optimization
problem. Additionally, the strictness of the constraints is another issue. A very strict set
of constraints may cause the optimization fail since it is difficult to find a feasible solution,
while a loose set of constraints may lead to a failure design in the real world. Another
task related to problem formulation is to detect the feasible area. If the feasible area can
be determined, it will be much easier for optimization algorithms to find the optimum. To
deal with these two challenges, different kinds of knowledge can be applied including
linguistic knowledge and the symbolic knowledge. Moreover, the expert system and
machine learning methods can also be employed.
19
Symbolic knowledge can be involved for constraint specification to avoid redundant
constraints. The expert system can become one of the useful tools for problem
formulation. KBE systems are widely used in engineering design problems to represent
the rules and requirements in a structural way [51], through which a more complete and
accurate set of constraints can be defined according to different design scenarios [84].
By employing the structured representation of the rules and frames, the constraints and
relations between different constraints can be obtained and the definition of the problem
can be generated through the expert system.
For constrained optimization, some data-based methods were developed to find the
feasible areas in the black-box constrained optimization problems [85]–[87]. The expert
system has the capability to generate feasible design considering different rules, and
also it can be used to detect the feasible area of one design problem [51], [53], [88], [89].
The expert system can also be used to determine the constraints that must not be
violated and the constraints can be mildly violated.
Ontology is a knowledge representation method considering not only a single concept
but also the relations between different concepts. Semantic nets are often used to
present the ontology. If one treats the design variables, constraints, and objectives as
the nodes in semantic nets and generates semantic nets among those nodes, designers
may have a clearer and deeper understanding of the problem and the optimization
problem formulation may be more targeted. Similar to expert systems, ontology can help
to make judgements. In [90] and [91], ontology was used to represent the requirements
in engineering design. The relationships between different concepts in the ontology can
give a clearer insight of the design problem at the problem formulation stage. For
example, similar requirements can be detected through analyzing the ontology. Then,
the constraints of the optimization problem can be defined more appropriately by
employing knowledge. Additionally, ontology can be used to early validate system
requirements [92]. Thus, using ontology to guide formulation is a future research
direction.
2.3.2. Dimension reduction
The dimensionality of an optimization problem often determines the computational cost,
especially when choosing metamodel-based optimization methods. Dimension reduction
20
is a common way to improve the optimization efficiency. There are two kinds of methods,
one is screening to select the important variables and the other is mapping which maps
the high-dimensional data to the low-dimensional space. However, how to determine the
omittable variables in the screening methods and how to determine the dimensionality of
the lower-dimensional space in mapping method are two challenges for dimension
reduction.
As mentioned in Section 2.2, the dimensionality of the optimization problem can be
reduced through analyzing the causal graph of the problem [93]. Some design variables
are removed from the variable set due to their monotonic influences on the objective.
The monotonic influences can also be obtained from other knowledge, such as
equations or design rules.
In screening methods, sensitivity analysis is performed to determine the importance of
variables. The screening process can also be performed based on rules and frames in
an expert system. In this case, sensitivity analysis can be employed as a validation
method by checking the screening results from the expert system.
In the traditional mapping method, the dimensionality of the mapped low dimensional
space is always a question. Usually, the dimensionality is determined by the user and
often fixed at an arbitrary small number. Knowledge can be used to find out such
dimensionality. By analyzing relations between design variables, the dimensionality of
the lower dimensional space may be defined. In [94], a mapping method named as
generative topographic mapping was used to solve 30-dimensional airfoil design
optimization problems and different lower-dimensional spaces were tested. It is found
that the optimized result of two-dimensional lower spaces is the best. One reason is that
for the airfoil design problem, the naive 30 NURBS variables might have a more sensible
dimension of two. If the designers can find out that the 2-D spaces are the best through
knowledge, the optimization results may be more accurate. The ontology knowledge
base can also be applied to find the latent variables by analyzing relationships of design
variables.
Various dimension reduction methods are developed in the Data Mining field. Feature
selection is one of the dimension reduction methods by removing irrelevant and
redundant features to reduce dimensionality [95]. Two categories of feature selection
21
methods, filter methods and wrapper methods, were developed to select features [96]. In
filter methods, variables are ranked according to different principle criteria, such as
correlation criteria [97] and Mutual Information [98]. Wrapper methods use the prediction
performance of different sub-set of variables to reduce the dimensionality [99], [100].
Compared with wrapper methods, the computational cost of filter methods is cheap, but
the accuracy of filter methods is lower.
2.3.3. Decomposition
There are two categories of commonly used decomposition methods, one is based on
relations of disciplines and the other is based on correlations among variables. One
multidisciplinary design optimization problem is usually decomposed according to
relations among disciplines. A common problem in decomposition is that there is no
general method in generating the decomposition framework. In other words, a new
framework needs to be constructed for every different problem. The expert system may
give a way to generate the frame for different design problems with little or no human
intervention. For the variable-based decomposition method, the main challenge is
detecting the correlations of the variables with lower computational cost.
As mentioned in Section 2.2.2, a problem can be decomposed according to variables
rather than disciplines. The causal graph or Bayesian networks constructed based on
the variables and their relationships can be used to help finding out where the coupling
is in the problem. In [93], the three coupled loops can be reduced to one when breaking
the discipline-based DSM to the variable-based DSM. Thus, the graphic knowledge
representation methods have the capability in generating more efficient decomposition
results. A group of decomposition methods based on variables was rooted in the high
dimensional model representation (HDMR) method [80-81]. High dimensional problems
were decomposed into several sub-problems based on the sensitivity information of
different component functions in the HDMR model.
For engineering problems, correlations between different variables can be determined
through the documented (linguistic) knowledge or the analysis of the graphic knowledge,
such as ontology knowledge base. Then, decomposition based on the HDMR model
may be performed according to the obtained knowledge.
22
2.3.4. Metamodeling
Metamodels are widely employed to replace expensive simulation models. Different
metamodels have different properties. Therefore, how to select a suitable metamodeling
method is one of the tasks. Second, the accuracy of the metamodel is another issue
when approximating high-dimensional problems. The basic idea of improving the
accuracy is to generate more samples in the space when treating the problem as a
black-box. However, thousands of samples are still scarce for a 100-dimensional
problem. Moreover, for some specific metamodeling method such as RBF, adding more
samples may lead to over-shooting. To overcome this problem, other information should
be considered rather than simply regarding the problem as a black-box.
Artificial neural network (ANN) is one effective metamodeling method for nonlinear
problems [103]. Increasing the number of nodes and the number of layers can improve
the accuracy of the ANN model to a certain extent. However, in a fully connected ANN
with plenty of nodes, the number of weights needed to be estimated is very large and
often thousands of sample points are needed to generate an accurate set of weights.
Thus, how to reduce the number of nodes and the number of links, or in other words,
how to determine the structure of a neural network is one of the issues for ANN
approximation. Similar with ANN, the causal graph or Bayesian network is also a
structure based on nodes and links. Those graphic knowledge representation methods
can be used as a guide in generating the ANN structure. Another potential improvement
of ANN is to consider the values of intermediate variables. Even in a black-box function
model, actual values of some intermediate variables can be obtained by simulation.
However, such information is not considered in metamodel construction. After employing
causal graph to determine the structure of an ANN, values of intermediate variables can
be determined to improve the approximation accuracy. In other words, some hidden
layers in ANN can be taken to the surface as actual values and the links related to them
can be obtained.
In [10], a partial metamodel was employed to deal with high dimensional problems. In
that case, only selected component functions are constructed in the HDMR model
instead of constructing the complete model to reduce the function evaluations. A
component function is selected through the importance of the design variables via
estimated sensitivity information. By using knowledge of the engineering problems, such
23
as causal graphs or empirical equations, the important component functions may be
predetermined.
Fuzzy logic knowledge can be used to construct the prediction model. In fuzzy expert
systems, continuous inputs and outputs are transferred to fuzzy sets and they are linked
together by if-then rules. In prediction, the predicted fuzzy output will be converted back
to the continuous output. In [104], fuzzy logit knowledge was used to forecast energy
demand. A type-2 fuzzy rule-based expert system model was constructed to estimate
the stock prices [105].
Previous design knowledge can also be utilized by using existing samples and design
results in constructing the metamodel for the similar problems. The response value of
the metamodel may be different from the actual model, but the trend of the problems or
some interesting design spaces may be found through the metamodel. Multi-task
regression is a method to construct a regression model for different but related tasks by
analyzing data from all the tasks instead of constructing individual regression model for
each task [106]. Thus, combining with the data from previous design problems, a multi-
task regression model can be constructed on all the related designs. Additionally, with
the number of design increases, the multi-task models can be updated.
2.3.5. Optimization strategy
Most of the optimization strategies are based on samples or offspring. How to generate
new samples is the main question for metamodel-based optimization strategies. Some
strategies generated samples in the area with the highest uncertainty [107], some
generated samples uniformly in the desired space [108]–[110] and others generated
samples according to the probability distribution calculated from the previous metamodel
[6], [7]. Those methods are all based on the data captured from the analysis of the black-
box model.
Knowledge can also be employed to guide the sampling in the design space. Bayesian
network is a method that represents not only the graphic structure of the problem but
also the probability distribution of different variables [103]. Bayesian networks can also
be used to estimate the probability distribution for input variables when given a certain
value of the output. This distribution is named as likelihood. By predicting the likelihood
24
and generating samples following the likelihood trend, more improvements are expected
in the metamodel-based optimization. Additionally, if the priori probability distribution in
the Bayesian network is known before optimization, the initial sampling and the updating
can be performed following the knowledge of the priori probability distribution.
The equation is another kind of information that can be used in the optimization. The
optimization results of the empirical equations may not be accurate, but equations can
be used in helping generating new samples in the optimization iterations.
Evolutionary and metaheuristic optimization algorithms have been widely used for
optimization on inexpensive problems. In those algorithms, new samples at each
iteration are generated following the evolution theory or other crowd behavior. The
properties of the design problem can be captured and involved in the algorithm to guide
the generation of new sample points to improve the search efficiency of those
optimization algorithms. Most of the current knowledge-based operators in evolutionary
algorithms are developed for special cases. Therefore, general-applicable methods of
employing knowledge in assisting generating offspring should be developed.
2.3.6. Optimization, machine learning, and knowledge
There exists close ties between sample-based optimization and machine learning. One
of the issues in sample-based optimization is to determine the next samples without or
with less expensive function evaluations. In heuristic optimization algorithms (e.g., GA,
PSO, etc.), the expensive function is evaluated at all sample points, which increases the
computational cost significantly. Instead of using expensive functions, metamodels are
employed to predict the responses and only the interesting points are evaluated by the
expensive functions to improve the efficiency of metamodel-based optimization (e.g.,
MPS, EGO, etc.). Similar to optimization, machine learning methods also try to learn
from data (or samples). ANN, one of the machine learning models, is widely used as
prediction and classification models in manufacturing [111]. ANN essentially plays the
same role as a metamodel and it is in fact a commonly-used metamodel in design
optimization community. The ability of ANNs (especially deep ANNs) in dealing with
high-dimensional spaces and large amount of data has been noticed [112]. For example,
convolutional neural networks (CNN) can be used to deal with pictures with thousands or
even millions of pixels as inputs [113]. Instead of estimating the actual responses of
25
samples, judging the performance of samples via classification is another way to guide
sampling, which has been used in optimization algorithms by employing Bayesian
network classifier [17], [19] or Support Vector Machine (SVM) [114]. Another application
of classification models is found in heuristic optimization algorithms, where classification
is used to determine whether the next generation of sample positions improves the
search or not. Some other ANNs can also be employed to assist optimization.
Autoencoder can be used to reduce dimensionality [115]. Recurrent neural networks
(RNN) are usually used to learn from sequential data as the circular architecture of RNN
[116]. An optimization process also has a loop structure that the current optimal point
and samples can determine the next optimal solutions. Thus, there is a potential to use
RNN to learn the optimization process.
Data Mining (DM), also known as knowledge discovery from data (KDD), is a method to
help find knowledge from existing data [117]. Regression and classification are also
employed in DM to find the trend of the data. Another benefit obtained from DM is the
ability of pre-processing. As mentioned in Section 4.2, feature selection methods can be
used to reduce the dimensionality. Additionally, feature selection methods can also be
used in determining redundant constraints according to the data of constraints. If data of
intermediate variables can be obtained from simulations, feature selection can also be
applied in input-intermediate and intermediate-output pairs to identify the structure of
engineering problems.
Both machine learning methods and data mining methods, however, are also based on
samples, similar to sample-based optimization. Wu et al. suggested that domain and
application knowledge should be applied to design big data mining algorithms and
systems [118]. In machine learning, deep learning methods are developed to improve
the effectiveness of learning without engineering skills and domain expertise [119], but
the amount of training data and computational costs are large. Therefore, knowledge
can help both optimization and machine learning. In [120] and [121], fuzzy rules were
employed to predict the flying ash and the performance of a gasoline engine and the
results were similar to or outperformed the ANN predictions. Bayesian networks came
into sights of researchers, as they contain the structures of problems (knowledge) and
the probability distributions of variables (data). By combining knowledge and data
together, Bayesian networks have the potential to be applied in optimization to guide
sampling.
26
2.3.7. Summary remarks
To overcome the limitation of assuming black-box functions in MBDO, knowledge is
involved to help in solving large-scale optimization problems. Knowledge can be applied
at different stages of the optimization process. At the beginning, knowledge is very
useful in defining a reasonable and effective optimization problem, either in dimension
reduction or in constraint specification. During the optimization, knowledge can help in
metamodel construction and to guide generation of new samples.
In assisting optimization, equations tend to be most useful information. Graphic
knowledge such as causal graph and Bayesian networks can also be used in problem
formulation, metamodel construction, new samples generation and other processes of
optimization. Additionally, ontology knowledge base tends to be useful in the problem
formulation stage to determine the constraints and design variables. Another important
piece of information which is not considered yet is data records from previous similar
optimizations. In practice, sample points in similar optimizations can potentially be
employed for the current problem through modifications.
2.4. Review of RBF-HDMR
The general form of HDMR [122] is:
𝑓(𝒙) = 𝑓0 +∑𝑓𝑖(𝑥𝑖)
𝑑
𝑖=1
+ ∑ 𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗)
1≤𝑖<𝑗≤𝑑
+ ∑ 𝑓𝑖𝑗𝑘(𝑥𝑖, 𝑥𝑗 , 𝑥𝑘)
1≤𝑖<𝑗<𝑘≤𝑑
+⋯
+ ∑ 𝑓𝑖1𝑖2…,𝑖𝑙(𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑙)
1≤𝑖1<⋯<𝑖𝑙≤𝑑
+⋯
+𝑓12…𝑑(𝑥1, 𝑥2, … , 𝑥𝑑)
(2-1)
where, 𝑓0 is a constant representing the zero-order effect on 𝑓(𝒙) ; the first order
component function, i.e., 𝑓𝑖(𝑥𝑖), gives the effect of the variable 𝑥𝑖 acting independently
on the output 𝑓(𝒙), which can be either linear or nonlinear; 𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗), the second order
27
component function, describes the correlated contribution of variable 𝑥𝑖 and 𝑥𝑗 upon
𝑓(𝒙).
In RBF-HDMR [123], an RBF model with a sum of thin plate spline plus a linear
polynomial is employed to approximate the component functions. The RBF model is
shown as follows [123]:
𝑓(𝒙) =∑𝛽𝑖|𝑥 − 𝑥𝑖𝑠|2𝑙𝑜𝑔|𝑥 − 𝑥𝑖
𝑠|
𝑁
𝑖=1
+ 𝑃(𝑥)
∑𝛽𝑖𝑝(𝑥)
𝑁
𝑖=1
= 0
𝑃(𝒙) = 𝒑𝜶 = [𝑝1, 𝑝2, ⋯ 𝑝𝑞][𝛼1, 𝛼2,⋯𝛼𝑞]𝑇
(2-2)
Where 𝑥𝑖𝑠 is the sampled point of input variables; 𝜷 = [𝛽1, 𝛽2,⋯ , 𝛽𝑁] and 𝜶 are
parameters to be found. 𝑁 is the number of sample points. 𝑃(𝑥) is a polynomial function
and 𝑝 is the vector of basis of polynomial, chosen as (1, 𝑥1, 𝑥2,⋯ 𝑥𝑑), so 𝑞 = 𝑑 + 1. The
function ∑ 𝛽𝑖𝑝(𝒙)𝑁𝑖=1 = 0 is imposed on 𝜷 to avoid the singularity of distance matrix.
The modeling process is described as follows.
(1) Randomly choose a point 𝒙0 in the design space as the cut center. Evaluate 𝑓(𝑥)
at 𝒙0 to obtain the zeroth-order component function 𝑓0.
(2) To approximate the first-order component function 𝑓𝑖(𝑥𝑖), first generate samples
in the close neighborhood of the upper bound and lower bound of 𝑥𝑖. Evaluate
those two ends and model the component function as 𝑓𝑖(𝑥𝑖) by a one-
dimensional RBF for variable 𝑥𝑖 using those two points.
(3) Check the linearity of 𝑓𝑖(𝑥𝑖) . If the cut center is on the line formed by the
approximation model 𝑓𝑖(𝑥𝑖) , then consider 𝑓𝑖(𝑥𝑖) as linear and terminate the
modeling process for 𝑓𝑖(𝑥𝑖). Otherwise, rebuild the RBF model 𝑓𝑖(𝑥𝑖) by using the
cut center and the two end points. Generate a random point along 𝑥𝑖 to test the
accuracy of the newly built 𝑓𝑖(𝑥𝑖). If the relative error between the actual value
and the approximation one is larger than a given criterion (e.g., 0.01), the test
28
point and all the existing points will be used to rebuild 𝑓𝑖(𝑥𝑖) until sufficient
accuracy is obtained.
(4) Check the accuracy of the first-order HDMR model. Form a new point through
randomly combining the sample values for each input variable. Then, compare
the value predicted by the approximation model with the value obtained from the
original expensive function. If these two values are sufficiently close, it indicates
that no higher-order components exist in the model, the modeling process
terminates. Otherwise, go to Step 5.
(5) Combine the value of 𝑥𝑖 and 𝑥𝑗 (𝑗 ≠ 𝑖) in the existing samples with the rest of the
elements 𝑥𝑘(𝑘 ≠ 𝑖, 𝑗) at 𝒙0 to create new points in two-dimensional planes. One
of the new points is randomly chosen to test the first-order RBF-HDMR model. If
the approximation model goes through the new point, 𝑥𝑖 and 𝑥𝑗 are deemed not
correlated and continue to test the next pair of input variables. Otherwise, use the
new point as well as the aforementioned evaluated points to construct the
second order component function, 𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗). This sampling-remodeling process
continues iteratively for all two-variable correlation until convergence. The higher
order component functions can be constructed in the same manner of Step 5.
The above process of building an RBF-HDMR model adaptively models a problem and
leads to high model accuracy for high-dimensional problems. The construction process
is simple. Moreover, RBF-HDMR can significantly reduce the number of expensive
function evaluations in approximating high-dimensional problems.
Besides the original RBF-HDMR, there are several modifications of RBF-HDMR in the
literature. Cai et al. [124] proposed an enhanced RBF-HDMR (ERBF-HDMR) that uses
enhanced RBF model based on the ensemble model to increase the accuracy of HDMR.
Other types of metamodel were employed, instead of RBF, to construct the component
functions. Huang et al. [125] and Wang et al. [126] employed Support Vector Regression
(SVR) and Moving Least Square (MLS) to replace RBF model in RBF-HDMR
respectively to obtain more accurate metamodels. The mentioned modifications all focus
on how to improve the accuracy of RBF-HDMR. Although original RBF-HDMR is used in
this thesis to construct the partial metamodel-based optimization, the user may use other
variations as well.
29
2.5. Artificial neural network architecture
ANNs have been widely used in different fields for real-world problem approximation and
prediction [127]–[129], and the feedforward ANN is one of the most popular types.
Building a proper ANN, however, is still a nontrivial task due to difficulties of determining
the architecture of networks, which affects the prediction accuracy [130]. The
architecture of an ANN includes the number of hidden layers, number of hidden nodes,
and connections between nodes. An improper architecture of ANN may lead to
overfitting, which will reduce the accuracy of the metamodel. In general, the number of
layers and number of hidden nodes are determined based on experience. ANN with two
hidden layers usually provides more benefits for different types of nonlinear problems
compared with the network with one hidden layer [131]. On the other hand, different
guidelines for the number of hidden nodes are developed, including “2n+1”[132], “2n”
[133], “n/2” [134], and so on, where n is the number of input nodes, but none of them
outperforms the others when considering all kinds of problems. A full-connected layer
structure is usually used in ANN.
The main issue of determining the ANN architecture by experience is that the guidelines
may not perform well in every situation. Different research is conducted to develop a
more intelligent architecture determination method for different approximation tasks. The
Akaaike’s Information Criterion (AIC) was used to determine the number of hidden
nodes in ANN [134], where statistic properties of the training set were considered to
generate the network structure. Another kind of architecture determination method is
based on the accuracy of the network. Different structures were tested and the most
accurate one was selected as the desired structure. Srivastava et al. [135] used the
dropout method to find the appropriate structure of ANN to avoid overfitting. Nodes in the
ANN were randomly dropped out during the training process to find the most accurate
structure of the network. Optimization was also employed to search for the structure with
the highest accuracy. A layer-wise structure learning method based on multi-objective
optimization was developed to construct a deep neural network [136]. By employing the
structure learning method, the network was no longer fully connected, and some of the
connections were deleted based on approximation accuracy. Moreover, some of the
researchers focus on breaking the layer-wise structure of the neural network, which
means there exists links connecting nodes not in the adjacent layers. Genetic evolution
30
methods were employed to find out the optimum topology of the network in [137]. In
those aforementioned methods, the architecture of the network is purely determined
from data. To find a more accurate structure of ANN, large amount of computation is
usually required.
Another kind of structure determination method is based on knowledge. A knowledge-
based neural network was developed for microwave design problems [77]. The existing
knowledge, such as empirical formulations, was involved to construct the knowledge
layer in the network. In [68], the intermediate variables in Bayesian networks were used
as the hidden nodes to construct an ANN. However, the Bayesian network can only
represent the input-output relations between variables and mathematical relations
cannot be captured from the Bayesian network. Therefore, in this thesis, the Bayesian
network is employed to guide the modeling of the structure of the causal-ANN rather
than using the Bayesian network directly. Also, mathematical relations will be involved in
the Bayesian network and causal-ANN to construct a more accurate metamodel by
considering the values of intermediate variables in Bayesian networks.
2.6. Bayesian network and causal graph
Bayesian networks (BNs), also known as belief networks, belong to the family of
probabilistic graphical models (GMs) [138]. These graphical structures can be used to
represent knowledge about an uncertain domain. BNs can also be regarded as a
directed acyclic graph (DAG), which means that there is no circle or loop in the graph
[139]. A more formal definition of a BN is given as follows: a Bayesian network is an
annotated acyclic graph that represents a joint probability distribution over a set of
random variables [140]. Hence, in a BN, there are two main members, the variables and
the conditional probability distribution of each variable. BN provides an efficient way to
compute the posterior probabilities given the evidence by reducing the number of
parameters that are required to characterize the joint probability distribution of variables
[21-22].
Causal graph is one of the variances of BN, which represents the cause-effect relations
embedded in human thinking. Compared with the original BN, the edges in causal
graphs contain directions, which express the judgement that certain events or actions
will lead to particular outcomes. Causal graphs have been used in the decision making
31
field to represent relationships between different factors. Besides, causal graph is also a
useful tool in representing structures of engineering systems. Based on causal graphs,
the Dimensional Analysis Concept Modeling (DACM) framework was developed to
gather and organize the information associated with an engineering problem during the
concept design phase [8,9].
2.7. Summary
To overcome the challenge of search blindly in design optimization, the application of
knowledge to assist optimization is discussed in this chapter. The concepts of
knowledge in AI and product design are reviewed. In those fields, knowledge is
captured, represented, and reused to solve decision-making or design problems. Next,
some existing applications of knowledge assisting optimization are described and
categorized. Although the concept of knowledge may not explicitly appear in these
methods, the idea of involving knowledge in improving efficiency of optimization is
employed in these works. Finally, multiple future potential applications of knowledge in
optimization are discussed. Some related algorithms and theories are also introduced in
this chapter.
In this thesis, two kinds of knowledge, sensitivity information and causal relations are
employed to deal with the large-scale optimization problem. Next chapter will discuss
about how the sensitivity information is applied in constructing a partial metamodel in
optimization to reduce the dimensionality.
32
Chapter 3. Partial metamodel-based optimization (PMO) method
As introduced in Chapter 2, although RBF-HDMR is an efficient metamodeling method,
the cost of building a complete RBF-HDMR can still be very high for high-dimensional
problems. Also, RBF-HDMR requires structured samples. This is essentially in conflict
with the fact that the optimization process may lead the search anywhere in a design
space. One approach is to build a new RBF-HDMR in a smaller area, such as a trust
region. The cost of doing so is also too high as almost none of the existing points can be
inherited for the new model.
This work is based on the fundamental belief that optimization can be performed on an
imperfect or incomplete metamodel. Instead of building a costly complete metamodel, I
propose to use partial metamodels in the optimization process, in order to gain efficiency
without sacrificing, or even gain, search quality. To reduce the exponentially increasing
cost of building an accurate metamodel for high dimensional problems, partial RBF-
HDMR models of selected design variables are constructed at every iteration in the
proposed strategy based on sensitivity analysis. After every iteration, the cut center of
RBF-HDMR is moved to the most recent optimum point in order to pursue the optimum
To improve the performance of the PMO method, a trust region based PMO (TR-PMO)
is developed.
3.1. Algorithm description
To reduce number of expensive function calls, a partial RBF-HDMR is built at every
iteration according to the importance of variables in the proposed PMO method. The cut
center of RBF-HDMR model is moved after every iteration to the newest optimum point.
The flow chart of the PMO method is shown in Figure 3-1. For better understanding of
the procedure, an n-dimensional optimization problem is considered and the details of
the proposed method are explained as follows:
33
Step 1. Construct a first-order RBF-HDMR and use this metamodel for optimization. A
random cut center in the design space (i.e., 𝒙0) is selected. Then, the first-order RBF-
HDMR model (i.e., Eq. (3-1)) is built based on this cut center.
𝑓(𝒙) = 𝑓0(𝒙0) +∑ 𝑓𝑖(𝑥𝑖)𝑛
𝑖=1 (3-1)
Then, the first-order HDMR model is optimized to obtain the optimal point 𝒙𝑜𝑝𝑡. The cut
center 𝒙0𝑛𝑒𝑤 is moved to this newly found optimum point.
Construct first-order RBF-HDMR
Select one coordinate
Construct partial RBF-HDMR
Optimize
Stopping criteria met?
Output
No
Yes
Roulette
Optimize first-order RBF-HDMR
Sensitivity analysis
Figure 3-1: Flow chart of PMO
Step 2. Select one dimension. First, sensitivity analysis is done on the constructed first
order RBF-HDMR model and the normalized sensitivity indices of the variables are used
for quantifying the importance of each variable. Next, the sensitivity indices are sorted in
descending order to obtain the sensitivity set 𝑺 = [𝑠1, 𝑠2, … , 𝑠𝑛], where 𝑠1 is the sensitivity
index of the most important variable (highest index) and 𝑠𝑛 is the sensitivity index of the
least important variable (lowest index). Next, use the sensitivity set 𝑺 to construct the
34
probability density set 𝑮 = [𝑔1, 𝑔2, … , 𝑔𝑛] , where 𝑔𝑖 = 𝑠1 + 𝑠2 +⋯+ 𝑠𝑖, 𝑖 = 1,2,… , 𝑛 .
Hence, 𝑔1 = 𝑠1 and 𝑔𝑛 = 1. When determining which dimension is selected in the PMO
approach, the larger the value of sensitivity index, the higher the chance of being
selected for optimization. However, in most cases, the probability densities of the
dimensions are close to each other. To ensure the most sensitive dimension is picked
up, a speed control factor used in Mode Pursuing Sampling method [6] is also used in
PMO to adjust the sampling aggressiveness. With the adjustment, the probability density
set 𝑮 is changed to �̂� = [𝑔11 𝑟⁄ , 𝑔2
1 𝑟⁄ , … , 𝑔𝑛1/𝑟] , where 𝑟 is the speed control factor. To
avoid being trapped into the same solution and balance between exploration and
exploitation, a roulette wheel selection operator is used next to randomly select one
variable, 𝑥𝑘1, according to the set �̂�, where the subscript 𝑘1 is the index of the variable,
𝑘1 ∈ [1, 𝑛] and 𝑘1 is an integer. The index is stored in the selected index set 𝑲 = [𝑘1].
Step 3. Construct a partial RBF-HDMR of 𝑥𝑘1 and use the partial metamodel for
optimization. Once the variable is selected, a partial RBF-HDMR model with only one
variable is constructed based on the new cut center.
𝑓(𝑥𝑘1) = 𝑓0 + 𝑓𝑘1(𝑥𝑘1) (3-2)
Thus, the partial HDMR model is a one-dimensional function of only 𝑥𝑘1 with the rest of
variables taking the corresponding values of 𝒙0. Then, optimize the partial HDMR model
to obtain the optimum 𝑥𝑘1∗ . The cut center is moved to 𝒙0
𝑛𝑒𝑤 = (𝑥1, … , 𝑥𝑘1∗ , … , 𝑥𝑛)
𝑇, and the
function value 𝑓0 at the new cut center 𝒙0𝑛𝑒𝑤 , which is the current optimum value, is
calculated.
Step 4. Select the 𝑑-th variable. Assume before this step, (𝑑 − 1) variables have been
picked from all the variables (𝑑 ≥ 2), and the selected index set is 𝑲 = [𝑘1, 𝑘2, … , 𝑘𝑑−1].
The new cut center 𝒙0𝑛𝑒𝑤 equals to (𝑥1, … , 𝑥𝑘1
∗ , … , 𝑥𝑘2∗ , … , 𝑥𝑘𝑑−1
∗ , … , 𝑥𝑛)𝑇, and the function
value at 𝒙0𝑛𝑒𝑤 is selected as the new 𝑓0. After removing the selected variables, the left-
out sensitivity set is expressed as 𝑺 = {𝑠𝑖}, 𝑖 ∉ 𝑲, and the transferred probability density
set can be represented as �̂� = {𝑔𝑖1/𝑟}, 𝑖 ∉ 𝑲. The 𝑑-th variable is then selected through
the roulette wheel selection operator from the rest of un-picked variables. The index of
that variable 𝑘𝑑 is then added to the index set 𝑲.
35
Step 5. Construct a new partial RBF-HDMR model and use the new partial metamodel
for optimization. Once the 𝑑-th variable is selected, the partial RBF-HDMR model can be
constructed as follows.
𝑓(𝒙) = 𝑓0 +∑𝑓𝑘𝑖(𝑥𝑘𝑖)
𝑑−1
𝑖=1
+ 𝑓𝑘𝑑(𝑥𝑘𝑑) + ∑ 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗)
1≤𝑖≤𝑗≤𝑑−1
+∑𝑓𝑘𝑖𝑘𝑑(𝑥𝑘𝑖 , 𝑥𝑘𝑑)
𝑑−1
𝑖=1
(3-3)
As shown in Eq. (3-3), the samples used to construct components 𝑓𝑘𝑖(𝑥𝑘𝑖) (𝑖 = 1,2, . . , 𝑑 −
1 ) and 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗) (1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑑 − 1 ) are all located in the partial design space,
𝒙 = (𝑥𝑘1 , 𝑥𝑘2 , … , 𝑥𝑘𝑑−1)𝑇 , 𝒙 ∈ [𝒙𝑙𝑏 , 𝒙𝑢𝑏] , where 𝒙𝑙𝑏 and 𝒙𝑢𝑏 are lower bound and upper
bound of the design space. To reduce the number of function evaluations, function
values of most samples used to construct those components can be predicted by the
RBF-HDMR model built in the last iteration, which is represented as
𝑓(𝒙) = 𝑓0 +∑𝑓𝑘𝑖(𝑥𝑘𝑖)
𝑑−1
𝑖=1
+ ∑ 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗)
1≤𝑖≤𝑗≤𝑑−1
(3-4)
Eq. (3-4) is a function of 𝒙 = (𝑥𝑘1 , 𝑥𝑘2 , … , 𝑥𝑘𝑑−1)𝑇. Therefore, the function values of the
component function 𝑓𝑘𝑖(𝑥𝑘𝑖) (𝑖 = 1,2, . . , 𝑑 − 1) and 𝑓𝑘𝑖𝑘𝑗(𝑥𝑘𝑖 , 𝑥𝑘𝑗) (1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑑 − 1) in Eq.
(3-3) can be calculated via Eq. (3-4). Thus, to construct the new partial HDMR model,
only the points used to construct component functions 𝑓𝑘𝑑(𝑥𝑘𝑑) and 𝑓𝑘𝑖𝑘𝑑(𝑥𝑘𝑖 , 𝑥𝑘𝑑)
(𝑖 = 1,2, . . , 𝑑 − 1) need to be calculated by calling the actual function. Next, optimize the
d-dimensional partial HDMR and obtain the optimum solution 𝒙∗ = (𝑥𝑘1∗ , 𝑥𝑘2
∗ , … , . 𝑥𝑘𝑑∗ )𝑇 .
Combining with other fixed variables, the new cut center moves to
𝒙0𝑛𝑒𝑤 = (𝑥1, … , 𝑥𝑘1
∗ , … , 𝑥𝑘𝑑∗ … , 𝑥𝑛)
𝑇. The function value 𝑓0 at the new cut center is set to be
the current optimum value.
Step 6. Repeat Steps 4 and 5 until reaching the terminate criterion. In the PMO method,
the maximum number of iterations is chosen as the termination criterion. If the maximum
number of iterations is reached, the process is stopped and output the current cut center
𝒙0𝑛𝑒𝑤 as the optimum solution and the function value 𝑓0 at that cut center as the optimum
value; otherwise, go to Step 4 and repeat the procedure. The maximum number of
iterations gives the number of variables selected in to perform PMO method. Selecting
more design variables can improve the optimization results. However, more design
variables selected means more second-order component functions need to be
36
constructed, which means that much more samples need to be generated. Thus, the
negative influence in efficiency caused by selecting more variables is larger than the
positive influence in effectiveness. In practice, four or five selected design variables can
make a balance between effectiveness and efficiency.
3.2. Example of PMO
A 3-dimensional problem [143] shown in Eq. (3-5) is selected as an example to explain
the process of the PMO method step-by-step.
𝑓(𝒙) = −∑ 𝛼𝑖 exp [−∑ 𝐴𝑖𝑗(𝑥𝑗 − 𝑃𝑖𝑗)23
𝑗=1]
4
𝑖=1
𝑥1,2,3 = [0, 1]
(3-5)
Where, 𝛼 = [1, 1.2, 3,3.2]𝑇 , 𝑨 = [
3.0 10 300.1 10 353.0 10 300.1 10 35
] , 𝑷 = 10−4 [
3689 1170 26734699 4387 74701091 8732 5547381 5743 8808
] .The
theoretical optimum point is 𝒙∗ = [0.114,0.556,0.852]𝑇; the optimum value is -3.86.
The center of the design space, 𝒙0 = [0.5,0.5,0.5]𝑇 is selected as the initial cut center,
and the function value at this cut center is evaluated as 𝑓0 = −0.628. Then, a first-order
RBF-HDMR model is constructed based on this cut center as shown in Eq. (3-6).
𝑓(𝒙) = 𝑓0 + 𝑓1(𝑥1) + 𝑓2(𝑥2) + 𝑓3(𝑥3) (3-6)
Genetic Algorithm (GA) from Matlab is employed to optimize the first-order RBF-HDMR,
and the optimum point 𝒙∗ = [0.132,0.816,0.774]𝑇 is found with the function value equal to
-2.143. The new cut center 𝒙0𝑛𝑒𝑤 moves to the optimum point. Then, sensitivity analysis
is performed based on the first-order RBF-HDMR model. The normalized sensitivity
index of the variables are 0.590, 0.289 and 0.121, respectively, for 𝑥1, 𝑥2, and 𝑥3. The
sensitivity indices are sorted in descending order to obtain the sensitivity indices set
𝑺 = [0.590, 0.289, 0.121] . The probability density set is then obtained as 𝑮 =
[0.590,0.879, 1.000] . After adjusting the speed control factor 𝑟 = 2 , the transferred
probability density set �̂� is [0.768,0.938,1.000]. The roulette wheel selection is
performed to determine the variable constructed in the first iteration. Thus, a partial RBF-
HDMR, which only contains zeroth-order component function (i.e., 𝑓0) and first-order
component function of 𝑥1 (i.e., 𝑓1(𝑥1)), will be constructed as follows:
37
𝑓(𝒙) = 𝑓0 + 𝑓1(𝑥1) (3-7)
One-dimensional optimization is performed on the partial RBF-HDMR model, and the
optimum point is found 𝑥1∗ = 0.131 . The cut center moves to the new point 𝒙0
𝑛𝑒𝑤 =
[0.131,0.816,0.774]𝑇 along 𝑥1. The function value at the new cut center is -2.143.
Next, excluding the first variable, the new sensitivity indices value set is 𝑆2 =
[0.705,0.295] and the transferred probability density set �̂� is [0.840,1]. After a roulette
wheel selection process, variable 𝑥2 is selected as the next constructed variable.
Therefore, a partial RBF-HDMR (as shown in Eq. (3-8)) with zeroth-order component
function, first-order component function, and second-order component of (𝑥1, 𝑥2) is
constructed based on the new cut center.
𝑓(𝒙) = 𝑓0 + 𝑓1(𝑥1) + 𝑓2(𝑥2) + 𝑓1,2(𝑥1, 𝑥2) (3-8)
Figure 3-2 shows the samples used to construct the partial model that is shown in Eq.
(3-8). The cut-center of the new partial model is moved to the current optimal point (star
in Figure 3-2). New samples need to be generated to construct the component functions
in Eq. (3-8). The samples at the 𝑥1 axis (triangle in Figure 3-2) will be evaluated through
previous partial model, i.e., Eq. (3-7). The samples at 𝑥2 axis and at the 𝑥1−𝑥2 plane
(circle in Figure 3-2) will be calculated through the real function. Therefore, the response
values of 5+8=13 samples will be evaluated from the real function when constructing the
partial model in Eq. (3-8).
38
New cut-center
Samples estimated from previous model
Samples estimated from real function
Figure 3-2: Samples used to construct Eq. (3-8).
At this time, this two-dimensional partial HDMR model is optimized and the optimum
point 𝒙12∗ = [0.134,0.575]𝑇 is obtained. Replace the first and second value of the cut
center, the optimum point in this iteration is generated, 𝒙0𝑛𝑒𝑤 = [0.134,0.575,0.774]𝑇, and
the function value at this optimum point is -3.3654, which is much closer to the
theoretical optimum point. Finally, only 𝑥3 is left, and the RBF-HDMR with one zeroth-
order, three first-order and three second-order components are built. The optimum point
𝒙12∗ = [0.132,0.568,0.863]𝑇 with optimum value -3.847 can be found. In this example,
3x5=15 points are used to construct the initial first-order RBF-HDMR model; five new
points are used to construct the partial HDMR model shown in Eq. (3-7), and 13 points
are used to construct the model of Eq. (3-8). Adding the initial cut center and two
optimum points obtained in two iterations, in total 36 new points are involved to finish the
second iteration. On the other hand, 1+3×5+3×8+1=41 sample points are needed to
construct and optimize a complete second-order RBF-HDMR model. Using GA to
optimize the complete RBF-HDMR with the same initial cut center, the optimum value is
-3.013. Hence, the PMO method can find a smaller optimum with higher efficiency than
optimizing on a complete RBF-HDMR model.
3.3. Properties of PMO
There are two key strategies in PMO. The first one is that a partial HDMR model is used
in optimization. Since not all of the variables are involved in the partial HDMR model, the
number of sample points used to construct the HDMR model is much less than building
the complete model. For instance, for a 10-dimensional problem, assuming that five
39
points are used to construct each first-order component function and eight points are
needed to construct each second-order component function, constructing a full second-
order HDMR needs 1+10×5+45×8=411 sample points, where 45 is the number of
second-order component functions. On the other hand, assuming a second-order partial
HDMR is built with five iterations (i.e., five variables are involved in the partial HDMR
with 10 possible second-order component functions), one only needs to generate
5×5+10×8=105 expensive sample points during the iterations, adding the initial
1+10×5=51 samples for the first order model at the start of PMO, the total number of
sample points is only 156, about one third of the cost of the complete model approach.
Another important strategy used in PMO is the moving cut center. In PMO process, the
cut center is moved at every iteration to the current optimum point, and a new partial
RBF-HDMR is constructed based on the new cut center. That means PMO does not
focus on the global accuracy of the HDMR model but pay more attention on the
accuracy around the interesting area (i.e., the area around the current optimum point).
With the moving cut center, an HDMR model will be built in a more interesting area at
every iteration. Moreover, when a new variable is selected to be added to the partial
HDMR, one can use the former partial HDMR model to predict the values at the new
samples, rather than invoking the actual expensive function. Although there is a risk in
using the former partial HDMR due to the moving cut-center and inaccuracy of the
models themselves, such a risk is mitigated by the PMO process, as evidenced from the
test results in the next section. It is easy to see that no matter how many iterations PMO
takes, the total number of function calls in PMO equals to the number of sample points
used to construct the final partial HDMR, plus those used for constructing the first-order
HMDR model at the beginning.
In addition, sensitivity analysis is employed in the PMO process to help selecting the
most important variables to optimize, rather than randomly selecting variables. The
roulette wheel selection process helps to balance the exploration and exploitation
phases to avoid being trapped in a local optimum.
3.4. Testing of PMO
A number of numerical benchmark functions are selected to test the performance of the
PMO algorithm. In this test, the proposed PMO algorithm is directly compared to the
40
approach of optimizing a complete RBF-HDMR. An RBF-HDMR model is deemed
“complete” if the modeling process is terminated according to the modeling process as
described in Chapter 2. In other words, “complete” means the construction process of
RBF-HDMR is completed. In RBF-HDMR construction, before constructing the second-
order component functions, the accuracy of the first-order RBF-HDMR is checked. If the
first-order RBF-HDMR is accurate enough, no second-order component functions are
built and the constructing process will be terminated. Additionally, before constructing
each second-order component function, whether the two variables, 𝑥𝑖 and 𝑥𝑗 , are
correlated or not is checked. If 𝑥𝑖 and 𝑥𝑗 are not correlated, the component function
𝑓𝑖𝑗(𝑥𝑖, 𝑥𝑗) will not be built in the RBF-HDMR model. Hence, in the complete RBF-HDMR
model, not all of the component functions are constructed in some cases. A full RBF-
HDMR, however, indicates that all first-order and second-order component functions
have been constructed. A complete RBF-HDMR may have skipped modeling of some of
the second-order component functions, and thus costs less than a full RBF-HDMR.
SUR-T1-14 [144], Rosenbrock [145], Trid [145], F16 [143], Griewank [145], Ackley [145],
Rastrigin [145], SUR-T1-16 [143], Powell [143] and Perm [145] problems are chosen as
the benchmark problems, which are listed in Appendix. In the test, the maximum number
of points used to construct a first-order and second-order component function in both
PMO algorithm and RBF-HDMR is set to six and eight, respectively. The maximum
number of iterations of PMO is set to be five for the test problems. Additionally, GA from
MATLAB global optimization toolbox is employed as the optimizer in PMO and RBF-
HDMR optimization, and the settings of GA are all as default. Each problem is run 30
times independently. The initial cut centers of both methods are randomly chosen in the
30 runs. The average of the found optimum values (𝑓∗) and the number of function
evaluations (NFE) are recorded to illustrate the effectiveness and efficiency of PMO. The
results are summarized in Table 3-1. Note that the NEF values in the table are the
average results of the 30 runs, so the decimal values appear. The box-plots of 𝑓∗ are
shown in Figure 3-3.
Table 3-1: Optimization results with numerical benchmark problems.
dim
Actual optimum
PMO Optimizing complete RBF-
HDMR
𝑓∗ NFE 𝑓∗ NFE
41
SUR-T1-14 10 0 21.38 156.6 74.33 161.3
Rosenbrock 10 0 107.08 153.5 187.61 194.5
Trid 10 -210 151.7 150.4 618.11 161.7
F16 16 25.88 25.93 151.0 26.76 397.2
Griewank 20 0 3.19 167.0 6.10 194.2
Ackley 20 0 10.31 236.6 21.07 1547.1
Rastrigin 20 0 196.85 158.0 234.27 111.2
SUR-T1-16 20 0 837.61 231.0 2060.3 430.1
Powell 20 0 596.96 215.4 7222.8 434.6
Perm 20 0 5.69e51 238.0 2.45e52 1625.0
As shown in Table 3-1, for all ten problems, the proposed method obtained a smaller
optimum value than directly optimizing RBF-HDMR model. Figure 3-3 gives the box-
plots of the optimum values of each problem. It can be found that for almost all the
problems, the ranges of 𝑓∗ in the 30 runs of PMO are smaller than the ranges of
optimizing a complete RBF-HDMR, except for Rosenbrock and Ackley. This means
PMO is more robust in optimization. From the perspective of the cost (NFEs), PMO
clearly costs less than RBF-HDMR except for Rastrigin. Such an advantage is more
distinct for higher scale problems. For twenty-dimensional functions such as Ackley and
Perm, due to the structure of the function, the cost to construct a second-order
metamodel becomes high because it has more second-order pairs. For PMO, because
the maximum number of iterations is fixed as five, the maximum number of second-order
component functions need to be constructed is 10, which is much smaller than building a
full 20-dimensioanl HDMR function (i.e., 190). For Rastrigin, since the problem is
decomposable, when using the process described in Chapter 2 to construct RBF-HDMR,
because the metamodeling process is terminated after construction of all first-order
component functions, the number of sample points used to construct the complete RBF-
HDMR is thus very small. However, due to the lower accuracy of partial RBF-HDMR,
some second-order components are constructed in PMO. Hence in the case that the
problem is decomposable, the advantage of PMO is not revealed.
0 50 100 150 200 250
PMO
RBF-HDMR
50 100 150 200 250 300 350 400 450
PMO
RBF-HDMR
42
(a) SUR-T1-14 (b) Rosenbrock
(c) Trid (d) F16
(e) Griewank (f) Ackley
(g) Rastringin (h) SUR-T1-16
(i) Powell (j) Perm
Figure 3-3: Box-plots of optimized values.
Next, three benchmark functions, including SUR-T1-14, Griewank and Ackley, with three
different dimensions (10, 20 and 30) are randomly chosen in order to see the
performance change of PMO as the problem dimensionality increases. 30 optimization
runs are performed for each problem in each dimension. The results are shown in Table
3-2.
Table 3-2. Optimized results with benchmark functions in different dimensions.
dim
Actual optimum
PMO Optimizing complete RBF-
HDMR
𝑓∗ NFE 𝑓∗ NFE
SUR-T1-14 10 0 21.38 156.6 74.33 161.3
0 200 400 600 800 1000
PMO
RBF-HDMR
26 26.5 27 27.5
PMO
RBF-HDMR
0 5 10 15 20 25 30 35
PMO
RBF-HDMR
6 8 10 12 14 16 18 20 22
PMO
RBF-HDMR
150 200 250 300 350
PMO
RBF-HDMR
0 1000 2000 3000 4000 5000 6000
PMO
RBF-HDMR
0 2000 4000 6000 8000 10000
PMO
RBF-HDMR
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
x 1052
PMO
RBF-HDMR
43
20 0 214.00 171.5 1117.5 434.1
30 0 863.29 221.7 3971.4 836.3
Griewank
10 0 1.19 93.5 1.28 138.2
20 0 3.19 167.0 6.10 194.2
30 0 28.10 204.2 37.03 223.9
Ackley
10 0 5.55 165.9 20.55 407.4
20 0 10.31 236.6 21.07 1547.1
30 0 11.24 264.9 21.37 3387.0
As shown in Table 3-2, with the increase of problem dimensions, the advantages of
PMO method over RBF-HDMR become larger. For SUR-T1-14 problem, in 10
dimensions, the average 𝑓∗of PMO is 21.38, which is 29% of the RBF-HDMR’s result.
When the dimension is increased to 30, the optimum result of PMO reduces to 21% of
the RBF-HDMR’s result. The advantage of PMO over higher dimensional problems
becomes clearer in terms of NFEs. As mentioned before, most samples are used to
construct the second-order component functions in high-dimensional problems. For
Ackley function, because every variable pair has strong correlations, with the increase of
dimension, the number of second-order functions becomes very large. Thus, the number
of sample points increases significantly to build a complete RBF-HDMR. For SUR-T1-14
and Griewank, the variables have mixed weak and strong correlations. Hence, the PMO
savings in terms of NFE is milder for SUR-T1-14 and Griewank than for Ackley when the
dimensionality arises.
The 10-dimensional SUR-T1-14 problem is chosen to show which dimensions are
selected through the roulette wheel selection method at each iteration. The SUR-T1-14
problem is optimized five times with PMO and the data are listed in Table 3-3. The index
is the sensitivity value of each dimension and number of iterations represents at which
iteration the variable is selected.
Table 3-3: Dimensions selected in PMO on SUR-T1-14 for five independent runs.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9 𝑥10 Optimum
1 Index 0.116 0.114 0.108 0.107 0.102 0.099 0.091 0.090 0.088 0.083
16.87 Rank 1 2 3 4 5 6 7 8 9 10
No. of iterations
1 3 5 4 2
44
2 Index 0.117 0.112 0.107 0.107 0.102 0.096 0.096 0.093 0.086 0.083
15.44 Rank 1 2 3 4 5 6 7 8 9 10 No. of
iterations 2 1 3 4 5
3 Index 0.112 0.114 0.112 0.107 0.100 0.098 0.095 0.091 0.088 0.081
14.92 Rank 2 1 3 4 5 6 7 8 9 10 No. of
iterations 3 1 2 5 4
4 Index 0.112 0.117 0.107 0.107 0.102 0.101 0.095 0.087 0.086 0.082
19.43 Rank 2 1 3 4 5 6 7 8 9 10 No. of
iterations 4 3 2 1 5
5 Index 0.116 0.114 0.108 0.105 0.102 0.103 0.095 0.091 0.088 0.081
22.89 Rank 1 2 3 4 6 5 7 8 9 10
No. of iterations
3 2 5 1 4
Figure 3-4: Convergence plot of PMO in SUR-T1-14 problem.
As shown in Table 3-3, dimensions 𝑥1 to 𝑥5, which have larger sensitivity values, are
more likely to be picked up in the five runs. The selection, however, does show its
stochastic nature. Different selection schemes lead to different optimum solutions with
slight variations for the test problem.
Figure 3-4 illustrates the current optimal value obtained from PMO in seven iterations in
SUT-T1-14 problem. As shown in Figure 3-4, from the third to the fifth iteration, the
optimization results do not improve. With the number of variables involving in the RBF-
45
HDMR model increasing, the accuracy of the HDMR model decreases, which may
influence the optimization result.
3.5. Trust Region based PMO
The performance of PMO can be further improved by applying different strategies when
optimizing each partial model. Trust region is often used as a strategy to guide the
optimization method to find optimum, and balancing the exploration and exploitation
phases. In this section, a simple trust region strategy is added when optimizing the
partial model at each iteration to generate a higher performance version of PMO.
The trust region strategy follows the description of reference [146]. The approximation
accuracy ratio 𝑟𝑎𝑡 at the 𝑡-th iteration can be calculated via the following equation,
𝑟𝑎𝑡 =𝑓(𝒙0,𝑡) − 𝑓(𝒙𝑡
∗)
𝑓(𝒙0,𝑡) − 𝑓(𝒙𝑡∗)
(3-9)
Where, 𝒙0,𝑡 is the center of the design space, 𝒙∗ is the optimal point, 𝑓(𝒙0,𝑡) and 𝑓(𝒙𝑡∗)
are the responses of the approximate model at 𝒙0,𝑡 and 𝒙𝑡∗, respectively. 𝑟𝑎𝑡 gives the
accuracy of the current metamodel, and the value of 𝑟𝑎𝑡 determines the shrinkage or
enlargement of the design space. The new size (𝐿𝑡+1) of the trust region is defined as
follows.
𝛿𝑡+1 = {
max (𝑐0𝐿𝑡, 𝐿𝑚𝑖𝑛) 𝑟𝑎𝑡 < 0 max (𝑐1𝐿𝑡, 𝐿𝑚𝑖𝑛) 0 ≤ 𝑟𝑎𝑡 < 𝜏1
𝐿𝑡 𝜏1 ≤ 𝑟𝑎𝑡 < 𝜏2
min(𝑐2𝐿𝑡, 𝐿𝑚𝑎𝑥) 𝑟𝑎𝑡 ≥ 𝜏2
(3-10)
Where, 𝜏1 and 𝜏2 are two positive constants to judge the accuracy of the metamodel and
𝜏1 < 𝜏2 < 1, 𝑐0, 𝑐1, and 𝑐2 are positive constant ratios to shrink or enlarge the trust region
where 𝑐0 ≤ 𝑐1 < 1 < 𝑐2, 𝐿𝑡 is the size of the current trust region, and 𝐿𝑚𝑖𝑛 and 𝐿𝑚𝑎𝑥 are
the minimal size and maximal size of trust region. In this thesis, the values of the
parameters are set as 𝜏1 = 0.25 , 𝜏2 = 0.75 , 𝑐0 = 0.25 , 𝑐1 = 0.5 , and 𝑐2 = 2 . Define
𝐿𝑚𝑖𝑛 = 0.01𝐿𝑚𝑎𝑥 and 𝐿𝑚𝑎𝑥 is set to be the size of the original design space.
For the center of the trust region, if 𝑟𝑎𝑡 < 0, it means that the objective function value of
current optimum (𝒙𝑡∗) is worse than the value of the center (𝒙0,𝑡). Thus, the center will not
46
move, i.e., 𝒙0,𝑡+1 = 𝒙0,𝑡 . Otherwise, the center moves to the current optimum, i.e.,
𝒙0,𝑡+1 = 𝒙𝑡∗.
The flowchart of trust region based PMO (TR-PMO) algorithm is shown in Figure 3-5,
which is similar to the flowchart of PMO as shown in Figure 3-1 with the insertion of the
Trust Region box in the flow. In this algorithm, the trust region strategy is used to find a
better solution in each partial metamodel. Assume that at the d-th iteration, coordinates
𝑘1 , 𝑘2 , … , and 𝑘𝑑 are selected to construct the partial RBF-HDMR model and the
current cut center and optimal solution are 𝒙0 and 𝒙∗, respectively. The steps as related
to the trust region are introduced as follows.
Step A. After optimizing on a partial RBF-HDMR, check if reaching the maximal iteration
number. If the maximal iteration number is reached, the trust region loop terminates and
the process goes to Step 6 as described in Section 3.1 to select a new coordinate,
otherwise, continue to Step B.
Step B. Calculate the appropriate accuracy ratio 𝑟𝑎𝑡 via Eq. (3-9). The partial HDMR
model of d variables is used to calculate the value 𝑓(𝒙0) and 𝑓(𝒙∗).
Step C. Determine the new trust region. If 𝑟𝑎𝑡 < 0, the cut center remains; otherwise, the
cut center will move to the current optimal point. Then, Eq. (3-10) is employed to shrink
or enlarge the trust region. Note that only the upper and lower bounds of the selected
variables are modified and the regions of other variables remain unchanged.
Step D. Generate a certain number of random sample points in the trust region (e.g.,
five). These new samples are used to update the partial model. If the cut center does not
move, the sample points used to construct the previous partial model can be inherited. If
the cut center moves, only these new samples in the updated trust region are used to
build a metamodel for optimization. Then go back to Step A.
47
Construct first-order RBF-HDMR
Select one coordinate
Construct partial RBF-HDMR
Optimize
Stopping criteria met?
Output
No
Yes
Roulette
Optimize first-order RBF-HDMR
Sensitivity analysis
Reach maximal iteration
Trust
region
No
Yes
Figure 3-5: Flowchart of TR-PMO.
To benchmark the performance of TR-PMO, two effective optimization strategies
developed for HEB problems are chosen for comparison, i.e., Trust Region based Mode
Pursuing Sampling method (TRMPS) [7] and Optimization on Metamodeling-supported
Iterative Decomposition (OMID) [8]. PMO also participates in the comparison.
The same ten numerical problems used in Section 3.4 are used to perform the
comparison and each problem is repeated 10 times. The numbers of points used to
construct the first- and second-order component functions are five and eight
respectively. For PMO, the number of sample points used to construct first- and second-
order components is increased to make a fair comparison with other methods with
similar NFE. Note that PMO cannot terminate at a certain NFE, so the NFE is controlled
to be as close as possible to the NFE used in TR-PMO and the average NFE of PMO is
also listed in Table 6. Additionally, the number of selected variables is set to be four for
48
both PMO and TR-PMO. The setting of trust region in TR-PMO is introduced earlier in
this section. The parameters in TRMPS and OMID are set the same as in Refs. [7] and
[8], as shown in Table 3-4 and Table 3-5. The maximal number of function evaluations of
TRMPS and TR-PMO is set to be the average number of function calls used by TR-PMO
in each benchmark. The results are shown in Table 3-6.
Table 3-4: TRMPS parameter settings.
𝑅𝑚𝑖𝑛 𝑅𝑚𝑎𝑥 Stall iterations 𝑘𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑅𝑠,𝑖𝑛𝑖𝑡𝑖𝑎𝑙 𝑅𝐵,𝑖𝑛𝑖𝑡𝑖𝑎𝑙
0.01 1 5 0.7 0.25 1
Table 3-5: OMID parameter settings.
𝑁𝐼𝑛𝑖𝑡 𝑛𝑐𝑜𝑚𝑝 𝑛𝑏𝑎𝑠𝑖𝑠 𝑁𝑎𝑠
10 × 𝑑𝑖𝑚 2 2 5 × 𝑑𝑖𝑚
In Table 3-6, the NFE data in the fourth column is the number of function evaluations
used in TR-PMO, TRMPS, and OMID, while the data in the eighth column is the number
of function evaluations used in PMO. The values of NFEs of PMO cannot be used to
compare the efficiency of PMO with that of TR-PMO but they show that TR-PMO and
PMO are compared in the similar NFE level. As shown in Table 3-6, with similar NFE,
TR-PMO outperforms PMO in optimizing all the ten benchmark functions. In PMO, the
partial HDMR model is a static metamodel at each iteration, while the trust region
strategy can actively add points to find better results for each partial model optimization.
The cost of NFEs of TR-PMO is in general higher than that of PMO however.
Table 3-6: Optimization results of using TR-PMO, TRMPS OMID and PMO.
dim
Actual optimum
NFE f* NFE f*
TR-PMO TRMPS OMID PMO
SUR-T1-14 10 0 276 19.64 20.11 91.23 279 20.01
Rosenbrock 10 0 157 63.55 273.23 2587.4 150 108.98
Trid 10 -210 165 100.67 331.7 3227.0 154 427.38
F16 16 25.88 276 26.98 25.92 30.97 233 27.08
Griewank 20 0 160 2.11 5.67 39.64 169 15.72
Ackley 20 0 287 9.68 15.83 17.38 280 12.29
49
Rastrigin 20 0 189 159.9 124.39 214.02 215 186.31
SUR-T1-16 20 0 255 935.87 86.45 2178.6 275 1079.0
Powell 20 0 288 157.05 71.93 2827.1 281 661.00
Perm 20 0 296 1.92e51 2.36e49 1.39e49 298 5.53e52
It also can be found that in case of SUR-T1-14, Rosenbrock, TRID, Griewank, and
Ackley functions, TR-PMO obtained better results than TRMPS while in other problems
the results of TR-PMO are worse (boldfaced numbers are the best for each problem).
On the other hand, compared with OMID, TR-PMO performs better for almost all
benchmark problems except for Perm function. TRMPS and OMID are two effective
optimization strategies for high dimensional problems but often need much more
function calls to reach a good optimal solution. When the allowed number of function
calls is limited to a few hundreds, TR-PMO method has comparable or better
performances than TRMPS and OMID. This is because for TRMPS and OMID, samples
are generated in the entire design space. To adequately cover a high-dimensional
space, a comparatively larger amount of samples are needed for both methods. For TR-
PMO, in 10-dimensional problems, selecting four variables seems to be enough for
obtaining acceptable results with scarce samples. However, in 20-dimensional problems,
four variables are only 1/5 of the total variables, which limits the optimization
performance of TR-PMO. Also, it is noticed that the range of objective function values in
SUR-T1-16, Powell and Perm problems are too large and a small change in the design
variables causes significant changes in function values. It is likely the reason that TR-
PMO did not perform as well as TRMPS for these cases.
In summary, when the number of samples is limited, the advantages of using a partial
metamodel emerge and TR-PMO shows better or comparable results as other methods.
3.6. Application to Airfoil Design
After testing with benchmark functions, both PMO and TR-PMO are applied to an airfoil
design problem as shown in Figure 3-6. The symbol 𝛼 is the attack angle and 𝑉∞ is the
flow velocity. Class function/shape function airfoil transformation representation tool
(CST) [147], as shown in Eq. (3-11), is used to model the geometry of the airfoil.
50
{
𝜉𝑈(𝜓) = 𝜓
0.5(1 − 𝜓)1.0∑ 𝐴𝑢𝑖5!
𝑖! (5 − 𝑖)!𝜓𝑖(1 − 𝜓)5−𝑖
5
𝑖=0+ 𝜓∆𝜉𝑈
𝜉𝐿(𝜓) = 𝜓0.5(1 − 𝜓)1.0∑ 𝐴𝑙𝑖
5!
𝑖! (5 − 𝑖)!𝜓𝑖(1 − 𝜓)5−𝑖
5
𝑖=0+ 𝜓∆𝜉𝐿
(3-11)
Where, 𝜉𝑈 and 𝜉𝐿 are the geometry function of the upper and lower surfaces of the
airfoil, respectively; 𝜓 is the non-dimensional horizontal coordinate; ∆𝜉𝑈 and ∆𝜉𝐿 are the
thickness ratios of the trailing edge of upper and lower surfaces, which can be
represented by the distance between the upper (or lower) surface and the x-axis at
trailing edge; 𝐴𝑢 and 𝐴𝑙 are the coefficients of the shape function. In this example, the
airfoil is a closed curve, so the trailing edge thicknesses of upper and lower surfaces are
zero, i.e., ∆𝜉𝑈 = 0 and ∆𝜉𝐿 = 0. In this parametric function, six upper surface coefficients
and six lower surface coefficients are selected as the design variables. The NACA0012
airfoil is selected as the baseline airfoil in this design problem, and the coefficients of
NACA0012 are shown in Table 3-7. The upper and lower boundaries of the design
variables are 130% and 70% of the baseline.
V�
α
Δξ U
Y
XO
Δξ L
Figure 3-6: Airfoil design problem.
Table 3-7: Parameters of NACA0012.
parameter 0Au
1Au 2Au
3Au 4Au
5Au
Initial value 0.1703 0.1602 0.1436 0.1664 0.1105 0.1794
parameter 0Al
1Al 2Al
3Al 4Al
5Al
Initial value -0.1703 -0.1602 -0.1436 -0.1664 -0.1105 -0.1794
The objective of the airfoil design problem is to maximize the lift-to-drag ratio (L/D). The
constraint of the problem is that the maximum thickness (𝑡𝑚𝑎𝑥) of the new airfoil is not
less than the baseline value (𝑡𝑚𝑎𝑥𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒). Thus, the optimization model is as shown in Eq.
(3-12).
51
min −𝐿
𝐷
𝑠. 𝑡. 𝑡𝑚𝑎𝑥𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 − 𝑡𝑚𝑎𝑥 ≤ 0
0.7𝑥𝑖𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒 ≤ 𝑥𝑖 ≤ 1.3𝑥𝑖
𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
𝑖 = 1,2, … ,12
(3-12)
Software XFOIL [148] is employed to calculate the value of L/D. In this test, the Mach
number of the flow is 0.5, and the Reynolds number is 5,000,000. The results obtained
by optimizing RBF-HDMR model directly are also listed for comparison. 30 independent
runs are carried out for each method. The settings of all methods are the same as in the
numerical tests. The average optimization results over 30 runs are shown in Table 3-8. It
should be noted that the constraint is considered as a cheap constraint.
Table 3-8: Optimization results with airfoil design problem.
dim
TR-PMO PMO Optimizing complete
RBF-HDMR
𝑓∗ NFE 𝑓∗ NFE 𝑓∗ NFE
Airfoil design 12 -117.94 295.1 -106.87 175.5 -51.06 584.5
(a) 𝑓∗ of airfoil design (b) NFE of airfoil design
Figure 3-7: Optimization results on the airfoil design problem
As shown in Table 8, the average NFE of PMO is only 30% of that used to construct a
complete RBF-HDMR model, but the objective is more than twice better than the
optimum value obtained from optimizing RBF-HDMR model. TR-PMO achieves a better
optimum than PMO with more NFEs, which is still 50% of the cost of the RBF-HDMR
approach. Figure 3-7 illustrates the box-plots of 𝑓∗ and NFE of the three methods. It can
be found that PMO and TR-PMO have similar robustness, which is better than
-120 -100 -80 -60 -40 -20 0
TR-PMO
PMO
RBF-HDMR
200 300 400 500 600
TR-PMO
PMO
RBF-HDMR
52
optimizing RBF-HMDR. The variations of NFEs for the three approaches are very
similar.
3.7. Summary
This chapter proposed a Partial Metamodel-based Optimization (PMO) algorithm to deal
with High-dimensional, Expensive, and Black-box (HEB) problems. Instead of building
the complete RBF-HDMR model, a series of partial RBF-HDMR models are constructed
to reduce the number of function evaluations in high-dimensional optimization problems.
To balance the exploration and exploitation phases of the method, a roulette wheel
selection process is employed to select variables to construct the partial HDMR model,
according to the sensitivity index values of all variables. The cut center of the partial
HDMR model at each iteration moves to the newly found optimum point to achieve
higher optimization performance. The HDMR model in previous iterations is used to
predict the function values used in constructing a new partial RBF-HDMR model. The
proposed method is compared with optimizing a complete RBF-HDMR using ten
numerical benchmark functions. PMO obtained better optimum solutions than optimizing
a complete RBF-HDMR, using fewer function calls in almost all the problems. A trust
region strategy is combined with PMO to improve the performance of PMO, and thus the
trust region based PMO method (TR-PMO) is developed. When the sample points are
scarce, TR-PMO method shows comparable or better performance than both TRMPS
and OMID. The proposed approaches are successfully applied to an airfoil design
problem. Note that TR-PMO provides a method to improve the performance of PMO.
Other space reduction-based methods that improve the searching ability for partial
metamodel optimization can also be employed to modify the PMO method. In next
chapter, causal relations will be employed to help with dimension reduction.
53
Chapter 4. Dimension reduction method employing causal relations
Many dimension reduction methods have been developed to reduce the dimensionality
of large-scale problems. Those strategies usually consider design problems as a black-
box functions. However, practitioners usually have certain knowledge of their problem. In
this chapter, a method leveraging causal graph and qualitative analysis is developed to
reduce the dimensionality of the problem by systematically modeling and incorporating
the knowledge about the design problem into optimization. Causal graph is created to
show the input-output relationships between variables. A qualitative analysis algorithm
using design structure matrix (DSM) is developed to automatically find the variables
whose values can be determined without resorting to optimization. According to impact
of variables, one problem is divided into two sub-problems, the optimization problem with
respect to the most important variables, and the other with variables of lower
importance.
4.1. Dimension reduction method description
A causal relationship assisted dimension reduction method is developed in this section.
By building a causal graph, input-output relations between variables in the numerical
model are illustrated. Dimensional analysis with qualitative analysis provides a method
to detect source of contradictions. This is supporting the reduction of dimensionality
before performing optimization. The DSM, constructed according to the causal graph, is
employed to automatically find the variables leading to no contradiction. Calculation of
the impact of variables helps to divide the optimization problem into sub-problems. The
following sub-sections describe the proposed method in detail.
4.1.1. Overall process
The overall process of our proposed dimension reduction method includes constructing
causal graph, performing qualitative analysis, removing variables, calculating the weight
54
of each link, simplifying causal graph and performing two-stage optimization. The steps
of the method are described as follows.
Step 1. Construct a causal graph based on cause-effect relationship. A causal graph is
an oriented graph showing the causal relations between variables. Figure 4-1 is an
example of a causal graph. In the graph, the nodes represent variables, the arrows give
the input-output relations and labels “+1” and “-1” represent how the input influences the
output. For example, the “+1” on the arrow from A to C means that C increases when A
increases. It should be noted that the input and output in one link should have monotonic
relations. This can be achieved by defining design space carefully. Additionally, the more
elaborate the causal graph is, the simpler the causal relations will be in each link and the
easier it will be to achieve monotonic relations for each link. Also, values of the
variables in an engineering problem are usually larger than zero, which helps to avoid
non-monotonic links to some extents.
Reference [142] gives a process of constructing a causal graph. First, all the
fundamental variables are listed and located in a functional structure that represents the
functional flow of the system. Then, the causal rules are employed to define the causality
for each variable. Finally, all the variables are linked together to form the causal graph.
The causal graph should not miss important links and this requirement can be satisfied if
the designers are familiar with the design problem. Once the causal graph is
constructed, the links in the causal graph can be checked by giving a perturbation on
each design variable. If the causal graph can reflect the changes on each intermediate
variable and objective, the causal graph can be regarded as correct.
A
B
C
D F
+1
-1+1
+1
+1
E+1+1
Figure 4-1: Causal graph example.
Step 2. Perform a qualitative analysis. The causal graph is used to detect variables with
or without contradictions. By multiplying the labels on the arrows of one route, the
relation between the input and the final output of the route can be detected. For the
55
example in Figure 4-1, if multiplying “+1” on the arrow from A to C by the “+1” on the
arrow from C to F, the multiplication result is “+1”, which means that F is monotonically
increasing with respect to A. If all the relations between design variables and the
objectives are calculated, contradictions of the variable can be found according to the
multiplications. In Figure 4-1, A influences F via C or D (i.e., -1 via D and +1 via C). A is
generating a contradictory influence on F if we consider both routes and multiplications.
On the other hand, F is a monotonically increasing function with respect to B no matter if
it traverses through D or E. Therefore, A has a contradiction and B is a variable without
contradictions. The vector of design variables is represented as 𝒙 . After qualitative
analysis, the design variables can be divided into two parts, variables with contradictions
( 𝒙𝑐 ) and variables without contradictions ( 𝒙𝑢𝑐 ). This qualitative analysis can be
performed by checking the causal graph manually. However, in optimization, all the
steps are desired to be executed automatically. Thus, a DSM based qualitative analysis
method is proposed to fulfill the requirement and will be described in Section 4.1.2 in
more detail.
Step 3. Determine values of 𝒙𝑢𝑐 and remove them from the causal graph and the set of
design variables. After qualitative analysis, the way design variables in 𝒙𝑢𝑐 influence the
objective can be confirmed. Thus, 𝒙𝑢𝑐 can be regarded as a constant variable set at its
lower (or upper) bounds. Taking Figure 4-1 as an example, B has no contradiction and
thus decrease of B leads to the decrease of the objective F. If a minimum F is desired, B
should be set at its lower bound value. Thus, the optimal value of B is determined before
optimization. Now the design variable set of the optimization problem becomes 𝒙𝑐 only.
Variables without contradictions can thus be removed from the causal graph and from
the optimization variable set.
Step 4. Calculate the weight of each link. The causal graph can be further simplified by
considering the weight of each link. In this step, the Taguchi method is used to calculate
the weight of each link [142]. Section 4.1.3 describes the approach in detail. Before
calculating the weights, the range of every variable, including design variables and state
variables is required. There exist two methods to determine those ranges. First, for
engineering problems, the recommended range of variables can be found from
references and it can be used in the sensitivity analysis. Second, the range can be
determined by sampling a certain number of random points and calculating the
56
responses of the samples. The maximal and minimal values can be used as the upper
and lower bound, respectively.
Step 5. Simplify the causal graph according to calculated weights. The link whose weight
is lower than a threshold is regarded as a low importance link and is removed from the
causal graph. The threshold can be selected according to the weights obtained from
Step 4. The main principle of threshold selection is that the threshold should not be too
high to miss important links, nor too low to be ineffective. A higher threshold will let one
regard more variables as unimportant variables then may increase the number of
iterations in the two-stage optimization, and removing more variables from the important
variable set may reduce the accuracy of the optimization. Generally, the threshold is no
larger than 15%. In this thesis, 10% is selected based on the different case studies that I
have tested during the development of the approach. It provides a good balance
between number of iterations and accuracy of the optimization. The value can be
adjusted if needed.
After removing those deemed less important links, some variables may not affect the
objective at all. Those removed variables are represented by 𝒙𝑢𝑛. On the other hand,
contradictions of some variables (represented by 𝒙𝑢𝑛𝑐) may disappear due to removal of
the less important links. Then, in a similar way to Step 3, values of such variables can be
determined according to the qualitative analysis. Thus, the design variables 𝒙𝑐 (variables
with contradictions) can be divided into two parts, the kept variables with contradictions
𝒙𝑘𝑒, and less important variables 𝒙𝑟𝑒, which includes both 𝒙𝑢𝑛 and 𝒙𝑢𝑛𝑐.
Step 6. Use a two-stage optimization process to obtain the final optimal solution. The
original optimization problem is divided into two sub-problems: one with respect to 𝒙𝑘𝑒
and the other with respect to 𝒙𝑟𝑒.
Then the two optimization problems are optimized separately. Results of the two
optimization problems are combined together to form the final optimal solution. Details of
the two-stage optimization process are shown in Section 4.1.4.
57
4.1.2. Qualitative Analysis based on design structure matrix
The qualitative analysis process is designed to find the variables without contradictions
and it has to be executed automatically for optimization. Thus, a novel design structure
matrix (DSM)-based qualitative analysis method is developed in this section.
In the DSM-based qualitative analysis, two matrices, [A] and [A1] are built according to
the causal graph, where [A] shows the input-output relations between each pair of
variables and [A1] gives [A] and direction of each link. For both [A] and [A1], the first
rows (columns) refer to design variables, the last row (column) is for the objective, and
the intermediate variables are in between. [A] and [A1] are n-by-n matrices, where n is
the number of entities including design variables, intermediate variables, and the
objective. For convenience and consistency with DACM nomenclature, I refer to these
entities as variables with the understanding that the global objective is located in the last
row (column) for both [A] and [A1]. Matrix [A] uses “1” to represent the links between two
variables. If 𝑖 is the input of 𝑗, then 𝑎𝑖𝑗 = 1; otherwise, 𝑎𝑖𝑗 = 0. In matrix [A1], the number
“+1” and “-1” are used to represent the relationship between the input and output. If
variable 𝑗 decreases with 𝑖 increasing, 𝑎𝑖𝑗 = −1; otherwise, 𝑎𝑖𝑗 = +1. I assume that the
optimization problem is a single objective problem. Thus, the last column of [A] and [A1]
shows the objective and its direct inputs. It is also possible to consider a multi-objective
problem. By checking the absolute values of elements in the last column of [A1] and [A],
variables without contradictions can be detected. Details of DSM-based qualitative
analysis method are shown as follows. Section 4.1.5 gives a numeric example for better
explanations.
Assume that the number of design variables is 𝑛𝑉𝑎𝑟 and the number of intermediate
variables is 𝑛𝐼𝑛𝑡, then 𝑛 = 𝑛𝑉𝑎𝑟 + 𝑛𝐼𝑛𝑡 + 1, for a single objective problem.
Step 1. Find coupled variables. In practice, if a “1” appears under the diagonal in DSM,
one can recognize that there is a feedback link. However, these feedback links do not
necessarily represent a loop and those links that do not represent loops should be
moved above the diagonal to simplify the DSM. This can be accomplished via a simple
strategy modifying [A]. The rows of [A] are checked one-by-one. If there is a “1” element
under the diagonal, i.e., 𝑎𝑖𝑗 = 1, 𝑖 > 𝑗, variable j is re-ordered to be before 𝑖. The number
of “1” elements under the diagonal (i.e., 𝑛𝑓) is counted after one movement, which is
58
compared with the smallest number of “1”s under the diagonal (𝑛𝑓∗ ). If 𝑛𝑓 < 𝑛𝑓∗ ,
𝑛𝑓∗ = 𝑛𝑓 and the sequence of the variables is recorded. If the value of 𝑛𝑓∗ does not
change for a given time (i.e., 5 iterations), the modification will stop and the sequence
with the smallest number of feedbacks is used to reconstruct [A] and [A1] and to obtain
[A’] and [A1’]. The location of the “1” element under the diagonal is used to give the
coupled variables. For example, if 𝑎𝑖𝑗 = 1 and 𝑖 > 𝑗, then variables 𝑖 and 𝑗 are coupled.
The coupled variables are stored in a 2-by-𝑛𝑓∗ matrix 𝐹𝐵 , each column of which is
shown as one pair of coupled variables.
Step 2. Calculate the number of links in the longest route. To detect the contradictions of
the 𝑖-th design variable, all the routes from 𝑖-th design variable to the objective should be
identified. The longest route should be determined and the other shorter routes are
checked at the same time. In some cases, the coupling relations make it difficult to find
the longest route because of the presence of the feedback loop. Thus the longest route
is considered in this thesis to be obtained by going through each feedback link once and
only once.
The number of links in the longest route contains two parts: one is the number of links
(𝑛𝑁𝑜𝐶 ) in the longest route without feedbacks and others are the number of links
(𝑛𝐶𝑖, 𝑖 = 1,2, . . , 𝑛𝑓∗) in all loops. Summing 𝑛𝑁𝑜𝐶 and 𝑛𝐶𝑖, 𝑖 = 1,2, . . , 𝑛𝑓
∗ together, the final
number of links (𝑛𝑀𝑎𝑥) can be obtained as follows
𝑛𝑀𝑎𝑥 = 𝑛𝑁𝑜𝐶 +∑ 𝑛𝐶𝑖𝑛𝑓∗
𝑖=1 (4-1)
To count the number of links in the route without feedbacks, the “1” elements under the
diagonal in [A’] are turned to “0” to obtain matrix [Anoc]. Reference [149] used the
number of multiplications to represent the links between two variables. It reported that
multiplying matrix [A] by itself 𝑘 times and if 𝑎𝑖,𝑛 is non-zero, it means that variable 𝑖
influences the objective through a route with 𝑘 + 1 links. [Anoc] is multiplied by itself,
and when the objective column contains non-zero elements, the number of multiplication
(𝑚𝑁𝑜𝑐) is recorded. After multiplying 𝑛 − 1 times, the largest 𝑚𝑁𝑜𝑐 gives the number of
links, i.e., 𝑛𝑁𝑜𝐶 = 𝑚𝑁𝑜𝑐 + 1.
After multiplying a DSM matrix itself several times, if non-zero elements exist in the
diagonal of the DSM matrix, it means there is at least one loop in the problem and a non-
59
zero element means that this variable goes through the loop once and back to itself.
Therefore, once a loop has been detected, by counting the times of multiplications
before non-zero elements appear in the diagonal, the number of links in the loop can be
identified [149].
For the 𝑖-th coupling loop, the two coupled variables are 𝐹𝐵1,𝑖 and 𝐹𝐵2,𝑖. The variables
from 𝐹𝐵1,𝑖 to 𝐹𝐵2,𝑖 are used to construct a small DSM with one coupling loop, [𝐶𝑖] .
Between 𝐹𝐵1,𝑖 and 𝐹𝐵2,𝑖, there are 𝑛𝐿 = 𝐹𝐵2,𝑖 − 𝐹𝐵1,𝑖 links. The matrix [𝐶𝑖]. is multiplied
by itself 𝑛𝐿 + 1 times, and when 𝑐1,1 = 𝑐𝑛𝐿+1,𝑛𝐿+1 = 1, the number of multiplication (𝑚𝐶𝑖)
is recorded. After 𝑛𝐿 + 1 times of multiplication, the largest 𝑚𝐶𝑖 gives the number of links
in the coupling loop, i.e., 𝑛𝐶𝑖 = 𝑚𝐶𝑖 + 1.
Step 3. Find the variables without contradictions. After obtaining the number of links
𝑛𝑀𝑎𝑥, matrices [A’] and [A1’] are multiplied by themselves (𝑛𝑀𝑎𝑥 − 1) times to check
the contradictions.
In general, at the kth multiplication, if 𝑎𝑖,𝑛𝑘 (𝑖 = 1,… , 𝑛𝑉𝑎𝑟) is non-zero, which means that
variable 𝑖 has impact on the objective through 𝑘 + 1 links, the absolute value in the
objective column |𝑎𝑖,𝑛𝑘 | and |𝑎1𝑖,𝑛
𝑘 | are compared. The value |𝑎𝑖,𝑛𝑘 | shows that there are
|𝑎𝑖,𝑛𝑘 | routes from variable 𝑖 to the objective, which contain (𝑘 + 1) links. If |𝑎𝑖,𝑛
𝑘 | ≠ |𝑎1𝑖,𝑛𝑘 |,
it means that there is at least one route through which the objective changes in
directions as compared with other routes. Thus the variable 𝑖 has contradictions. If
|𝑎𝑖,𝑛𝑘 | = |𝑎1𝑖,𝑛
𝑘 | for all the multiplications, the sign of non-zero 𝑎1𝑖,𝑛𝑘 in every multiplication
is checked. If the signs of 𝑎1𝑖,𝑛𝑘 are different, that means through different routes, the
objective changes in directions. Therefore, the variable i has contradictions. Otherwise, if
the sign of 𝑎1𝑖,𝑛𝑘 are the same for all the multiplications, variable 𝑖 has no contraction.
After multiplying [A’] and [A1’] by themselves 𝑛𝑀𝑎𝑥 times, variables without
contradictions can be picked out. The sign of 𝑎1𝑖,𝑛𝑘 indicates the relations between
variable 𝑖 and the objective. Assuming that the objective is to be minimized, if the sign of
𝑎1𝑖,𝑛𝑘 is “+”, it means that the variable 𝑖 should be set at the lower bound value;
otherwise, the upper bound value should be selected.
60
4.1.3. Weight calculation
For the impact of each design variable on the objective is different, a practitioner often
focuses on important variables. By selecting important variables, the problem
dimensionality can be reduced further. Thus, the weight of each link in the causal graph
is calculated in the proposed method, and the original optimization problem is divided
into two sub-problems according to weights of the links. Several methods have been
developed to calculate the weights, including analysis of variances (ANOVA) [82],
principle component analysis (PCA) [81], and so on. Taguchi method [150], [151], one of
the design of experiment tools, offers a simple and systematic approach to calculate the
impact of each input on the output. In this thesis, a two-level Taguchi approach to
compute impact is selected to calculate the weight of each link Assume that an equation
is as follows.
𝑦 = 𝑓(𝒙), 𝑥 = {𝑥1, 𝑥2, … , 𝑥𝑡} (4-2)
There are 𝑡 inputs of 𝑦 in Eq. (4-2); 𝑦 can represent for example an intermediate variable
and 𝑥𝑖 represent the variables influencing 𝑦 . First, the sample points are generated
according to the Taguchi orthogonal arrays. In this thesis, it is assumed the boundary of
design variables and the intermediate variables are appropriately selected so that the
output is monotonic or nearly monotonic with respect to each input. Therefore, a two-
level Taguchi design has the capability to capture the impact of each input. The two-level
Taguchi orthogonal array selected is shown in Table 4-1.
Table 4-1: The Taguchi orthogonal array for t=7.
Experiment number
Column
1 2 3 4 5 6 7
1 1 1 1 1 1 1 1 2 1 1 1 2 2 2 2 3 1 2 2 1 1 2 2 4 1 2 2 2 2 1 1 5 2 1 2 1 2 1 2 6 2 1 2 2 1 2 1 7 2 2 1 1 2 2 1 8 2 2 1 2 1 1 2
“1” means that the variable takes the value of the lower bound and “2” the upper bound.
For the equation shown in Eq. (4-3), eight sample points are generated according to
Table 4-1 and the responses are calculated at each sample. The symbol 𝑖 represents
61
the columns of the table. The effect of 𝑥𝑖 (𝑖 = 1,2,… , 𝑛𝑉𝑎𝑟) to 𝑦 can be calculated as
follows.
𝐸𝑓𝑓𝑒𝑐𝑡𝑥𝑖−𝑦 =∑ 𝑦𝐿𝑒𝑣𝑒𝑙 2𝑓𝑜𝑟 𝑥𝑖 𝑎𝑡 𝑙𝑒𝑣𝑒𝑙 ℎ𝑖𝑔ℎ𝑗=1 𝑡𝑜 𝑚
𝑚/2−∑ 𝑦𝐿𝑒𝑣𝑒𝑙 1𝑓𝑜𝑟 𝑥𝑖 𝑎𝑡 𝑙𝑒𝑣𝑒𝑙 𝑙𝑜𝑤𝑗=1 𝑡𝑜 𝑚
𝑚/2 (4-3)
where, m is the number of experiments. In this case, 𝑚 = 8 . Then, the effect is
normalized by Eq. (4-4) and the normalized effect is the weight of link 𝑥𝑖 to 𝑦.
𝑊𝑒𝑖𝑔ℎ𝑡 𝑥𝑖−𝑦 =𝐸𝑓𝑓𝑒𝑐𝑡𝑥𝑖−𝑦
∑ 𝐸𝑓𝑓𝑒𝑐𝑡𝑥𝑘−𝑦𝑛𝑘=1
(4-4)
For problems of variables equal to or less than seven, as shown in Table 4-1, only eight
samples are needed. One sample corresponds to one system analysis, or a complete
simulation of the whole system. With these samples, one can perform the Taguchi
computation of weights for each link, and thus the added cost of function evaluations is
eight. For problems of larger scale, the added expense is determined by the specific
orthogonal array that one chooses. The size of the orthogonal array is dependent on the
number of variables in the problem. Note that although only the influence of each single
input is calculated, the cross effect of the inputs are considered in Taguchi method
because the sampling array is designed to consider the cross effect by employing as
small number of sample points as possible.
Each link represents the influence of a variable on another variable and instead of
calculating the importance of a variable to the final objective, the weight of every single
link in the causal graph is estimated. By removing the links with low importance, the
causal graph can be simplified. Then, another qualitative analysis is performed on the
simplified causal graph to find variables without contradictions as well as variables that
have no links to the objective. Optimal values of these variables can thus be determined
and removed from the set of important optimization variables.
4.1.4. Two-stage optimization process
After the second simplification, design variables are divided into two parts, the important
variables 𝒙𝑘𝑒 and the less important variables 𝒙𝑟𝑒. Then, two optimization problems are
constructed shown as Eqs. (4-5) & (4-6) and optimized sequentially.
Problem 1:
62
𝑓𝑖𝑛𝑑 𝒙𝑘𝑒
min𝑓(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐)
𝑠. 𝑡. 𝑔(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐) ≤ 0
𝒙𝑘𝑒𝑙𝑏 ≤ 𝒙𝑘𝑒 ≤ 𝒙𝑘𝑒
𝑢𝑏
(4-5)
Problem 2:
𝑓𝑖𝑛𝑑 𝒙𝑟𝑒
min𝑓(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐)
𝑠. 𝑡. 𝑔(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐) ≤ 0
𝒙𝑟𝑒𝑙𝑏 ≤ 𝒙𝑟𝑒 ≤ 𝒙𝑟𝑒
𝑢𝑏
(4-6)
In both problems, 𝒙𝑢𝑐 is fixed at the value determined by the qualitative analysis. When
optimizing Problem 1, 𝒙𝑟𝑒, 𝒙𝑢𝑛𝑐 are fixed at the determined values, while the value of 𝒙𝑘𝑒
is fixed at the optimal solution obtained from Problem 1 when optimizing Problem 2. In
the following tests, MATLAB function fmincon(.) is employed to solve the optimization
problem. Other optimization methods can also be used to solve the sub-problems. The
stopping criterion is checked at the end of each problem. If the optimum is not found yet
after optimizing problem 2, the sequential optimization process will be performed again.
For the purpose of comparing the efficiency with other methods, one need to fix the
quality of the solution. Therefore, if the relative difference between the optimal results
from Problem 1 (or Problem 2) 𝑓1∗ (𝑜𝑟𝑓2
∗) and the given optimal results 𝑓∗ is less than a
given tolerance (i.e., 10−4), the optimization process terminates. The relative difference
is defined as follows.
휀 =|𝑓∗ − 𝑓1
∗|
𝑓∗ (4-7)
The sequential optimization method may be stuck into a sub-optimum when dealing with
multimodal problems. In the proposed decomposition method, it should be noted that the
unimportant variables include two categories, the variables without contradictions and
the variables having less impact on the objective. For the variables without
contradictions, the optimal solution can be accurately determined according to the
qualitative analysis results. By separating the rest of variables into unimportant and
important variables using knowledge also reduces the risks of falling into a sub-optimum.
63
4.1.5. Numerical example
A simple numerical problem is employed in this section to explain how the proposed
method works. The expression of the problem is as follows.
𝐹𝑖𝑛𝑑 𝒙 = [𝐴, 𝐸, 𝐻, 𝐼] min𝐺 = 10𝐷𝐹−2 + 100𝐶2 𝑤ℎ𝑒𝑟𝑒, 𝐹 = 2𝐶1.8𝐷−2𝐸−2.2𝐻2.5 𝐷 = 2𝐼−1.5 − 𝐶4 𝐶 = 0.5𝐸0.3𝐵−1.2 𝐵 = 2𝐴𝐷 𝑠. 𝑡. 1 ≤ 𝐴, 𝐸, 𝐻, 𝐼 ≤ 2
(4-8)
Step 1. The causal graph (Figure 4-2) is constructed according to Eq. (4-8). The design
variables are drawn at the left side and the objective is located at the right side. As
shown in the causal graph, a coupling loop involved B, C and D exists in the problem.
The labels “+1” and “-1” are assigned above the arrows according to each equation.
Take the equation 𝐶 = 0.5𝐸0.3𝐵−1.2 as an example, C increases when E increases or
when B decreases. Thus, a “+1” is located above the arrow from E to C and a “-1” is
added above the arrow from B to C.
A B
I
E
H
D
C
F
G
+1
-1
+1
-1
+1
+1
-1
+1
-1+1
+1
-1 -1
Figure 4-2: Causal graph of a numerical example.
Step 2. Qualitative analysis based on design structure matrix is performed to find the
design variables without contradictions. The two matrices [A] and [A1] are constructed
as shown in Table 4-2 and Table 4-3. The first four columns refer to design variables
and the last column G shows the objective. For example, B is the output of A and the
input of C, the elements (A, B) and (B, C) are “1” in [A]. The labels “+1” and “-1” above
the arrows in the causal graph are used to construct [A1]. The process of DSM-based
qualitative analysis is presented step-by-step.
64
Table 4-2: Matrix [A] for the numerical example.
A E H I B C F D G
A 0 0 0 0 1 0 0 0 0
E 0 0 0 0 0 1 1 0 0
H 0 0 0 0 0 0 1 0 0
I 0 0 0 0 0 0 0 1 0
B 0 0 0 0 0 1 0 0 0
C 0 0 0 0 0 0 1 1 1
F 0 0 0 0 0 0 0 0 1
D 0 0 0 0 1 0 1 0 1
G 0 0 0 0 0 0 0 0 0
Table 4-3: Matrix [A1] for the numerical example.
A E H I B C F D G
A 0 0 0 0 +1 0 0 0 0
E 0 0 0 0 0 +1 -1 0 0
H 0 0 0 0 0 0 +1 0 0
I 0 0 0 0 0 0 0 +1 0
B 0 0 0 0 0 -1 0 0 0
C 0 0 0 0 0 0 +1 -1 +1
F 0 0 0 0 0 0 0 0 -1
D 0 0 0 0 +1 0 -1 0 +1
G 0 0 0 0 0 0 0 0 0
Step 2.1. The coupled variables are found in this step. As shown in Figure 4-2, there is
one loop involving B, C, and D. In Table 4-2, non-zero elements exist under the diagonal
(boldfaced). For detecting the loop, the sequence of the columns in matrix [A] is changed.
First, the element (D, B) is detected and to remove the “1” under the diagonal, variable D
is moved to the front of B. Then, the modified [A] (named as [A’]) is shown in Table 4-4,
and the number of “1”s under diagonal is one in the modified matrix, i.e., 𝑛𝑓∗ = 1 .
Repeating this step five times, the 𝑛𝑓∗ does not change during the repeating process.
Thus, the new sequence of the variables and objective is
𝑆𝑒𝑞 = [𝐴, 𝐸, 𝐻, 𝐼, 𝐷, 𝐵, 𝐶, 𝐹, 𝐺] (4-9)
The modified [A’] and [A1’] are listed in Table 4-4 and Table 4-5. There is one loop (“1” in
boldface) detected through this step and the pair of the coupled variables are D and C.
65
Table 4-4: Modified matrix [A’] for the numerical example.
A E H I D B C F G
A 0 0 0 0 0 1 0 0 0
E 0 0 0 0 0 0 1 1 0
H 0 0 0 0 0 0 0 1 0
I 0 0 0 0 1 0 0 0 0
D 0 0 0 0 0 1 0 1 1
B 0 0 0 0 0 0 1 0 0
C 0 0 0 0 1 0 0 1 1
F 0 0 0 0 0 0 0 0 1
G 0 0 0 0 0 0 0 0 0
Table 4-5: Modified matrix [A1’] for the numerical example.
A E H I D B C F G
A 0 0 0 0 0 1 0 0 0
E 0 0 0 0 0 0 1 -1 0
H 0 0 0 0 0 0 0 1 0
I 0 0 0 0 1 0 0 0 0
D 0 0 0 0 0 1 0 -1 1
B 0 0 0 0 0 0 -1 0 0
C 0 0 0 0 -1 0 0 1 1
F 0 0 0 0 0 0 0 0 -1
G 0 0 0 0 0 0 0 0 0
Step 2.2. The number of links in the longest route is counted in this step. The “1”
element in (C, D) in matrix [A’] is turned to be “0” to construct the matrix [Anoc] (as
shown in Table 4-6). [Anoc] is multiplied by itself eight times and at the first, second,
third and fourth time, the objective column contains non-zero elements. Thus 𝑛𝑁𝑜𝐶 = 5.
Table 4-6: Matrix [Anoc] for the numerical example.
A E H I D B C F G
A 0 0 0 0 0 1 0 0 0
E 0 0 0 0 0 0 1 1 0
H 0 0 0 0 0 0 0 1 0
I 0 0 0 0 1 0 0 0 0
D 0 0 0 0 0 1 0 1 1
B 0 0 0 0 0 0 1 0 0
C 0 0 0 0 0 0 0 1 1
F 0 0 0 0 0 0 0 0 1
G 0 0 0 0 0 0 0 0 0
In the example, only one coupling exists, so one matrix [C] is built. As shown in Table
4-4, “1” showing the feedback appears in the element (C, D). Thus, variables from D to
C in Table 4 (i.e., D, B, and C) are used to construct the matrix [C], which is shown in
Table 4-7. In matrix [C], the two coupled variables, D and C are located at the first and
66
third column, so 𝑛𝐿 = 3 − 1 = 2. [C] is multiplied by itself three times and at the second
multiplication, 𝑐1,1 = 𝑐𝑛𝐿+1,𝑛𝐿+1 = 1, which means 𝑛𝐶 = 3. Thus, the total number of links
(nMax) in the longest route is 𝑛𝑀𝑎𝑥 = 𝑛𝑁𝑜𝑐 + 𝑛𝐶 = 5 + 3 = 8. As one can see from
Figure 4-2, the longest path is A-B-C-D-B-C-D-F-G with eight steps.
Table 4-7: Matrix [C] for the numerical example.
D B C
D 0 1 0
B 0 0 1
C 1 0 0
Step 2.3. The variables without contradictions are detected in this step, because
𝑛𝑀𝑎𝑥 = 8, [A’] and [A1’] are multiplied by themselves seven times. Table 4-8 illustrates
that for the four design variables, the values in the objective column in [A’] and [A1’] after
each multiplication.
Table 4-8: Element values in the objective column in [A’] and [A1’].
Multipli-cation
No. 1 2 3 4 5 6 7
G G1 G G1 G G1 G G1 G G1 G G1 G G1
A 0 0 1 -1 2 2 1 1 1 -1 2 2 1 1 E 2 2 2 -2 1 -1 1 1 2 -2 1 -1 1 1 H 1 -1 0 0 0 0 0 0 0 0 0 0 0 0 I 1 1 1 1 1 -1 2 2 1 1 1 -1 2 2
In Table 4-8, G is the element value in the objective column in [A’] while G1 gives the
values in the objective column in [A1’] in every multiplication. As shown in Table 4-8, the
absolute values of G and G1 are the same for every variable in every multiplication.
Now if checking the first and second multiplication for variable E, it can be found that in
the first multiplication, G1 has the sign “+” while in the second multiplication, G1’s sign is
“-”, which means that through two steps the increasing of E can increase G while the
increasing of E may decrease G when going another route with three links. Therefore, a
contradiction exists in variable E. The same applies to A and I when checking the
second and the third multiplications. On the other hand, for H, only at the first
multiplication, the values in (H, G) and (H, G1) is non-zero and the absolute values are
the same, which means that H only influences objective G through one route with two
links. Therefore, H has no contradiction. Thus, A, E and I are variables containing
contradictions while H is without contradictions, i.e., 𝒙𝑢𝑐 = 𝐻 and 𝒙𝑐 = [𝐴, 𝐸, 𝐼].
67
Step 3. After qualitative analysis, one variable (H) is found without contradictions. Since
the sign of element (𝐻, 𝐺1) is “-”, H should be set at the upper bound value because G is
to be minimized. The other three variables (A, E and I) will go through the next steps.
Step 4. The weight of each link is calculated using the Taguchi method. The objective
function 𝐺 = 10𝐷𝐹−2 + 100𝐶2 is taken as an example to show the process. Because D,
F and C are intermediate variables, their ranges are decided by calculating the
responses of 50 random sample points. In this case, the ranges of D, F and C are
[0.7140, 1.9848], [0.0026, 0.8635], and [0.0525, 0.3388]. Next, the Taguchi table is
constructed as shown in Table 4-9 and the response of each sample is constructed.
Table 4-9: Taguchi sampling table of objective function.
Experiment number
Inputs G
C D F
1 0.0525 0.714 0.0026 1056213 2 0.0525 0.714 0.0026 1056213 3 0.0525 1.9848 0.8635 26.89465 4 0.0525 1.9848 0.8635 26.89465 5 0.3388 0.714 0.8635 21.05431 6 0.3388 0.714 0.8635 21.05431 7 0.3388 1.9848 0.0026 2936106 8 0.3388 1.9848 0.0026 2936106
By calling Eq. (4-3) and (4-4), the weight of the three inputs links (i.e., D to G, F to G and
C to G) are 24.2%, 51.6%, and 24.2% respectively. Using the same method to calculate
the weight of all the links and using the weight to replace the “1” element in matrix [A],
the weighted matrix [Aw] is constructed as shown in Table 4-10.
Table 4-10: Weighted matrix [Aw] for numerical example.
A E H I B C F D G
A 0 0 0 0 15.0% 0 0 0 0
E 0 0 0 0 0 3.6% 25.5% 0 0
H 0 0 0 0 0 0 25.6% 0 0
I 0 0 0 0 0 0 0 96.1% 0
B 0 0 0 0 0 96.4% 0 0 0
C 0 0 0 0 0 0 27.3% 3.9% 24.2%
F 0 0 0 0 0 0 0 0 51.6%
D 0 0 0 0 85.0% 0 21.6% 0 24.2%
G 0 0 0 0 0 0 0 0 0
68
Step 5. The causal graph is simplified according to the weight and the variable sets, 𝒙𝑘𝑒
and 𝒙𝑟𝑒 are detected in this step. In this case, the threshold is selected as 10%.
Comparing the weights of each link with the threshold, the links E -> C and C -> D are
removed and the simplified causal graph is shown in Figure 4-3.
A B
I
E
H
D
C
F
G
+1
-1
-1
+1
+1
-1
+1
-1+1
+1
-1
Figure 4-3: Simplified causal graph for the numerical example.
From Figure 4-3, it can be found that the coupling loop is decoupled because the link
between C and D is cut. In the simplified graph, the variables without contradictions are
detected through qualitative analysis. In this case, variable E is found without
contradictions and the objective G decreases with E decreasing. Therefore, the kept
variables needed to be optimized 𝒙𝑘𝑒 = [𝐴, 𝐼] and the less important variable 𝑥𝑟𝑒 = 𝐸.
Step 6. The two-stage optimization problem is constructed as follows.
Problem 1
𝑓𝑖𝑛𝑑 𝒙𝑘𝑒 = [𝐴, 𝐼] min𝐺 = 𝑓(𝒙𝑘𝑒 , 𝐸, 𝐻) 1 ≤ 𝒙𝑘𝑒 ≤ 2 𝑤ℎ𝑒𝑟𝑒, 𝐸 = 1,𝐻 = 2
(4-10)
Problem 2
𝑓𝑖𝑛𝑑 𝑥𝑟𝑒 = 𝐸 min𝑓(𝑥𝑟𝑒 , 𝒙𝑘𝑒 , 𝐻) 1 ≤ 𝑥𝑟𝑒 ≤ 2 𝑤ℎ𝑒𝑟𝑒, 𝒙𝑘𝑒 = [𝐴
∗, 𝐼∗], 𝐻 = 2
(4-11)
The optimal value 𝑓∗ = 7.9735 is set to be the stopping criterion value. Matlab function
fmincon(.) is employed to perform the optimization and the results are shown in Table
4-11. The starting point of the original problem is randomly generated in the design
69
space. For the two-stage optimization problem, the starting point of the two stages is the
same as that in the original problem. For example, if the starting point in the original
problem is 𝑥0 = [𝐴0, 𝐼0, 𝐸0, 𝐻0] = [1.2,1.3,1.4,1.5], then the starting point for Problem 1 will
be 𝒙𝑘𝑒,0 = [𝐴0, 𝐼0] = [1.2,1.3] and the starting point for Problem 2 will be 𝑥𝑟𝑒,0 = 𝐸0 = 1.4.
Since the optimal value is reached at the first stage optimization, the second stage
optimization is not run in this case. 𝑓∗ is the optimal value and SA stands for system
analysis. The optimization is repeated 11 times so the median is an actual tested value.
The median number of SA is shown in Table 4-11. The optimal value and the optimal
points are the results in the run with the median number of SA.
Table 4-11: Optimization results of the original problem and decomposed problem.
𝑥∗ 𝑓∗ # of SA Variance of # of SA
Original [1.365,1,2,2] 7.9735 91 [60,131] Decomposed [1.365,1,2,2] 7.9735 41 [38,48]
As shown in Table 4-11, the number of system analysis for the two-stage optimization is
41, including eight system analyses in weight calculation, which is 45% of the number of
analysis used in optimization of the original problem. That is because the four-
dimensional problem is reduced to a two-dimensional problem.
To test the influence of the threshold, the threshold is selected as 15%. Then, link A -> B
is removed from Figure 4-3 as well, which means A has less impact on the final
objective. Thus, the kept variables 𝑥𝑘𝑒 = 𝐼 and the less important variable 𝒙𝑟𝑒 = [𝐴, 𝐸].
Using fmincon(.) function to optimize the decomposed problem, the results are shown in
Table 4-12.
Table 4-12: Comparison of two thresholds (10% and 20%).
Threshold 𝑥∗ 𝑓∗ # of SA Variance of # of SA
10% [1.365,1,2,2] 7.9735 41 [38,48] 15% [1.365,1,2,2] 7.9735 55 [41,65]
As shown in Table 4-12, the number of SA when using 10% threshold is smaller than
that with 15% threshold. When using 10% as threshold, after optimizing the important
variable 𝒙𝒌𝒆 the optimum is reached and the optimization process is terminated.
However, when selecting 15% as threshold, optimizing 𝑥𝑘𝑒 cannot reach the target value
because one of the important variable A left as an important variable. As the result, the
unimportant variable 𝒙𝑟𝑒 needs to be optimized, which increase the number of SA.
70
Therefore, missing important variables will lead to more function calls. To avoid remove
important variables mistakenly, a smaller threshold is preferred, i.e., 10%.
4.2. Engineering case studies
4.2.1. Power converter design problem
A power converter design problem [152], [153] is used to test the performance of the
proposed dimension reduction methodology. The design problem has six design
variables, as shown in Table 4-13. The upper and lower bounds defined in [154] are
used in this thesis. The objective of the problem is to minimize the weight of the power
converter as shown in Eq. (4-12). The formulation of the problem is defined as follows
and all constant values are taken from [152].
min𝑦1 = 𝑊𝑐 +𝑊𝑤 +𝑊𝑐𝑎𝑝 +𝑊ℎ𝑠 (4-12)
where, 𝑊𝑐 = |𝐷𝐼𝑦6(𝑍𝑃1 + 𝑦7)| , 𝑍𝑃1 = 2(1 + 𝐾2)𝑥6 , 𝑊𝑤 = |(𝑋𝑀𝐿𝑇)(𝐷𝐶)𝑥2𝑥3| , 𝑋𝑀𝐿𝑇 =
2𝑥1(1 + 𝐾1)𝐹𝐶, 𝑊𝑐𝑎𝑝 = |𝐷𝐾5𝑥5|, and 𝑊ℎ𝑠 = |𝑃𝑂
𝐾𝐻(1
𝑦2− 1)|.
Electrical design state analysis duty cycle:
𝑦3 =𝐸𝑂
(𝑦2𝐸𝐼2(𝑋𝑁)
)
(4-13)
Minimum duty cycle:
𝑦4 =𝐸𝑂
(𝑦2𝐸𝐼𝑀𝐴𝑋2(𝑋𝑁)
)
(4-14)
Inductor resistance:
𝑦5 =𝑋𝑀𝐿𝑇𝑥2(𝑅𝑂)
𝑥3 (4-15)
Core cross-sectional area:
𝑦6 = 𝐾1𝑥12 (4-16)
Magnetic path length:
71
𝑦7 =𝜋
2𝑥1 (4-17)
Inductor value:
𝑦8 =(𝐸𝑂 + 𝑉𝐷)(1 − 𝑦3)
𝑦6𝑥2(𝐹𝑅) (4-18)
Loss design state analysis:
𝑦2 =𝑃𝑂
𝑃𝑄 + 𝑃𝐷 + 𝑃𝑂𝐹 + 𝑃𝑋𝐹𝑅 (4-19)
Table 4-13: Design variables in power converter design.
Variables Name Description Lower Bound Upper Bound
𝑥1 𝐶𝑤 Core center leg width (m) 0.001 0.1
𝑥2 𝑇𝑢𝑟𝑛𝑠 Inductor turns 1.0 10
𝑥3 𝐴𝑐𝑝 Copper size (m2) 7.29e-8 1.0e-5
𝑥4 𝐿𝑓 𝑃𝐼𝑁𝐷𝑈𝐶⁄ Inductance (H) 1.0e-6 1.0e-5
𝑥5 𝐶𝑓 Capacitance (F) 1.0e-5 0.01
𝑥6 𝑤𝑤 Core window width (m) 0.001 0.01
x1
x2
x3
x4
x5
x6
DELI
y6
XMLT
y7
ZP1
y5
CIRMS
XIRMS
XIMIN
XIP
ESRy3
PQ
PD
POF
y2
Wc
Ww
Wcap
Whs
y8
y1
+1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1+1
+1
+1+1
+1
+1
+1
+1
+1
+1
+1
Figure 4-4: Causal graph of the power converter problem.
The proposed dimension reduction method is employed to solve the six-dimensional
multidisciplinary design optimization problem. It is to be noted that this problem entails
mathematical expressions, which are used to build the causal graph as shown in Figure
72
4-4. In most engineering problems, one does not have equations and thus should use
their knowledge to construct a causal graph. By employing the qualitative analysis, it can
be found that all variables contain contradictions. To further simplify the causal graph,
the less important links are removed according to the weights and two-stage
optimization is constructed as shown in Eqs. (4-20) and (4-21).
Problem 1:
𝑓𝑖𝑛𝑑 𝒙𝑘𝑒 = [𝑥1, 𝑥2, 𝑥5]𝑇
min𝑦1 = 𝑓(𝒙𝑘𝑒 , 𝒙𝑟𝑒)
𝒙𝑘𝑒𝑙𝑏 ≤ 𝒙𝑘𝑒 ≤ 𝒙𝑘𝑒
𝑢𝑏
(4-20)
Problem 2:
𝑓𝑖𝑛𝑑 𝒙𝑟𝑒 = [𝑥3, 𝑥4, 𝑥6]𝑇
min𝑦1 = 𝑓(𝒙𝑟𝑒 , 𝒙𝑘𝑒)
𝒙𝑟𝑒𝑙𝑏 ≤ 𝒙𝑟𝑒 ≤ 𝒙𝑟𝑒
𝑢𝑏
(4-21)
When optimizing Problem 1 for the first time, the design variables 𝒙𝑟𝑒 are fixed at the
given value determined by the qualitative analysis. According to the previous qualitative
analysis results, the upper bounds of 𝑥3 and 𝑥4 and the lower bound of 𝑥6 should be
selected. In this case, 𝑥3 = 1𝑒−5, 𝑥4 = 1𝑒
−5, and 𝑥6 = 0.001.
The MATLAB function fmincon(.) is employed to optimize the two problems. The starting
point is generated randomly in the design space for the original problem. The starting
points for the problems 1 and 2 in the two-stage optimization are the same as the
starting point for the original problem. The original problem with six design variables is
optimized first. The optimal result of the original problem is used as the stopping criterion
for the two-stage optimization. Both optimizations are repeated 11 times and the median
number of SA and the optimal results in that run are shown in Table 4-14.
Table 4-14: Optimization results for the power converter problem.
𝑥∗ 𝑓∗ # of SA Variance of # of SA
Original [0.003,3.605,1e-5,1e-5,8e-5,0.001] 0.9864 887 [636,1442] Decomposed [0.003,3.586,1e-5,1e-5,1e-4,0.001] 0.9866 210 [186,557]
For the two-stage optimization, after optimizing Problems 1 and 2 once, the optimal
value reaches 0.9866. The number of SA in the two-stage optimization is 210 including
eight system analyses in sensitivity analysis, which is only 23% of that used in the
73
original optimization. The significant reduction of SA is due to the reduction of
dimensions. In the original problem, six design variables need to be optimized. Although
all the variables contain contradictions at the beginning, three variables with weak
contradictions are selected from the original design variable set in the second
simplification and the six-dimensional problem are divided into two lower-dimensional
problems with three variables each. The reduction of the dimensions improves
significantly the optimization efficiency. Although the description seems tedious, the
qualitative analysis and dimension reduction are automatically conducted using the
developed algorithm and code.
To illustrate the efficiency of the proposed method, the decomposed problem is
compared with the original problem with the same number of function evaluations. In this
case, the number of function calls is fixed at 250 for both problems. Note that for the
decomposed problem, the maximum number of function calls for Problem 1 is set as
250. If Problem 1 optimization is terminated before 250 function calls, Problem 2 will
continue to run to reach to 250 function evaluations. This test is also repeated 11 times
for both methods and the results are shown in Table 4-15.
Table 4-15: Comparison of optimization results with a fixed number of SA for the power converter problem.
𝑥∗ 𝑓∗ Variance of 𝑓∗ # of SA
Original [0.0028,3.618,9e-6,8e-6,1e-4,0.001] 1.0024 [0.9887,1.0222] 250 Decomposed [0.0030,3.384,1e-5,1e-5,1e-4,0.001] 0.9865 [0.9864,0.9893] 250
When fixing the number of function evaluations, optimizing decomposed problem can
obtain better results than optimizing the original problem. For the original problem, using
250 function calls cannot obtain the optimal results. However, for the decomposition
problem, Problem 1 usually needs about 200 SAs to find the optimal solution for the
important variables. Then, around 50 function evaluations are used in Problem 2 to
obtain the final optimal results. To summarize, using the proposed dimension reduction
method can help to achieve better results when the number of function evaluations is
fixed.
4.2.2. Aircraft concept design problem
The aircraft concept design problem [66] is used to test the performance of the proposed
method. There are ten design variables (listed in Table 4-16) and three coupled
74
disciplines (structure, aerodynamics, and propulsion). The objective of the problem is to
maximize the range computed by the Breguet equation. The causal graph is shown in
Figure 4-5. By employing the proposed method, it can be found that variable ℎ has no
contradiction and the upper bound of ℎ is desired. Then, the original problem is divided
into two optimization problems,
Problem 1:
𝑓𝑖𝑛𝑑 𝒙𝑘𝑒 = [𝑀, 𝑇, 𝑆𝑅𝐸𝐹 , 𝑡 𝑐⁄ ,Λ, 𝑥, 𝐶𝑓]𝑇
max𝑅(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐)
𝑠. 𝑡. 𝑔(𝒙𝑘𝑒 , 𝒙𝑟𝑒 , 𝒙𝑢𝑐) ≤ 0
𝒙𝑘𝑒𝑙𝑏 ≤ 𝒙𝑘𝑒 ≤ 𝒙𝑘𝑒
𝑢𝑏
(4-22)
Problem 2:
𝑓𝑖𝑛𝑑 𝒙𝑟𝑒 = [𝜆, 𝐴𝑅]𝑇
max𝑅(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐) 𝑠. 𝑡. 𝑔(𝒙𝑟𝑒 , 𝒙𝑘𝑒 , 𝒙𝑢𝑐) ≤ 0 𝒙𝑟𝑒
𝑙𝑏 ≤ 𝒙𝑟𝑒 ≤ 𝒙𝑟𝑒𝑢𝑏
(4-23)
75
R
V
L/D
WT/(WT-
WF)
SFC
D
WT
WT-WF
CD
WW
WE
WFWFW
CDmin
CL
k
Θ
Fo2
ESF
x
b/2λ
AR
t
Sref
t/c
Cf
Λ
T
M
h
+1
-1
-1
-1-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1+1
+1
+1+1
+1
+1
+1
+1
+1 +1+1
+1
+1
+1 +1
+1
+1
+1
+1 +1
+1
+1 +1
+1
+1
+1
+1
+1
+1
+1 +1
+1
+1
+1
+1
+1
+1
+1
Figure 4-5 Causal graph of the aircraft concept design problem
When optimizing Problem 1 at the first time, the design variables 𝒙𝑟𝑒 and 𝒙𝑢𝑐 are fixed at
the given values that are determined by the qualitative analysis. In this case, ℎ = 60000
and 𝐴𝑅 = 2.5. For 𝜆, because it has no impact on the objective, 𝜆 is set to be the initial
number 0.25. The details of how the proposed method performs in the aircraft concept
design can be found in [155].
Table 4-16 Design variables in aircraft concept design
Variables Description Lower Bound Upper Bound
1 𝑀 Mach number 1.4 1.8 2 𝑇 Throttle setting 0.1 1.0
3 𝑆𝑅𝐸𝐹 Wing surface area (ft2) 500 1500 4 𝐴𝑅 Aspect ratio 2.5 8.5
5 𝑡/𝑐 Thickness/chord ratio 0.01 0.09 6 𝜆 Wing taper ratio 0.1 0.4
7 Λ Wing sweep (deg) 40 70 8 𝑥 Wingbox x-section area (ft2) 0.9 1.25
9 ℎ Skin friction coefficient 38000 60000
10 𝐶𝑓 Mach number 0.75 1.25
76
The MATLAB function fmincon(.) is employed to optimize the two problems. The starting
points are selected randomly in the design space. The original problem with ten design
variables is optimized first. The optimal result of the original problem is used as the
stopping criterion in two-stage optimization. Each optimization is run 11 times and the
results are shown in Table 4-17.
Table 4-17: Optimization results of aircraft concept design.
𝑥∗ 𝑓∗ # of SA
Variance of # of SA
Original [1.4,0.265,1500,2.5,0.09,0.1,70,0.9,60000,0.75] 4459 453 [420,724] Decomposed [1.4,0.265,1500,2.5,0.09,0.1,70,0.9,60000,0.75] 4459 210 [166,231]
As shown in Table 4-17, it can be found that after decomposition, the two-stage
optimization reaches the same optimal value with 210 function evaluations, which is half
of the function calls used in the original optimization. In the original problem, ten design
variables need to be optimized. After employing the causal knowledge to analyze the
problem, one can find that there exists one monotonic variable. Then, the ten-
dimensional problem turns to be a nine-dimensional problem. After simplification, the
nine-dimensional problem is divided into two problems with seven and two variables
respectively. The reduction of the dimensions improves significantly the efficiency of the
optimization.
Then, the two-stage optimization is compared with the original optimization with a fixed
number of SA. In this case, the maximum number of function evaluations is set to be 180
for both optimizations. Each optimization is run 11 times and the median results are
shown in Table 4-18. It can be found that with the fixed number of function evaluations,
the result of the two-stage optimization is much better than the optimal value obtained
from the original problem.
Table 4-18: Comparison of optimization results with a fixed number of SA for the aircraft problem.
𝑥∗ 𝑓∗ Variance of 𝑓∗ # of SA
Original [1.4,0.366,870,2.5,0.09,0.14,70,0.9,50535,0.75] 1885 [951,4413] 180 Decomposed [1.4,0.263,1500,2.5,0.09,0.1,70,0.9,60000,0.75] 4458 [4458,4459] 180
77
4.3. Summary
This chapter proposed a dimension reduction method using causal graph and qualitative
analysis and this method is used to solve two optimization problems to test the
optimization efficiency. Causal graphs are constructed to show the input-output
relationships between variables. To find the variables without contradictions
automatically, a novel design structure matrix (DSM) based qualitative analysis method
is developed. Then, the values of the variables without contradictions can be determined
before optimization and the dimensionality of the problem can be reduced. Taguchi
method is employed to calculate the weight of each relationship and the original problem
is divided into two sub-problems, one with important variables and the other with less
important variables. Thus, the number of variables in each sub-problem is reduced
compared with the original problem. Finally, the two sub-problems are optimized
sequentially to obtain the optimal solution. The proposed method is employed to solve a
power converter design problem and an aircraft concept design problem from literature
and the results are compared with that obtained by optimizing the original problem. With
the same optimal value, the efficiency of the proposed method is significantly higher as
compared to optimizing the original problem. On the other hand, with the same number
of function calls, the proposed method arrives at a better optimal solution. Nevertheless,
the method can reach its limit if in the optimization problem all variables have
contradictions and no simplifications can be made with the approach developed in the
article. It is to be noticed that the only added function calls by the proposed method are
for the weight calculation; the total number of these calls is limited according to the
problem dimension and corresponding orthogonal array. Other steps of the proposed
method are only the analyses of the causal graph with matrix operations. The associated
cost of those operations is negligible and the operations are automated. Thus, the
dimension reduction method can be performed as a pre-analysis before launching the
optimization.
In addition to assisting dimension reduction, can one use causal relations to build a more
accurate metamodel? Next chapter addresses this very question.
78
Chapter 5. Casual-Artificial Neural Network (Causal-ANN) and its application
To reduce the computational cost in engineering design, expensive high-fidelity
simulation models are approximated by metamodels. Typical metamodeling methods
assume that expensive simulation models are black-box functions. A totally unknown
design space implies that more sample points are needed to seek for enough
information to construct an accurate metamodel in the entire design space. In order to
improve the efficacy of metamodels, knowledge about engineering design problems is
employed to help develop a novel metamodel, named as causal artificial neural network
(causal-ANN). Cause-effect relations intrinsic to the design problem are employed to
decompose an ANN into sub-networks and values of intermediate variables are utilized
to train these sub-networks.
Apart from giving a good prediction, an accurate metamodel can be used in different
applications in engineering design. Considering the structural representation of a causal-
ANN, not only the objective values, but also values of intermediate variables can be
predicted from the causal-ANN. Therefore, combining with the theory of Bayesian
Network [138], distribution of the variables and objectives can be estimated through the
causal-ANN. By analyzing the distribution, attractive design subspaces can be identified,
which means the subspace where the optimum solution may locate. In this thesis, the
application of causal-ANN in identifying attractive design spaces is also developed.
5.1. Causal ANN and application in attractive sub-space identification
In this section, causal relations are employed to help the neural network construction.
According to the causal graph, the entire network is divided into multiple sub-networks.
Intermediate variables are used together with the design variables and objective to train
each sub-network. The constructed casual-ANN can be used to identify the attractive
sub-spaces where the optimum design may locate. The likelihood of design variables
can be estimated from the causal-ANN and the attractive sub-spaces can be selected
79
through the likelihood distribution. In this section, the process of constructing a causal-
ANN is described and its application in identifying attractive sub-spaces is represented.
Case studies are given in Section 5.2 for more detailed description of each step.
5.1.1. Causal artificial neural network
The main challenge of using ANN in engineering design is the large number of sample
points needed to build a reasonably accurate model. Engineers have certain
understanding of the problem at hand, and furthermore, during engineering simulation,
values of some intermediate variables can be obtained through one simulation along
with the objective value. But those values are not employed in constructing metamodels.
In this chapter, causal relations are used to form the structure of the ANN and values of
the intermediate variables are used in training ANN. The process of constructing ANN is
represented as follows.
Step 1. Generate causal relations of the design problem. A high-level causal relation
map (i.e., simplified causal graph) is needed before constructing a causal-ANN. The
simplified causal graph needs inputs, output, and key intermediate variables. Such
intermediate variables can be the coupling variables, the outputs from each discipline, or
variables whose values can be obtained from simulation as by-products. Usually, key
intermediate variables can be selected according to the problem simulation process and
experience of the designers. There are two ways to generate a high-level causal graph.
One simple method is to simplify an existing causal graph. A causal graph of an
engineering problem usually contains all of the variables involved in the problem. By
keeping the key variables and removing others, a causal graph can be simplified to
represent the high-level causal relations. If the causal graph does not exist, knowledge
of the design problem can be used to generate the high-level causal-relations. By
connecting inputs and output with the selected key intermediate variables, a high-level
causal graph can be generated. Case studies in Section 5.2 will show examples.
Step 2. Generate sub-networks according to the causal relations. The high-level causal
graph is divided into multiple sub-graphs, which include only two layers, inputs and
outputs. For example, for the causal graph in Figure 5-1, two sub-graphs can be
generated, [A, B] to C and [C, D] to E.
80
A
B
C
E
D
Figure 5-1: An example of high-level causal graph
Step 3. Construct neural networks on the sub-graphs. Apart from causal relations, other
knowledge can also be employed in the causal-ANN. According to the kind of knowledge
applied in the problem, causal-ANNs can be divided into three categories.
The first category is that cheap models, i.e., mathematical models or inexpensive
simulation models, exist as part of the prediction model. Since the network is divided into
several sub-nets in the casual-ANN, some of the sub-nets can be replaced by the
existing cheap models. Taking the problem in Figure 5-1 as an example, if it needs to
employ computationally expensive models to calculate C while calculation E from [C, D]
is a cheap model, then a causal-ANN with cheap models can be constructed as shown
in Figure 5-2. Thus, there will be one ANN to be trained with two inputs. By involving the
cheap model in the causal-ANN, the accuracy of the prediction model can be improved.
Additionally, the reduced number of weights in the causal-ANN can reduce the training
cost. Moreover, the incorporation of cheap models in causal-ANN has negligible
computational overhead for ANN training.
A
B
C
E
D
Cheap model
Figure 5-2: Causal-ANN with a cheap model.
The second category of causal-ANN is that values of the intermediate variables can be
obtained as a by-product from the output evaluation, usually from running expensive
simulations. Thus, the causal-ANN can be divided into multiple independent sub-nets.
81
For the example problem shown in Figure 5-1, if the value of variable C can be obtained
from simulation, then two separate sub-nets can be constructed. If the sub-ANN is
between intermediate variables and the objective (e.g., [C, D] to E in Figure 5-3), the
actual values of the intermediate variable C is used as the inputs of the ANN. In
contrast, if one builds one system causal-ANN, after the model is constructed and used
in prediction, values of the intermediate variables are only estimated from previous
layers of the network, instead of the actual values. For this category of causal-ANN, the
complex prediction model can be divided into multiple sub-networks with less
complexity, which may lead to high accuracy of each sub-network.
A
B
C
C
E
D
Figure 5-3: Two separate sub-networks.
The last category of causal-ANN is that both the value of the intermediate variables and
the cheap model are available in the problem. Thus, the entire causal-ANN can be
divided into multiple sub-networks and some of them can be replaced by the cheap
model as shown in Figure 5-4. Then, the number of sub-networks to be trained can be
reduced.
A
B
C
C
E
D
Cheap model
Figure 5-4: Casual-ANN with known intermediate variables and cheap models.
The purpose of the causal-ANN constructing method is to employ knowledge in
constructing more accurate metamodels. The structure of the ANN is determined
according to causal relations and the problem is divided into several sub-ANNs. Then,
the sub-ANNs are trained based on values of the intermediate variables. The main
82
advantage of causal-ANN is to reduce the complexity of ANN. It is often difficult to train
ANN to approximate large-scale nonlinear problems. Thus, by dividing the entire network
into several sub-networks, the complexity of each network is reduced and the accuracy
of the entire model can be improved. By generating sample points from the neural
network, Bayesian probability inference can be performed with lower computational cost
than on the actual simulation.
5.1.2. Attractive sub-space identification method
Distribution of the obtained objective values is one kind of important information for
guiding sampling and performing optimization. In the Mode-Pursuing Sample (MPS)
method [6], a large number of cheap samples are generated by evaluating the
metamodel and distribution of the objective values is estimated through those cheap
samples and their responses. Then, new sample points are generated following the
distribution of the objective values to balance between exploration and exploitation.
Bayesian network is one kind of belief graphic modeling method that gives the joint
distribution of each variable. By constructing a Bayesian network of the engineering
design problem, the distribution of the objective 𝑝(𝑓|𝒙, 𝐷, 𝐺) can be found, where f is the
objective, 𝒙 is the design variables, D is the data and G is the graph structure. After
obtaining the distribution of the objective, the likelihood of the objective 𝑝(𝒙|𝑓, 𝐷, 𝐺) can
be calculated via the Bayesian theorem as shown in the following equation
𝑝(𝒙|𝑓, 𝐷, 𝐺) =𝑝(𝑓|𝒙, 𝐷, 𝐺)𝑝(𝒙)
𝑝(𝑓) (5-1)
Where, 𝑝(𝒙) and 𝑝(𝑓) are the distribution of the design variables and the objective
respectively, which can be estimated through analyzing the sample data. In general, for
a certain engineering design problem, designers or the decision makers often have an
expected objective value or range. The likelihood of the objective gives the information
about what area (or range) of the design variables has higher probability to generate
expected designs. Details of the method are shown as follows.
Step 1. Generate sample points. Sample points are generated following the uniform
distribution (or other variable distributions if known). The causal-ANN model is evaluated
to calculate the responses of the sample points. Note that the responses include the
objectives and also the intermediate variables.
83
Step 2. Discretize all the variables and objective. Most of BNs only deal with discrete
variables, while the variables in design problems are usually continuous. One method to
deal with the problem is to discretize variables and objective.
At the beginning, all the variables including inputs, intermediate variables, and outputs
are assumed to follow uniform distribution. Then, the range of each variable is divided
into n intervals with certain indices, as shown in Figure 5-5.
xi [ lb ub ]... ...
Index 1 ... ... nm+1
Figure 5-5: Variable discretization.
𝑙𝑏 and 𝑢𝑏 are lower bound and upper bound of the variables, respectively. If the sample
falls between (𝑚(𝑢𝑏−𝑙𝑏)
𝑛+ 𝑙𝑏) and (
(𝑚+1)(𝑢𝑏−𝑙𝑏)
𝑛+ 𝑙𝑏), 𝑚 = 0, . . , 𝑛, the index of the sample
is 𝑚+ 1. Note that when the variable does not have fixed lower bound and upper bound,
a rough bound can be determined and then two additional sections, which are smaller
than the lower bound and larger than the upper bound are added, as shown in Figure
5-6.
xi lb ub ... ...
Index 1 ... ... nm+10 n+1
Figure 5-6: Discretization for the variable without fixed bounds.
Step 3. Calculate the joint probability of the objective, 𝑝(𝑓|𝒙, 𝐷, 𝐺). The approximate
inferencing method is employed to generate the conditional distribution of the variables,
𝑝(𝑥𝑖|𝑃𝑥𝑖, 𝐷, 𝐺) , where, 𝑥𝑖 is the intermediate variables, 𝑃𝑥𝑖 is the parents of 𝑥𝑖 . The
conditional distribution can be calculated as follows
𝑝(𝑥𝑖 = 𝑎|𝑃𝑥𝑖 = 𝑏, 𝐷, 𝐺) =𝑁𝑥𝑖=𝑎,𝑃𝑥𝑖=𝑏𝑁𝑃𝑥𝑖=𝑏
(5-2)
where, 𝑁𝑃𝑥𝑖=𝑏 is the number of samples that 𝑃𝑥𝑖 = 𝑏, and 𝑁𝑥𝑖=𝑎,𝑃𝑥𝑖=𝑏 is the number of
samples that 𝑥𝑖 = 𝑎 as well 𝑃𝑥𝑖 = 𝑏 . Because the design variables are generated
following the uniform distribution, the prior probability of the design variable can be
84
calculated as 𝑝(𝑥 = 𝑎) = 1/𝑛. Then, the joint probability of objective can be calculated
as follows
𝑝(𝑓 = 𝑎|𝑥, 𝐷, 𝐺)
=∑ …∑ ∑ (𝑝(𝑓 = 𝑎|𝑃𝑥𝑖1)𝑝(𝑃𝑥𝑖1|𝑃𝑥𝑖2)⋯ 𝑝(𝑃𝑥𝑖𝑘|𝑥)𝑝(𝑥))𝑛𝑥
𝑖𝑥=1
𝑛𝑘
𝑖𝑘=1
𝑛1
𝑖1=1
= ∑ ( 𝑝(𝑓 = 𝑎|𝑃𝑥𝑖1)⋯∑ 𝑝(𝑃𝑥𝑖𝑘|𝑥)𝑛𝑘
𝑖𝑘=1
∑ 𝑝(𝑥)𝑛𝑥
𝑖𝑥=1)
𝑛1
𝑖1=1
(5-3)
where, 𝑛𝑘 gives the discrete number of each parent variable (i.e., intermediate
variables), and 𝑛𝑥 represents the number of discrete sections of design variables. By
counting the data and analyzing the Bayesian network, the joint probability of objective
can be estimated.
Step 4. Estimate the likelihood of the design variables and find the interesting area of
each variable. The likelihood is estimated according to the Bayesian theorem. 𝑝(𝑓) is
estimated through the function, 𝑝(𝑓 = 𝑎) =𝑁𝑓=𝑎
𝑁, where 𝑁 is the number of samples and
𝑁𝑓=𝑎 is the number of samples where the objective value falling in the section 𝑎. Finally,
the likelihood of the design variable is estimated via (5-3). The interval with the largest
likelihood of the design variables is selected as the interesting area.
Note that when there are multiple parents for one variable, the correlations of those
parents should be considered. However, to estimate the joint distribution considering the
correlations of multiple parents, a huge amount of samples are needed to cover all the
possible combinations of the multiple parents. One of the methods is assuming the
probability distribution given each parent is independent. For example, if A and B are the
parents of C, the distribution 𝑝(𝐶|𝐴) and 𝑝(𝐶|𝐵) are calculated independently. However,
ignoring the correlations between parents may lead to wrong likelihood estimation when
the correlations between parents are very strong. Therefore, a method named “Noisy-or”
is employed to estimate the probability distribution. In Noisy-or method, the joint
distribution given multiple parents can be calculated as follows.
𝑃(𝑓 = 𝑎|𝑥1, 𝑥2, … , 𝑥𝑛) = 1 −∏𝑃(𝑓 ≠ 𝑎|𝑥𝑖)
𝑛
𝑖=1
(5-4)
In the Noisy-or method, the probability distribution considering correlation can be
estimated by the probability distribution given each parent, which can reduce the number
of samples significantly.
85
By comparing the likelihood of each interval, the interesting sub-space can be
determined. However, the number of samples used in likelihood estimation is usually
very large. In this work, I use causal-ANN to generate the samples and thus
computational cost of attractive sub-space identification is negligible.
5.2. Case studies
The power converter design problem [152], [153] used in Chapter 4 is used to test the
performance of the proposed dimension reduction methodology. The design problem
has six design variables, as shown in Table 5-1. The objective of the problem is to
minimize the weight of the power converter.
Table 5-1: Design variables in power converter design.
Variables Name Description Lower Bound
Upper Bound
𝑥1 𝐶𝑤 Core center leg width (m) 0.001 0.1
𝑥2 𝑇𝑢𝑟𝑛𝑠 Inductor turns 1.0 10
𝑥3 𝐴𝑐𝑝 Copper size (m2) 7.29e-8 1.0e-5
𝑥4 𝐿𝑓 𝑃𝐼𝑁𝐷𝑈𝐶⁄ Inductance (H) 1.0e-6 1.0e-5
𝑥5 𝐶𝑓 Capacitance (F) 1.0e-5 0.01
𝑥6 𝑤𝑤 Core window width (m) 0.001 0.01
min𝑦1 = 𝑊𝑐 +𝑊𝑤 +𝑊𝑐𝑎𝑝 +𝑊ℎ𝑠 (5-5)
x1
x2
x3
x4
x5
x6
DELI
y6
XMLT
y7
ZP1
y5
CIRMS
XIRMS
XIMIN
XIP
ESRy3
PQ
PD
POF
y2
Wc
Ww
Wcap
Whs
y8
y1
+1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1+1
+1
+1+1
+1
+1
+1
+1
+1
+1
+1
Figure 5-7: Causal graph of the power converter problem.
86
The problem is mainly dominated by the coupling between 𝑦2 and 𝑦3, where 𝑦2 is the
circuit efficiency and 𝑦3 is the duty circle. It is to be noted that this problem entails
mathematical expressions, which are used to build the causal graph as shown in Figure
5-7. In most engineering problems, one does not have equations and thus should use
their knowledge to construct a causal graph. The six variables at the left side are the six
design variables and the one at the right side is the objective. There are 21 intermediate
variables involved in the problem, which are listed between the design variables and the
objective in Figure 5-7. As shown in Figure 5-7, 𝑦2 is influenced by 𝑦3 through different
routes and 𝑦2 influences 𝑦3 directly. All the design variables are involved into the loops
through different links and then finally influence the objective.
Constructing causal-ANN
The casual graph can be simplified to generate the high-level causal graph. Since the
objective of the problem is to minimize the total mass of the power converter, the mass
of the four components, i.e., 𝑊𝑐, 𝑊𝑤, 𝑊𝑐𝑎𝑝, and 𝑊ℎ𝑠 can be outputs from the simulation.
Additionally, the circuit coefficient 𝑦2 as one coupled variable can also be an output from
the simulation. Thus, the simplified causal graph can be formed in Figure 5-8.
x1
x2
x3
x4
x6
x5
y2
Wc
Ww
Whs
Wcap
y1
Figure 5-8: Simplified causal graph of the power converter design problem.
According to the simplified causal graph, six sub-networks are divided as shown in
Figure 5-9. Note that the objective is a sum of the mass of each component. Thus, the
third category of causal-ANN can be constructed, which means to construct ANNs for
the first to the fifth sub-networks and use Eq. (5-5) for the sixth sub-networks. For
constructing the fourth ANN, the actual values of 𝑦2 are used as the input of this ANN.
87
For each sub-network, there are two hidden layers and four hidden neurons for each
layer. The active function is tangent sigmoid function.
x1
x6
Wc
x1
x2
x3
Ww
x1
x2
x3
x4
x6
x5
y2
y2 Whs
x5 Wcap
Wc
Ww
Whs
Wcap
y1
Figure 5-9: Six sub-networks for the power converter design problem.
In this case, 100, 200, and 500 sample points are generated by Latin hypercube design to train the causal-ANN model. The Matlab neural network toolbox is employed to construct the ANN. To test the accuracy of the ANN, another 2,000 samples are
generated and the 𝑅2 error and mean absolute error (MAE) shown in Eqs. (5-6) & (5-7)
are calculated to illustrate the error, where 𝑓 is the actual output value; 𝑓 is the predicted
value, 𝑓̅ is the average value of the actual output, N is the number of test samples. Additionally, a RBF model and an ANN with two hidden layers and four hidden neurons
for each layer are constructed on the same training samples sets. The 𝑅2 value and MAE are also calculated and compared in
𝑀𝐴𝐸 =1
𝑁∑ |𝑓
𝑖− �̂�
𝑖|
𝑁
𝑖=1 (5-7)
Table 5-2.
88
𝑅2 = 1 −∑ (𝑓𝑖 − 𝑓𝑖)
2𝑖
∑ (𝑓𝑖 − 𝑓)̅2
𝑖
(5-6)
𝑀𝐴𝐸 =1
𝑁∑ |𝑓
𝑖− �̂�
𝑖|
𝑁
𝑖=1 (5-7)
Table 5-2: Comparison of accuracy among three metamodels.
# of samples Criteria Causal-ANN ANN RBF
100 𝑅2 0.634 0.217 0.372
MAE 33.3 56.1 69.3
200 𝑅2 0.949 0.718 0.410
MAE 10.6 30.2 67.7
500 𝑅2 0.965 0.878 0.484
MAE 8.8 13.1 61.5
As shown in
𝑀𝐴𝐸 =1
𝑁∑ |𝑓
𝑖− �̂�
𝑖|
𝑁
𝑖=1 (5-7)
Table 5-2, the 𝑅2 value of the causal-ANN is the highest among the three metamodeling
method, which means causal-ANN is the most accurate metamodel. The lower value of
𝑅2 of ANN and RBF is caused by the high non-linearity of the problem, especially for
ANN. Among three metamodels, causal-ANN has the smallest MAE value, which shows
that causal-ANN is the most accurate metamodel. Additionally, with the increase of
training samples, the accuracy of three metamodels increases. It can be found that with
200 training samples, the accuracy of causal-ANN is more accurate than ANN and RBF
with 500 samples. To further illustrate the performance of the causal-ANN, the 𝑅2 value
and MAE of each sub-network is shown in Table 5-3. It can be found that all sub-
networks are accurate. Note that the third network is between all the six design variable
and 𝑦2, which has the same number of inputs as the entire design problem. However,
the accuracy of the sub-network is very high compared with the accuracy of the entire
model. By dividing the entire network to sub-networks to reduce the complexity of each
one, the accuracy of each sub-network can be improved.
Table 5-3: Accuracy of each sub-network.
# of samples
Criteria 𝑦2 𝑊𝑐 𝑊𝑤 𝑊ℎ𝑠 𝑊𝑐𝑎𝑝
89
100 𝑅2 0.770 0.999 0.997 0.634 0.980
MAE 0.023 1e-4 3.824 33.2 7e-05
200 𝑅2 0.883 0.999 0.988 0.916 0.988
MAE 0.015 9 e-05 8 e-05 10.6 6 e-05
500 𝑅2 0.988 0.999 0.971 0.919 0.990
MAE 0.005 8 e-05 1e-4 8.84 5e-05
Attractive sub-space identification
After constructing the causal-ANN, the probability distribution of the objective values and
likelihood of the design variables can be estimated on the samples generated from
causal-ANN. In this test, 200 samples are used to train the causal-ANN. At the
beginning, the design variables, the intermediate variables, and the objective are
discretized. For this case, the upper and lower bounds are used to determine the interval
of design variables. While for the intermediate variables and the objective, the minima
and maxima are selected to determine the boundary of the intervals. Thus, all the
variables and objective are divided into five intervals based on their own bounds.
The objective of the power converter problem is to minimize the mass, which means a
smaller objective value is desired. Therefore, the first interval of the objective, i.e., 𝑦 = 1,
is selected and the conditional probability 𝑃(𝑦 = 1|𝒙) and likelihood 𝑃(𝒙|𝑦 = 1) are
estimated in this problem. Considering the correlations among the six design variables,
the Noisy-or method is employed and the probability distribution of each design variable
𝑃(𝑦 ≠ 1|𝑥𝑖), 𝑖 = 1,2,… ,6 is calculated. To estimate the probability distribution and the
likelihood, 10,000 samples are generated from both the actual model and the causal-
ANN. The probability distributions estimated from the actual model and prediction model,
𝑃(𝑦 ≠ 1|𝑥𝑖) and 𝑃𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛(𝑦 ≠ 1|𝑥𝑖), 𝑖 = 1,2, … ,6 are shown in Table 5-4 and Table
5-5, where 𝑥𝑖 = 1 means the sample locates in the first interval of 𝑥𝑖.
Table 5-4: Probability distribution 𝑷(𝒚 ≠ 𝟏|𝒙𝒊), 𝒊 = 𝟏, 𝟐,… , 𝟔 on actual model.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑥𝑖 = 1 0 0.0005 0.038 0.004 0.007 0.007
𝑥𝑖 = 2 0.002 0.002 0 0.0095 0.009 0.006
𝑥𝑖 = 3 0.0055 0.0075 0 0.009 0.012 0.009
𝑥𝑖 = 4 0.0105 0.014 0 0.011 0.006 0.007
90
𝑥𝑖 = 5 0.02 0.014 0 0.0045 0.004 0.009
Table 5-5: Probability distribution 𝑷𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏(𝒚 ≠ 𝟏|𝒙𝒊), , 𝒊 = 𝟏, 𝟐,… , 𝟔 on causal-
ANN.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑥𝑖 = 1 0 0.0005 0.038 0.004 0.007 0.007
𝑥𝑖 = 2 0.002 0.002 0 0.0095 0.009 0.006
𝑥𝑖 = 3 0.0055 0.0075 0 0.009 0.012 0.009
𝑥𝑖 = 4 0.0105 0.014 0 0.011 0.006 0.007
𝑥𝑖 = 5 0.02 0.014 0 0.0045 0.004 0.009
As shown in the both tables, the probability distributions estimated from the predicted
model is the same as the distribution calculated from the actual model which means that
causal-ANN is accurate to estimate the predicted distribution. However, in some cases
the probability distribution is equal to zero for example when 𝑥3 = 2, which means that if
the third coordinate of the sample located in the second interval, all the objective values
will be located in its first interval. This is caused by the distribution of the objective value.
By setting the upper bound of the objective at the maximum value, over 95% objective
values will locate at the first interval. The ill-defined boundary of the objective may
render the likelihood estimation useless because the likelihood of some intervals may
reach 100% according to Eq. (5-5). Therefore, the upper bound of the objective should
be reduced to avoid 0% appeared in the probability distribution. In this case, 11 is
selected as the upper bound according to the distribution of the objective values and
then the objective is discretized into six intervals. The first interval of the objective is still
the desired space. Then, the probability distributions estimated on the actual model and
the causal-ANN is shown in Table 5-6 and Table 5-7.
Table 5-6: Probability distribution 𝑷(𝒚 ≠ 𝟏|𝒙𝒊), 𝒊 = 𝟏, 𝟐,… , 𝟔 with new upper bound.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑥𝑖 = 1 0.0505 0.5775 0.7915 0.6455 0.6290 0.6035 𝑥𝑖 = 2 0.2245 0.6075 0.6315 0.6470 0.6345 0.6160
𝑥𝑖 = 3 0.9270 0.6580 0.6215 0.6360 0.6295 0.6570 𝑥𝑖 = 4 1 0.6695 0.5780 0.6425 0.6455 0.6435
𝑥𝑖 = 5 1 0.6895 0.5795 0.6310 0.6635 0.6820
Table 5-7: Probability distribution 𝑷𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏(𝒚 ≠ 𝟏|𝒙𝒊), , 𝒊 = 𝟏, 𝟐,… , 𝟔 with new
upper bound.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑥𝑖 = 1 0.0495 0.5820 0.7915 0.6445 0.6290 0.6035
91
𝑥𝑖 = 2 0.2215 0.6060 0.6295 0.6465 0.6355 0.6145
𝑥𝑖 = 3 0.9265 0.6540 0.6200 0.6340 0.6290 0.6530
𝑥𝑖 = 4 1 0.6670 0.5765 0.6420 0.6420 0.6435
𝑥𝑖 = 5 1 0.6885 0.5800 0.6305 0.6620 0.6830
As shown in Table 5-6 and Table 5-7, the probability distributions estimated from the
predicted model are close to the values estimated from the actual model. Note that only
200 expensive points are used to construct the causal-ANN, and probability estimation is
performed on the causal-ANN whose cost is negligible.
Then, by employing the Noisy-or method and the Bayes theory, the interval of the design
variable with the largest likelihood can be determined, which is shown in Table 5-8. In
the table, the number of each design variable represents the interval of each variable.
Same as the above comparison, the likelihood is estimated on both the actual and
predicted models. Additionally, the interval that the optimum is located in is also
represented in the table. As shown in the table, the interesting interval generated from
the prediction model is the same as the result from the actual model, which is almost the
same as the interval where the actual optimum locates, except for 𝑥2. This is because
that the second design variable of the optimum point is located near the boundary of the
first and the second interval and the likelihood distribution cannot capture it accurately.
Table 5-8: Interesting interval with the largest likelihood.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
Actual model 1 1 5 5 1 1 Predicted model 1 1 5 5 1 1 Optimal solution 1 2 5 5 1 1
5.2.2. Aircraft concept design problem
The aircraft concept design problem [66] is used to test the performance of the proposed
method. There are nine design variables (listed in Table 5-9) and three coupled
disciplines (structure, aerodynamics, and propulsion). The objective of the problem is to
maximize the range computed by the Breguet equation. The causal graph is shown in
Figure 5-10.
92
R
V
L/D
WT/(WT-
WF)
SFC
D
WT
WT-WF
CD
WW
WE
WFWFW
CDmin
CL
k
Θ
Fo2
ESF
x
b/2λ
AR
t
Sref
t/c
Cf
Λ
T
M
h
+1
-1
-1
-1-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1+1
+1
+1+1
+1
+1
+1
+1
+1 +1+1
+1
+1
+1 +1
+1
+1
+1
+1 +1
+1
+1 +1
+1
+1
+1
+1
+1
+1
+1 +1
+1
+1
+1
+1
+1
+1
+1
Figure 5-10: Causal graph of the aircraft concept design problem.
Table 5-9: Design variables in aircraft concept design.
Variables Description Lower Bound Upper Bound
1 𝑀 Mach number 1.4 1.8 2 𝑇 Throttle setting 0.1 1.0
3 𝑆𝑅𝐸𝐹 Wing surface area (ft2) 500 1500 4 𝐴𝑅 Aspect ratio 2.5 8.5 5 𝑡/𝑐 Thickness/chord ratio 0.01 0.09
6 𝜆 Wing taper ratio 0.1 0.4
7 Λ Wing sweep (⸰) 40 70
8 𝑥 Wingbox x-section area (ft2) 0.9 1.25
9 𝐶𝑓 Skin friction coefficient 0.75 1.25
To simplify the causal graph, the two coupled variables, total weight of the aircraft, 𝑊𝑇
and the drag 𝐷 are selected as two intermediate variables. Also, the weight of the fuel
(𝑊𝐹 ) from the structural discipline and the specific fuel consumption (𝑆𝐹𝐶) from the
propulsion discipline are selected as the other two intermediate variables. Thus, the
simplified the causal graph is shown in Figure 5-11.
93
M
x
λ
AR
Sref
Cf
t/c
Λ
T
WT
SFC
WF
D
R
Figure 5-11: Simplified causal graph for aircraft concept design
M
x
λ
AR
Sref
Cf
t/c
Λ
T
WT
M
T
SFC
M
x
λ
AR
Sref
Cf
t/c
Λ
T
D
AR
Sref
t/c
WF
WT
SFC
WF
D
R
Figure 5-12: Sub-networks for aircraft concept design.
The simplified causal graph can be divided into five sub-networks as shown in Figure
5-12. Because the actual values of the intermediate variables can be obtained from one
simulation and there is no simple equation existing, this problem belongs to the second
category of the causal-ANN. The ANNs with two hidden layers and four hidden neurons
for each layer is constructed based on the five sub-networks. The active function is
selected as the tangent sigmoid function. Note that for the fifth network, the actual
values of 𝑊𝑇, 𝑊𝐹, 𝑆𝐹𝐶, and 𝐷 are used as the input of the network. 100, 200, and 500
training samples are generated according to the simulation model. The Matlab toolbox is
employed to train the neural network. 2,000 testing points are generated and the 𝑅2
94
value and MAE are calculated to illustrate the estimation error of the causal-ANN.
Additionally, the accuracy of a RBF model and an ANN on the entire problem are also
calculated for comparison, which are shown in Table 5-10. As shown in Table 5-10,
causal-ANN is more accurate than ANN and RBF when the number of samples is 100 or
200, while causal-ANN is comparable with others with 500 samples. In this case, the
ANN and RBF are also accurate due to their high 𝑅2 values. As increasing the number
of training samples, the accuracy of causal-ANN increases. Table 5-11 gives the 𝑅2
value and MAE of each sub-network. The accuracy of the sub-network between all
design variables and 𝑊𝑇 is the lowest compared with other networks, which brings down
the overall accuracy of the causal-ANN. The reason of the lower accuracy of the first
sub-network is that the coupling among the three disciplines is involved in this network,
which increases the complexity of the sub-problem.
Table 5-10: Comparison of accuracy value among three metamodels.
# of samples Criteria Causal-ANN ANN RBF
100 𝑅2 0.905 0.797 0.902
MAE 74.6 111.5 78.6
200 𝑅2 0.968 0.943 0.940
MAE 41.7 49.7 52.7
500 𝑅2 0.980 0.990 0.987
MAE 35.4 23.1 25.7
Table 5-11: Accuracy of each sub-network.
# of samples Criteria 𝑊𝑇 𝑊𝐹 𝑆𝐹𝐶 𝐷
100 𝑅2 0.743 0.988 0.997 0.956
MAE 6128.9 252.6 0.015 326.0
200 𝑅2 0.906 0.987 0.983 0.980
MAE 3217.3 253.3 0.009 103.7
500 𝑅2 0.931 0.993 0.999 0.997
MAE 1696.8 227.7 0.009 67.4
Once the causal-ANN is constructed with 200 training samples, the likelihood is
estimated based on the samples generated from the neural network. To illustrate the
performance of the likelihood estimation on the neural network, 10,000 testing samples
are generated on the actual model and the causal-ANN. The design variables,
intermediate variables, and the objective are discretized into five intervals. As the
objective is to maximize the range, the fifth interval of the objective is desired. To
estimate the likelihood through the Noisy-or method, the probability distribution
95
𝑃(𝑦 ≠ 5|𝑥𝑖), 𝑖 = 1,2,… ,9 is estimated on the actual model and the causal-ANN as shown
in Table 5-12 and Table 5-13. It can be found that the probability distribution estimated
from causal-ANN is similar with the results from the actual model. Then, the likelihood is
calculated via Bayes theory and the interval with the largest likelihood is represented in
Table 5-14. The interval where the optimal solution locates in is also shown in the same
table. It can be found that the interesting interval generated from causal-ANN is the
same as that obtained from the actual model. Additionally, this interesting interval is
exactly where the optimal solution locates. Therefore, by employing the causal-ANN and
the likelihood estimation method, interesting design subspaces of the problem can be
detected with few expensive function evaluations.
Table 5-12: Probability distribution 𝑷(𝒚 ≠ 𝟏|𝒙𝒊), 𝒊 = 𝟏, 𝟐,… , 𝟗 on real model.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
𝑥𝑖 = 1 1 0.9955 0.9950 1 1 0.9965 0.9990 0.9985 0.9975
𝑥𝑖 = 2 1 0.9980 0.9995 1 0.9990 1 0.9990 0.9985 0.9960
𝑥𝑖 = 3 0.9990 1 0.9990 1 0.9995 0.9990 0.9965 0.9980 1
𝑥𝑖 = 4 0.9970 1 1 0.9995 0.9985 0.9995 0.999 0.9985 1
𝑥𝑖 = 5 0.9975 1 1 0.9940 0.9965 0.9985 1 1 1
Table 5-13: Probability distribution 𝑷𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏(𝒚 ≠ 𝟏|𝒙𝒊), , 𝒊 = 𝟏, 𝟐, … , 𝟗 on causal-
ANN.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
𝑥𝑖 = 1 1 0.9875 0.9830 1 0.9965 0.9905 0.9920 0.9940 0.9900
𝑥𝑖 = 2 1 0.9930 0.9945 1 0.9955 0.9975 0.9950 0.9955 0.9875
𝑥𝑖 = 3 0.9985 0.9965 0.9980 1 0.9985 0.9960 0.9940 0.9940 0.9975
𝑥𝑖 = 4 0.9880 0.9990 0.9990 0.9995 0.9950 0.9950 0.9980 0.9940 1
𝑥𝑖 = 5 0.9885 0.9990 1 0.9755 0.9895 0.9960 0.9960 0.9975 1
Table 5-14: Interesting interval with the largest likelihood.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
Actual model 5 1 1 5 5 1 1 1 2 Predicted model 5 1 1 5 5 1 1 1 2 Optimal solution 5 1 1 5 5 1 1 1 2
5.2.3. Discussion
Generation of high-level causal graph
The causal relations are employed as the premier knowledge in causal-ANN. However, it
is hard to generate an accurate and complete causal graph. In this thesis, only a high-
level causal graph including key intermediate variables is needed to represent the
96
causal-effect relations in the design problem. As described in Section 5.1, finding the
important intermediate variables is the key step in constructing a high-level casual
graph. One of the criteria to select intermediate variables is if the variable value can be
calculated or is an output from simulation. These intermediate variables can thus be
called by-product variables. Basically, the coupling variables, outputs of each discipline,
and by-product variables can be selected as key intermediate variables in the high-level
causal graph. Another suggestion is to simplify the structure of an existing causal graph.
Involving many variables in the causal graph may cause difficulty in constructing causal-
ANN. Thus, a causal graph with less than two intermediate layers is recommended.
Additionally, for the problem with coupling loops, one variable in each coupling relation is
selected to avoid coupling in the causal graph since the BN cannot deal with coupling
well. Finally, the intermediate variables that have directly and prominent impact on the
objective are usually selected as key variables.
In this chapter, the complete causal graph exists for the two case study problems. Thus,
the high-level causal relations can be generated through simplifying the causal graph.
For the power converter problem, considering the objective is to minimize the total
weight of the converter, the weight of each component can be selected as the key
intermediate variables. Also, one of the coupling variables, circuit coefficient (𝑦2) is kept
in the high-level causal graph. For the aircraft design problem, the total weight of the
aircraft (𝑊𝑇), the drag (𝐷), the weight of the fuel (𝑊𝐹), and the structural discipline and
the specific fuel consumption (𝑆𝐹𝐶), which directly influence the final objective, i.e.,
range, are picked as the key variables. Additionally, 𝑊𝑇 and 𝐷 are the coupled variables,
while 𝑊𝐹 and 𝑆𝐹𝐶 are the outputs from structure and propulsion disciplines. If a
complete causal graph does not exist, high-level knowledge about the design problem
can be utilized to generate the causal relations in causal-ANN.
Fault tolerance studies on causal relations
Even though only high-level causal relations are required in constructing causal-ANNs,
there might be errors in defining these causal relations, which may influence the
accuracy of the causal-ANN. Thus, the impact of the faulty causal relations on the
accuracy of the causal-ANN is discussed in this section.
First, the influences of the number of layers in the causal relations are discussed. Figure
5-7 illustrates a high-level causal relation including two intermediate layers. As shown in
97
Figure 5-13, one intermediate variable, 𝑦2, is removed from the causal graph to reduce
the number of intermediate layer to one. Compared with the original causal-ANN, the
sub-network, [𝑥1, … , 𝑥6] − 𝑦2 −𝑊ℎ𝑠 is replaced by the direct links from the design
variables to 𝑊ℎ𝑠 . Thus, the total number of sub-networks to be trained is four. The
casual-ANN with one intermediate layer is trained with 200 samples and the 𝑅2 values of
the objective and different intermediated variables are calculated on 2000 testing
samples and shown in Table 5-15. Note that the same training samples and test
samples as in Section 5.2.1 are used in this test and the following test. Compared with
the casual-ANN with 𝑦2, the accuracy of the new causal-ANN decreases. Comparing the
𝑅2 value of 𝑊ℎ𝑠 in Table 5-3 and Table 5-15, it can be found that involving more
intermediate variables in the complex networks can improve the prediction accuracy. On
the other hand, compared with ANN and RBF model, the accuracy of the new causal-
ANN is still better, which means a simple high-level causal graph can also improve the
accuracy of the prediction model.
x1
x2
x3
x4
x6
x5
Wc
Ww
Whs
Wcap
y1
Figure 5-13: Casual graph with one intermediate layer for power converter design
Table 5-15: 𝑹𝟐 value of objective and intermediate variables for the causal-ANN without 𝒚𝟐.
𝑦1 𝑊𝑐 𝑊𝑤 𝑊ℎ𝑠 𝑊𝑐𝑎𝑝
𝑅2 0.886 1.000 0.969 0.805 1.000
Second, the influence of missing links is studied. Six causal graphs with one of the links
from [𝑥1, … , 𝑥6] to 𝑦2 missing in each graph are employed to construct causal-ANNs and
the accuracies of those causal-ANNs are calculated. The 𝑅2 values of the objective and
the intermediate variables, 𝑦2 and 𝑊ℎ𝑠, are listed in Table 5-16. It can be found that,
missing the links will decrease the accuracy of the causal-ANN model. If any of the links
from [𝑥1, 𝑥2, 𝑥3] to 𝑦2 is removed, the causal-ANN fails. In the causal-ANN with multiple
98
layers, the accuracy of the previous sub-network has large impact on the next sub-
network and the errors will accumulate through the sub-networks. Thus, the low
accuracy of 𝑦2 when removing the links from [𝑥1, 𝑥2, 𝑥3] to 𝑦2 leads to a failed prediction
of 𝑦1 as the negative 𝑅2 value. However, if any of the links from [𝑥4, 𝑥5, 𝑥6] to 𝑦2 are
missed, the prediction accuracy will not decrease much compared with the correct
causal graph. Table 5-17 gives the ANOVA analysis results of [𝑥1, … , 𝑥6] to 𝑦2, which
illustrates that [𝑥1, 𝑥2, 𝑥3] are important variables while [𝑥4, 𝑥5, 𝑥6] are not. Therefore,
missing the links of the important variables will decrease the prediction accuracy
significantly while missing the links of unimportant variables will influence the accuracy
slightly. Additionally, another causal graph with all the links from [𝑥4, 𝑥5, 𝑥6] to 𝑦2
removed is used to build the causal-ANN and the 𝑅2 values are shown in Table 5-16 as
well. The results show that even if missing three unimportant variables from the causal
graph, the accuracy of the causal-ANN is still acceptable. In engineering design, the
chance of missing less important variables is much larger than missing important
variables and missing those less important variables will influence the accuracy of the
causal graph slightly. On the other hand, if important variables are missed from the
causal graph, the prediction of the causal-ANN will be poor or unacceptable.
Table 5-16: Comparison of 𝑹𝟐 values when missing links in causal graphs.
Missing link(s) 𝑦1 𝑦2 𝑊ℎ𝑠
None 0.967 0.994 0.934
𝑥1 − 𝑦2 -60.280 0.0913 -122.265
𝑥2 − 𝑦2 -4.120 0.6768 -9.300
𝑥3 − 𝑦2 -62.245 -1.002 -126.222
𝑥4 − 𝑦2 0.922 0.992 0.844
𝑥5 − 𝑦2 0.949 0.992 0.897
𝑥6 − 𝑦2 0.931 0.993 0.861
[𝑥4, 𝑥5, 𝑥6] − 𝑦2 0.909 0.993 0.818
Table 5-17: ANOVA analysis results of [𝒙𝟏, … , 𝒙𝟔] to 𝒚𝟐
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
Prob>F 0 0 0 0.111 0.165 0.565
Impact of variable correlations
In engineering problems, design variables usually correlate with each other. But
considering the correlations may lead to higher expenses because a large number of
variable combinations should be considered. To reduce the computational cost, the
99
multiple parents usually are assumed to be independent of each other in common
probability inference. In this case, each design variable is considered independently and
the interval of each design variable with the largest likelihood is determined separately
and finally those intervals of variables are put together to form the interesting design
subspace [17]. However, ignoring variable correlations may bring extra errors in
probability inference. Thus, the impact of variable correlations is discussed in this
section. To illustrate the differences between considering variable correlations or not, the
interval with the largest likelihood is estimated with the independence assumption on the
same 10,000 samples, and the results are shown in Table 5-18 and Table 5-19 for both
the power converter design problem and aircraft design problem. By comparing the
results between Table 5-8 and Table 5-18 for the power converter design problem, it can
be found that in the power converter design problem, ignoring the variable correlations
may lead to completely wrong results. It can be explained that for a highly nonlinear
problem, optimizing it along each dimension cannot find the optimal solution. When the
design variables are highly correlated, the combined influence of design variables may
dominate the objective value variance. On the other hand, as shown in Table 5-19 for
the aircraft design problem, the fifth design variable tends to be in a different interval
compared with the results considering correlations and the interval where the optimal
solution is in from Table 5-14. In this case, the correlation influence of the design
variables is weaker than that in the power converter problem. Thus, the interesting
interval estimated independently is near the actual one. Therefore, correlations between
design variables should be considered in probability inference.
Table 5-18: Interesting area detected with independent assumption in power converter design.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
Actual model 1 2 5 5 1 1 Predicted model 3 5 2 4 4 2
Table 5-19: Interesting area detected with independent assumption in aircraft concept design.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 𝑥9
Actual model 5 1 1 5 5 1 1 1 2 Predicted model 5 1 1 5 4 1 1 1 2
100
5.3. Summary
To improve metamodel accuracy, knowledge of the engineering design problem is
employed in building the metamodel. The cause-effect relations are combined with ANN
to develop the causal-ANN model. The entire ANN is divided into several sub-networks
according to the causal graph. Values of intermediate variables are employed in
constructing sub-networks. The causal-ANN is employed in two engineering case
studies and the results show that the prediction accuracy of the causal-ANN outperforms
ANN or RBF model. To further explore the applications of the causal-ANN, a causal-
ANN based attractive space identification method is developed. Likelihood distributions
of the design variables are estimated through Bayesian networks according to samples
estimated from the causal-ANN. In the two engineering cases, by employing the
proposed method, the attractive sub-spaces can be found in both cases. Since the
samples used in the distribution estimation come from the causal-ANN, there is no
expensive simulation involved and thus has low cost. Additionally, the impacts of errors
in the causal graph and variable correlations are discussed based on the testing results.
For the causal graph, involving intermediate variables in the complex sub-networks can
improve the prediction accuracy. Missing less important links will not influence the
accuracy much but missing important links will cause the prediction to fail. The variable
corrections have influence on the likelihood estimation and they should be considered in
attractive design space detection.
I will apply this method to a real-world problem that is sponsored by a local company.
This will be the topic of the next chapter.
101
Chapter 6. Applying causal-ANN in energy consumption prediction
The Residential End-Use Stock and Flow (REUSF) model is developed to predict the
energy consumption according to the unit energy consumption of 19 end-uses
(appliances) and the market shares of different end-use technologies. This model can be
used by power companies and governments for planning and policy making purposes.
To improve the efficiency and accuracy of the market share prediction model, the
causal-ANN is applied to replace the original logit model. The causal relations of the
market prediction model are used to construct the ANN structure. To reduce the training
difficulty, the simulation model, named stock turnover engine in the REUSF model, is
used to replace part of the ANN. Optimization is applied to train the causal-ANN by
reducing the error between the predicted market shares and historical data.
6.1. Residential End-Use Stock and Flow Model
In recent years, there is a growing interest in reducing the residential energy
consumption which is the largest consumer of energy compared to other sectors such as
commercial and industrial. The difficulty of developing an energy model for the
residential sector is the uncertainty around the consumers’ decision-making process.
[156]. Currently, there are two main methods, top-down and bottom-up, that is used to
estimate the historical and future energy consumption for the residential sector. [156].
The top-down approach only considers the total energy consumption at an aggregate
level and it uses macroeconomic indicators such as housing starts, energy prices, and
weather to estimate the historical and future energy growth. Because of the relatively
simple model structure and the ease of access to the different inputs, the top-down
approach is widely used in long-term energy forecasting [157]–[160]. However, the top-
down approach is not able to capture the short-term behavior changes, and therefore, it
is difficult to generate specific polices based on the outcomes. On the other hand, the
bottom-up approach is able to look at individual end-uses and housing types to develop
an energy forecast [161]. The bottom-up approach can be developed using either the
102
statistical or engineering methodology. During the calibration period of such models,
historical data such as: annual energy consumption, end-use efficiencies, fuel prices,
and life expectancy is used to develop an energy forecast. [162]–[165], The engineering
methodology requires specific inputs such as power rates and heat transfer rates to
estimate the historical and future load growth. [166]–[169]. By considering the energy
consumption of individual end-uses, the bottom-up approach enables researchers to
study the causality of the historical load as well as the impact of different scenarios
around government policies and consumer decision in the forecast period. However,
both bottom-up methodologies rely heavily on large amounts of survey data and the
technical knowledge for individual end-uses.
The U.S Energy Information Administration (EIA) uses a hybrid approach in their
National Energy Modeling System (NEMS) to take advantage of both bottom-up and top-
down methodologies. [170] The NEMS model uses individual end-use data such as,
energy consumption and market shares as well as other input data which are typically
used in a top-down model such as temperature and fuel prices to estimate historical and
future energy growth. In addition, the model’s logit function is able to take into account
the impact of the consumer decision making process. As such, the NEMS model can be
used to evaluate the effects of different policies.
The Residential End-Use Stock and Flow Model (REUSF) uses a similar approach as
the NEMS model to provide 30-year projections of energy consumption. Similarly, the
REUSF model uses a logit function to calculate the individual end-use’s market shares,
which will be used to calculate the total energy consumption. Specifically, the REUSF
model considers 13 end-uses, including dish washers, refrigerators, TVs, freezers,
heating modules, cooktops, ovens, clothes washers, clothes dryers, set-top boxes, water
heaters, air conditioners, and lightings. Note that the heating module contains four sub-
modules considering primary and secondary heating with two different kinds of energy
sources, electricity and fuel. The lighting module is divided into four different parts
including general purpose screw-in bulbs, general purpose reflector, linear fluorescent,
and others. Thus, there are totally 19 end-uses involved in the model. Additionally, the
model decomposes end-uses into different end-use technologies (brand models), which
are delineated by efficiency tiers. For example, dish washers include two end-use
technologies, named as basic with higher energy consumption and energy star with
lower energy consumption. Different end-uses include different numbers of end-use
103
technologies, and the largest number of end-use technologies is ten. The yearly energy
consumption of each end-use is a summation of consumption of each end-use
technology as presented in Eq. (6-1).
𝐸 =∑ 𝑒𝑖𝐸𝑇
𝑖=1=∑ 𝑈𝐸𝐶𝑖 × 𝑆𝑈𝑖
𝐸𝑇
𝑖=1 (6-1)
where 𝑒𝑖 is the energy consumption of the i-th end-use technology, which is calculated
by the product of unit energy consumption (UEC) and the stock units (SU). As shown in
Figure 6-1, the SU of the end-use technologies is estimated through a logit model and a
stock turnover engine. The logit model is used to estimate the customer preferences of a
specific end-use technology according to the total life cycle cost (TLCC) and capital cost
(CC). The Stock and Turnover Engine is used to calculate the SU in the current year
based on customer preferences, the saturation of the end-use technology, and the
replacement rate or new instruments rate. The saturation represents the percentage of
dwellings that have at least one of the end-use technologies. To specify the saturation of
the end-use technology, BC province is divided into four different regions while the
houses are categorized into four different types. The saturation of the end-use
technologies varies in different region and housing type combinations and the values are
based on the residential end use survey in different combinations in BC. The following
sub-sections introduce the main parts in the REUSF model.
Model
Inputs
Total Life
Cycle Cost
Calculation
Logit
Model
Stock
Turnover
Engine
Energy
Consumption
Prediction
Market shares prediction
Stock
Units
Figure 6-1: Flow chart of Residential End-Use Stock and Flow Model.
6.1.1. Total life cycle cost calculation
In REUSF model, the TLCC of one end-use technology is regarded as the key impact on
the customer preferences when buying new stocks, while CC is considered when
104
replacing stocks. TLCC is the cost of the end-use technology in its entire life, which
includes the CC, fuel costs, and other costs. In CC calculation, the incentive given by the
government to use more efficient technologies is also considered. The fuel costs present
the total energy costs in its entire life, which is calculated by the products of unit energy
consumption, life expectancy, and fuel price. The other costs contain operation costs
and maintenance costs.
6.1.2. Logit model
In REUSF, the customer preferences are quantified by replacement shares and new
shares. Instead of directly estimating the market shares (MS) of the end-use technology,
the dynamic changes in the market are predicted by replacement/new shares when
residents replace or buy new stocks. The logit model is employed to predict the new and
replacement shares according to the TLCC and CC of the end-use technology. The
equations of the logit model are shown in Eq. (6-2), where 𝛼 and 𝛽 are the coefficients of
the logit model that need to be determined based on the historical data. In the logit
model, the customer preferences are assumed following the logit distribution according
to the TLCC and CC. Since the numbers of CC and TLCC are very large, CC and TLCC
are normalized in the logit model.
𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 =𝑒𝛼−𝛽𝐶𝐶𝑖
∑ 𝑒𝛼−𝛽𝐶𝐶𝑖𝐸𝑇
𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 =𝑒𝛼−𝛽𝑇𝐿𝐶𝐶𝑖
∑ 𝑒𝛼−𝛽𝑇𝐿𝐶𝐶𝑖𝑖
(6-2)
6.1.3. Stock turnover engine
The stock turnover engine is used to predict the number of stocks of each end-use
technology in a specific year using the number of stocks in the previous year, and new
and replacement shares. As shown in Figure 6-2, in a year period, the decay and end-
of-life stocks will be replaced by new stocks. In addition, with the increase of the number
of houses, the number of stocks will be increased as well. Therefore, the number of
stocks for the i-th end-use technology in the Y-th year can be represented as
𝑆𝑈𝑌,𝑖 = 𝑆𝑈𝑌−1,𝑖 + 𝑅𝑆𝑌,𝑖 +𝑁𝑆𝑌,𝑖 − 𝐷𝑆𝑌,𝑖 − 𝐸𝑂𝐿𝑌,𝑖 (6-3)
105
Stocks in last
year
New stocks
Replacement stocks
Flow in
Decay stocks
End-of-life stocks
Flow out
Figure 6-2: Flow of stocks in the stock turnover engine.
where, 𝑆𝑈𝑌 and 𝑆𝑈𝑌−1 are the number of stock units in the Y-th year and the (Y-1)-th
year. 𝐷𝑆𝑌 is the number of decay stocks and 𝐸𝑂𝐿𝑌 is the number of end-of-life stocks.
𝐷𝑆 is calculated via the decay rate while 𝐸𝑂𝐿 is calculated from the life expectancy and
the number of stocks in the last year. For the i-th end-use technology, the 𝐷𝑆𝑌,𝑖 and
𝐸𝑂𝐿𝑌,𝑖 follow the market shares in the previous year as shown in Eq. (6-4), where
𝑀𝑆𝑌−1,𝑖 represents the market shares of the i-th end-use technology in the previous year.
𝐷𝑆𝑌,𝑖 = 𝐷𝑆𝑌 ×𝑀𝑆𝑌−1,𝑖
𝐸𝑂𝐿𝑌,𝑖 = 𝐸𝑂𝐿𝑌 ×𝑀𝑆𝑌−1,𝑖 (6-4)
The decay stocks and end-off life stocks are replaced by new stocks following the
replacement shares estimated from the logit model as Eq. (6-5). Note that the total
number of replacement stocks (𝑅𝑆𝑌) is equal to the sum of 𝐷𝑆𝑌 and 𝐸𝑂𝐿𝑌 for an end-
use, while the number of replacement stocks for the i-th end-use technology (𝑅𝑆𝑌,𝑖) will
not equal to the sum of 𝐷𝑆𝑌,𝑖 and 𝐸𝑂𝐿𝑌,𝑖 since the market shares of the i-th end-use
technology may be different.
𝑅𝑆𝑌,𝑖 = 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 × 𝑅𝑆𝑌 (6-5)
𝑁𝑆𝑌 is the number of new stocks in the current year. The number of houses increases
every year and new stocks are needed for the new house. The 𝑁𝑆𝑌 for an end-use can
be calculated as
106
𝑁𝑆𝑌 = 𝑁𝑒𝑤𝐻𝑜𝑢𝑠𝑒 × 𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑖𝑜𝑛 × 𝑆𝑃𝐻 (6-6)
where, 𝑆𝑃𝐻 is the number of stocks per house. Note that the SPH is the average
number of stocks with respect to the number of houses having at least one of the end
use technologies. The number of new stocks for the i-th end-use technology can be
calculated as
𝑁𝑆𝑌,𝑖 = 𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑌,𝑖 × 𝑁𝑆𝑌 (6-7)
6.1.4. Example of REUSF model
The dish washer in REUSF model is employed to show the process to estimate the
energy consumption in the year 2010. The parameters used in the model are shown in
Table 6-1. There are two end-use technologies considered in this model, the basic and
energy star models. The performances and the logit model coefficients of the two
technologies are listed in Table 6-2.
Table 6-1: The parameters of dish washers in 2010.
Parameters Values
𝑇𝑁𝑆2009 433,064 DS & EOL 137,248
Number of new houses 10,071 Saturation 85.2%
SPH 1
Table 6-2: The inputs of two end-use technologies.
Inputs Basic Energy Star
TLCC ($) 1,166 1,167 CC ($) 908 929
Coefficient 𝛼 -0.6 -0.6
Coefficient 𝛽 0.7 0.5 Market shares, 2009 38% 62%
Unit Energy Consumption per year, kWh 995 212
The total number of stocks in 2009 (𝑇𝑁𝑆2009) is 433,064, and the numbers of stocks for
the two end-use technologies (𝑆𝑈2009,𝐵 & 𝑆𝑈2009,𝐸) are 164,564 and 268,500 respectively
according to the market shares in that year. Employing the logit function in Eq. (6-2), the
new shares of the two end-use technologies can be calculated as 47% and 53% and the
replacement shares as 48% and 52%, respectively. According to the market shares in
107
2009, the stocks to be replaced (𝐷𝑆𝑖 + 𝐸𝑂𝐿𝑖) in 2010 for the two end-use technologies
are 52,154 and 85,093 respectively. The total number of replaced stocks in 2010 is thus
137,248, which equals to the number of decay and end-of-life stocks. As Eq. (6-5), the
𝑅𝑆2010,𝐵 and 𝑅𝑆2010,𝐸 for the two end-use technologies are 65,430 and 71,818
respectively. For the new stocks, the total number of new stocks in 2010 is 8,580
calculated from Eq. (6-6) and the 𝑁𝑆2010,𝐵 and 𝑁𝑆2010,𝐸 are 4,033 and 4,547 respectively.
Thus, the number of stocks for the two end-use technologies can be calculated as
follows.
𝑆𝑈2010,𝐵 = 164564 + 4033 + 65430 − 52154 = 181873
𝑆𝑈2010,𝐸 = 268500 + 4547 + 71818 − 85093 = 259772 (6-8)
Therefore, the market shares of the two end-use technologies in 2010 are 41% and
59%, respectively. Finally, the annual energy consumption for dish washers can be
obtained as 236 GWh.
6.1.5. Logit model training
The coefficients, 𝛼 and 𝛽 in the logit model are estimated through the historical data.
Note that only the market shares of end-use technologies can be obtained through
survey rather than the new and replacement shares. Therefore, the stock turnover
engine is also needed to predict the market shares when training the coefficients. In
REUSF, 12-year market shares are obtained from market survey for all the end-use
technologies and the training goal is to find a set of coefficients to minimize the errors
between the predicted shares and the historical data in the 12 years. Optimization
algorithms are employed to train the logit model.
One of the shortcomings of the logit model is the training efficiency. Due to large
number of end-uses and large number of end-use technologies in each end-use, the
number of design variables is extremely large. In the REUSF model, there are 19 end-
uses / sub-end-uses and each end-use contains end-use technologies varied from two
to ten; the total number of variables is 178. Additionally, since four different regions and
four different housing types are considered in REUSF, the total number of coefficients is
178 × 4 × 4 = 2848. Since the different end-uses, different regions, and different housing
types are independent, the training problem can be divided into 19 × 4 × 4 = 304 sub-
problems. Thus, the number of design variables in each sub-problem will vary between
108
four and 20. Although the dimensionality of the optimization problem is reduced, the
large number of optimization problems makes the computational cost unacceptable.
Hence, a fast training model needs to be developed to replace the logit model.
Another issue of logit models is the accuracy. The logit model requires the market
shares following the logit distribution, which cannot be guaranteed for every end-use. If
the logit distribution assumption is violated, the accuracy of the prediction model will be
very low.
To improve the training efficiency and accuracy, the proposed casual-ANN is employed
to replace the logit model to predict the market shares. The causal-ANN is constructed
between TLCC/CC and market shares, which means the stock turnover engine will
become one component in the casual-ANN structure. The details of causal-ANN
construction are described in Section 6.2.
6.2. Applying causal-ANN in market share prediction
To construct the causal-ANN for the market shares prediction, the flow chart of the
market share prediction model is first listed in Figure 6-3. Note that the TLCC, CC, SU,
and MS are all vectors in the figure, and the number of elements in each vector is equal
to the number of end-use technologies. Thus, if the number of end-use technologies is
𝐸𝑇, the number of inputs of the market share prediction model will be 3 × 𝐸𝑇 + 1 and the
number of outputs is 𝐸𝑇. According to the flow chart of the prediction model, the high-
level causal relations can be constructed as Figure 6-4.
TLCC
CC
Logit
model
NewShares
ReplaShares
SUY-1
TNS
Stock
turnover
engine
MS
Figure 6-3: Flow chart of the market share prediction.
109
TLCC
CC
NewShares
ReplaShares
SUY-1
TNS
MS
Figure 6-4: High-level causal relations of the market share prediction model.
As shown in Figure 6-4, the entire high-level causal graph can be divided into three sub-
networks, 𝑇𝐿𝐶𝐶 to 𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑠 , 𝐶𝐶 to 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑠 , and [𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑠 , 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑠 ,
𝑆𝑈𝑌−1, 𝑇𝑁𝑆] to 𝑀𝑆. In the market share prediction model, the first two sub-networks are
black-box models which are predicted through logit models, while the last sub-networks
can be performed through the given stock turnover engine which is a cheap model only
containing mathematical functions. On the other hand, the values of the intermediate
variables (𝑁𝑒𝑤𝑆ℎ𝑎𝑟𝑒𝑠 & 𝑅𝑒𝑝𝑙𝑎𝑆ℎ𝑎𝑟𝑒𝑠) are unknown. Therefore, the first category causal-
ANN can be constructed as shown in Figure 6-5. The new and replacement shares are
estimated through a neural network from TLCC and CC. Then, the outputs of the
network with other inputs including SUs and TNS are used in the stock turnover engine
to calculate the market shares.
TLCC
CC
NewShares
ReplaShares
SUY-1
TNS
Stock
turnover
engine
MS
Figure 6-5: Structure of causal-ANN to predict market shares
Training the causal-ANN is different from the traditional ANN because there is no
training data for the outputs of the ANN, i.e., new and replacement shares in this case.
110
But the actual data for the final outputs, MS, are provided. The optimization method is
employed to search the weight of the ANN to minimize the error between the estimated
MS and the historical data (HS). The root mean square error (RMSE) criterion is used to
estimate the error. Thus, the training model of the causal-ANN can be presented as Eq.
(6-9), where 𝒘 is the weights in the ANN. In this chapter, Genetic Algorithm (GA) is
employed to optimize the weights in the network to minimize the RMSE value.
find 𝒘
min𝑅𝑀𝑆𝐸 =1
𝐸𝑇∑ √∑ (𝐻𝑆𝑌,𝑖 −𝑀𝑆𝑌,𝑖)
22013𝑌=2002
12
𝐸𝑇
𝑖=1
(6-9)
6.3. Results and discussion
In this section, the causal-ANN is used to predict the market shares in the REUSF
model. First, one end-use, dish washer is selected as an example to illustrate the
accuracy and efficiency of the causal-ANN. Then, the proposed method is employed to
estimate the market shares for all end-uses.
6.3.1. Case study: dish washer
There are two end-use technologies in dish washer, one is named as basic and the
other is energy star with higher energy efficiency. Therefore, there are seven inputs for
the causal-ANN, specifically four inputs to predict new and replacement shares and
three used in the stock turnover engine. The proposed method is compared with logit
model in both RMSE value and computational time. Additionally, a feedforward ANN with
one hidden layer and four hidden nodes are employed to predict the market shares from
the seven inputs by treating the prediction model as a black-box. The Matlab ANN
toolbox is used to train the original ANN. Note that all the tests are run in a computer
with Core i7 @ 3.40GHz and 16 GB memory. The tests on all end-uses are also run in
the same computational environment. Table 6-3 lists the comparison results. Figure 6-6
gives the market shares curves of dish washers among the historical data and three
predicted shares.
Comparing causal-ANN with the logit model in Table 6-3, it can be found that the RMSE
value of the proposed method is 0.0195, which is much smaller than the logit model.
111
Thus, the accuracy of the causal-ANN is higher than the logit model. As shown in Figure
6-6, the market shares obtained from causal-ANN is almost the same as the historical
data, while there is an obvious gap between the predicted shares from logit model and
historical data. This is because the logit model cannot catch the trend of new and
replacement shares accurately. Logit model requires that the share follows logit
distribution, which may be violated in some end-uses. However, ANN can be used to fit
any kind of non-linear problems. Thus, the accuracy of ANN can be higher than the logit
model when the model does not follow the logit distribution. Additionally, training the
causal-ANN spends one second which is much smaller than training the logit model. The
most difficult part of training the logit model is that when the exponential number is
negative, the output value will change in a very small region. Usually, in the prediction
model, the exponential number in the logit model is negative, which cause the
optimization difficulty in converging to the optimum. Therefore, the computational cost of
the logit model training is usually very large. But in the causal-ANN, the optimization of
weights is much easier. Thus, the causal-ANN can be more efficient than the logit
model. Besides, although training the traditional ANN is the fastest, the RMSE values
and the shares curve show that the ANN fails to model accurately. This is because of the
lack of training data. In the prediction model, there are only 12 years of data for training,
which is so scarce that the ANN cannot be well-trained in this case. For the causal-ANN,
the neural network is only constructed in one part of the model and the other part
employs the cheap model to reduce the nonlinearity of the neural network. Additionally,
there are four inputs in the neural network of the causal-ANN compared with seven
inputs in the original ANN. The reduced dimensionality helps in reducing the difficulty of
training the network.
Table 6-3: Comparison in RMSE and time among three approximation models.
Causal-ANN Logit model ANN
RMSE 0.0195 0.1154 0.2357 Time (s) 2 27 0.5
112
(a) Basic (b) Energy star
Figure 6-6: Market shares comparison for dish washers.
6.3.2. Full model prediction
The causal-ANN is employed to predict the market shares of all the 19 end-uses in four
different housing types and four different regions, which means that there are 16
combinations of regions and housing types to be involved. For different combinations,
parameters such as the total number of stocks and saturation are different. Due to the
decomposable of the training process, the full prediction model can be divided into 304
sub-models which can be trained separately. However, the number of coefficients is still
large for some end-uses, which leads to high computational cost in the training of
coefficients using the optimization method. Moreover, the total number of training is so
large that it takes long time when optimizing those problems sequentially. To improve
efficiency, the parallel computing toolbox in Matlab is employed in training the market
shares prediction model.
In the full prediction model, both the causal-ANN and logit models are trained
sequentially as well as in parallel, for the entire Stock and Flow model. The results are
shown in Table 6-4. The RMSE values shown in the table are the average values of the
19 end-uses in 16 different combinations of housing types and regions. Since the ANN
fails in prediction of market shares due to the data scarcity for specific end use tests, it
will not be used for the entire Stock and Flow model. As shown in Table 6-4, the average
RMSE for all prediction models obtained from the causal-ANN is 0.023, which is much
lower than the value obtained from the original logit model. Therefore, the accuracy of
the market share prediction can be improved significantly. Moreover, the computational
113
cost of the proposed method is much smaller than the logit model, where training causal-
ANN spends half an hour sequentially and only six minutes in parallel, compared with
nine hours in sequential training and two and half hours in parallel training of the logit
model. The number of coefficients in the logit model has large impact on the
computational costs in model training. Thus, training the end-use with a large number of
end-use technologies will take much longer time than training dish washers. However,
the training costs for the causal-ANN increase little with the increase of the number of
end-use technologies. One of the applications of the REUSF model is to test the
influence of different policies on the residential energy consumption. For each test, the
market shares prediction model needs to be retrained since the input data may be
changed. Therefore, a higher training efficiency of employing causal-ANN can enable
more policy testing iterations to find better choices.
Finally, the market shares prediction model with the causal-ANN is employed in the
REUSF model to predict the energy consumption from 2014 to 2034 and the results are
shown in Figure 6-7. Two different cases are considered in the test, including static and
natural. In the static case, the new and replacement shares will remain static in the
forecasting period, while the new and replacement shares will vary following the causal-
ANN prediction in the natural case. The lower energy consumption in the natural case
means that more and more low-efficiency end-use technologies will be replaced by high
efficiency ones if considering the varying new and replacement shares.
Table 6-4 Approximation results of causal-ANN and logit model.
Causal-ANN Logit model
Average RMSE 0.023 0.128 Time (Sequential) 25 mins 9 hrs.
Time (Parallel) 6 mins 2.5 hrs.
114
Figure 6-7 Energy consumption prediction using causal-ANN.
6.4. Summary
Training a logit model to predict market shares in REUSF model is time-consuming and
the trained logit model is not sufficiently accurate. To improve the efficiency and
accuracy of the prediction model, a causal-ANN is proposed and applied in the REUSF
model. The causal relations are employed to construct the structure of the network. In
REUSF model, the cheap model, stock turnover engine, can be employed to reduce the
training difficulty, while the values of the intermediate variables, new and replacement
shares are unknown. To train the causal-ANN, the weights in the network are optimized
to minimize the RMSE value between the predicted shares and historical data. The
predicted results of causal-ANN are compared with the original logit model and the
results show that the casual-ANN can improve the accuracy and efficiency significantly.
Especially, the training time can be reduced to six minutes in parallel compared with 2.5
hours using the logit model.
115
Chapter 7. Conclusions and future work
7.1. Conclusions
The knowledge-assisted metamodeling and optimization methodology are discussed
and developed in this thesis. First, according to concepts of knowledge and existing
applications of knowledge in optimizations, different potential applications of knowledge
assisted optimizations are proposed. Next, two types of knowledge are employed to
assist metamodeling and optimizations. A PMO method is developed by employing
sensitivity information to improve the efficiency to deal with large-scale optimization
problems. Causal relations are employed to reduce the dimensionality of optimization
problems by determining variables without contradiction. Moreover, causal-ANN is
developed by combining the causal relations and ANN structures together to improve the
accuracy and efficiency of the metamodel. Combining with the Bayesian theory, the
attractive design spaces can be identified efficiently through causal-ANN. Finally, causal-
ANNs are applied in an energy forecasting model and the accuracy and efficiency of the
prediction significantly.
Specific knowledge, such as algorithmic and symbolic knowledge has been employed in
optimization in improving effectiveness and efficiency of optimization. However, there is
no systematic way to employ different kinds of knowledge together to deal with one
problem. Through analysis of potential applications of knowledge in optimization, it is
found that different knowledge can be applied to different stages of optimization from
problem formulation to optimization strategies. Equations and graphic knowledge such
as casual graph and Bayesian networks tend to be attractive in assisting large-scale
optimizations. In this thesis, two categories of knowledge, sensitivity information and
casual relations are employed in optimization to assist in dimension reduction,
metamodeling, and optimization process.
To decrease the dimensionality of the optimization problem, the sensitivity information is
employed to develop the Partial Metamodel-based Optimization (PMO) algorithm to
obtain the best optimal solution with scarce samples. Instead of constructing a complete
116
RBF-HDMR model, a series of partial RBF-HDMR models are constructed based on the
fundamental belief that optimization can be performed on an imperfect or incomplete
metamodel. The sensitivity information is used to quantify the importance of each
variable and the more important variables are more probable to be modeled in the partial
RBF-HDMR. The roulette wheel selection operator is used to select the constructed
variable to balance between exploration and exploitation. To pay more attention to the
accuracy around the interesting area, the cut center moves to the current optimal point
at every iteration and a new partial RBF-HDMR is built on the new cut center. To reduce
the number of real function evaluations, most of the points in constructing the new partial
model can be predicted by the RBF-HDMR in the last iteration. Compared with
optimization on a complete RBF-HDMR, PMO obtains better optimum solutions with
fewer number of function evaluations. Moreover, the trust region based PMO (TR-PMO)
is developed to further improve the performance by focusing on the most attractive
design area. The test results show that TR-PMO performs comparably or better than
TRMPS and OMID using scarce samples.
Next, causal relations are used to reduce the dimensionality of engineering design
problem. The main idea of the dimension reduction method is to find variables without
contradiction, which can be defined as the variables having a monotonic effect on the
objectives. To distinguish the variables without contradiction, the causal graph is applied
in design problems to show the route from design variables to the objective. A DSM-
based qualitative analysis method is developed to automatically find out the variables
without contradiction. By transferring the causal graph to a DSM and multiplying the
DSM by itself multiple times, the impact of design variables on the objective is analyzed
to find the contradictions. Taguchi method is used to calculate the weights of each link to
simplify the causal graph. According to the simplified causal graph, the design variables
can be divided into two parts, the important variables and the less important variables.
The two-stage optimization process is used to optimize the two parts of variables
sequentially. The proposed method is used to solve the power converter design and
aircraft concept design problems and the number of function evaluations after dimension
reduction is reduced significantly. On the other hands, when the number of function
evaluations is fixed, using the two-stage optimization process can obtain better
solutions.
117
To capture more information from the design problem rather than blindly constructing
metamodel, the causal relations are combined with neural networks to construct the
causal-ANN. The high-level causal graph is used to guide the structure of the neural
networks. Considering other kinds of knowledge involved in the causal-ANN, the causal-
ANN can be classified into three categories, involving cheap models, involving values of
intermediate variables, and involving both. Using cheap models to replace sub-networks
in the causal-ANN can improve the approximation accuracy. If the values of the
intermediate variables can be obtained from simulation, each sub-network can be
trained separately. By dividing the complex networks to multiple sub-networks can
reduce the complexity of each sub-model and improve the prediction accuracy.
Compared to constructing a metamodel between design variables and objective directly,
the causal-ANN is more accurate for the two test problems. Apart from giving accurate
prediction, the causal-ANN model is used to detect the attractive design space. Using
the prediction values from the causal-ANN, the likelihood values of the design variables
with respect to the objective are estimated and the attractive design space is detected by
comparing the likelihood values of different intervals. The test results show that the
interval where the optimal point lies in can be determined by estimating the likelihood
from causal-ANN models.
Finally, the causal-ANN model is applied in a residential energy consumption forecasting
model, a project sponsored by a local power company. The logit model in the REUSF
model is replaced by the causal-ANN model to predict the market shares of different
end-use technologies. Besides of the causal relations, the cheap model, i.e., stock
turnover engine is involved in the causal-ANN while the values of the intermediate
variables cannot be obtained. Compared to the original logit model, the accuracy of the
predicted market shares is improved by employing causal-ANN. Moreover, the training
time of the causal-ANN is reduced significantly from several hours to few minutes.
Therefore, applying knowledge in metamodeling in the energy consumption forecasting
can improve the accuracy and efficiency.
7.2. Future Research
This thesis offers a novel way to break the “curse-of-dimensionality” by involving
knowledge in the metamodeling and optimization. Besides the research in this thesis,
some future works can be proposed as follows.
118
7.2.1. Knowledge validation, correction, and updating
Correct knowledge can assist metamodeling and optimization, but wrong information
could lead to erroneous result. Therefore, the knowledge needs to be validated before
using in the optimization, especially the knowledge obtained through experiences or
experiments. The limitation of the current knowledge obtaining method is that the
knowledge is obtained from one source. To validate the knowledge, different sources
can be used. For instance, the experiment data can be used to validate the knowledge
obtained from experiences while the experience can be used to judge the correctness of
data. When errors are observed in the current knowledge base, these errors should be
corrected through certain methodologies. Also, different sources of knowledge can be
performed together to correct the errors. Another work is the knowledge updating. With
the optimization process, new knowledge can be obtained as more samples are
generated. Then, how to update the current knowledge base and how to apply the newly
obtained knowledge in optimization are research questions.
7.2.2. Employing different kinds of knowledge
More kinds of knowledge involved in the optimization process will further improve the
efficiency and accuracy of the optimization. One of the aspects of this challenge is how
to combine linguistic knowledge with data in the optimization. The linguistic knowledge
can be applied in problem formulation stage to formulate a more reasonable problem.
However, the usage of linguistic knowledge in the optimization process may be limited.
Within the process of optimization, more information can be obtained and the majority of
it is related to data. The task is how to systematically utilize different types of knowledge
at different stages of optimization. During optimization there are different kinds of
knowledge can be obtained and employing only a single kind of knowledge has
limitations in dealing with complex problems. Therefore, a systematic methodology of
organizing different kinds of knowledge is needed to concertedly assist optimization.
7.2.3. Knowledge-assisted optimization strategies
Besides the sensitivity information and causal graph assisted optimization strategies
developed in this thesis, other knowledge-assisted optimization strategies can be
proposed. The causal-ANN is applied in detecting the attractive design area combining
119
with Bayesian theory. This information can be used to generate new samples in the most
interesting design spaces. Combined with the input-output relations represented by
causal graphs, the components selection in partial metamodels can be more effective.
120
References
[1] S. Shan and G. G. Wang, “Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions,” Struct. Multidiscip. Optim., vol. 41, no. 2, pp. 219–241, 2010.
[2] J. H. Holland, Adaptation in natural and artificial systems : an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, 1992.
[3] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by Simulated Annealing,” Science (80-. )., vol. 220, no. 4598, pp. 671–680, 1983.
[4] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE international conference on neural networks, 1995, vol. 4, pp. 1942–1948.
[5] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient Global Optimization of Expensive Black-Box Functions,” J. Glob. Optim., vol. 13, no. 4, pp. 455–492, 1998.
[6] L. Wang, S. Shan, and G. G. Wang, “Mode-pursuing sampling method for global optimization on expensive black-box functions,” Eng. Optim., vol. 36, no. 4, pp. 419–438, 2004.
[7] G. H. Cheng, A. Younis, K. Haji Hajikolaei, and G. Gary Wang, “Trust Region Based Mode Pursuing Sampling Method for Global Optimization of High Dimensional Design Problems,” J. Mech. Des., vol. 137, no. 2, p. 21407, Feb. 2015.
[8] K. Haji Hajikolaei, G. H. Cheng, and G. G. Wang, “Optimization on Metamodeling-Supported Iterative Decomposition,” J. Mech. Des., vol. 138, no. 2, p. 21401, Dec. 2015.
[9] R. G. Regis and C. A. Shoemaker, “Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization,” Eng. Optim., vol. 45, no. 5, pp. 529–555, May 2013.
[10] D. Wu, K. H. Hajikolaei, and G. G. Wang, “Employing partial metamodels for optimization with scarce samples,” Struct. Multidiscip. Optim., pp. 1–15, Sep. 2017.
[11] R. Bellman, Dynamic programming. Univ. Pr, 1972.
[12] P. A. Boghossian, Fear of knowledge : against relativism and constructivism. Clarendon Press, 2006.
[13] M. Beynon, D. Cosker, and D. Marshall, “An expert system for multi-criteria decision making using Dempster Shafer theory,” Expert Syst. Appl., vol. 20, no. 4,
121
pp. 357–367, May 2001.
[14] M. B. Islam and G. Governatori, “RuleRS: a rule-based architecture for decision support systems,” Artif. Intell. Law, vol. 26, no. 4, pp. 315–344, Dec. 2018.
[15] P. Kim and Y. Ding, “Optimal Engineering System Design Guided by Data-Mining Methods,” Technometrics, vol. 47, no. 3, pp. 336–348, Aug. 2005.
[16] A. Cutbill and G. G. Wang, “Mining constraint relationships and redundancies with association analysis for optimization problem formulation,” Eng. Optim., vol. 48, no. 1, pp. 115–134, 2016.
[17] P. B. Backlund, D. W. Shahan, and C. C. Seepersad, “Classifier-guided sampling for discrete variable, discontinuous design space exploration: Convergence and computational performance,” Eng. Optim., vol. 47, no. 5, pp. 579–600, May 2015.
[18] P. B. Backlund, C. C. Seepersad, and T. M. Kiehne, “All-Electric Ship Energy System Design Using Classifier-Guided Sampling,” IEEE Trans. Transp. Electrif., vol. 1, no. 1, pp. 77–85, Jun. 2015.
[19] C. Sharpe, C. Morris, B. Goldsberry, C. C. Seepersad, and M. R. Haberman, “Bayesian Network Structure Optimization for Improved Design Space Mapping for Design Exploration With Materials Design Applications,” in Proceeding of ASME 2017 International Design Engineering Technical Conferences August 6-9, 2017, p. V02BT03A004.
[20] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach. New Jersey: Prentice Hall, 2003.
[21] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, vol. 60, no. 4. Prentice Hall, 2003.
[22] D. L. Poole, A. Mackworth, and R. G. Goebel, “Computational Intelligence: A Logical Approach,” Comput. Intell. A Log. Approach, vol. 2, no. 2, pp. 146–149, 1998.
[23] G. W. Ernst and A. Newell, GPS: A case study in generality and problem solving. Academic Pr, 1969.
[24] B. Chandrasekaran, “Generic tasks in knowledge-based reasoning: High-level building blocks for expert system design,” IEEE Expert, vol. 1, no. 3, pp. 23–30, 1986.
[25] B. G. Buchanan, D. Barstow, R. Bechtal, J. Bennett, W. Clancey, C. Kulikowski, T. Mitchell, and D. A. Waterman, “Constructing an expert system,” Build. Expert Syst., vol. 50, pp. 127–167, 1983.
[26] S. Liao, “Expert system methodologies and applications—a decade review from
122
1995 to 2004,” Expert Syst. Appl., vol. 28, no. 1, pp. 93–103, Jan. 2005.
[27] F. Hayes-Roth, D. Waterman, and D. Lenat, Building expert systems. Boston, MA: Addison-Wesley Longman Publishing Co., Inc, 1984.
[28] F. C. Bartlett and C. Burt, “Remembering: A study in experimental and social psychology,” Br. J. Educ. Psychol., vol. 3, no. 2, pp. 187–192, 1933.
[29] J. A. Bernard, “Use of a rule-based system for process control,” IEEE Control Syst. Mag., vol. 8, no. 5, pp. 3–13, Oct. 1988.
[30] K. J. Åström, J. J. Anton, and K.-E. Årzén, “Expert control,” Automatica, vol. 22, no. 3, pp. 277–286, May 1986.
[31] G. DeSanctis and R. B. Gallupe, “A Foundation for the Study of Group Decision Support Systems,” Manage. Sci., vol. 33, no. 5, pp. 589–609, May 1987.
[32] Z. Pawlak, “Rough set approach to knowledge-based decision support,” Eur. J. Oper. Res., vol. 99, no. 1, pp. 48–57, May 1997.
[33] M. H. Richer and M. H., AI tools and techniques. Ablex Pub, 1989.
[34] Y. Li, D. McLean, Z. A. Bandar, J. D. O’Shea, and K. Crockett, “Sentence similarity based on semantic nets and corpus statistics,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 8, pp. 1138–1150, Aug. 2006.
[35] R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and application of a metric on semantic nets,” IEEE Trans. Syst. Man. Cybern., vol. 19, no. 1, pp. 17–30, 1989.
[36] S. Mankovskii, M. Gogolla, S. D. Urban, S. W. Dietrich, S. D. Urban, S. W. Dietrich, M.-H. Yang, G. Dobbie, T. W. Ling, T. Halpin, B. Kemme, N. Schweikardt, A. Abelló, O. Romero, R. Jimenez-Peris, R. Stevens, P. Lord, T. Gruber, P. De Leenheer, A. Gal, S. Bechhofer, N. W. Paton, C. Li, A. Buchmann, N. Hardavellas, I. Pandis, B. Liu, M. Shapiro, L. Bellatreche, P. M. D. Gray, W. M. P. Aalst, N. Palmer, N. Palmer, T. Risch, W. Galuba, S. Girdzijauskas, and S. Bechhofer, “OWL: Web Ontology Language,” in Encyclopedia of Database Systems, Boston, MA: Springer US, 2009, pp. 2008–2009.
[37] N. Guarino, “Formal Ontology and Information Systems,” in Proceedings of FOIS’98, 1998, pp. 3–15.
[38] T. R. Gruber, “A Translation Approach to Portable Ontology Specifications,” Appear. Knowl. Acquis., vol. 5, no. 2, pp. 199–220, 1993.
[39] A. Singhal, “Modern information retrieval: A brief overview,” IEEE Data Eng. Bull., vol. 24, no. 4, pp. 35–43, 2001.
123
[40] H. Hong, Y. Yin, and X. Chen, “Ontological modelling of knowledge management for human–machine integrated design of ultra-precision grinding machine,” Enterp. Inf. Syst., vol. 10, no. 9, pp. 970–981, 2016.
[41] P. Sainter, K. Oldham, A. Larkin, A. Murton, and R. Brimble, “Product knowledge management within knowledge-based engineering systems,” in Design Engineering Technical Conference, Baltimore, Setembro, 2000.
[42] S. Sunnersjö, “A taxonomy of engineering knowledge for design automation,” in Proceedings of TMCE 2010 symposium, April 12-16, 2010.
[43] S. K. Chandrasegaran, K. Ramani, R. D. Sriram, I. Horváth, A. Bernard, R. F. Harik, and W. Gao, “The evolution, challenges, and future of knowledge representation in product design systems,” Comput. Des., vol. 45, no. 2, pp. 204–228, 2013.
[44] R. Owen and I. Horváth, “Towards product-related knowledge asset warehousing in enterprises,” in Proceedings of the 4th international symposium on tools and methods of competitive engineering, TMCE, April 22-26, Wuhan, China, 2002, vol. 2002, pp. 155–170.
[45] I. Nonaka, The knowledge-creating company. Harvard Business Press, 2008.
[46] J. F. Sowa, Knowledge representation: logical, philosophical, and computational foundations, vol. 13. MIT Press, 2000.
[47] S. Gorti, A. Gupta, G. Kim, R. Sriram, and A. Wong, “An object-oriented representation for product and design processes,” Comput. Des., vol. 30, no. 7, pp. 489–501, Jun. 1998.
[48] Y. Rezgui, S. Boddy, M. Wetherill, and G. Cooper, “Past, present and future of information and knowledge sharing in the construction industry: Towards semantic service-based e-construction?,” Comput. Des., vol. 43, no. 5, pp. 502–515, 2011.
[49] Z. Li, V. Raskin, and K. Ramani, “Developing Engineering Ontology for Information Retrieval,” J. Comput. Inf. Sci. Eng., vol. 8, no. 1, p. 11003, Mar. 2008.
[50] M. N. Huhns and M. P. Singh, “Ontologies for agents,” IEEE Internet Comput., vol. 1, no. 6, pp. 81–83, 1997.
[51] G. La Rocca, “Knowledge based engineering: Between AI and CAD. Review of a language based technology to support engineering design,” Adv. Eng. Informatics, vol. 26, no. 2, pp. 159–179, Apr. 2012.
[52] G. La Rocca, “Knowledge based engineering techniques to support aircraft design and optimization,” TU Delft, 2011.
124
[53] P. . Lovett, A. Ingram, and C. . Bancroft, “Knowledge-based engineering for SMEs — a methodology,” J. Mater. Process. Technol., vol. 107, no. 1–3, pp. 384–389, Nov. 2000.
[54] G. La Rocca and M. J. L. Van Tooren, “Enabling distributed multi-disciplinary design of complex products: a knowledge based engineering approach,” J. Des. Res., vol. 5, no. 3, p. 333, 2007.
[55] A. H. Van Der Laan and M. J. L. Van Tooren, “Parametric Modeling of Movables for Structural Analysis,” J. Aircr., vol. 42, no. 6, pp. 1605–1613, Nov. 2005.
[56] R. Van Dijk, R. d’Ippolito, G. Tosi, and G. La Rocca, “Multidisciplinary design and optimization of a plastic injection mold using an integrated design and engineering environment,” in NAFEMS World Congress, Boston, 2011.
[57] D. Wu, E. Coatanea, and G. G. Wang, “Employing Knowledge on Causal Relationship to Assist Multidisciplinary Design Optimization,” J. Mech. Des., vol. 141, no. 4, p. 41402, Jan. 2019.
[58] J. R. R. a Martins and A. B. Lambe, “Multidisciplinary Design Optimization: A Survey of Architectures,” AIAA J., vol. 51, no. 9, pp. 2049–2075, 2013.
[59] R. S. Krishnamachari and P. Y. Papalambros, “Optimal Hierarchical Decomposition Synthesis Using Integer Programming,” J. Mech. Des., vol. 119, no. 4, pp. 440–447, Dec. 1997.
[60] N. F. Michelena and P. Y. Papalambros, “A Network Reliability Approach to Optimal Decomposition of Design Problems,” J. Mech. Des., vol. 117, no. 3, pp. 433–440, 1995.
[61] N. F. Michelena and P. Y. Papalambros, “A Hypergraph Framework for Optimal Model-Based Decomposition of Design Problems,” Comput. Optim. Appl., vol. 8, no. 2, pp. 173–196, 1997.
[62] T. C. Wagner and P. Y. Papalambros, “General framework for decomposition analysis in optimal design.,” ASME Des Eng Div Publ De., ASME, New York, NY(USA), vol. 65, pp. 315–325, 1993.
[63] L. Chen, Z. Ding, and S. Li, “A formal two-phase method for decomposition of complex design problems,” J. Mech. Des., vol. 127, no. 2, pp. 184–195, 2005.
[64] J. Sobieszczanski-Sobieski, “Optimization by decomposition: A step from hierarchic to non-hierarchic systems,” NASA Tech. Rep., pp. 51–78, 1988.
[65] D. R. Braun, “Collaborative optimization: an architecture for large-scale distributed design.” Department of Aeronautics and Astronautics, Stanford University, Standford, CA, 1996.
125
[66] J. Sobieszczanski -Sobieski, J. S. Agte, and R. Sandusky, “Bi-Level Integrated System Synthesis,” AIAA J., vol. 38, no. 1, pp. 164–172, 2000.
[67] N. P. Tedford and J. R. R. A. Martins, “Benchmarking multidisciplinary design optimization algorithms,” Optim. Eng., vol. 11, no. 1, pp. 159–183, 2010.
[68] D. Morris, A. Antoniades, and C. C. Took, “On making sense of neural networks in road analysis,” in Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN), May 14-19, 2017, pp. 4416–4421.
[69] R. Jin, W. Chen, and T. W. Simpson, “Comparative studies of metamodelling techniques under multiple modelling criteria,” Struct. Multidiscip. Optim., vol. 23, no. 1, pp. 1–13, Dec. 2001.
[70] D. Beasley, D. R. Bull, and R. R. Martin, “An overview of genetic algorithms: Part 2, research topics,” Univ. Comput., vol. 15, no. 4, pp. 170–181, 1993.
[71] S. J. Louis and F. Zhao, “Incorporating problem specific information in genetic algorithms,” Children, vol. 1, no. P2, p. C2, 1994.
[72] Y. Hu and S. X. Yang, “A knowledge based genetic algorithm for path planning of a mobile robot,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, 2004, vol. 5, pp. 4350–4355.
[73] H. Piroozfard, K. Y. Wong, and A. Hassan, “A hybrid genetic algorithm with a knowledge-based operator for solving the job shop scheduling problems,” J. Optim., vol. 2016, pp. 1–13, 2016.
[74] E. H. Winer and C. L. Bloebaum, “Development of visual design steering as an aid in large-scale multidisciplinary design optimization . Part I : method development,” Struct. Multidiscip. Optim., vol. 23, no. 6, pp. 412–424, 2002.
[75] E. H. Winer and C. L. Bloebaum, “Development of visual design steering as an aid in large-scale multidisciplinary design optimization. Part II: method validation,” Struct. Multidiscip. Optim., vol. 23, no. 6, pp. 425–435, Jul. 2002.
[76] A. I. J. Forrester, A. Sóbester, and A. J. Keane, “Multi-fidelity optimization via surrogate modelling,” Proc. R. Soc. A Math. Phys. Eng. Sci., vol. 463, no. 2088, pp. 3251–3269, Dec. 2007.
[77] Fang Wang and Qi-Jun Zhang, “Knowledge-based neural models for microwave design,” IEEE Trans. Microw. Theory Tech., vol. 45, no. 12, pp. 2333–2343, 1997.
[78] Z. Yang, D. Eddy, S. Krishnamurty, I. Grosse, P. Denno, Y. Lu, and P. Witherell, “Investigating Grey-Box Modeling for Predictive Analytics in Smart Manufacturing,” in Proceedings of ASME 2017 International Design Engineering Technical Conferences , August 6-9, 2017, p. V02BT03A024.
126
[79] M. Kurek, M. P. Deisenroth, W. Luk, and T. Todman, “Knowledge Transfer in Automatic Optimisation of Reconfigurable Designs,” in Proceedings of 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 1-3, 2016, pp. 84–87.
[80] M. Kurek, T. Becker, T. C. P. Chau, and W. Luk, “Automating Optimization of Reconfigurable Designs,” in Proceedings of 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, May 11-13, 2014, pp. 210–213.
[81] C. Ding, X. He, H. Zha, and H. D. Simon, “Adaptive dimension reduction for clustering high dimensional data,” in Proceedings of 2002 IEEE International Conference on Data Mining, Dec 9-12, 2002, pp. 147–154.
[82] M. D. Morris and T. J. Mitchell, “Exploratory designs for computational experiments,” J. Stat. Plan. Inference, vol. 43, no. 3, pp. 381–402, 1995.
[83] M. H. Karwan, V. Lotfi, J. Telgen, and S. Zionts, Redundancy in mathematical programming: A state-of-the-art survey, vol. 206. Springer Science & Business Media, 2012.
[84] Z.-L. Liu, Z. Zhang, and Y. Chen, “A scenario-based approach for requirements management in engineering design,” Concurr. Eng., vol. 20, no. 2, pp. 99–109, Jun. 2012.
[85] W. Chen and M. Fuge, “Beyond the Known: Detecting Novel Feasible Domains Over an Unbounded Design Space,” J. Mech. Des., vol. 139, no. 11, p. 111405, Oct. 2017.
[86] B. J. Larson and C. A. Mattson, “Design Space Exploration for Quantifying a System Model’s Feasible Domain,” J. Mech. Des., vol. 134, no. 4, p. 41010, Apr. 2012.
[87] T. H. Lee and J. J. Jung, “A sampling technique enhancing accuracy and efficiency of metamodel-based RBDO: Constraint boundary sampling,” Comput. Struct., vol. 86, no. 13–14, pp. 1463–1476, Jul. 2008.
[88] H. Z. Yang, J. F. Chen, N. Ma, and D. Y. Wang, “Implementation of knowledge-based engineering methodology in ship structural design,” Comput. Des., vol. 44, no. 3, pp. 196–202, Mar. 2012.
[89] P. Geyer, “Component-oriented decomposition for multidisciplinary design optimization in building design,” Adv. Eng. Informatics, vol. 23, no. 1, pp. 12–31, 2009.
[90] S. Ahmed, S. Kim, and K. M. Wallace, “A Methodology for Creating Ontologies for Engineering Design,” J. Comput. Inf. Sci. Eng., vol. 7, no. 2, pp. 132–140, Jun. 2007.
127
[91] J. Jinxin Lin, M. S. Fox, and T. Bilgic, “A Requirement Ontology for Engineering Design,” Concurr. Eng., vol. 4, no. 3, pp. 279–291, Sep. 1996.
[92] E. Stachtiari, A. Mavridou, P. Katsaros, S. Bliudze, and J. Sifakis, “Early validation of system requirements and design through correctness-by-construction,” J. Syst. Softw., vol. 145, pp. 52–78, Nov. 2018.
[93] D. Wu, E. Coatanea, and G. G. Wang, “Dimension Reduction and Decomposition Using Causal Graph and Qualitative Analysis for Aircraft Concept Design Optimization,” in Volume 2B: 43rd Design Automation Conference, 2017.
[94] A. Viswanath, A. I. J. Forrester, and A. J. Keane, “Dimension Reduction for Aerodynamic Design Optimization,” AIAA J., vol. 49, no. 6, pp. 1256–1266, 2011.
[95] K. Sutha and J. J. Tamilselvi, “A review of feature selection algorithms for data mining techniques,” Int. J. Comput. Sci. Eng., vol. 7, no. 6, p. 63, 2015.
[96] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014.
[97] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res., vol. 3, no. Mar, pp. 1157–1182, 2003.
[98] C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe, “A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 9, no. 4, pp. 1106–1119, Jul. 2012.
[99] J. Reunanen, “Overfitting in Making Comparisons Between Variable Selection Methods,” J. Mach. Learn. Res., vol. 3, no. Mar, pp. 1371–1382, 2003.
[100] A. Alexandridis, P. Patrinos, H. Sarimveis, and G. Tsekouras, “A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models,” Chemom. Intell. Lab. Syst., vol. 75, no. 2, pp. 149–162, Feb. 2005.
[101] S. Shan and G. G. Wang, “Turning Black-Box Functions Into White Functions,” J. Mech. Des., vol. 133, no. 3, p. 31003, 2011.
[102] K. Haji Hajikolaei, G. H. Cheng, and G. G. Wang, “Optimization on Metamodeling-Supported Iterative Decomposition,” J. Mech. Des., vol. 138, no. 2, p. 21401, Dec. 2015.
[103] C. M. Bishop, Pattern recognition and machine learning. Springer, 2006.
[104] A. Ghanbari, S. M. R. Kazemi, F. Mehmanpazir, and M. M. Nakhostin, “A Cooperative Ant Colony Optimization-Genetic Algorithm approach for construction of energy demand forecasting knowledge-based expert systems,” Knowledge-
128
Based Syst., vol. 39, pp. 194–206, Feb. 2013.
[105] M. H. Fazel Zarandi, B. Rezaee, I. B. Turksen, and E. Neshat, “A type-2 fuzzy rule-based expert system model for stock price analysis,” Expert Syst. Appl., vol. 36, no. 1, pp. 139–154, Jan. 2009.
[106] J. Zhang, Z. Ghahramani, and Y. Yang, “Flexible latent variable models for multi-task learning,” Mach. Learn., vol. 73, no. 3, pp. 221–242, Dec. 2008.
[107] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient Global Optimization of Expensive Black-Box Functions,” J. Glob. Optim., vol. 13, no. 4, pp. 455–492, 1998.
[108] G. G. Wang, “Adaptive Response Surface Method Using Inherited Latin Hypercube Design Points,” J. Mech. Des., vol. 125, no. 2, pp. 210–220, 2003.
[109] G. G. Wang, Z. Dong, and P. Attchison, “Adaptive Response Surface Method - A Global Optimization Scheme for Approximation-based Design Problems,” Eng. Optim., vol. 33, no. 6, pp. 707–733, Aug. 2001.
[110] T. Long, D. Wu, X. Guo, G. G. Wang, and L. Liu, “Efficient adaptive response surface method using intelligent space exploration strategy,” Struct. Multidiscip. Optim., vol. 51, no. 6, pp. 1335–1362, Jun. 2015.
[111] T. Wuest, D. Weimer, C. Irgens, and K.-D. Thoben, “Machine learning in manufacturing: advantages, challenges, and applications,” Prod. Manuf. Res., vol. 4, no. 1, pp. 23–45, Jan. 2016.
[112] G. Köksal, İ. Batmaz, and M. C. Testik, “A review of data mining applications for quality improvement in manufacturing industry,” Expert Syst. Appl., vol. 38, no. 10, pp. 13448–13467, Sep. 2011.
[113] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[114] R. Shi, L. Liu, T. Long, and J. Liu, “Sequential Radial Basis Function Using Support Vector Machine for Expensive Design Optimization,” AIAA J., vol. 55, no. 1, pp. 214–227, Jan. 2017.
[115] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, Oct. 1986.
[116] J. Wang, Y. Ma, L. Zhang, and R. X. Gao, “Deep learning for smart manufacturing: Methods and applications,” J. Manuf. Syst., vol. 48, pp. 144–156, Jul. 2018.
[117] O. Maimon and L. Rokach, Eds., Data Mining and Knowledge Discovery Handbook. Boston, MA: Springer US, 2010.
129
[118] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding, “Data mining with big data,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, Jan. 2014.
[119] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
[120] İ. B. Topçu and M. Sarıdemir, “Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logic,” Comput. Mater. Sci., vol. 41, no. 3, pp. 305–311, Jan. 2008.
[121] S. Tasdemir, I. Saritas, M. Ciniviz, and N. Allahverdi, “Artificial neural network and fuzzy expert system comparison for prediction of performance and emission parameters on a gasoline engine,” Expert Syst. Appl., vol. 38, no. 11, pp. 13912–13923, Oct. 2011.
[122] H. Rabitz, Ö. Alis, and Ö. F. Alış, “General foundations of high-dimensional model representations,” J. Math. Chem., vol. 25, no. 2–3, pp. 197–233, 1999.
[123] S. Shan and G. G. Wang, “Metamodeling for High Dimensional Simulation-Based Design Problems,” J. Mech. Des., vol. 132, no. May 2010, p. 51009, 2010.
[124] X. Cai, H. Qiu, L. Gao, P. Yang, and X. Shao, “An enhanced RBF-HDMR integrated with an adaptive sampling method for approximating high dimensional problems in engineering design,” Struct. Multidiscip. Optim., vol. 53, no. 6, pp. 1209–1229, 2016.
[125] Z. Huang and H. Qiu, “An adaptive SVR-HDMR model for approximating high dimensional problems,” Eng. Comput. Int. J. Comput. Eng. Softw., vol. 32, no. 3, pp. 643–667, 2015.
[126] H. Wang, L. Tang, and G. Y. Li, “Adaptive MLS-HDMR metamodeling techniques for high dimensional problems,” Expert Syst. Appl., vol. 38, no. 11, pp. 14117–14126, 2011.
[127] E. Ebrahimi, M. Monjezi, M. R. Khalesi, and D. J. Armaghani, “Prediction and optimization of back-break and rock fragmentation using an artificial neural network and a bee colony algorithm,” Bull. Eng. Geol. Environ., vol. 75, no. 1, pp. 27–36, Feb. 2016.
[128] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short-term load forecasting: a review and evaluation,” IEEE Trans. Power Syst., vol. 16, no. 1, pp. 44–55, 2001.
[129] D. J. Fonseca, D. O. Navaresse, and G. P. Moynihan, “Simulation metamodeling through artificial neural networks,” Eng. Appl. Artif. Intell., vol. 16, no. 3, pp. 177–183, Apr. 2003.
[130] G. Zhang, B. Eddy Patuwo, and M. Y. Hu, “Forecasting with artificial neural
130
networks:: The state of the art,” Int. J. Forecast., vol. 14, no. 1, pp. 35–62, Mar. 1998.
[131] B. Cheng and D. M. Titterington, “Neural networks: A review from a statistical perspective,” Stat. Sci., pp. 2–30, 1994.
[132] R. Lippmann, “An introduction to computing with neural nets,” IEEE ASSP Mag., vol. 4, no. 2, pp. 4–22, 1987.
[133] F. S. Wong, “Time series forecasting using backpropagation neural networks,” Neurocomputing, vol. 2, no. 4, pp. 147–159, Jul. 1991.
[134] S. Y. Kang, “An investigation of the use of feedforward neural networks for forecasting.,” Kent State University, 1992.
[135] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
[136] J. Liu, M. Gong, Q. Miao, X. Wang, and H. Li, “Structure Learning for Deep Neural Networks Based on Multiobjective Optimization,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 6, pp. 2450–2463, Jun. 2018.
[137] V. Maniezzo, “Genetic evolution of the topology and weight distribution of neural networks,” IEEE Trans. Neural Networks, vol. 5, no. 1, pp. 39–53, 1994.
[138] I. Ben-Gal, F. Ruggeri, F. Faltin, and R. Kenett, “Bayesian networks, encyclopedia of statistics in quality and reliability.” John Wiley and Sons, 2007.
[139] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 2014.
[140] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classifiers,” Mach. Learn., vol. 29, no. 2–3, pp. 131–163, 1997.
[141] P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction, and search. MIT press, 2000.
[142] E. Coatanea, R. Roca, H. Mokhtarian, F. Mokammel, and K. Ikkala, “A Conceptual Modeling and Simulation Framework for System Design,” Comput. Sci. Eng., vol. 18, no. 4, pp. 42–52, Jul. 2016.
[143] E. Adorio and U. Diliman, “MVF–multivariate test functions library in c for unconstrained global optimization,” pp. 1–56, 2005.
[144] K. Schittkowski, “More Test Examples for Nonlinear Programming Codes,” in Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, 1987.
131
[145] X. Duan, G. G. Wang, X. Kang, Q. Niu, G. Naterer, and Q. Peng, “Performance study of mode-pursuing sampling method,” Eng. Optim., vol. 41, no. 1, pp. 1–21, 2009.
[146] N. M. Alexandrov, J. E. Dennis, R. M. Lewis, and V. Torczon, “A trust-region framework for managing the use of approximation models in optimization,” Struct. Optim., vol. 15, no. 1, pp. 16–23, Feb. 1998.
[147] B. Kulfan and J. Bussoletti, “‘Fundamental’ Parameteric Geometry Representations for Aircraft Component Shapes,” 11th AIAA/ISSMO Multidiscip. Anal. Optim. Conf., vol. 1, pp. 547–591, 2006.
[148] “XFOIL.” [Online]. Available: http://web.mit.edu/drela/Public/web/xfoil/.
[149] J. Warfield, “Binary Matrices in System Modeling,” IEEE Trans. Syst. Man. Cybern., vol. SMC-3, no. 5, pp. 441–449, 1973.
[150] M. S. Phadke, “Quality Engineering Using Design of Experiment, Quality Control, Robust Design and Taguchi Method,” Wadsworth, Los Angeles, CA, 1998.
[151] J. . Ghani, I. . Choudhury, and H. . Hassan, “Application of Taguchi method in the optimization of end milling parameters,” J. Mater. Process. Technol., vol. 145, no. 1, pp. 84–92, 2004.
[152] “Test Suite Problem 2.5, POWER CONVERTER,” NASA MultiDisciplinary Optimization Branch, 2018. [Online]. Available: http://www.eng.buffalo.edu/Research/MODEL/mdo.test.orig/class2prob5/descr.html.
[153] D. Wang, G. G. Wang, and G. F. Naterer, “Extended collaboration pursuing method for solving larger multidisciplinary design optimization problems,” AIAA J., vol. 45, no. 6, p. 14, 2007.
[154] D. Wang, G. Wang, and G. Naterer, “Collaboration Pursuing Method for MDO Problems,” AIAA J., vol. 45, no. 5, pp. 1091–1103, 2007.
[155] D. Wu, E. Coatanea, and G. G. Wang, “Dimension Reduction and Decomposition Using Causal Graph and Qualitative Analysis for Aircraft Concept Design Optimization,” in Volume 2B: 43rd Design Automation Conference, 2017, p. V02BT03A035.
[156] L. G. Swan and V. I. Ugursal, “Modeling of end-use energy consumption in the residential sector: A review of modeling techniques,” Renew. Sustain. Energy Rev., vol. 13, no. 8, pp. 1819–1835, Oct. 2009.
[157] E. Hirst, W. Lin, and J. Cope, “Residential energy use model sensitive to demographic, economic, and technological factors,” Q. Rev. Econ. Bus., 1977.
132
[158] H. K. Ozturk, O. E. Canyurt, A. Hepbasli, and Z. Utlu, “Residential-commercial energy input estimation based on genetic algorithm (GA) approaches: an application of Turkey,” Energy Build., vol. 36, no. 2, pp. 175–183, Feb. 2004.
[159] Q. Zhang, “Residential energy consumption in China and its comparison with Japan, Canada, and USA,” Energy Build., vol. 36, no. 12, pp. 1217–1225, Dec. 2004.
[160] R. Haas and L. Schipper, “Residential energy demand in OECD-countries and the role of irreversible efficiency improvements,” Energy Econ., vol. 20, no. 4, pp. 421–442, Sep. 1998.
[161] M. Kavgic, A. Mavrogianni, D. Mumovic, A. Summerfield, Z. Stevanovic, and M. Djurovic-Petrovic, “A review of bottom-up building stock models for energy consumption in the residential sector,” Build. Environ., vol. 45, no. 7, pp. 1683–1697, Jul. 2010.
[162] R. Ghedamsi, N. Settou, A. Gouareh, A. Khamouli, N. Saifi, B. Recioui, and B. Dokkar, “Modeling and forecasting energy consumption for residential buildings in Algeria using bottom-up approach,” Energy Build., vol. 121, pp. 309–317, Jun. 2016.
[163] L. G. Swan, V. I. Ugursal, and I. Beausoleil-Morrison, “Occupant related household energy consumption in Canada: Estimation using a bottom-up neural-network technique,” Energy Build., vol. 43, no. 2–3, pp. 326–337, Feb. 2011.
[164] J. Yang, H. Rivard, and R. Zmeureanu, “Building energy prediction with adaptive artificial neural networks,” in Ninth International IBPSA Conference, Montréal, Canada, August, 2005, pp. 15–18.
[165] E. Hirst, R. Goeltz, and D. White, “Determination of household energy using ‘fingerprints’ from energy billing data,” Int. J. Energy Res., vol. 10, no. 4, pp. 393–405, Oct. 1986.
[166] Y. Ji and P. Xu, “A bottom-up and procedural calibration method for building energy simulation models based on hourly electricity submetering data,” Energy, vol. 93, pp. 2337–2350, Dec. 2015.
[167] H. Farahbakhsh, V. I. Ugursal, and A. S. Fung, “A residential end-use energy consumption model for Canada,” Int. J. Energy Res., vol. 22, no. 13, pp. 1133–1143, Oct. 1998.
[168] A. Capasso, W. Grattieri, R. Lamedica, and A. Prudenzi, “A bottom-up approach to residential load modeling,” IEEE Trans. Power Syst., vol. 9, no. 2, pp. 957–964, May 1994.
[169] R. Kadian, R. P. Dahiya, and H. P. Garg, “Energy-related emissions and mitigation opportunities from the household sector in Delhi,” Energy Policy, vol. 35, no. 12, pp. 6195–6211, Dec. 2007.
133
[170] J. T. Wilkerson, D. Cullenward, D. Davidian, and J. P. Weyant, “End use technology choice in the National Energy Modeling System (NEMS): An analysis of the residential and commercial building sectors,” Energy Econ., vol. 40, pp. 773–784, 2013.
134
Appendix A. Numerical Benchmark Functions
SUR-T1-14 function, 𝑛 = 10, 20, 30
𝑓(𝒙) = (𝑥1 − 1)2 + (𝑥𝑛 − 1)
2 + 𝑛∑(𝑛 − 𝑖)(𝑥𝑖2 − 𝑥𝑖+1)
2𝑛−1
𝑖=1
−3 ≤ 𝑥𝑖 ≤ 2, 𝑖 = 1,2,… , 𝑛
(A-1)
Rosenbrock function, 𝑛 = 10
𝑓(𝒙) = ∑(100(𝑥𝑖+1 − 𝑥𝑖2)2+ (𝑥𝑖 − 1)
2)
𝑛−1
𝑖=1
−5 ≤ 𝑥𝑖 ≤ 5, 𝑖 = 1,2,… , 𝑛
(A-2)
Trid function, 𝑛 = 10
𝑓(𝒙) =∑(𝑥𝑖 − 1)2
𝑛
𝑖=1
−∑𝑥𝑖𝑥𝑖−1
𝑛
𝑖=2
−𝑛2 ≤ 𝑥𝑖 ≤ 𝑛2, 𝑖 = 1,2,… , 𝑛
(A-3)
F16 function, 𝑛 = 16
𝑓(𝒙) =∑∑𝑎𝑖𝑗(𝑥𝑖2 + 𝑥𝑖 + 1)(𝑥𝑗
2 + 𝑥𝑗 + 1)
16
𝑗=1
16
𝑖=1
−1 ≤ 𝑥𝑖 ≤ 1, 𝑖 = 1,2, … , 𝑛
(A-4)
𝑎𝑖𝑗 =
[ 1 00 1
0 11 0
0 00 0
1 00 1
0 00 0
1 11 0
0 00 0
1 00 0
0 00 0
0 00 0
0 00 0
0 00 0
1 10 1
0 01 1
0 00 0
1 00 1
0 00 1
0 00 0
1 10 0
0 01 0
0 00 0
0 10 0
0 10 0
0 01 0
0 10 0
0 10 0
0 00 1
1 00 0
0 00 0
0 11 0
1 00 0
0 01 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
1 00 1
0 10 0
0 00 0
1 00 1
0 00 1
0 10 0
1 00 1
0 00 0
0 00 0
0 00 0
0 00 0
0 00 0
1 10 1
0 00 0
0 00 0
1 00 1]
135
Griewank function, 𝑛 = 10, 20, 30
𝑓(𝒙) =∑𝑥𝑖2
4000
𝑛
𝑖=1
−∏cos (𝑥𝑖
√𝑖)
𝑛
𝑖=1
+ 1
−300 ≤ 𝑥𝑖 ≤ 300, 𝑖 = 1,2, … , 𝑛 (A-5)
Ackley function, 𝑛 = 10, 20, 30
𝑓(𝒙) = 20 + 𝑒 − 20𝑒−15√1𝑛∑ 𝑥𝑖
2𝑛𝑖=1 − 𝑒
1𝑛∑ 𝑐𝑜𝑠(2𝜋𝑥𝑖)𝑛𝑖=1
−30 ≤ 𝑥𝑖 ≤ 30, 𝑖 = 1,2,… , 𝑛 (A-6)
Rastrigin function, 𝑛 = 20
𝑓(𝒙) = 10 × 20 +∑ (𝑥𝑖2 − 10 𝑐𝑜𝑠(2𝜋𝑥𝑖))
20
𝑖=1
−5.12 ≤ 𝑥𝑖 ≤ 5.12, 𝑖 = 1,2,… ,20 (A-7)
SUR-T1-16, 𝑛 = 20
𝑓(𝒙) =∑[(𝑥𝑖 + 10𝑥𝑖+5)2 + 5(𝑥𝑖+10 − 𝑥𝑖+15)
2 + (𝑥𝑖+5 − 2𝑥𝑖+10)2
5
𝑖=1
+ 10(𝑥𝑖 − 𝑥𝑖+15)4]
−2 ≤ 𝑥𝑖 ≤ 5, 𝑖 = 1,2, … , 𝑛
(A-8)
Powell function, 𝑛 = 20
𝑓(𝒙) =∑ (𝑥4𝑖−3 + 10𝑥4𝑖−2)2 + 5(𝑥4𝑖−1 − 𝑥4𝑖)
2 + (𝑥4𝑖−2 − 2𝑥4𝑖−1)4
𝑛 4⁄
𝑖=1
+ 10(𝑥4𝑖−3 − 𝑥4𝑖)2
−4 ≤ 𝑥𝑗 ≤ 5, 𝑗 = 1,2,…𝑛
(A-9)
Perm function, 𝑛 = 20
136
𝑓(𝒙) =∑ [∑ (𝑖𝑘 + 𝛽)((𝑥𝑖 𝑖⁄ )𝑘 − 1)
𝑛
𝑖=1]2𝑛
𝑘=1
−𝑛 ≤ 𝑥𝑖 ≤ 𝑛, 𝑖 = 1,2, … , 𝑛
𝛽 = 0.5
(A-10)
137
Appendix B. List of Publications during PhD Studies
Journals
D. Wu, K. H. Hajikolaei, and G. G. Wang, “Employing partial metamodels for
optimization with scarce samples,” Struct. Multidiscip. Optim., pp. 1–15, Sep. 2017.
D. Wu, E. Coatanea, and G. G. Wang, “Employing Knowledge on Causal Relationship to
Assist Multidisciplinary Design Optimization,” J. Mech. Des., vol. 141, no. 4, p. 41402,
Jan. 2019.
E. T. Woldemariam, E. Coatanéa, G. G. Wang, H. G. Lemu, and D. Wu, “Customized
dimensional analysis conceptual modelling framework for design optimization—a case
study on the cross-flow micro turbine model,” Eng. Optim., vol. 51, no. 7, pp. 1168–1184,
Jul. 2019.
D. Wu and G. G. Wang, “Knowledge Assisted Optimization for Large-scale Design
Problems: A Review and Proposition,” Journal of Mechanical Design, Accepted with
revisions, 2019.
D. Wu and G. G. Wang, “Causal Artificial Neural Network and its Applications in
Engineering Design,” submitted to Expert Systems with Application, 2019.
D. Wu, G. G. Wang, and H. Jarollahi, “Developing Causal-Artificial Neural Network in
Residential Energy Consumption Forecasting,” submitted to Expert Systems with
Application, 2019.
138
Conferences
D. Wu, E. Coatanea, and G. G. Wang, “Dimension Reduction and Decomposition Using
Causal Graph and Qualitative Analysis for Aircraft Concept Design Optimization,” IDETC
2017-67601, Cleveland, Ohio, USA, August 6–9, 2017.
D. Wu and G. G. Wang, “Knowledge Assisted Optimization for Large-Scale Problems: A
Review and Proposition,” IDETC 2018-85325, Quebec City, Quebec, Canada, August
26–29, 2018.