Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling...
-
Upload
osama-salaheldin -
Category
Engineering
-
view
146 -
download
0
Transcript of Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling...
ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN
BEHAVIOR IMITATION USING STATISTICAL MODELING
TECHNIQUES
By
Osama Salah Eldin Farag
A Thesis submitted to the
Faculty of Engineering at Cairo University in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE
In
Computer Engineering
FACULTY OF ENGINEERING, CAIRO UNIVERSITY
GIZA, EGYPT
2015
ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN
BEHAVIOR IMITATION USING STATISTICAL MODELING
TECHNIQUES
By
Osama Salah Eldin Farag
A Thesis submitted to the
Faculty of Engineering at Cairo University in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE
In
Computer Engineering
Under the Supervision of
Prof. Dr. Magda Bahaa Eldin Fayek
Professor of
Computer Engineering
Faculty of Engineering, Cairo University
FACULTY OF ENGINEERING, CAIRO UNIVERSITY
GIZA, EGYPT
2015
ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN
BEHAVIOR IMITATION USING STATISTICAL MODELING
TECHNIQUES
By
Osama Salah Eldin Farag
A Thesis submitted to the
Faculty of Engineering at Cairo University in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE
In
Computer Engineering
Approved by the
Examining Committee
____________________________________________________________
Prof. Dr. Magda Bahaa Eldin Fayek, Thesis Main Advisor
Prof. Dr. Samia Abdulrazik Mashaly,
- Department of Computers and Systems,
Electronics Research Institute
Prof. Dr. Mohamed Moustafa Saleh
- Department of Operations Research &
Decision Support, Faculty of Computers
and Information, Cairo University
FACULTY OF ENGINEERING, CAIRO UNIVERSITY
GIZA, EGYPT
2015
Insert photo
here
Engineer’s Name: Osama Salah Eldin Farag
Date of Birth: 11 / 2 / 1987
Nationality: Egyptian
E-mail: [email protected]
Phone: +20/ 100 75 34 156
Address: Egypt – Zagazig City
Registration Date: 1 / 10 / 2010
Awarding Date: / / 2015
Degree: Master of Science
Department: Computer Engineering
Supervisors:
Prof. Magda B. Fayek
Examiners:
Prof. Samia Abdulrazik Mashaly
Prof. Mohamed Moustafa Saleh
Prof. Magda B. Fayek
Title of Thesis:
Enhancing intelligent agents by improving human behavior imitation using
statistical modeling techniques
Keywords:
Intelligent agent; Cognitive agent; Human imitation; Evolutionary computation;
Machine Learning
Summary:
This thesis introduces a novel non-neurological method for modeling human
behaviors. It integrates statistical modeling techniques with “the society of mind” theory
to build a system that imitates human behaviors. The introduced Human Imitating
Cognitive Modeling Agent (HICMA) can autonomously change its behavior according
to the situation it encounters.
viii
ix
Acknowledgements
“All the praises and thanks be to Allah, Who has guided us to this, and never could we
have found guidance, were it not that Allah had guided us”
Immeasurable appreciation and deepest gratitude for the help and support are
extended to the following persons who, in one way or another, contributed in making this
work possible.
A sincere gratitude I give to Prof. Magda B. Fayek for her support, valuable
advice, guidance, precious comments, suggestions, and patience that benefited me much
in completing this work. I, heartily, appreciate her effort to impart her experience and
knowledge to my work.
I would also like to acknowledge with much appreciation all participants of
Robocode experiments. Many thanks to Ali El-Seddeek and his fellows; the students of
Computers department at faculty of Engineering - Cairo University. Thanks a lot to
Ahmed Reda and his students at faculty of Engineering - Zagazig University. Also,
thanks a million to my friends and coworkers who kindly participated in these
experiments.
Deep thanks to Mahmoud Ali and Mohammed Hamdy for helping getting
material and information that supported this work.
Finally, I warmly thank my family who has been motivating me to keep moving
forward. My deepest appreciation to all those who helped me complete this work.
x
xi
Table of Contents
Acknowledgements ........................................................................................................ ix
Table of Contents ........................................................................................................... xi
List of Tables ................................................................................................................ xiv
List of Figures ............................................................................................................... xv
List of Abbreviations ................................................................................................ xviii
Nomenclature ............................................................................................................... xix
Abstract ........................................................................................................................ xxi
Chapter 1: Introduction ........................................................................................... 1
Problem Statement ...................................................................................... 1
Literature Review ....................................................................................... 1
Previous Work ............................................................................................ 2
Contributions of this Work ......................................................................... 4
Applications ................................................................................................ 4
1.5.1 Brain Model Functions ........................................................................ 5
1.5.2 Artificial Personality ........................................................................... 5
1.5.3 Ambient Intelligence and Internet of Things ...................................... 6
1.5.4 Ubiquitous Computing and Ubiquitous Robotics ............................... 7
Techniques .................................................................................................. 7
1.6.1 Feature Selection ................................................................................. 7
1.6.2 Modeling ........................................................................................... 10
Organization of the Thesis ........................................................................ 11
Chapter 2: Background ......................................................................................... 13
Introduction .............................................................................................. 13
The Society of Mind ................................................................................. 13
xii
2.2.1 Introduction ...................................................................................... 13
2.2.2 Example – Building a Tower of Blocks ........................................... 13
Evolutionary Computation (EC) .............................................................. 16
Evolutionary Algorithms (EA) ................................................................ 16
Genetic Algorithms (GA) ........................................................................ 17
Optimization Problem .............................................................................. 19
2.6.1 Introduction ...................................................................................... 19
2.6.2 Mathematical Definition ................................................................... 20
Evolution Strategies ................................................................................. 23
2.7.1 Basic Evolution Strategies ................................................................ 23
2.7.2 Step-size Adaptation Evolution Strategy (σSA-ES ) ........................ 26
2.7.3 Cumulative Step-Size Adaptation (CSA) ......................................... 28
2.7.4 Covariance Matrix Adaptation Evolution Strategy (CMA-ES) ....... 30
Nelder-Mead Method ............................................................................... 38
2.8.1 Introduction ...................................................................................... 38
2.8.2 What is a simplex ............................................................................. 38
2.8.3 Operation .......................................................................................... 38
2.8.4 Nelder-Mead Algorithm ................................................................... 40
Robocode Game ....................................................................................... 44
2.9.1 Robot Anatomy................................................................................. 44
2.9.2 Robot Code ....................................................................................... 44
2.9.3 Scoring .............................................................................................. 44
Chapter 3: Human Imitating Cognitive Modeling Agent (HICMA) ................ 45
Introduction .............................................................................................. 45
The Structure of HICMA ......................................................................... 45
3.2.1 Modeling Agent ................................................................................ 46
xiii
3.2.2 Estimation Agent ............................................................................... 48
3.2.3 Shooting Agent .................................................................................. 48
3.2.4 Evolver Agents .................................................................................. 49
The Operation of HICMA ........................................................................ 55
Chapter 4: Experiments and Results .................................................................... 65
Human-Similarity Experiments ................................................................ 66
4.1.1 Human Behavior Imitation ................................................................ 66
4.1.2 Human Performance Similarity ......................................................... 69
Modeling-Agent Evolution ....................................................................... 70
Chapter 5: Conclusions and Future Work ........................................................... 77
Conclusions .............................................................................................. 77
Future Work .............................................................................................. 78
References...................................................................................................................... 79
Feature Selection ................................................................................. 85
A.1. Mutual Information .................................................................................. 85
A.1.1. Histogram Density Estimation .......................................................... 85
A.1.2. Kernel Density Estimation ................................................................ 87
A.2. Correlation ................................................................................................ 91
A.2.1. Pearson Correlation Coefficient (PCC) ............................................. 91
A.2.2. Distance Correlation .......................................................................... 92
xiv
List of Tables
Table 2-1: A simple CMA-ES code ............................................................................... 35
Table 2-2: An example of “fitness” function ................................................................. 36
Table 2-3: An example of “sortPop” function ............................................................... 36
Table 2-4: An example of “recomb” function ............................................................... 36
Table 2-5: Simplexes in different dimensions ............................................................... 38
Table 2-6: Nelder-Mead Algorithm ............................................................................... 40
Table 2-7: Iteration count for different initial guesses of Nelder-Mead Algorithm ...... 42
Table 3-1: The parameters of modeling agent ............................................................... 47
Table 3-2: The function lexicon ..................................................................................... 50
Table 4-1: Robocode simulation parameters ................................................................. 66
Table 4-2: Human behavior interpretation ..................................................................... 68
Table 4-3: Description of human behaviors modeled by mathematical functions ........ 68
Table 4-4: The initial state of HICMA .......................................................................... 69
Table 4-5: Human Players Data ..................................................................................... 69
Table 4-6: The parameters of Nelder-Mead evolver used in the experiment ................ 75
Table 4-7: Bad parameter values of Nelder-Mead evolver (used for verfication) ......... 75
Table A-1: Common Kernel density functions .............................................................. 88
Table A-2: Rule-of-thumb constants ............................................................................. 89
Table A-3: Common kernel constants ........................................................................... 90
xv
List of Figures
Fig. 1-1: Asimple IoT system ........................................................................................... 7
Fig. 1-2: A model of battery-decay-rate ........................................................................... 8
Fig. 1-3: mathematical modeling techniques utilized or experimented in this work ....... 8
Fig. 1-4: Modeling-related technique ............................................................................... 9
Fig. 2-1: AI Disciplines map ......................................................................................... 14
Fig. 2-2: A builder agent with its sub-sgents .................................................................. 15
Fig. 2-3: Sub-agents of add agent ................................................................................... 15
Fig. 2-4: A general scheme of evolutionary algorithms ................................................. 17
Fig. 2-5: A 2D fitness landscape .................................................................................... 17
Fig. 2-6: Basic GA flowchart ......................................................................................... 18
Fig. 2-7: An objective function ...................................................................................... 19
Fig. 2-8: Local and global minima ................................................................................. 20
Fig. 2-9: An example application of mathematical optimization ................................... 21
Fig. 2-10 Minimization example in a 2D search space .................................................. 22
Fig. 2-11: Basic steps of evolution strategies ................................................................. 24
Fig. 2-12: Visualization of the search process of a (1/1,100)-ES ................................... 25
Fig. 2-13: One-Sigma ellipse of bivariate normal distribution N(0,I) [µ=0, σ=I] .......... 26
Fig. 2-14: Two random probability distributions with (a) σ = 1.0 and (b) σ = 3.0. The
circles are the one-sigma elipses ............................................................................. 27
Fig. 2-15: A 2D normal distribution (a) 2D vector of points and (b) two 1D histograms
................................................................................................................................. 28
Fig. 2-16: The principle of Cumulative Step-size Adaptation (CSA), ........................... 28
Fig. 2-17: A population of (a) Step-size Adaptation and (b) Covariance Matrix
Adaptation ............................................................................................................... 30
Fig. 2-18: A 2D normal distribution N(0,C) [µ=0, σ=C]................................................ 31
Fig. 2-19: Optimization of 2D problem using CMA-ES. ............................................... 31
xvi
Fig. 2-20: Operations of Nelder-Mead algorithm .......................................................... 39
Fig. 2-21: Twelve iterations of a practical run of Nelder-Mead algorithm ................... 42
Fig. 2-22: Nelder-Mead algorithm flowchart................................................................. 43
Fig. 2-23: A Robocode robot anatomy .......................................................................... 44
Fig. 3-1: The structure of HICMA’s Robocode agent ................................................... 45
Fig. 3-2: The block diagram of HICMA’s modeling agent ........................................... 46
Fig. 3-3: The hybrid optimizatiob method ..................................................................... 47
Fig. 3-4: The block diagram of HICMA’s modeling agent ........................................... 48
Fig. 3-5: The block diagram of HICMA’s shooting agent ............................................ 48
Fig. 3-6: The operation of Nelder-Mead evolver agent ................................................. 49
Fig. 3-7: A chromosome example .................................................................................. 51
Fig. 3-8: The flowchart of the mutation process ............................................................ 54
Fig. 3-9: Operation phases flowchart of HICMA .......................................................... 55
Fig. 3-10: Solving the estimating problem .................................................................... 58
Fig. 3-11: The initialization phase of HICMA ............................................................... 59
Fig. 3-12: The operation phase of HICMA .................................................................... 60
Fig. 3-13: The evolution phase of HICMA .................................................................... 61
Fig. 3-14: The evolution sequence of parameters .......................................................... 62
Fig. 3-15: The optimization operation of the modeling agent ....................................... 63
Fig. 3-16: The operation of the estimation agent ........................................................... 64
Fig. 4-1: Experiment map with HICMA’s agents .......................................................... 65
Fig. 4-2: The behavior of every function pair against: (a) Shadow and (b) Walls ......... 67
Fig. 4-3: Peroformace similarity with human players ................................................... 70
Fig. 4-4: Peroformace difference with human players ................................................... 70
Fig. 4-5: The evolution of the parameters of the modeling agent .................................. 74
Fig. A-1: A histogram .................................................................................................... 85
xvii
Fig. A-2: A 2D histogtam with origin at (a) (-1.5, -1.5) and (b) (-1.625, -1.625) .......... 87
Fig. A-3: Kernel density estimation. The density distribution (dotted curve) is estimated
by the accumulation (solid curve) of Gaussian function curves (dashed curves) ... 88
Fig. A-4: 2D kernel density estimate (a) individual kernels and (b) the final KDE ....... 90
Fig. A-5: Pearson Correlations of different relationships between two variables .......... 91
Fig. A-6: Distance Correlation of linear and non-linear relationships ........................... 92
Fig. A-7: Murual Information vs Distance Correlation as dependence measures .......... 92
xviii
List of Abbreviations
AI Artificial Intelligence
AmI Ambient Intelligence
ANN Artificial Neural Network
CMA-ES Covariance Matrix Adaptation Evolution Strategy
CSA-ES Cumulative Step-size Adaptation Evolution Strategy
EA Evolutionary Algorithm
EC Evolutionary Computation
ES Evolution Strategy
GA Genetic Algorithm
HICMA Human Imitating Cognitive Modeling Agent
IA Intelligent Agent
IoT Internet of Things
LAD Least Absolute Deviations
LRMB Layered Reference Model of the Brain
MFT Modeling Field Theory
PCC Pearson Correlation Coefficient
PIR Passive Infrared
RL Reinforcement Learning
RSS Residual Sum of Squares
SA-ES Step-size Adaptation Evolution Strategy
Ubibot Ubiquitous Robot
Ubicomp Ubiquitous Computing
xix
Nomenclature
A
agency ................................................ 14
Ambience ............................................. 6
Ambient intelligence ............................ 6
Artificial Neural Network .................... 2
C
comma-selection .......................... 22, 24
cost function ....................................... 20
covariance .......................................... 89
D
direct behavior imitation ...................... 2
E
elite .................................................... 22
elitist selection ................................... 22
Euclidean norm ........................... 28, 32
evaluation function ............................ 15
evolver agent...................................... 45
F
feature selection ................................... 7
fitness function ............................ 18, 22
fitness landscape .......................... 16, 22
functional form .................................. 10
G
Genetic Algorithms ............................. 1
global optimum ............................ 16, 18
H
heuristic search .................................. 37
Histogram .......................................... 83
I
indicator function ............................... 84
indirect behavior imitation ................... 2
Intelligent Agent .................................. 1
Internet of Things (IoT) ....................... 6
J
joint probability distribution .............. 83
K
kernel .................................................. 85
Kernel Density Estimation ................. 85
Koza-style GP ...................................... 2
L
local minimum ................................... 37
local optimum .............................. 16, 18
loss function ....................................... 18
M
marginal probability distribution . 83, 87
mutation strength ............................... 25
N
Nelder-Mead ...................................... 37
neurons ................................................. 2
O
object parameter ................................ 23
objective function ......................... 18, 20
one-σ ellipse ....................................... 29
one-σ line ........................................... 24
P
plus-selection ............................... 22, 24
xx
R
Reinforcement Learning ...................... 1
reinforcement signal ............................ 1
Robocode ............................................. 2
S
sample standard deviation .................. 87
search costs ........................................ 22
search space ....................................... 20
search-space ...................................... 16
Silverman’s rule of thumb ................. 87
society of mind ..................................... 1
standard deviation .............................. 89
statistical model ................................... 7
stochastic optimization problem ........ 15
stochastic search ................................ 15
T
training dataset .................................... 2
U
ubibots ................................................. 7
Ubicomp .............................................. 7
Ubiquitous Computing ........................ 7
xxi
Abstract
Human intelligence is the greatest inspirational source of artificial intelligence
(AI). The ambition of all AI researches is to build systems that behave in a human-like
manner. For this goal, researches follow one of two directions: either studying the
neurological theories of the human brain or finding how a human-like intelligence can
stem from non-biological methods. This research follows the second school. It employs
statistical methods for imitating human behaviors. A Human-Imitating Cognitive
Modeling Agent (HICMA) is introduced. It combines different non-neurological
techniques for building and tuning models of the environment. Every combination of the
parameters of these models represents a typical human behavior. HICMA is a society of
intelligent agents that interact together. HICMA adopts the society of mind theory and
extends it by introducing a new type of agents: the evolver agent. An evolver is a special
type of agents whose function is to adapt other agents according to the encountered
situations. HICMA’s simple representation of human behaviors allows an evolver agent
to dress the entire system in a suitable human-like behavior (or personality) according to
the given situation.
HICMA is tested on Robocode [1]. Robocode is a game where autonomous tanks
battle in an arena. Every tank is provided by a gun and a radar. The proposed system
consists of a society of five agents (including two evolver agents) that cooperate to
control a Robocode tank in a human-like behavior. The individuals of the society are
based on statistical and evolutionary methods: CMA-ES, Nelder-Mead algorithm, and
Genetic Algorithms (GA).
Results show that HICMA could develop human-like behaviors in Robocode
battles. Furthermore, it could select the suitable behavior for every Robocode battle.
1
Chapter 1: Introduction
Problem Statement
An Intelligent Agent (IA) is an entity that interacts with the environment by
observing it via sensors and acting on it via actuators. This interaction aims at achieving
some goals. An IA usually builds and keeps models of the environment and the
interesting objects in this environment. Such models represent how an AI sees the
environment and, consequently, how it behaves in it. These models are continuously
adapted according to the environmental changes. These adaptations are reflected on the
behavior of the AI. The ability of an AI to conform to environmental changes mainly
depends on the mechanisms it uses for building and adapting its internal model(s) of the
environment. These internal models became the focus of wide researches in the field of
artificial intelligence (AI). As human intelligence is the greatest inspirational source for
AI, many researches focused on imitating human intelligence by envisioning how human
brain works. However, the operation of mind is very complicated, so it may be better to
imitate human behavior without emulating the operation of mind. This thesis tackles
imitating human behavior by simple mathematical models.
This work integrates the society of mind [2] theory with statistical approaches to
achieve human-like intelligence. An enhanced adaptable model of the society of mind
theory is proposed, where statistical approaches are used to autonomously build and
optimize models of the environment and the interesting objects in this environment.
Literature Review
The problem of providing intelligent agents with human-like behavior has been
tackled in many researches using different AI techniques such as Reinforcement
Learning (RL), Genetic Algorithms (GA), and Artificial Neural Networks (ANN).
Reinforcement Learning (RL) targets the problem of which behavior an agent
should have to maximize a reward called the reinforcement signal. The agent learns the
proper behavior by trial-and-error interactions with the dynamic environment. It observes
the state of the environment through its sensors and selects an action to apply on the
environment by its actuators. Each of the available actions has a different effect on the
environment and gains a different reinforcement signal. The agent behavior should
choose actions that maximize the cumulative reinforcement signal. Learning the proper
behavior is a systematic trial-and-error process guided by a wide variety of RL
algorithms.
Genetic Algorithms (GA) is a type of evolutionary algorithms, which imitates the
natural evolution of generations. It encodes the solution of a problem into the form of
chromosomes and generates a population of candidate-solution individuals each
represented by one or more chromosomes. The individuals (candidate solutions) are then
adapted iteratively hoping to find a good one eventually. An overview of GA is given in
section 2.5.
2
An Artificial Neural Network (ANN) is imitation of the biological neural networks
existing in the brains of living organisms. Both consists of neurons organized and
connected together in a specific way. The neurons are grouped in three layers: an input
layer, an output layer, and one or more intermediate hidden layers. Some or all neurons
of every layer are connected to the neurons of the next layer via weighted directed
connections. The inputs of the network are received via the input layer and passed to the
hidden the output layers over the weighted connections. The network is trained on a
training dataset that consists of observations of inputs along with their corresponding
outputs. The goal of this learning stage is to adapt the weights of the interconnections so
that the output of the network for a given input is as close as possible to the output in the
training dataset for the same input. ANNs have been utilized in a wide variety of fields
including, but not limited to, computer vision, speech recognition, robotics, control
systems, and game playing and decision making.
This section examines some of the previous works based on RL, GA, and ANN
that had novel contributions to AI and can give inspiration for human imitation
techniques. The review of a previous work focuses on how this work shows human
behavior imitation regardless of how efficient (with respect to scoring) this work is in
comparison with similar ones. The novel contributions of the previous works are
extracted to support human imitation of this thesis.
Previous Work
An extensive comparison between different human behavior-imitation techniques
is introduced in [3]. They are divided into direct and indirect behavior imitation.
In direct behavior imitation, a controller is trained to output the same actions a
human took when faced the same situation. This means that the performance of the
controller depends on the performance of the human exemplar. This arises a dilemma:
should the human exemplar be skillful or amateur? If he was skillful, the IA may
encounter new hard situations that have never been encountered by that human exemplar
because he is merely clever enough to avoid falling into such hard situations. On the other
hand, if the human exemplar is amateur, then imitating his behavior will not endow the
IA with good performance.
On the other hand, indirect behavior imitation uses an optimization algorithm to
optimize a fitness function that measures the human similarity of an IA. There is no
human exemplar, so indirect techniques can achieve more generalization than direct ones
[3]. It is found that controllers trained using GA (indirect) performed more similar to
human players than those trained using back-propagation (direct). All previous works
presented in this section employ indirect techniques.
Genetic Programming (GP) is used in [4] to build a robot for the Robocode game.
In Robocode, programmed autonomous robots fight against each other, this involves
shooting at the enemy robots and dodging their bullets. The authors used Koza-style GP
where every individual is a program composed of functions and terminals. The functions
they used are arithmetic and logical ones (e.g. addition, subtraction, OR, etc.). Their
3
evolved robot could win the third rank in a competition against other 26 manually
programmed robots.
The same authors of [4] used the same technique, GP, to evolve car-driving
controllers for the Robot Auto Racing Simulator (RARS) [5]. The top evolved controllers
won the second and the third ranks over other 14 RARS hand-coded controllers.
Also In [6], GP is used for evolving car racing controllers. GP builds a model of
the track and a model of the driving controller. The model of the track is built and stored
in memory. Then the driving controller uses this model during the race to output the
driving commands. The driving controller is a two-branch tree of functions and terminals.
The output of the first sub-tree is interpreted as a driving command (gas/break) and the
output of the second one is the steering command (left/right). The functions of a tree are
mathematical ones in addition to memory functions for reading the model stored in
memory. The terminals of a tree are the readings of some sensors provided by the
simulation environment (e.g. distances to track borders, distance traveled, and speed).
A work similar to [6] is introduced in [7] where virtual car driving controllers for
RARS are evolved using GP. An individual consists of two trees; one controls the
steering angle and the other triggers the gas and brake pedals. The controller of the
steering angle is a simple proportional controller that tries to keep the car as close as
possible to the middle of the road. A proportional controller merely gives a steering angle
proportional to the current deviation from the middle of the road without concerning the
previous deviations or the expected future deviation tendency. The best-evolved
controller performed well but not enough to compete with other elaborate manually
constructed controllers.
An interesting comparison between GP and artificial neural networks (ANN) in
evolving controllers is made in [8]. Car controllers similar to those of [6] and [7] were
built using GP and ANN and their performances were compared. It is found that the GP
controllers evolve much faster than those of ANN do. However, the ANN controllers
ultimately reach higher fitness. In addition, the ANN controllers outperform the GP
controllers in generality; ANN controllers could perform significantly better on tracks
for which they have not been evolved. Finally, it is found that both GP and ANN could
use the controllers trained for one track as seeds for evolving other controllers trained for
all tracks. Both GP and ANN could generalize these controllers proficiently on the
majority of eight different tracks. However, it is found that ANN controllers generalized
better.
In [9], an ANN is trained using GA to ride simulated motorbikes in a computer
game. It is found that GA could create a human-like performance and even could find
solutions that no human has previously found. GA is then compared with back-
propagation learning algorithm. As GA requires no training data, it could adapt to any
new track. However, its solutions are not optimal and not as good as the solutions of a
good human player or the solutions of the back-propagation algorithm. On the other hand,
back-propagation requires training data recorded from a game played by a good human
player, but cannot be trained to deal with unusual situations.
4
Contributions of this Work
This work introduces a flexible and expandable model of IAs. The evolved IA can
autonomously adapt itself to the encountered situation to get higher fitness. Its fitness
function seems as if it implicitly involves an unlimited number of fitness functions from
which it selects the most appropriate one. For example, in Robocode game, our IA is
required to get the highest possible score. It autonomously finds that saving its power
implicitly increases its score. Consequently, it evolves a behavior that targets decreasing
power consumption.
Furthermore, the proposed IA not only evolves human-like behaviors but also
allows for the manual selection among these evolved behaviors. This can be used in
computer games to generate intelligent opponents with various human-like behaviors
.This can make computer games more exciting and challenging. In addition, the evolution
process requires no supervision; it is a type of indirect human imitation.
This thesis introduces the following to the field of AI:
An evolvable model of the society of mind theory by introducing a new type of
agents (evolver agents) that facilitates autonomous behavior adaptation
Automatic unsupervised behavior selection according to the encountered
situation
Simple mathematical representations of different personalities of IAs without
emulating the brain
In addition to the automatic behavior selection, the proposed agent can be
influenced towards a certain behavior. This influence is easily achieved by changing the
fitness function. A suitable fitness function is selected and the agent automatically
changes its behavior to satisfy it. This is similar to a child who acquires the manners and
behaviors of his ideals (e.g. his parents). Also, as the fitness of the agent can be fed back
directly from a human exemplar (e.g. the user), the agent can autonomously learn to have
a satisfying behavior to that person. Experiments show that a wise selection of the fitness
function inspires the agent to the required behavior.
The simple representation of behaviors enables the agent to survive in different
environments. It can learn different behaviors and autonomously select the suitable
behavior for an environment. For example, a robot can live with different persons and
select the behavior that satisfies each person.
Applications
Intelligent Agents are used in almost all fields of life due to their flexibility and
autonomy. They contribute in several emerging disciplines such as ambient intelligence
and cognitive robotics. The benefits of intelligent systems are unlimited. They can be
useful in educating children, guiding tourists, helping handicapped and old persons …
5
etc. This section presents what and how emerging disciplines can benefit from work
introduced in this thesis
1.5.1 Brain Model Functions
Several models of the brain have been introduced to AI literature. These models
can be categorized into two classes: Models that imitate the architecture of the brain, and
models that imitate the human behavior without imitating the architecture of the brain.
The brain models of the first class develop simple architectures similar to those of
the brain of an organism. They normally adopt Artificial Neural Networks (ANN). An
example of this category is the confabulation theory. It is proposed as the fundamental
mechanism of all aspects of cognition (vision, hearing, planning, language, initiation of
thought and movement, etc.) [10]. This theory has been hypothesized to be the core
explanation for the information processing effectiveness of thought.
The other class of brain modeling theories tries to develop a model of the brain that
does not necessarily resemble the brain of an organism. These models aim to develop
models of the brain that rigorously implement its functions. This implementation can
mainly depend on non-neurological basis such as mathematics, statistics, probability
theory … etc. Examples of this category are: the Modeling Field Theory (MFT) [11],
[12], the Layered Reference Model of the Brain (LRMB) [13]–[16], Bayesian models of
cognition [17], the society of mind [2] outlined in 2.2, and the Human Imitating Cognitive
Modeling Agent (HICMA) [18] introduced in 2.9. These theories adopt different theories
for modeling the brain. However, they all have the same target: developing an artificial
brain that behaves like a human (or animal) brain when encountering real-life situations.
This imitation includes not only brain’s strong points (e.g. adaptability and learning
capability) but also its weak points such as memory loss over time. Inheriting the weak
points of the brain is not necessarily a drawback. It can be useful in some applications
such as in computer games, where a human-like opponent is more amusing than a genius
one.
1.5.2 Artificial Personality
Personality can be defined as a characteristic way of thinking, feeling, and
behaving. It includes behavioral characteristics, both inherent and acquired, that
distinguish one person from another [19]. When imitating human behaviors, personality
must be taken into account, as personality is the engine of behavior [20]. As an important
trait of humans, it has gained much focus of research. For a genetic robot’s personality,
genes are considered as key components in defining a creature’s personality [21]. That
is, every robot has its own genome in which each chromosome, consisting of many genes,
contributes to defining the robot’s personality.
In this work, human behaviors are represented by mathematical models. The
parameters of a model define not only a human behavior but also the degree of that
behavior. For example, a robot can have a tricky behavior and another robot can be
learned to be trickier.
6
Another advantage of mathematical modeling of behaviors is that it opens the door
to the great facilities of mathematical optimization techniques such as CMA-ES and
Nelder-Mead described in sections 2.7 and 2.8 respectively.
1.5.3 Ambient Intelligence and Internet of Things
Ambience is the character and atmosphere of some place [22]. It is everything
surrounding us in the environment. This includes lights, doors and windows, TV sets,
computers … etc. Ambient intelligence (AmI) is providing the ambience with enough
intelligence to understand user’s preferences and adapt to his needs. It incorporates
smartness into the environment for comfort, safe, secure, healthy, and energy conserving
environment. The applications of AmI in life are unlimited. They include helping
elderlies and disabled persons, nursing children, guiding tourists … etc.
An example of an AmI application is a smart home that detects the arrival of the
user and automatically takes the actions that the user likely requires. It can switch on
certain lights, turn on the TV and switch to the user’s favorite channel with the preferred
volume, suggest and order a meal from a restaurant and so on. All of these actions are
taken automatically by the AmI system depending on preferences the system has learned
before about the user.
The evolution of AmI has led to the emergence of the Internet of Things (IoT). IoT
provides the necessary intercommunication system between the smart things in the
environment including sensors, processors, and actuators. Sensors (e.g. temperature,
clock, humidity … etc.) send their information to a central processor. The processor
receives their information and guesses what tasks the user may like to be done. Finally,
the processor decides what actions need to be done and sends commands to the concerned
actuators to put these actions into effect.
An example scenario: A PIR (Passive Infra-Red) sensor detects the arrival of the
user and sends a trigger to a processor. The processor receives this trigger along with
information from a clock and a temperature sensor. It reviews its expertise in the user’s
preferences and guesses that he likely desires a shower when he arrives at that time in
such a hot weather. Consequently, the processor sends a command to the water heater to
get the water heated at the user’s preferred degree. The same principle can be scaled up
to hundreds or thousands of things connected together via a network. A simple IoT
system is shown in Fig. 1-1.
This work proposes a simple mathematical modeling of human behaviors. This
modeling can be useful for providing the smart environment with human behaviors so
that it interacts with the user humanly. Furthermore, it is simple enough for the
environment to tailor different behaviors for different users.
7
1.5.4 Ubiquitous Computing and Ubiquitous Robotics
Ambient Intelligence is based on Ubiquitous Computing (ubicomp). The word
“ubiquitous” means existing or being everywhere at the same time [23].Ubicomp is a
computer science concept where computing exists everywhere and not limited to
personal computers and servers. The world that human will be living in is expected to be
fully ubiquitous and everything to be networked. Information may flow freely among
lamps, TVs, vehicles, cellular phones, computers, smart watches and glasses … etc.
Among this ubiquitous life, the ubiquitous robotics (ubibots) may be living as artificial
individuals. They, over other things, deal directly with their human users. This requires
them to understand and imitate human behaviors. Again, mathematical behavior
modeling can be useful.
Techniques
The main part of this work is the statistical modeling. Given a set of observations
or samples, a statistical model is the mathematical representation of the process assumed
to have generated these samples. This section outlines statistical techniques utilized or
experimented in this work for building and optimizing a statistical model. Fig. 1-3
illustrates how these techniques contribute to the modeling process. An extended chart
of related techniques is also given in Fig. 1-4.
1.6.1 Feature Selection
A model can be thought of as a mapping between a set of input features and one or
more output responses. When the model of a process is unknown, it is usually unknown
which features affect the output of that process. For example, assume that a mobile robot
is provided by a battery and some sensors: humidity, temperature, gyroscope,
accelerometer, and speedometer. Assume that it is required to find the battery decay rate
as a function of sensor readings as shown in Fig. 1-2. Obviously, the decay rate does not
change with all sensor readings. The purpose of feature selection is to identify which
features (i.e. sensor readings) are relevant to the output (i.e. battery decay rate). Thus, the
modeling process tries to map the output to only a subset of all features. This makes
modeling faster, less complicated, and more accurate. An introduction to feature selection
is provided in [24].
Fig. 1-1: Asimple IoT system
8
In general, any feature-selection method tends to select the features that have:
Maximum relevance with the observed feature (output)
Minimum redundancy to (relation with) other input features
Mutual Information and correlation are detailed more in Appendix A.1 and
Appendix A.2 respectively.
Fig. 1-2: A model of battery-decay-rate
Sensor readings Battery Consumption Decay rate
Fig. 1-3: mathematical modeling techniques utilized or experimented in this work
Feature Selection
• Mutual Information
• Density Estimation
• Histogram
• Kernel
• Correlation
• Pearson Correlation
• Distance Correlation
• Linear
• Polynomial
• Logarithmic
• Exponential
Parameter Estimation
(Optimization)
• CMA-ES
• Nelder-Mead
• BOBYQA
• Powell
Model Evaluation
• Least Absolute Deviations (LAD)
Model Selection (Functional Form)
9
Fig. 1-4: Modeling-related technique
10
1.6.2 Modeling
A model of a process is the relation between its inputs and its output. This relation
comprises:
Input features (Independent variables)
Observed output (Dependent variable)
Parameters
For example, in Eq. (1-1), the inputs are x1 and x2, the output is y, and the
parameters are a, b, c, d, and e.
𝑦 = 𝑎. 𝑥12 + 𝑏. 𝑥2 + 𝑐. 𝑒𝑑.𝑥1 + 𝑒 (1-1)
The modeling process consists of three main steps:
1. Model Selection
2. Model Optimization
3. Model Evaluation
The following subsections describe these steps.
1.6.2.1 Model Selection
The first step in formulating a model equation is selecting its functional form,
namely the form of the function that represents the process. For example, the formula
(1-1) consists of a quadratic function, a linear function, an exponential function, and a
constant. Making such selection depends on experience and trials. Assuming having
experience about the modeled process, trials must be conducted to find the most suitable
functional form. In this work, Genetic Algorithms searches for a good functional form as
described in section 3.2.4.2.
1.6.2.2 Model Optimization
After selecting a suitable functional form of a model, the function of the
optimization stage is to estimate the values of the parameters that best fit the model to
the real process. As this stage is a main part of this work, it is described in detail in section
2.6.
1.6.2.3 Model Evaluation
The function of the model evaluation stage is to evaluate optimized models. This
allows for selecting the fittest model from among the available ones. For example,
different functional forms (polynomial, linear, logarithmic … etc.) can be optimized for
modeling a process, and the best one can then be chosen. The meaning of best depends
11
on the model-evaluation method. For example, the residual sum of squares (RSS) adds
up the squares of the differences between the observed (real) output and the predicted
(modeled) output. A different method is Pearson Correlation Coefficient (PCC),
described in Appendix B.1, which calculates the correlation between the real output and
the predicted output. The simplest method is the Least Absolute Deviations (LAD),
which is used in this work.
Organization of the Thesis
This thesis is organized as follows: Chapter 2 gives a background about the
underlying theories of this work. It overviews the society of mind theory and briefly
summarizes Evolutionary Computation (EC), Evolutionary Algorithms (EA), and
Genetic Algorithms (GA). It next explains in detail the mathematical optimization
problem. Two optimization methods are then explained: Covariance Matrix Adaptation
(CMA) and Nelder-Mead method. Finally, the Robocode game is presented as the
benchmark of this work.
Chapter 3 explains in detail the structure of the proposed agent (HICMA). It
comprises five sections that describe the five agents of HICMA. Chapter 4 introduces the
experiments and results of HICMA as a Robocode agent. Finally, Chapter 5 discusses
the conclusions and possible future works.
12
13
Chapter 2: Background
Introduction
This chapter overviews the basic disciplines behind this work. It is organized as
follows. Section 2.2 briefly overviews the society of mind theorem and how it is utilized
and extended in this work. Section 2.3 generally overviews evolutionary computation.
Section 2.4 reviews evolutionary algorithms. Section 2.5 overviews Genetic Algorithms
(GA). Section 2.6 defines the optimization problem. Section 2.7 explains solving
optimization problems using the evolution strategies techniques focusing on the
Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Section 2.8 explains the
Nelder-Mead algorithm combined in this work with CMA-ES to form a hybrid
optimization technique, which is the main engine of the proposed system.
The relations between the aforementioned disciplines and similar ones are depicted in
Fig. 2-1. The disciplines used in this work are bounded by double outlines. This map
provides a good reference to different substitutes that can be used for extending this work
in the future.
The Society of Mind
2.2.1 Introduction
The society of mind theory was introduced by Marvin Minsky in 1980 [2]. It tries
to explain how minds work and how intelligence can emerge from non-intelligence. It
envisions the mind as a number of many little parts, each mindless by itself, each part is
called agent. Each agent by itself can only do some simple thing that needs no mind or
thought at all. Yet when these agents are joined in societies, in certain very special ways,
this leads to true intelligence. The agents of the brain are connected in a lattice where
they cooperate to solve problems.
2.2.2 Example – Building a Tower of Blocks
Imagine that a child wants to build a tower with blocks, and imagine that his mind
consists of a number of mental agents. Assume that a “builder” agent is responsible for
building towers of blocks. The process of building a tower is not that simple. It involves
other sub-processes: choosing a place to install the tower, adding new blocks to the tower,
and deciding whether the tower is high enough. It may be better to break up this complex
task into simpler ones and dedicate an agent to each one such as in Fig. 2-2.
Again, adding new blocks is too complicated for the single agent “add” to
accomplish. It would be helpful to break this process into smaller and simpler sub-
processes such as finding an unused block, getting this block, and putting it onto the
tower as shown in Fig. 2-3.
14
Fig. 2-1: AI Disciplines map
15
Fig. 2-3: Sub-agents of add agent
In turn, the agent get can be broken up into: “grasp” sub-process that grasps a
block, and “move” sub-process that moves it to the top of the tower. Generally, when an
agent is found to have to do something complicated, it is replaced with a sub-society of
agents that do simpler tasks.
It is clear that none of these agents alone can build a tower, and even all of them
cannot do unless the interrelations between them are defined, that is, how every agent is
connected with the others. In fact, an agent can be examined from two perspectives: from
outside and from inside its sub-society. If an agent is examined from the outside with no
idea about its sub-agents, it will appear as if it knows how to accomplish its assigned
task. However, if the agent’s sub-society is examined from the inside, the sub-agents will
appear to have no knowledge about the task they do.
To distinguish these two different perspectives of an agent, the word agency is used
for the system as a black box, and agent is used for every process inside it.
A clock can be given as an example. As an agency, if examined from its front, its
dial seems to know the time. However, as an agent it consists of some gears that appear
to move meaninglessly with no knowledge about time.
To sense the importance of viewing a system of agents as an agency, one can
examine a steering wheel of a car. As an agency, it changes the direction of the car
without taking into account how this works. However, if it disassembled, it appears as an
agent that turns a shaft that turns a gear to pull a rod that shifts the axle of a wheel.
Bearing this detailed view in mind while driving a car can cause a crash because it
requires too long time to be realized every time the wheels are to be steered.
In summary, to understand a society of agents, the following points must be known:
1. How each separate agent works
2. How each agent interacts with other agents of the society
3. How all agents of the society cooperate to accomplish a complex task
Fig. 2-2: A builder agent with its sub-sgents
16
This thesis extends the society of mind theory to present a novel model of an
intelligent agent that behaves like a human. The previous points are expanded so that the
entire society evolves according to the environmental changes. 2.9 describes the
proposed model and how it adopts the society of mind theory.
Evolutionary Computation (EC)
Evolutionary Computation (EC) is a subfield of Artificial Intelligence (AI) that
solves stochastic optimization problems. A stochastic optimization problem is the
problem of finding the best solution from all possible solutions by means of a stochastic
search process, that is, a process that involves some randomness. EC methods are used
for solving black-box problems where there is no information to guide the search process.
An EC method tests some of the possible solutions, trying to target the most promising
ones. EC methods adopt the principle of evolution of generations. They generate a
population of candidate solutions and evaluate every individual solution in this
population. Then, a new generation of, hopefully fitter, individuals is generated. The
evolution process is repeated until a satisfying result is obtained.
Evolutionary Algorithms (EA)
An Evolutionary Algorithm (EA) is an Evolutionary Computation subfield that
adopts the principle: survival of the fittest. EA methods are inspired from the evolution
in nature, a population of candidate solutions is evaluated using an evaluation function,
and the fittest individuals, called parents, are granted better chance to reproduce the
offspring of the next generation. Reproduction is done by recombining pairs of the
selected parents to produce offspring. The offspring are then mutated in such a way that
they hold some of the traits inherited from their parents in addition to their own developed
traits. The rates of recombination and mutation are selected to achieve a balance between
utilization of parents’ good traits and exploration of new traits. For improving the fitness
over generations, the process of reproduction ensures that the good traits are not only
inherited over generations but also developed by the offspring. After a predetermined
termination condition is satisfied, the evolution process is stopped. The fittest individual
in the last generation is then selected as the best solution of the given problem. Fig. 2-4
shows the general scheme of evolutionary algorithms.
17
The search process of an evolutionary algorithm tries to cover the search-space of
the problem, which is the entire range of possible solutions, without exhaustively
experimenting every solution. The fitness of all individuals in the search-space can be
represented by a fitness landscape as shown in Fig. 2-5. The horizontal axes represent
the domain of candidate solutions (i.e. individuals) and the vertical axis represents their
fitness. The optimum solution within a limited sub-range of the search-space is called a
local optimum, while the absolute optimum solution over the entire search-space is called
the global optimum. An EA tries to find the global optimum and not to fall into one of
the local optima.
Genetic Algorithms (GA)
Genetic Algorithm (GA) is a type of evolutionary algorithms that was first
introduced by John Holland in the 1960s and were further developed during the 1960s
and the 1970s [25]. It is designed to imitate the natural evolution of generations. It
encodes the solution of a problem into the form of chromosomes. A population of
candidate-solution individuals (also called phenotypes) is generated, where each solution
is represented by one or more chromosomes (also called genotypes). In turn, each
chromosome consists of a number of genes. GA then selects parents to reproduce from
the fittest individuals in the population. Reproduction involves crossover of parents to
produce offspring, and mutation of offspring’s genes. The newly produced offspring
population represents the new generation, which is hopefully fitter, on average, than the
previous one. The evolution process is repeated until a satisfying fitness level is achieved
Fig. 2-5: A 2D fitness landscape
Population
Parents Offspring
Fitness
function
Termination
Condition
Selection
Initialization
Crossover &
Mutation
Evaluation
Solution
Fig. 2-4: A general scheme of evolutionary algorithms
Global optimum Local optimum
18
or a maximum limit of generations is exceeded. The general GA flowchart is illustrated
in Fig. 2-6. Every block is briefly explained next.
Encoding
Encoding is how a candidate solution is represented by one or more chromosomes.
The most common type of chromosomes is the binary chromosome. The genes of this
chromosome can hold either 0 or 1.
Initial Population
An initial population of candidate solutions is generated to start the evolution
process. It is often generated randomly, but sometimes candidate solutions are seeded
into it.
Evaluation
The candidate solutions, represented by individuals, are evaluated by a fitness
function. The fitness of an individual determines its probability to be selected for
reproduction.
Selection
Selection is the process of choosing the parents of the next generation from among
the individuals of the current generation. It is a critical part of GA, as it must ensure a
good balance between exploitation and exploration. Exploitation is giving the fittest
individuals better chance to survive over generations while exploration is searching for
Start
Stop
New Generation
No
Yes
Parents
Offspring
Encoding
Initializing Parameters
Generating Initial Population
Evaluating Population
Selection
Crossover
Mutation
Terminate?
Fig. 2-6: Basic GA flowchart
19
new useful individuals. The tradeoff between exploitation and exploration is critical. Too
much exploitation may lead to a local optimum and too much exploration may greatly
increase the number of required generations to find a good solution of the problem.
Selection methods include roulette-wheel, stochastic universal sampling, rank, and
tournament selection.
Crossover
Crossover is the recombination of two, or more, selected parents to produce
offspring. It is performed by dividing parents’ chromosomes into two or more portions,
and randomly copying every portion to either offspring.
Mutation
Mutation is the adaptation of an individual’s genes. For example, a binary gene is
mutated by flipping it with a specified probability.
Termination
The evolution process is repeated until a termination criterion is satisfied.
Common termination criteria are:
1. A satisfying solution is found
2. A maximum limit of generations is reached.
3. Significant fitness improvement is no longer achieved.
Optimization Problem
2.6.1 Introduction
Optimization is the minimization or the maximization of a non-linear objective
function (fitness function, loss function). An objective function is a mapping of n-
dimension input vector to a single output value as shown in Fig. 2-7. That is, if y = f(x)
is a non-linear function of n-dimension vector x ∈ ℝ(n) , then minimizing f(x) is finding
the n values of x that gives the minimum value of y ∈ ℝ.
Fig. 2-7: An objective function
The optimum value (maximum or minimum) of a function within a limited range
is called local optimum while the optimum value of the function over its domain is called
global optimum. A function can have several local optima, but only one global optimum.
An example of local and global minima is shown in Fig. 2-8.
x(n) f(x) y
X
20
Fig. 2-8: Local and global minima
2.6.2 Mathematical Definition
Given a number n of observations (samples) of p variables (e.g. sensor readings)
forming a 2-D matrix M with dimensions n x p, such that column j represents the
observations (samples) of the jth variable, and row i represents the ith observation
(sample).
M =[
𝑚1,1 𝑚1,2 ⋯ 𝑚1,𝑝
⋮ ⋮ ⋮ ⋮𝑚𝑛,1 𝑚𝑛,2 ⋯ 𝑚𝑛,𝑝
], where mi,j is the ith sample of the jth variable (sensor)
The p variables may correspond to, for example, a number of sensors attached to a
robot such as (thermometers, speedometers, accelerometers … etc.) which represent the
senses of that robot. For example, suppose that a robot plays the goalkeeper role in a
soccer game. Like a skillful human goalkeeper, the robot should predict the future
location of the ball as it approaches the goal to block it in time. Assume that the motion
of the ball over time is modeled by a quadratic function of time, that is:
Location (x, y, z) = f (t) (2-1)
Equivalently:
Location (x) ≡ 𝑓𝑥(𝑥) = 𝑎𝑥. 𝑡2 + 𝑏𝑥. 𝑡 + 𝑐𝑥 (2-2)
Location (y) ≡ 𝑓𝑦(𝑦) = 𝑎𝑦. 𝑡2 + 𝑏𝑦. 𝑡 + 𝑐𝑦 (2-3)
Location (z) ≡ 𝑓𝑧(𝑧) = 𝑎𝑧. 𝑡2 + 𝑏𝑧 . 𝑧 + 𝑐𝑧 (2-4)
Let the axes of the playground be as illustrated in Fig. 2-9. The robot can use
equation (2-3) to predict the time T at which the ball will arrive at the goal line (this
prediction can be done using optimization). Then the robot can use equations (2-2) and
(2-4) to predict the location of the ball at the goal line (xg, yg, zg) at time T, and can move
to there in time.
y=f(x)
global minimum local minimum x local minimum
21
Solving this problem (blocking the ball) is done as follows:
1. Get n samples (observations) of the locations of the moving ball (x, y, and z) at
fixed intervals of time (t).
2. Store the (x, t) samples in matrix Mx , where x and t represent the p variables
(sensors).
3. Similarly, store (y, t), and (z, t) samples in My and Mz
The three matrices will appear like these:
Mx =[
𝑥0 𝑡0⋮ ⋮
𝑥𝑛−1 𝑡𝑛−1
], My =[
𝑦0 𝑡0⋮ ⋮
𝑦𝑛−1 𝑡𝑛−1
], Mz =[
𝑧0 𝑡0⋮ ⋮
𝑧𝑛−1 𝑡𝑛−1
]
The function of the optimization strategy is to find the values of functions’
parameters (ax, bx, cx, ay, by …) in equations (2-2), (2-3), and (2-4) that make every
function optimally fits into the sampled data (observations). Finding such values is a
search problem, where the search space is the range of all possible values of the
parameters. This is what the following two steps do. To minimize, for example, 𝑓𝑥(𝑥):
4. Re-organize the given equation so that the left-hand side equals zero:
𝑓𝑥(𝑥) − 𝑎𝑥. 𝑡2 + 𝑏𝑥. 𝑡 + 𝑐𝑥= ex (2-5)
Where ex is the error of the optimization process. This equation is called
the objective function, cost function, or loss function.
5. Search for the values of the parameters (ax, bx, and cx) that minimize the error ex.
This is a 3-D search problem as the algorithm searches for the optimum values of
three parameters. The optimization (minimization) of the function 𝑓𝑥 is done as
follows:
a. Guess a value for (ax0, bx0, cx0)
b. Substitute into (2-5) for (ax, bx, cx) by (ax0, bx0, cx0), and for 𝑓𝑥(𝑥) and t
by x0 and t0 respectively into the matrix Mx , and calculate the error ex0
Fig. 2-9: An example application of mathematical optimization
22
x0 − 𝑎𝑥0. 𝑡02 + 𝑏𝑥0. 𝑡0 + 𝑐𝑥0 = 𝑒𝑥0 (2-6)
c. Repeat steps (b) and (c) for all of the n observations and accumulate the
errors into ex
𝑒𝑥 = ∑|xi − 𝑎𝑥𝑖. 𝑡𝑖2 + 𝑏𝑥𝑖. 𝑡𝑖 + 𝑐𝑥𝑖|
𝑛−1
𝑖=0
(2-7)
Where |X| is the absolute value of X.
The optimization algorithm tries to improve its guesses of the parameters (a, b, and
c) over the iterations to find the optimum values in the search space that minimizes the
error ex.
Fig. 2-10 visualizes a 2D search space (two parameters), where the horizontal axes
represent the domains of the two parameters and the vertical axis represents the value of
the objective function (the error). The optimization algorithm should start from any point
on the search space and move iteratively towards the minimum error (i.e. the global
optimum).
Searching the search space for the optimum (minimum or maximum) solution
depends on the optimization algorithm. The next section explains evolution-strategy
optimization algorithms.
After optimizing a function, it is saved for usage as a model for an observed
phenomenon or an environmental object. For example, the goalkeeper robot can keep the
optimized functions (1), (2), and (3) for similar future shoots (Assuming that all shoots
have similar paths).
Fig. 2-10 Minimization example in a 2D search space
23
Evolution Strategies
Evolution Strategies (ESs) are optimization techniques belonging to the class of
Evolutionary Computation (EC) [26], [27]. An evolution strategy searches for the
optimum solution in a search-space similarly as Genetic Algorithms (GA). It generates a
population of individuals representing candidate solutions (i.e. vectors of the parameters
to be optimized). Every individual in the population is then evaluated by a fitness function
that measures how promising it is for solving the given problem. The fittest individuals
are then selected and mutated to reproduce another generation of offspring. This process
is repeated until a termination condition is met. Mutation represents the search steps that
the algorithm takes in the search-space; it is done by adding normally distributed random
vectors to the individuals.
The fitness of all individuals in the search space can be represented by a fitness
landscape as shown in Fig. 2-5.The horizontal axes represent candidate solutions
(individuals) and the vertical axis represents their fitness. The goal of the optimization
algorithm is to converge to the global optimum solution with the minimum search costs
represented by the number of objective function evaluations.
In optimization problems, the search space is the domain of the parameters of the
optimized function (minimized or maximized) which is also used as an objective
function. For example, in maximization problems, the goal is to find the values of the
parameters that maximizes a function, so the value of that function represents the fitness
of the given set of parameters, the higher the value of the function, the fitter the solution.
2.7.1 Basic Evolution Strategies
The basic evolution strategy can be defined by:
(µ/ρ, λ)-ES and (µ/ρ + λ)-ES
Where:
µ is the number of parents (fittest individuals) in the population
ρ is the number of parents that produce offspring
λ is the number of generated offspring
The “,” means that the new µ parents are selected from only the current λ offspring,
this is called comma-selection, while the “+” means that the new parents are selected
from the current offspring and the current parents, this is called plus-selection.
For example, a (4/2, 10)-ES selects the fittest 4 parents from the current population
(10 individuals) and randomly mutates two of them to generate 10 new offspring. The 4
parents are selected from the current 10 offspring only.
On the other hand, a (4/2 + 10)-ES selects the fittest 4 parents from the current
population (10 individuals) along with the 4 parents of the previous generation. This is a
type of elitist selection where the elite individuals are copied to the next generation
without mutation.
24
The basic steps of an evolutionary strategy are:
1. Generating candidate solutions (Mutating parent individuals)
2. Selecting the fittest solutions
3. Updating the parameters of the selected solutions
Fig. 2-11 illustrates the previous three steps.
Fig. 2-11: Basic steps of evolution strategies
An ES individual x is defined as follows:
x = [y, s, F(y)] (2-8)
Where:
y is the parameter vector to be optimized, it is called object parameter vector
s is a set of parameters used by the strategy, it is called strategy parameter vector
F(y) is the fitness of y
Strategy parameters s are the parameters used by the strategy during the search
process, they can be thought of as the tools used by the strategy, they are similar to a
torchlight a person may use for finding an object (optimum solution) in a dark room
(search space). The most important strategy parameter is the step-size described later.
Notice that an evolution strategy not only searches for the optimum solution of y, but also
searches for the optimum strategy parameter s. This is similar to trying several types of
torchlights to find the best one to use for finding the lost object in the dark room.
Obviously, finding the optimum strategy-parameter vector speeds-up the search process.
Generating Selecting Updating
25
The basic Algorithm of a (µ/ρ +, λ)-ES is given in Algorithm 2-1:
Algorithm 2-1: A basic ES algorithm
1. Initialize the initial parent population Pµ = {p1, p2 … pµ}
2. Generate initial offspring population Pλ = {x1, x2 … xλ} as follows:
a. Select ρ random parents from Pµ
b. Recombine the selected ρ parents to form an offspring x
c. Mutate the strategy parameter vector s of the offspring x
d. Mutate the object parameter vector y of the offspring x using the
mutated parameter set s
e. Evaluate the offspring x using the given fitness function
f. Repeat for all λ offspring
3. Select the fittest µ parents from either
{Pµ∪ Pλ} if plus-selection (µ/ρ + λ)-ES
{Pλ} if comma-selection (µ/ρ , λ)-ES
4. Repeat 2 and 3 until a termination condition is satisfied
Normally distributed random vectors are used to mutate the strategy-parameter set
s and the object-parameter vector y at steps 2.c and 2.d respectively. The mutation process
is explained in more detail in the next section.
Fig. 2-12 visualizes the search process described above for solving a 2D problem
(i.e. Optimizing two parameters where the object parameter vector y⊂ℝ(2) ). Both µ and
ρ equals 1, and λ equals 100. That is, one parent is selected to produce 100 offspring.
Every circle represents the one-σ line of the normal distribution at a generation. The
center of every circle is the parent that was mutated by the normally distributed random
vectors to produce the rest of the population represented by ‘.’, ‘+’ and ‘*’ marks. The
black solid line represents the direction of search in the search space.
A one-σ line is the horizontal cross-section of the normal-distribution 2D curve at
σ (standard deviation). It is an ellipse (or a circle) surrounding 68.27% of the samples.
This ellipse is useful in studying normal distributions. In Fig. 2-12, the one-σ lines are
unit circles because both of the two sampled variables has a standard deviation σ = 1.
Fig. 2-12: Visualization of the search process of a (1/1,100)-ES
Generation g Generation g+1 Generation g+2
26
Fig. 2-13 shows the one-σ ellipse (circle in this case) of a 2D normal distribution
represented on a 3D graph.
2.7.2 Step-size Adaptation Evolution Strategy (σSA-ES )
The goal of an optimization algorithm is to take steps towards the optimum; the
faster an optimization algorithm reaches the optimum the better it is. Clearly, if the
optimum is far from the starting point, it is better to take long steps towards the optimum
and vice versa. In an optimization algorithm, the step size is determined by the amount
of mutation of parent individuals. That is, high mutation of a parent causes its offspring
to be highly distinct from it and thus very far from it in the search space.
Usually, there is no detailed knowledge about the good choice of the strategy parameters
including the step-size. Therefore, step-size adaptation evolution strategies adapt the
mutation strength of parents at every generation in order to get the optimum step-size
that quickly reaches the optimum. As mutation is done by adding a normally distributed
random vector to the parent, the standard deviation σ of that random vector represents
the mutation strength. A large value of σ means that the random vector will more likely
hold larger absolute values and thus more mutation strength. The principle of step-size
adaptation is illustrated in Fig. 2-14, where the standard deviation in (a) equals 1.0 and
in (b) equals 3.0. It is clear that the step-size in (b) is larger than in (a).
An individual of a σ-SA strategy is defined as follows:
a = [y, σ, F(y)] (2-9)
Fig. 2-13: One-Sigma ellipse of bivariate normal distribution N(0,I) [µ=0, σ=I]
27
Recall the general definition of an ES individual in Eq. (2-8). It is clear that σ is
the only element in the strategy parameter vector s. σ is the mutation strength parameter
that would be used to mutate the parameter vector y if that individual is selected for
generating offspring. An offspring generated from a parent is defined as follows:
xt
i
)1( = {
σi(t+1) ← σ(t). eτ𝑁𝑖(0,1)
𝑦𝑖(𝑡+1) ← 𝑦𝑖
(𝑡) + σ(t). 𝑁𝑖(0, 𝐼)
𝐹𝑖 ← F(yt)
(2-10)
As shown in eq. (2-10), the mutation strength parameter σ is self-adapted every
generation t. The learning parameter τ controls the amount of self-adaptation of σ per
generation. A typical value of τ is 1/ 2n [28]. It is obvious that the step-size σ in Eq.
(2-10) is a parameter of every individual in the population, and because the fittest
individuals are selected to produce the offspring of the next generation, the best step sizes
are inherited by the new offspring. This enables the algorithm to find the optimum step
size that quickly reaches the optimum solution.
The exponential function in Eq. (2-10) is usually used in evolution strategies, but
other functions can also be used to mutate the step-size [29]
N(0,1) is a normally distributed random scalar (i.e. a random number sampled from
a normal distribution with mean = 0, and standard deviation = 1). N(0, I) is a normally
distributed random vector with the same dimensions of the optimized parameter vector
s. Fig. 2-15 shows a (a) normal distribution of 2D points (i.e. Two object parameters) and
(b) the probability density function (PDF) of every parameter. It is obvious that the
density of samples is high around the mean and decreases as we move away. That is, the
number of random samples near the mean is larger than far from it.
(a) (b)
Fig. 2-14: Two random probability distributions with (a) σ = 1.0 and (b) σ = 3.0. The
circles are the one-sigma elipses
28
Normal distributions are used for the following reasons [27]:
1. Widely observed in nature
2. The only stable distribution with finite variance, that is, the sum of independent
normal distributions is also a normal distribution. This feature is helpful in the
design and the analysis of algorithms
3. Most convenient way to generate isotropic search points, that is, no favor to any
direction in the search space
2.7.3 Cumulative Step-Size Adaptation (CSA)
CSA-ESs updates the step-size depending on the accumulation of all steps the
algorithm has made. The importance of a step decreases exponentially with time [30].
The goal of CSA is to adapt the mutation strength (i.e. step-size) such that the correlations
between successive steps are eliminated [30], [31]. Correlation represents how much two
vectors agree in direction. Highly correlated steps are replaced with a long step, and low-
correlated steps are replaced with a short step. The concept of Cumulative Step-seize
Adaptation is illustrated in Fig. 2-16. The thick arrow represents step adaptation
according to the cumulation of the previous six steps. Every arrow represents a transition
of the mean m of the population.
(a) (b)
Fig. 2-15: A 2D normal distribution (a) 2D vector of points and (b) two 1D histograms
Fig. 2-16: The principle of Cumulative Step-size Adaptation (CSA),
29
The CSA works as follows: 1. Get λ random normally distributed samples around the mean solution (i.e. the
parent of the population):
𝑥𝑖 = 𝑚𝑡 + 𝜎𝑡. 𝑁(0, 𝐼)
Equivalently:
𝑥𝑖 = 𝑁(𝑚𝑡, 𝜎𝑡2)
Where mt is the solution at iteration t, and σt is the standard deviation of the
selection (i.e. the step-size). That is, select λ random samples from around the
solution mt with probability decreasing as we move away from mt. The samples
appear as shown in Fig. 2-15-a.
2. Evaluate the λ samples ,and get the fittest µ of them
3. Calculate the average Zt of the fittest µ samples as follows:
𝑍𝑡 =1
𝜇∑𝑥𝑖
𝜇
𝑖=1
4. Calculate the cumulative path:
Pct+1 = (1-c)Pct + μ.c(2-c) Zt , where 0 < 𝑐 ≤ 1 (2-11)
The parameter c is called the cumulation parameter; it determines how rapidly
the information stored in Pct fades. That is, how long (over generations) the effect
of a step size at generation t lasts. The typical value of c is between 1
√n and
1
n .
It is chosen such that the normalization term µ.c (2-c) normalizes the
cumulative path. That is, if Pt follows a normal distribution with a zero mean and
a unit standard deviation (i.e. Pt ∼ N(0 ,I)), Pt+1 also follows N(0 ,I) [30].
5. Update the mutation strength (i.e. step-size):
𝜎𝑡+1 = 𝜎𝑡 exp(𝑐
2𝑑𝜎(‖𝑃𝑐𝑡+1‖
2
𝑛− 1)) (2-12)
Where ‖𝑋‖ is the Euclidean norm of the vector X ∈ℝn :
‖𝑋‖ = √𝑥12 + 𝑥2
2 + ⋯+ 𝑥𝑛2
The damping parameter dσ determines how much the step-size can change. It is
set to 1 as indicated in [30], [31].
30
The fittest µ parents are then selected and mutated by the new step-size σt+1 to
form a new population of λ offspring.
6. Repeat all steps until a termination condition is satisfied.
2.7.4 Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
2.7.4.1 Introduction
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [32] is a state-of-the-
art evolution strategy. It extends the CSA strategy described in section 2.7.3, which
adapts the step-size σ every generation and uses the updated step-size to mutate the parent
solution. CMA-ES differs from CSA in that it uses a covariance matrix C, instead of the
identity matrix I, to generate the random mutating vectors. This means that the different
components of the random vector are generated from normal distributions with different
standard deviations. That is, every component has a different step size. For example,
imagine the problem of optimizing a 2D vector, and assume that the optimum solution is
{10, 30} and the initial guess is {0, 0}. It is obvious that the optimum value of the second
parameter is farther than the first one. Therefore, it is better to take larger steps in the
direction of the second one. This is what CMA-ES does and this is why it requires fewer
generations to find the optimum solution. In brief, CMA is more directive than SA. Fig.
2-17 shows the same population generated from a parent at the origin. In (a) the
covariance matrix is the identity matrix I, and in (b) the covariance matrix equals[1 11 3
].
Fig. 2-18 shows a 2D normal distribution shaped by a covariance matrix C. The
black ellipse is the one-σ ellipse of the distribution.
(a) (b)
Fig. 2-17: A population of (a) Step-size Adaptation and (b) Covariance Matrix Adaptation
31
In addition to using a covariance matrix to adapt the shape of the mutation
distribution, the covariance matrix itself is set as a strategy parameter. Consequently, it
is adapted every generation so that the mutation distributions could adapt to the shape of
the fitness landscape and converge faster to the optimum. The operation of CMA-ES is
further illustrated in Fig. 2-19, where the population concentrate on the global optimum
after six generations. The ‘*’ symbols represent individuals, and the dashed lines
represent the distribution of the population. Background color represents the fitness
landscape, where darker color represents lower fitness.
Fig. 2-18: A 2D normal distribution N(0,C) [µ=0, σ=C]
Fig. 2-19: Optimization of 2D problem using CMA-ES.
C = [1 11 3
]
Generation 1 Generation 2 Generation 3
Generation 4 Generation 5 Generation 6
32
2.7.4.2 CMA-ES Algorithm
The basic (µ/µ, λ) CMA-ES works as follows [27]:
Initialization:
I.1 λ Number of offspring (i.e. population size)
I.2 µ/ µ Number of parents / number of solutions involved in
updating m, C, and σ
I.3 m ∈ ℝ(nx1) n-dimension Initial solution (the mean of the population)
I.4 C = I(nxn) Initial covariance matrix = Identity matrix
I.5 σ ∈ ℝ+𝑛𝑥1 Initial step size
I.6 cσ ≈ 4/n Decay rate for evolution (cumulation) path for step-size σ
I.7 dσ ≈ 1 Damping parameter for σ change
I.8 cc ≈ 4/n Decay rate for evolution (cumulation) path of C
I.9 c1 ≈ 2/n2 Learning rate for rank-one update of C
I.10 cµ ≈ µw/n2 Learning rate for rank-µ update of C
I.11 Pσ = 0 Step-size cumulation path
I.12 Pc = 0 Covariance-matrix cumulation path
The constant n is the number of state parameters (i.e. the parameters of the objective
function). The left column maps these parameters with the code given in Table 2-1.
Generation Loop: Repeat until a termination criterion is met:
1. Generate λ offspring by mutating the mean m
xi = m + yi, 0 < i ≤ λ
Where: yi is an (n x 1) random vector generated according to a normal distribution
with zero mean and covariance C [yi ~ Ni (σ2, C)] as shown in Fig. 2-17.b and
Fig. 2-18.
2. Evaluate the λ offspring by the fitness function
F(xi) = f(xi)
3. Sort the offspring by fitness so that:
f(x1:λ) < f(x2:λ) < … < f(xλ:λ)
Where 𝑥1:𝜆 is the fittest individual in the population.
4. Update the mean m of the population
𝑚 = ∑𝑤𝑖. 𝑥𝑖:𝜆
µ
𝑖=1
m = m + σ. yw
33
Where:
𝑦𝑤 = ∑𝑤𝑖 . 𝑦𝑖:𝜆
µ
𝑖=1
Hence:
m = 𝑚 + σ∑𝑤𝑖. 𝑦𝑖:𝜆
µ
𝑖=1
Where xi:λ is the ith best individual in the population, and the constants wi are
selected such that [27]:
w1 ≥ w2 ≥ w3 ≥ ⋯ ≥ wµ ≥ 0,
∑ 𝑤𝑖µ𝑖=1 = 1,
μ𝑤 =1
∑ 𝑤𝑖2µ
𝑖=1
≈𝜆
4
5. Update step-size cumulation path Pσ :
Pσ = (1-cσ)Pσ + 1-(1-cσ)2 µw C yw
Where Pσ ∈ ℝ(nx1)
The square root of the matrix C can be calculated using Matrix Decomposition
[32] or using Cholesky Decomposition [26].
6. Update the covariance-matrix cumulation path Pc :
Pc = (1-cc)Pc + 1-(1-cc)2 µw yw
Where Pc ∈ ℝ(nx1)
7. Update the step-size σ:
𝜎 = 𝜎. exp(𝑐𝜎𝑑𝜎
(‖𝑃𝜎‖
𝐸‖𝑁(0, 𝐼)‖− 1))
According to [30] this formula can be simplified to:
𝜎 = 𝜎. exp(𝑐𝜎2𝑑𝜎
(‖𝑃𝜎‖
2
𝑛− 1))
Where ||X|| is the Euclidean norm of the vector X.
34
8. Update the covariance matrix C:
𝐶 = (1 − 𝑐1)𝐶 + 𝑐1𝑃𝑐𝑃𝑐𝑇 + (1 − 𝑐𝜇)𝐶 + 𝑐𝜇 ∑𝑤𝑖𝑦𝑖:𝜆𝑦𝑖:𝜆
𝑇
𝜇
𝑖=1
The expression (1 − 𝑐1) 𝐶 + 𝑐1𝑃𝑐 𝑃𝑐𝑇 is called “rank-one update”. It reduces the
number of function evaluations. The constant c1 is called “rank-one learning
rate”.
The expression (1 − 𝑐𝜇)𝐶 + 𝑐𝜇 ∑ 𝑤𝑖𝑦𝑖:𝜆𝑦𝑖:𝜆𝑇𝜇
𝑖=1 is called “rank-µ update”. It
increases the learning rate in large populations and can reduce the number of
necessary generations. The constant cµ is the “rank-µ” learning rate [27].
Termination:
Some example termination criteria used in [32] are:
Stop if the best objective function values of the most recent 10+30n
λ generations
are zero
Stop if the average fitness of the most recent 30% of M generations is not better
than the average of the first 30% of M generations. Where M is 20% all
generations, such that 120 + 30n
λ ≤ M ≤ 20,000 generations.
Stop if all of the best objective function values over the last 10+30n
λ generations
are below a certain limit. A common initial guess is of that limit is 10-12
Stop if the standard deviations (step-sizes) in all coordinates are smaller than a
certain limit. A common limit is 10-12 of the initial σ.
Usually, the algorithm is bounded to a limited search space, but in our experiments
it could find the global optimum even if the search space is unbounded (i.e. the domain
of a component of the solution vector is [-∞, ∞]).
A simple MATLAB/Octave CMA-ES code is given in Table 2-1. The left column
of the table maps the given code with the steps given above in the initialization stage.
Rank-one update Rank-µ update
35
Table 2-1: A simple CMA-ES code
%Initialization
I.1 lambda = LAMBDA; % number of offspring
I.2 mu = MU; % number of parents
I.3 yParent = INIT_SOL; % Initial solution vector
n = length(yParent); % Problem dimensions
I.4 Cov = eye(n); % Initial covariance matrix
I.5 sigma = INIT_SIGMA; % Initial sigma (step-size)
I.6 Cs = 1/sqrt(n); % Learning rate of step-size
I.8 Cc = 1/sqrt(n); % Decay rate of Pc
I.10 Cmu = 1/n^2; % Learning rate of C
I.11 Ps = zeros(n,1); % Step-size cumulation
I.12 Pc = zeros(n,1); % Cov. matrix cumulation
I.13 minSigma = 1e-3; %Min. step-size…
% … termination condition
% Generation Loop: Repeat until termination criterion
while(1)
SqrtCov = chol(Cov)'; % square root of cov. …
% … matrix
for l = 1:lambda; % generate lambda …
% … offspring
1
offspr.std = randn(n,1); % offspring σ
offspr.w = sigma*(SqrtCov*offspr.std); % σ C N(0, I) ≡ N(σ, C)
offspr.y = yParent + offspr.w; % Mutate the parent
2 offspr.F = fitness(offspr.y); % Evaluate the offspring
offspringPop{l} = offspr; % offspring complete
end; % end for
3 ParentPop = sortPop(offspringPop, mu); % sort pop. and take µ …
% … best individuals
4 yw = recomb(ParentPop); % Calculate yw
yParent = yParent + yw.w; % new mean (parent)
5 Ps=(1-Cs)*Ps+sqrt(mu*Cs*
(2-Cs))*yw.std;
% Update Ps
6 Pc=(1-Cc)*Pc+sqrt(mu*Cc*(2-Cc))*yw.w; % Update Pc
7 sigma=sigma*exp((Ps'*Ps - n)
/(2*n*sqrt(n)));
% Update step-size
8 Cov = (1-Cmu)*Cov + Cmu*Pc*Pc'; % Update cov. matrix
Cov = (Cov + Cov')/2; % enforce symmetry
% Termination
if (sigma < minSigma) % termination condition
printf("solution="); % The solution is…
disp(ParentPop{1}.y'); % … the first parent
break; % Terminate the loop
end; % end if
end % end while
36
The upper-case words, such as LAMBDA, are predefined constants. The function
fitness evaluates the candidate solutions. The function sortPop sorts the individuals by
fitness and extracts the best µ ones. The function recomb recombines the selected µ
parents to form a new parent for the next generation. A simple recombination is to
average the solution vector and the step-size vector of the selected µ parents.
MATLAB/Octave examples of these three functions are given in Table 2-2, Table 2-3,
and Table 2-4 respectively.
Table 2-2: An example of “fitness” function
function out = fitness(x)
out = norm(x-[5 -5]'); % The global optimum is at [5, -5]
end
Table 2-3: An example of “sortPop” function
function sorted_pop = sortPop(pop, mu);
for i=1:length(pop);
fitnesses(i) = pop{i}.F;
end;
[sorted_fitnesses, index] = sort(fitnesses);
for i=1:mu;
sorted_pop{i} = pop{index(i)};
end;
end
Table 2-4: An example of “recomb” function
function recomb = recomb(pop);
recomb.w = 0; recomb.std = 0;
for i=1:length(pop);
recomb.w = recomb.w + pop{i}.w;
recomb.std = recomb.std + pop{i}.std;
end;
recomb.w = recomb.w/length(pop);
recomb.std = recomb.std/length(pop);
end
The previous code snippets are modifications of the code provided in [26]. Codes
also for C, C++, Fortran, Java, MATLAB/Octave, Python, R, and Scilab are provided in
[33].
37
2.7.4.3 Advantages of CMA-ES
CMA-ES is efficient for solving:
Non-separable problems 1
Non-convex functions 2
Multimodal optimization problems, where there are possibly many local optima
Objective functions with no available derivatives
High dimensional problems
2.7.4.4 Limitations of CMA-ES
CMA-ES can be outperformed by other strategies in the following cases:
Partly separable problems (i.e. optimization of an n-dimension objective function
can be divided into a series of n optimizations of every single parameter)
The derivative of the objective function is easily available (Gradient
Descend/Ascend is better)
Small dimension problems
Problems that can be solved using a relatively small number of function
evaluations (e.g. < 10n evaluations. Nelder-Mead may be better)
1 An n-dimensional separable problem can be divided into n 1-dimensional separate problems 2 A function is convex if the line segment between any two points lies above the curve of the function
38
Nelder-Mead Method
2.8.1 Introduction
Nelder-Mead method [34] is a non-linear optimization technique that uses a
heuristic search, that is, its solution is not guaranteed to be optimal. It is suitable for
solving problems where the derivatives of the objective function are not known or too
costly to compute. Normally, it is faster than the CMA-ES, but it easily falls in local
optima. This method uses the simplex concept described in the next section.
2.8.2 What is a simplex
A simplex is a geometric structure consisting of n+1 vertices in n dimensions.
Table 2-5 contains examples of simplexes:
Table 2-5: Simplexes in different dimensions
Dim. Shape Graph
0 Point
1 Line
2 Triangle
3 Tetrahedron
4 Pentachoron
2.8.3 Operation
To optimize an n-dimensional function (i.e. with n parameters), Nelder-Mead
algorithm constructs an (n+1) initial simplex and tries to capture the optimum point inside
it while reducing the size of the simplex. A simplex is similar to a team of police officers
chasing a criminal; every simplex point represents a police officer, the optimum solution
is the criminal, and Nelder-Mead is the plan the police officers follow to catch the
criminal. Selecting the initial simplex is critical and problem-dependent as a very small
initial simplex can lead to a local minimum. This is why Nelder-Mead method is usually
used only when a local optimum is satisfying such as its usage in the hybrid optimization
technique described in section 3.2.4.
After constructing the initial simplex, it is iteratively updated using four types of
operations. Fig. 2-20 illustrates these operations on a 2D simplex (triangle). The shaded
39
and the blank regions represent the simplex before and after the operation respectively.
P̅ is the mean of all points except for the worst. Ph is the highest (worst) point, Pl is the
lowest (best) point, P* is the reflected point, P** is the expanded or the contracted point.
Every operation is described next.
(a) reflection (b) expansion (c) contraction (d) reduction
(resizing)
Fig. 2-20: Operations of Nelder-Mead algorithm
a) Reflection:
If Ph is the worst point, it is expected to find a better point at the reflection of Ph
on the other side of the simplex. The reflection P* of Ph is:
P* = (1 + α) P̅ – α Ph , where:
α ∈ ℝ+ is the reflection coefficient, and [P* P̅] = α[P̅ Ph]
b) Expansion:
If the reflection point P* is better than the best Pl:
f(P*) < f(Pl)
Then expand P* to the expansion point P**:
P** = γ P* + (1-γ) P̅ , where:
γ > 1 is the expansion coefficient: the ratio of [P** P̄ ] to [P* P̄ ]
c) Contraction:
If the reflection point is worse than all points except for the worst (i.e. worse than
the second worst point):
f(P*) > f(Pi), for i ≠ h
Then, define a new Ph to be either the old Ph or P* whichever is better, and
contract P* to P**:
P** = β Ph (1 – β) P̄
Pl
𝑃 Ph
P*
P**
P*
Pl
𝑃 𝑃
Pl
P*
P**
Pl
P*
40
The contraction coefficient β lies between 0 and 1 and is the ratio of [P** P̄ ] to
[P* P̄ ]
d) Reduction (Resizing):
If, after contraction, the contracted point P** is found worse than the second worst
point, then replace every point i with (Pi + Pl) / 2. This contracts the entire simplex
towards the best point Pl and, thus, reduces the size of the simplex.
Reduction handles the rare case of having a failed contraction, which can
happen if one of the simplex points is much farther than the others from the
minimum (optimum) value of the function. Contraction may thus move the
reflected point away from the minimum value and, consequently, further
contractions are useless. In this case, reduction is the proposed action in [34] to
bring all points to a simpler fitness landscape.
2.8.4 Nelder-Mead Algorithm
A flowchart of the Nelder-Mead method is illustrated in Fig. 2-22, and the
corresponding MATLAB/Octave code is given in Table 2-6. This algorithm is explained
in detail in [35].
Table 2-6: Nelder-Mead Algorithm
function [x, fmax] = nelder_mead (fun, x)
% Initialization
minVal = 1e-4; % Min. value to achieve
maxIter = length(x)*200; % Max. number of iterations
n = length (x); % Problem dimension
S = zeros(n,n+1); % Empty simplex
y = zeros (n+1,1); % Empty simplex fitness
S(:,1) = x; % The initial guess
y(1) = feval (fun,x); % Evaluate the initial guess
iter = 0; % Iteration counter
for j = 2:n+1
% Build initial simplex S(j-1,j) = x(j-1) + (1/n);
y(j) = feval (fun,S(:,j));
endfor
[y,j] = sort (y,'ascend'); % Sort simplex points
S = S(:,j); % Re-arrange simplex points
alpha = 1;
beta = 1/2;
gamma = 2;
% Reflection coefficient
% Contraction coefficient
% Expansion coefficient
while (1) % Main loop
if (++iter > maxIter)
% Stop if exceeded max. iterations break;
endif
if (abs(y(1)) <= minVal)
% Stop of target min. value achieved break;
endif
41
mean = (sum (S(:,1:n)')/n)'; % Calculate the mean point
Pr =(1+alpha)*mean - alpha*S(: ,n+1); % Calculate the reflected point
Yr = feval (fun,Pr); % Evaluate the reflected point
if (Yr < y(n)) % Is Reflected better than 2nd worst?
if (Yr < y(1)) % Is reflected better than best?
Pe=gamma*Pr+ (1-gamma)*mean; % Calculate expanded point
Fe = feval (fun,Pe); % Evaluate expanded point
if (Fe < y(1)) % Is expanded better than best?
S(:,n+1) = Pe; % Replace worst with expanded
y(n+1) = Fe;
else
S(:,n+1) = Pr; % Replace worst by reflected
y(n+1) = Yr;
endif
else
S(:,n+1) = Pr; % Replace worst by reflected
y(n+1) = Yr;
endif
else
if (Yr < y(n+1)) % Is reflected better than worst?
S(:,n+1) = Pr; % Replace worst by reflected
y(n+1) = Yr;
endif
Pc = beta*S(:,n+1) + (1-beta)*mean; % Calculate contracted point
Yc = feval (fun,Pc); % Evaluate contracted point
if (Yc < y(n)) % Is contracted better than 2nd worst?
S(:,n+1) = Pc; % Replace worst by contracted
y(n+1) = Yc;
else
for j = 2:n+1
% Shrink the simplex (Reduction) S(:,j) = (S(:,1) + S(:,j))/2;
y(j) = feval (fun,S(:,j));
endfor
endif
endif
[y,j] = sort(y,'ascend'); % Sort the simplex
S = S(:,j);
endwhile
x = S(:,1); % The best solution
fmax = y(1); % The minimum value
endfunction
Fig. 2-21 shows 12 of 19 iterations of a Nelder-Mead algorithm run for
minimizing the function f (x, y) = (x-5)2 + (y-8)4, with the starting point (6, 6). The reason
of selecting a starting point close to the optimum solution is just to view all simplex
updates on the same axes without shifting or scaling the axes at every iteration. This
illustrates the operation of the algorithm better. The same problem was resolved several
42
times using different initial guesses, and the number of algorithm iterations is recorded
for every starting point in Table 2-7.
Table 2-7: Iteration count for different initial guesses of Nelder-Mead Algorithm
Initial Guess Number of iterations
(6, 6) 19
(0, 60) 21
(-60, 60) 32
(100, 100) 34
(-200, 300) 38
For all results in Table 2-7, the algorithm terminated after achieving the targeted
minimum value (0.0001). It never exceeded the maximum iteration count (400).
Fig. 2-21: Twelve iterations of a practical run of Nelder-Mead algorithm
(1) Initial (2) Contraction (3) Reflection (4) Contraction
(5) Contraction (6) Contraction (7) Contraction (8) Contraction
(9) Contraction (10) flection (11) Contraction (12) Reflection
43
Fig. 2-22: Nelder-Mead algorithm flowchart
Contraction
Expansion
Reflection
Resizing
Replace all Pi’s by
(Pi+Pl)/2
Get fitness function f Get termination condition
Get α, β, and γ
Yes
P** = (1+γ) P* - γ P̄
y** = f (P**)
End
Start
Sort the simplex
Determine Ph and Pl
Calculate P̄
P* = (1+α) P̄ - α Ph
y* = f (P*)
yr < yn ?
yr < yl ?
y* < yl ?
y** < yl ?
Replace Ph
by P**
Replace Ph
by P*
No
Yes
No
Terminate?
y* < yh?
y** < yn?
P** = β Ph + (1-β) P̄
y** = f (P**)
Replace Ph by P*
No
Yes
Replace Ph
by P**
No Yes
Yes
No
No
Yes
Initialize the simplex
44
Robocode Game
The proposed IA is tested on Robocode game [1]. It is a Java-programming game
where programmed tanks compete in a battle arena. The tanks are completely
autonomous. That is, programmers have no control over them during the battle.
Therefore, Robocode is ideal for testing IAs. Furthermore, a tank has no perfect
knowledge about the environment (i.e. the arena); its knowledge is restricted to what its
radar can read. Robocode game has been used as a benchmark in many previous works
including [4], [36], [37]. The following subsections describe the Robocode game in
detail.
2.9.1 Robot Anatomy
As shown in Fig. 2-23, every robot consists of a body, a gun, and a radar. The body
carries the gun and the radar. The radar scans for other robots and the gun shoots bullets
with a configurable speed. The energy consumption of a bullet depends on its damage
strength.
Fig. 2-23: A Robocode robot anatomy
2.9.2 Robot Code
Every robot has a main thread and up to five other running threads. The main thread
usually runs an infinite loop where actions are taken. Robocode game also provides some
listener classes for triggering specific actions at certain events such as colliding a robot,
detecting a robot, hitting a robot by a bullet, being hit by a bullet … etc.
2.9.3 Scoring
There are several factors for ranking the battling tanks in Robocode. The factor
considered in this work is the “Bullet Damage”; it is the score awarded to a robot when
one of its bullets hits an opponent.
45
Chapter 3: Human Imitating Cognitive
Modeling Agent (HICMA)
Introduction
This thesis introduces a Human Imitating Cognitive Modeling Agent (HICMA): a
cognitive agent that models human behaviors by using statistical approaches. We believe
that some human-like behaviors can be represented by simple statistical models. This is
much simpler than imitating the complicated operation of mind as targeted by several
previous works. HICMA models every environmental object or phenomenon by a
statistical model, which has some parameters. The different combinations of these
parameters are used in such a way that each combination is interpreted as a human
behavior. HICMA presents an updated version of Minsky’s society of mind theory
described in section 2.2. It introduces to the agents’ society a new type of agents that is
capable of evolving other agents. This evolution is done by GA (Section 2.5) and Nelder-
Mead method (Section 2.8). Other methods like CMA-ES (Section 2.7.4) can also be
used.
Section 3.2 describes the components of HICMA and section 3.3 explains how
these components interact together.
The Structure of HICMA
For testing the behavior of HICMA, a software agent based on it was implanted in
a Robocode robot. The structure of this agent is shown in Fig. 3-1.
Fig. 3-1: The structure of HICMA’s Robocode agent
46
Every agent in the society has the following attributes:
An identifier
A fitness measure
A set of evolvable parameters (e.g. maximum speed, minimum temperature …
etc.) that define the state of the agent
A default state (defined by default parameter values)
Each of agent’s parameters has the following attributes:
An identifier
An initial value
A list of evolver agents
An evolver is an agent that implements an optimization technique (e.g. Nelder-
Mead). It is responsible for adapting the parameters of another agent based on its fitness.
The fitness of an agent is calculated by its parent agent. For example, in Fig. 3-1 the
shooting agent is the parent of the modeling agent and, therefore, responsible for
evaluating it.
The following subsections describe in detail HICMA’s Robocode agents.
3.2.1 Modeling Agent
The modeling agent is responsible for building and optimizing models of the
interesting features of the environment and any interesting objects in this environment.
In this work, it models the behavior of an enemy Robocode tank (i.e. how it dodges
bullets). The block diagram of the proposed modeling agent is shown in Fig. 3-2.
Fig. 3-2: The block diagram of HICMA’s modeling agent
HICMA’s modeling agent samples the location of the target robot to construct a
model for its motion. Building motion model is done by a hybrid optimization technique
which integrates Nelder-Mead method [34] and Covariance Matrix Adaptation Evolution
Strategy (CMA-ES) [28]. Nelder-Mead method can find a local optimum whereas CMA-
ES can find the global optimum or a near-optimum solution. Nonetheless, experiments
showed that CMA-ES is less accurate than Nelder-Mead method. Therefore, the hybrid
optimization method is proposed. As shown in Fig. 3-3, the hybrid optimization method
uses CMA-ES to get a rough estimate of model parameters then uses Nelder-Mead
method to fine-tune them.
47
Fig. 3-3: The hybrid optimizatiob method
A similar technique is proposed in [38] where GA is employed instead of CMA-
ES. The proposed hybrid optimization method does not have a fixed objective function.
Instead, it composes an objective function from a set of elementary mathematical
functions such as linear, polynomial, cosine, and exponential functions. As described
later, the function combination is suggested by a GA evolver agent. This allows for
selecting a good choice from a wide search space of function combinations. Every
combination is evaluated by its bullet miss rate:
𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 =# 𝑏𝑢𝑙𝑙𝑒𝑡𝑠 𝑡ℎ𝑎𝑡 𝑚𝑖𝑠𝑠 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑢𝑙𝑙𝑒𝑡𝑠
Where a lower miss rate means a higher fitness.
As mentioned above, every agent has a set of evolvable parameters. Every
parameter has an evolver that searches for its optimum value. The parameters of the
modeling agent along with their evolvers are described in Table 3-1. For instance, the
optimum value of the initial guess parameter is searched by a Nelder-Mead evolver that
uses the fitness provided by the shooting agent (Refer to Fig. 3-1).
Table 3-1: The parameters of modeling agent
Symbol Description Evolver Type
function set The set of basic mathematical functions that form
the objective-function composition GA
initial guesses The initial guess of every optimized parameter Nelder-Mead
step sizes The step size of the search process of every
parameter Nelder-Mead
lower bounds The lower bound of every parameter in the search
space a Nelder-Mead
upper bounds The upper bound of every parameter in the search
space a Nelder-Mead
tolerance The maximum allowed error in the optimization
process Nelder-Mead
trials The maximum number of trials the CMA-ES
makes to find a solution b Nelder-Mead
sample count The number of samples used in the optimization
process b Nelder-Mead
a Experiments show that CMA-ES algorithm finds good solutions for almost any bounds
b More trials and larger sample size give more accurate solutions but consumes more time, so
compromising is important
48
Just for studying the effect of evolving an agent, the modeling agent is the only
one, so far, allowed to evolve its parameters. Evolving the parameters of the other agents
is a future work.
3.2.2 Estimation Agent
The function of the estimation agent is to predict or estimate the future of an entity
(i.e. the environment or an environment object). It uses one or more models (built by the
modeling agent) at a time. For instance, the proposed estimation agent uses two models
of target’s motion in the arena: a model for the vertical motion of the target tank (along
y- axis) and the other model for its horizontal motion (along x-axis). The output of this
agent is an estimate or prediction of where (location) and when (time) to shoot at the
target tank. This output is sent to the shooting agent described next. Obviously, to predict
the future, the time should be considered as a parameter in the modeling process. The
block diagram of the proposed estimation agent is shown in Fig. 3-4.
Fig. 3-4: The block diagram of HICMA’s modeling agent
3.2.3 Shooting Agent
The shooting agent acts as:
1. A parent of the modeling agent
2. An interface between the estimation agent and the gun of the Robocode robot
As a parent of the modeling agent, the shooting agent is responsible for triggering
its evolvers (GA and Nelder-Mead, as given in Table 3-1) and providing these evolvers
with a fitness corresponding to its current state. The shooting agent is illustrated in Fig.
3-5.
Fig. 3-5: The block diagram of HICMA’s shooting agent
49
3.2.4 Evolver Agents
As mentioned above (Refer to Table 3-1), the parameters of the modeling agent are
optimized by a Nelder-Mead evolver and a GA evolver agents. The following subsections
describe both in detail.
3.2.4.1 Nelder-Mead Agent
As described in section 2.8, Nelder-Mead is a simplex-based optimization
algorithm. Its function, as an evolver agent, is to find the optimum parameter values of
the modeling agent (Table 3-1) that optimizes its modeling process. This optimizer agent
optimizes the parameters of the modeling agent one by one; it first finds the best initial
guess that obtains a good solution quickly. Then, it finds the best step size, the best
bounds of the search space and so on.
It is important not to confuse with the Nelder-Mead as a part of the hybrid method
(CMA-ES + Nelder-Mead) implemented in the modeling agent. As a modeling method,
it is used for optimizing objective functions while, as an evolver, it is used for tuning the
parameters of another agent. The two roles are completely unrelated.
The operation of Nelder-Mead evolver agent is illustrated in Fig. 3-6. It requires
frequent evaluations of the candidate solutions (i.e. the simplex points shown in Table
2-5). Each of the candidate solution holds a parameter combination of the modeling
agent. Every candidate solution is evaluated by the shooting agent, as it is the parent of
the modeling agent.
Fig. 3-6: The operation of Nelder-Mead evolver agent
However, the parent (i.e. shooting agent) has no explicit evaluation function for its
child (the modeling agent); it must run a complete Robocode round for every candidate
solution. This requires the Nelder-Mead algorithm to interrupt at every evaluation point.
A modified version of Nelder-Mead algorithm with the interrupt points is given in
Algorithm 3-1. The algorithm is interrupted at steps 4, 7, and 11. Therefore, a single
iteration of the algorithm requires up to three Robocode rounds.
50
Algorithm 3-1: A modified version of Nelder-Mead algorith with interrupt points
1. Order the points of Nelder-Mead simplex
2. Calculate x0: the center of gravity of all points except the worst xn+1
3. Calculate the reflected point xr
4. Evaluate the objective function at the reflected point xr
5. If xr is better than the second worst xn, but not better than the best x1, replace xn+1
by xr and go to step 1
6. If xr is the best point, compute the expanded point xe
7. Evaluate xe
8. If xe is better than xr, replace xn+1 by xe and go to 1
9. Else, replace xn+1 by xr and go to step 1
10. At this step, the selected point is not better than xn. Compute the contracted point
xc
11. Evaluate the objective function at the contracted point xc
12. If xc is better than xn+1, replace xn+1, by xc and go to step 1
xc is not better than xn+1, update all points of the simplex except the best x1
(reduction) and go to step 1
The definitions of reflected, expanded, and contracted points; and “Reduction” operation
are explained in section 2.8.3.
3.2.4.2 GA Evolver Agent As aforementioned, the function combination used by the hybrid optimization
method is suggested by a GA evolver. It suggests different combinations of elementary
mathematical functions until the modeling agent gets a satisfying result (i.e. a good
fitness). GA is employed for this purpose to allow for a large range of elementary
functions (although in the test case at hand only 4 functions are used). 3.2.4.2.1 Chromosome Encoding
A set of elementary functions (called function lexicon) contains the basic building
blocks of the objective-function composition. Each elementary function has an index in
the function lexicon as shown in Table 3-2.
Table 3-2: The function lexicon
Index Function Formula
0 Quadratic a.x2 + b.x + c
1 Linear a.x + b
2 Exponential a.eb.x
3 Cosine a.cos(b.x+c)
Each chromosome represents an objective-function composition. It consists of a
set (no duplicates) of integers; each represents the index of an elementary function in
the lexicon.
Fig. 3-7 shows a chromosome example with its decoded objective function.
51
Fig. 3-7: A chromosome example
3.2.4.2.2 Initial Population
The initial population contains five distinct individuals generated randomly.
Every gene holds a random integer representing a function index in the function lexicon.
The size of the initial population was selected empirically; it is one of the parameters of
the GA Evolver agent, and hence, can be evolved so that the optimum population size
could be obtained. Selecting the initial population size has been extensively targeted in
[39]–[42].
3.2.4.2.3 Fitness Function
As aforementioned, the shooting agent is responsible for evaluating the modeling
agent. The fitness measure is bullet miss rate: the ratio of missed bullets to the total
number of shot bullets during a complete Robocode round. Hence, smaller fitness values
are better. Notice that evaluating one individual requires a complete Robocode round.
Therefore, every generation has only one new individual that replaces an individual from
the old generation. The new individual is then evaluated after the new round; another
individual is replaced by a new one, and so on.
3.2.4.2.4 Selection Method
Roulette-wheel selection is used; fitter individuals are more likely selected.
3.2.4.2.5 Elitism
One elite is copied to the next generation. Elitism allows the IA to use the best
solution found so far in case it had to interrupt the evolution process for any reason. For
example, if the Robocode battle (several rounds) terminated before the proposed
modeling agent gets the optimum model of the motion of the target, then trying to hit the
target using the best model on hand is better than staying inactive. This is similar to what
a human player may do in a similar situation.
3.2.4.2.6 Mutation and Crossover
Mutation is performed as following. This prevents having duplicate genes in one
chromosome:
1. Initialize a temporary lexicon (temp) with the set of all function indices in the
function lexicon, and define an empty new chromosome
2. Define a mutation constant Pmc
3. Generate a random real number R within the interval [0, 1]. If R ≤ Pmc , do the
following, otherwise go to step 4:
4. Copy a random gene (i.e. an integer) from the temporary lexicon (temp) to the
new chromosome, and remove that gene from the temporary lexicon so that it
could not be selected again.
52
5. If the selected gene already exists in the new chromosome, replace it by the
next gene in the old chromosome and remove it from there.
6. If the new chromosome is not filled yet, go to step 3.
7. Remove the corresponding gene from the old chromosome and copy it to the
new one if it does not exist in the new one, otherwise go to step (3.a)
8. If the new chromosome is not filled yet, go to step 3.
Hence, mutation probability Pm of the ith gene is:
𝑃𝑚 = 𝑃𝑚𝑐 ∗ (1 −1
𝐿2) ∗ (1 − [
𝑖 − 1
𝐿∗ (1 − 𝑃𝑚𝑐)]) , 𝑓𝑜𝑟 1 ≤ i ≤ 𝐿
(3-1)
Where L is the size of the lexicon.
The constant Pmc is the mutation constant. The second term represents the
probability that the gene copied from temp lexicon differs from the old one at the same
position. The length of the lexicon is L. Therefore, the probability of selecting any index
j from the lexicon is 1
𝐿. In addition, the probability of any index (an allele) to exist in the
old chromosome at any gene is 1
𝐿. Hence, the probability that an index (allele) j is copied
from the lexicon and the same allele exists at the corresponding position in the old
chromosome is 1
𝐿2
Thus, the probability that the copied allele (from the lexicon) is not the same as the
old one is (in the old chromosome) is
1 −1
𝐿2
(3-2)
The last term is the probability that the new gene (from temp) has not previously
been copied from the old chromosome. At the ith iteration, the new chromosome is
already filled with i-1 genes selected randomly with equal probabilities (1
𝐿 for every
allele). Hence, the probability of any index (allele) to exist in the new chromosome at the
ith iteration is
𝑖 − 1
𝐿
(3-3)
At the first iteration (i=1), the new chromosome is still empty. Thus, the probability
that the copied index already exists is zero. This is clear from formula (3-3). To better
clarify this formula, assume that the length of the lexicon is 10. At the 6th iteration, 5
indices have already been copied to the new chromosome. Hence, the probability that the
6th index exists in the previous 5 ones is 1
2. Substituting into formula (3-3) gives the same
result.
53
The formula (3-3) is correct if the ith index is copied from all of the L indices in
the lexicon. However, step 4 in the mutation process above states that the selected
chromosome is removed from the lexicon so that is could not be re-selected again. This
means that the only source of a repeated allele must be the old chromosome. Therefore,
the probability that the ith index is duplicated is the probability that it has been previously
selected from the lexicon (formula (3-3)) multiplied by the probability of copying from
the old chromosome:
𝑖 − 1
𝐿∗ (1 − 𝑃𝑚𝑐)
(3-4)
From formula (3-4), it is concluded that the probability that the ith index is not
duplicated is
1 − [𝑖 − 1
𝐿∗ (1 − 𝑃𝑚𝑐)]
(3-5)
For a gene in the new chromosome (copied from the lexicon) to be different from
the corresponding one in the old chromosome:
1. Mutation must occur [Pmc ]
2. The new gene must hold a different value [formula (3-2)]
3. The copied allele must be not duplicated [formula (3-5)]
Combining these three conditions gives formula (3-1). This means that the
probability of mutation decreases as we proceed from one gene to the next one.
Pmc and L are empirically selected. In the experiments, the initial value of Pmc is
0.5 and the chromosome length L is 4 as shown in the function lexicon in Table 3-2.
Fig. 3-8 shows the flowchart of the mutation process. After mutation, the new
chromosome is decoded into an objective-function composition for the modeling agent.
Then, it is evaluated by the shooting agent as aforementioned. It is noteworthy that more
decision variables, such as the starting point of Nelder-Mead’s simplex, can be encoded
as proposed in [43].
54
Fig. 3-8: The flowchart of the mutation process
55
The Operation of HICMA
This section describes the operation of HICMA as a Robocode robot. It describes
the interactions between its society agents.
The operation of HICMA can be divided into three phases:
1. Initialization phase
Initializing HICMA’s agents
2. Operation phase
Playing a Robocode round
3. Evolution (Learning) phase
Evaluating the performance of HICMA’s agents in the previous round
Fig. 3-9 shows a flowchart of these phases in a Robocode game run.
Fig. 3-9: Operation phases flowchart of HICMA
Initialization Phase
The initialization phase begins just after the game starts. It includes the following:
- Constructing the agents of the society
- Assigning an ID to every agent
- Setting the initial state of every agent (parameter initialization)
- Restoring the states of agents learned from previous experiences if any
56
As shown in Fig. 3-11, the initialization phase starts with initializing the shooting
agent, which then initializes the estimation agent and two instances of the modeling
agent. After initialization, HICMA is ready for a Robocode battle.
Operation Phase
The operation phase shown in Fig. 3-12 is the core of the battle. After the battle
begins, the shooting agent – as the parent of other agents – initializes an agent index to
be used in the evolution phase. Next, it initializes an instance of the estimation agent and
two instances of the modeling agent. This initialization is done at the beginning of every
battle to recall any stored experience from a previous battle.
After that, the shooting agent uses the radar to sample the position of the opponent
robot. It then forwards the collected samples to the two modeling agents and sends a
command to each of them to start building a model of target’s motion path. One modeling
agent builds a model of the motion in the horizontal direction (i.e. along the x-axis) and
the other in the vertical direction (i.e. along the y-axis). What happens inside a modeling
agent is described next.
Once the shooting agent gets the motion model, it gets the current position of
HICMA’s robot in the arena. Next, the shooting agent forwards the two models along
with the current position to the estimation agent, which estimates the most suitable time
and angle to shoot at the target. This is done by simultaneously solving two equations (of
the two models) with a single unknown (time). The hybrid method is used for solving
these equations but other methods can be used. The operation of the estimation agent is
described later.
Finally, after getting an estimated time and angle for hitting the target, the shooting
agent validates this estimate against the borders of the arena and translates it to a
command to the gun. This command takes the form: “At time τ, turn to angle Θ and shoot
a bullet”.
The operation phase is repeated for every bullet while the gun is cooling down.
Therefore, it does not cause any delay.
Evolution Phase
The evolution phase represents the learning capability of the system. It is based on
the performance of an agent during its operation phase over the previous round.
As shown in Fig. 3-13, The evolution phase begins after every Robocode round.
The shooting agent uses the agent index, initialized in the operation phase, to evolve its
children agents one by one. In turn, every child evolves its parameters one by one as
shown in Fig. 3-14. Each parameter is allowed to evolve until no more improvement in
fitness is achieved. A single evolution of one parameter takes a complete one Robocode
round.
The fitness of a parameter value is determined during the round. For example, the
fitness of the first parameter (the function pair) of the modeling agent is the ratio of the
57
bullets that missed the target to the total number of bullets during the previous round. It
is calculated by the shooting agent, as it is the parent of the modeling agent.
After a child agent finishes evolving all its parameters, the parent agent advances
the agent index to point to the next agent. This is shown in the last condition in Fig. 3-13.
Finally, after the parent agent finishes the evolution of all of its child agents, it
declares the end of the evolution phase. This means that there will be no more evolution
hereafter. Thus, only the operation phase will be repeated over every round until the
entire battle ends.
Modeling Agent Operation
The optimization process shown in Fig. 3-14 is the core of the modeling agent
operation. It consists of the following:
- Covariance Matrix Adaptation – Evolution Strategy (CMA-ES) algorithm
- Nelder-Mead algorithm
- An objective function
- A termination condition
- A solution vector
First, the shooting agent initializes the CMA-ES algorithm. This includes
initializing its initial guess, step size, search-space bounds, sample size … etc. Next, the
samples collected in the first loop shown in Fig. 3-12 are stored in the buffer of the
modeling agent.
The shooting agent then commands the first modeling agent to optimize its
objective function and to return the result, which models the motion of the target along
the x-axis. Next, the modeling agent runs a multi-start CMA-ES algorithm and stores the
best solution (the one with minimum optimization error). The best solution is then used
by a multi-start Nelder-Mead algorithm as its initial guess. This integration between
CMA-ES and Nelder-Mead is called the hybrid algorithm.
As mentioned in section 3.2.1, the output of the CMA-ES fed to the Nelder-Mead
is a rough estimate of the final solution. Therefore, this rough estimate might be too rough
for Nelder-Mead to converge to the global optimum. Thus, the hybrid algorithm may fall
into a local optimum. To overcome this problem, the entire optimization process (CMA-
ES + Nelder-Mead) is repeated for a number of trials to make sure that the final solution
is the optimum one. The optimization process loops until the maximum number of trials
is exceeded or a tolerable error is achieved.
Similarly, the whole optimization process is repeated for modeling target’s motion
along the y-axis. The results are then returned to the shooting agent as described in the
operation phase.
58
Estimation Agent Operation
The operation of the estimation agent is similar to that of the modeling agent.
However, the objective function here is a univariate function with the time t as the solo
variable. It is composed of the following:
- The two models returned by the two modeling agents
- The speed of the bullet vb
- The time t
- The current position of the shooting robot (x, y)
It takes the following form:
𝑣𝑏 . 𝑡 = √(𝑥 − 𝑥𝑚𝑜𝑑𝑒𝑙(𝑡))2+ (𝑦 − 𝑦𝑚𝑜𝑑𝑒𝑙(𝑡))
2
(3-6)
Solving this equation gives the time at which the target robot, represented by
(xmodel, ymodel), can be at the same location of the shooter’s bullet. This process is
illustrated in Fig. 3-10. Such equation normally has several solutions. Therefore, the
estimation agent selects the soonest one (i.e. the minimum time t).
An example of an objective function of the estimation agent is given in Eq. (3-7)
11. 𝑡 = √(50 + 3. 𝑡 − 2. cos(3. 𝑡 + 4) + 2)2 + (90 − 3. 𝑡 + 2. 𝑒9.𝑡 − 5)2 (3-7)
First, the estimation agent receives a command from the shooting agent with
current (x,y) position of the shooting robot, and the x-model and y-model returned by the
modeling agent. Next, an objective function similar to Eq. (3-7) is composed and
optimized by the CMA-ES and Nelder-MEAD as shown in Fig. 3-16. The process then
continues as described in the operation of the modeling agent.
Bullet Vb.t
Shooter (x, y)
Target ( xmodel(t), ymodel(t) )
Solution (estimated location) (xs, ys)
Fig. 3-10: Solving the estimating problem
59
Fig. 3-11: The initialization phase of HICMA
60
Fig. 3-12: The operation phase of HICMA
61
Fig. 3-13: The evolution phase of HICMA
62
Fig. 3-14: The evolution sequence of parameters
63
Fig. 3-15: The optimization operation of the modeling agent
64
Fig. 3-16: The operation of the estimation agent
65
Chapter 4: Experiments and Results
This chapter presents two types of experiments:
1. Human-similarity experiments
Two experiments that measure how and how far HICMA behaves like a
human
2. Evolution experiment
An experiment that examines the benefit of the Nelder-Mead evolver agent
Every experiment targets one or more agents of the system as shown in Fig. 4-1.
Fig. 4-1: Experiment map with HICMA’s agents
All experiments examine the behavior of HICMA agent in a Robocode robot. Only
the five agents described in section 3.2 are implemented: modeling, estimation, shooting,
Nelder-Mead evolver and GA evolver. There are no agents for dodging opponent’s
bullets. Therefore, HICMA’s robot is experimented for only its targeting efficiency. That
is, the opponent robots only dodge bullets without shooting. Two Robocode robots are
involved as opponents:
1. Shadow 3.66d: A powerful robot used in a previous Robocode targeting contest
as a reference robot. It uses a sophisticated dodging technique called “wave
surfing”. It is used for testing HICMA’s efficiency in shooting at advanced
dodging robots.
2. Walls: A simple robot that always follows the walls of the arena. It is used for
testing how efficiently HICMA can model simple behaviors
Table 4-1 gives the parameters of the Robocode simulation environment.
Experiments
Human-likeness
Human Behavior Imitation
- GA evolver Agent
Human Performance Likeness
- Modeling Agent
- Estimation Agent
- Shooting Agent
Evolution
Modeling-Agent Evolution
-Nelder-Mead evolver Agent
66
Table 4-1: Robocode simulation parameters
Rule Value
Gun Cooling Rate 0.1
Inactivity Time 250
Arena Size 600 x 600
Human-Similarity Experiments
This section presents two experiments:
1. Human behavior imitation
An experiment that illustrates how HICMA imitates typical human behaviors such as
wisdom, carelessness, recklessness … etc.
2. Human performance similarity
An experiment that compares the performance of HICMA with the performance of
human players and hand-coded Robocode robots.
These experiments are described in the next two subsections.
4.1.1 Human Behavior Imitation
As mentioned in section 3.2.1, the objective function of the modeling agent is
composed of a pair of basic mathematical functions. An interesting observation is that
every function pair models a typical human behavior. To affirm that, HICMA was forced
to use every possible pair of functions. For every pair, HICMA’s behavior was observed
over ten rounds against both Shadow and Walls. The following observations are then
taken as shown in Fig. 4-2:
1. Hit %: The average percentage of bullets that hit the target (reflects absolute
efficiency)
2. Round Time: The average time that HICMA took to beat its enemy (reflects
absolute goal reaching). Notice that there is no round time in Fig. 4-2.a because
HICMA could not beat Shadow with any function pair except for pair (1, 2).
3. Avg. Shooting Interval. The average time interval between every two
consecutive bullets shot by HICMA (reflects attack tendency).
4. Avg. Shooting-Interval Standard Deviation: The standard deviation of the Avg.
Shooting Interval (reflects regularity of shooting).
67
The mapping of every function pair into a certain human behavior is derived from:
1) the observations in Fig. 4-2 and 2) the behavior of HICMA in every battle. The second
point is illustrated by a video at [44].
The previous results show that every function pair imitates a human behavior. For
example, the function pair [0, 1] imitates a wise behavior. Interpreting this function pair
as “wise” behavior is justified by the fact that this function pair makes HICMA shoot
moderately (moderate average shooting interval), regularly (small standard deviation),
and wisely (relatively high hit rate). Similarly, the behavior of the function pair [1, 2] is
interpreted as tricky behavior as HICMA keeps silent for a while and suddenly shoots at
the target; it seems as if it deceives its opponent. This behavior has an average shooting
interval almost similar to that of the wise behavior. However, its standard deviation (and
its variance) is higher, meaning that it fires bullets less regularly than the wise behavior.
The titles of the imitated human behaviors are merely derived from the English
dictionary Longman. This is done by mapping the definitions of these titles with the
observations defined above. For example, Longman’s definition of the word “tricky” is
“a tricky person is clever and likely to deceive you”. This definition agrees with the
behavior of the function pair [1, 2] described above. This The other function pairs can
be mapped to human behaviors similarly by the guidance of the video at [44] and Table
4-2, where a darker color indicates a better result.
(a)
(b)
Fig. 4-2: The behavior of every function pair against: (a) Shadow and (b) Walls
68
Table 4-2: Human behavior interpretation
Opponent Wise Careless Reckless Tricky Artless Unwise
Hit % Shadow 1.325 0.000 5.591 6.022 4.831 1.553
Walls 53.037 6.673 7.951 43.556 31.030 4.573
Round
Time Walls 418.90 2147.60 1970.50 481.600 617.100 1480.10
Avg.
shooting
interval
Shadow 41.400 455.900 109.500 54.800 44.800 58.400
Walls 36.600 83.900 59.100 34.500 31.200 39.300
Std.
Deviation
Shadow 8.959 166.500 38.968 12.426 8.702 10.543
Walls 2.836 13.924 11.628 4.972 4.185 3.234
Table 4-3 summarizes the observations given in Table 4-2.
Table 4-3: Description of human behaviors modeled by mathematical functions
Function Pair Description Behavior
[0,1] Moderately shoots at the target wise
[0,2] Shoots even if the probability of hitting the target is
low careless
[0,3] Shoots rashly reckless
[1,2] Suddenly shoots at the target when it gets close
enough. tricky
[1,3] Can hit the target only if its motion is very predictable artless
[2,3] Shoots almost continuously and unwisely unwise
The shooting agent is the parent of two instances of the modeling agent: one for
modeling opponent’s motion along the vertical axis (y-axis) and the other for the
horizontal axis (x-axis). The results showed that HICMA could mix two behaviors into a
new hybrid one by selecting a function pair for the x-axis different from that of the y-
axis. For instance, when battling Walls robot, HICMA tends to use pairs [0, 1] (wise) and
[1, 3] (artless) as if it feels that Walls is not tricky and there is no need to worry about it.
On the other hand, when HICMA encounters Shadow, it tends to use pair [1, 2] (tricky)
for both models, as Shadow uses its advanced technique for misleading its opponents. In
almost all battles against Shadow, the function pair [1, 2] was autonomously selected. By
experimenting other pairs manually (i.e. with GA evolver disabled), it is found that the
pair [1, 2] is the only one that could beat Shadow. This is because this pair makes HICMA
behaves like a tricky person in firing bullets suddenly and unexpectedly after some period
of silence.
69
4.1.2 Human Performance Similarity
This experiment measures how much HICMA performs like a human player. A
number of competitors played ten rounds against both Shadow and Walls, and their scores
were recorded. The competitors are:
1. HICMA: The proposed agent. It autonomously optimizes its parameters from the
initial state, as given in Table 4-4, to a suitable state.
2. Three hand-coded robots:
- DrussGT [45]: A very advanced robot. It is the champion of 1-vs-1 Robocode
champion since 2008
- GresSuffurd [46]: An advanced robot that uses an advanced targeting
technique called “Guess Factor” [47]
- Wallaby [48]: A competitive robot with a sophisticated targeting technique
called “Circular Targeting” [49]
To test only the targeting functionality of these robots, they were modified to
shoot without moving.
3. Twelve human players (each played against Shadow and Walls). Their data is
shown in Table 4-5.
Table 4-4: The initial state of HICMA
Parameter Value
Function composition indices {2, 3}a
Nelder-Mead Simplex initial guess {{0.5,0.5},{5.0,0.5,0.5}}
Nelder-Mead Simplex step sizes {{5.0.0.5},{2.5,0.5,0.25}}
Nelder-Mead Simplex lower bounds {{-500, -500},{-500, -500, -500}}
Nelder-Mead Simplex upper bounds {{500, 500},{500, 500, 500}}
Tolerance b 1.0e-10
Trials b 1
Sample size 3
Table 4-5: Human Players Data
Gender Males
Age 21-27
Education 10 engineering students
2 communication engineers
Familiarity with the game Novice
The average scores of the competitors over the ten rounds are shown in Fig. 4-3.
These results are further clarified in Fig. 4-4. It is obvious that HICMA has the most
similar performance to that of the human players. Furthermore, HICMA’s behavior
70
resembles the human performance in that it does not always outperform other IAs; both
HICMA and the human players performed better against Walls than against Shadow.
Fig. 4-3: Peroformace similarity with human players
Fig. 4-4: Peroformace difference with human players
Modeling-Agent Evolution
These experiments test the benefit of the Nelder-Mead evolver agent in optimizing
the modeling agent. They are conducted as follows: The first parameter of the modeling
agent (i.e. the objective-function composition) was manually set to [0, 1]. This parameter
is out of the scope of this experiment as it is evolved by the GA evolver agent (covered
in section 4.1.1). Next, the Nelder-Mead evolver agent is activated to evolve the
parameters of the modeling agent as shown in Fig. 4-5-a (the lift column). The parameters
of the Nelder-Mead agent are manually tuned as given in Table 4-6. The description of
every parameter is given in Table 3-1. The initial values are carefully set so that the step
size almost equals the difference between initial guess and the expected optimum value.
For example, the step size of the initial-guess parameter is set to 10, as the initial value
of that parameter is 10 and its expected optimum value is close to 0.
71
(a1) (b1)
(a2) (b2)
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Init
ial
gues
s
Round
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Init
ial
gues
s
Round
0
2
4
6
8
10
12
18
22
26
30
34
38
42
46
50
54
58
62
66
70
74
78
82
86
90
94
98
102
106
110
Ste
p s
ize
Round
0
2
4
6
8
10
12
16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91S
tep
siz
eRound
72
(a3) (b3)
(a4) (b4)
-250
-200
-150
-100
-50
0
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
Lo
wer
bo
und
Round-250
-200
-150
-100
-50
0
93 94 95 96 97 98 99 100 101 102 103 104 105 106
Lo
wr
bo
und
Round
0
50
100
150
200
250
136 137 138 139 140 141 142 143 144 145 146 147
Up
per
bo
und
Round
0
50
100
150
200
250
107
109
111
113
115
117
119
121
123
125
127
129
131
133
135
137
139
141
143
145
147
149
151
153
155
157
159
Up
per
bo
und
Round
73
(a5) (b5)
(a6) (b6)
-2.00E-11
0.00E+00
2.00E-11
4.00E-11
6.00E-11
8.00E-11
1.00E-10
1.20E-10
148 149 150 151
To
lera
nce
Round-2.00E-11
0.00E+00
2.00E-11
4.00E-11
6.00E-11
8.00E-11
1.00E-10
1.20E-10
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
To
lera
nce
Round
0
5
10
15
20
25
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
Tri
als
Round
0
5
10
15
20
25
184 185 186 187 188 189 190T
rial
sRound
74
(a7) (b7)
Fig. 4-5: The evolution of the parameters of the modeling agent
0
2
4
6
8
10
12
14
174 175 176 177 178 179 180 181 182 183
Sam
ple
siz
e
Round
0
2
4
6
8
10
12
14
191 192 193 194
Sam
ple
siz
e
Round
75
Table 4-6: The parameters of Nelder-Mead evolver used in the experiment
Parameter Fitness Measure Initial Value Step Size Threshold a
1 Initial guesses Evaluation count b 10 10 550
2 Step sizes Evaluation count 10 5 600
3 Lower bounds Evaluation count -200 100 550
4 Upper bounds Evaluation count 200 100 580
5 Tolerance Evaluation count 1 x 10-100 1 x 10-99 350
6 Trials Minimum error c 1 1 1 x 10-11
7 Sample count Minimum error 3 5 1 x 10-11
a If the difference between the fitness of two solutions is less than the threshold, they are considered equal
b Evaluation count is the number of objective-function evaluations that the modeling agent performs to get
a solution
c The minimum error, of the modeling process, that the modeling agent could get (i.e. the error of the best
solution)
To further study the Nelder-Mead evolver, the entire experiment was repeated
again with bad parameter values as given in Table 4-7. The meaning of bad values is that
the step size is far from the difference between the initial value and the optimum value.
For example, the step-size of the initial guess parameter is set to 1, although the initial
value is 10 and the expected optimum value is close to 0. For fair comparison between
the two experiments, the thresholds and the initial values of the parameters are set
identical. The results of this experiment are shown in Fig. 4-5-b (the right column).
Table 4-7: Bad parameter values of Nelder-Mead evolver (used for verfication)
Parameter Fitness Measure Initial Value Step Size Threshold a
1 Initial guesses Evaluation count b 10 1 550
2 Step sizes Evaluation count 10 1 600
3 Lower bounds Evaluation count -200 1 550
4 Upper bounds Evaluation count 200 1 580
5 Tolerance Evaluation count 1 x 10-100 1 x 10-100 350
6 Trials Minimum error c 1 10 1 x 10-11
7 Sample count Minimum error 3 10 1 x 10-11
The following observations can be extracted from the results of both experiments
as shown in Fig. 4-5:
1. The initial guess in the first experiment was evolved to 7, which is slightly better
than in the second experiment (i.e. 9) as shown in Fig. 4-5.1. However, the
optimum initial guess is 0 and, therefore, both results are poor. Although the
algorithm was given a good step size in the first experiment, it fell in a local
optimum.
2. As shown in Fig. 4-5.2, the step size in the first experiment was evolved to 8.3. It
is slightly worse than in the second experiment. The Nelder-Mead algorithm took
only a single step away from its given initial value (10). This is a reason of why
Nelder-Mead method easily falls in local optima. Other methods such as CMA-
ES usually take several steps away from their given initial values.
76
3. In the first experiment as shown in Fig. 4-5.3 and Fig. 4-5.4, evolving the upper
bound and lower bound parameters gained a small benefit. The evolver could
tighten the search space from [-200, 200] to [-166, 200]. However, this
improvement has a trivial effect on performance. Furthermore, the algorithm was
given a good step size (i.e. 100) to tighten the search space more, but it fell again
in a local optimum. In the second experiment, the evolver gained no
improvement.
4. As shown in Fig. 4-5.5, evolving the tolerance parameter aggravated its initial
value. For reducing the required number of objective function evaluations, it is
normal to tolerate a wider range of errors. Unexpectedly, the evolver tightened
the tolerance instead of loosening it. Apparently, this is because the fitness
function (i.e. a Robocode round) is very noisy; no two rounds are the same. The
evolver must have made more evaluations before it settles on a solution. This is
another reason of why Nelder-Mead method easily falls in local optima.
5. Evolving the trials parameter was quite good in the first experiment; the evolver
selected to run the modeling process once. This agrees with the manual
experiments, which showed that the hybrid method usually finds the optimum
solution by only one run. However, in the second experiment, the evolver
worsened the initial value; it selected 11 instead of 1. As shown in Fig. 4-5-b.6,
the evolver made a long step away from the initial value to a local optima. It
followed this long jump with a number of short steps; none of them could be long
enough to return to the optimum value. This may be a third reason of why Nelder-
Mead method easily falls in local optima; it tends to make consecutive
contractions (refer to section 2.8.3) when it takes a step to a worse solution. Once
the simplex is contracted, the following steps become shorter and returning to the
initial step becomes less probable.
6. In evolving the sample size parameter, the first experiment, shown in Fig. 4-5-
a.7, could get rather good results. It selected a sample size of 8 instead of 3 (the
initial guess). This is normal as collecting more samples decreases the error and
vice versa. However, in the second experiment shown in Fig. 4-5-b.7, the evolver
settled on a sample size of 3, which seems insufficient for running a good
modeling process but, anyway, better than the initial guess.
It is concluded from the previous observations that the benefit of the Nelder-Mead
evolver mainly depends on its own parameters. This is because the Nelder-Mead method
can easily fall in a local optimum as mentioned in section 2.8. This calls for either of the
following solutions:
- Using CMA-ES or the hybrid method (explained in section 3.2.1) instead of
Nelder-Mead method
- Fine-tuning the Nelder-Mead evolver by another evolver
The second solution seems less realistic as the new evolver may also need a third
evolver. On the other hand, CMA-ES and the hybrid method are less sensitive to their
initial parameters and, usually, converge to global optima.
77
Chapter 5: Conclusions and Future Work
Conclusions
Statistical modeling techniques can greatly enhance human-behavior imitation of
intelligent agents. They provide not only a simple method for modeling human behaviors
but also robust tools for adapting the behavior of the agent. This work introduces an
intelligent agent that incorporates statistical modeling techniques into a multi-agent
system. It presents a Human Imitating Cognitive Modeling Agent (HICMA) that
represents an enhanced model of the society of mind theory. HICMA introduces a new
type of agents called the evolver agent, which is the source of behavior and performance
evolution. HICMA was implemented and tested in a Robocode robot. It comprises five
agents: modeling agent, estimation agent, shooting agent, and two novel evolver agents.
The modeling agent observes the location of the enemy while moving in the arena
and builds a model for its motion path. This is done by a hybrid optimization method of
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and Nelder-Mead method.
Combining these two methods gains the global search of CMA-ES along with the speed
of Nelder-Mead.
The estimation agent uses the model built by the modeling agent to predict the
future path of the enemy. It finds the soonest time at which HICMA’s bullet can catch
the enemy and estimates the location of the enemy at that time. This estimation is made
also by the hybrid optimization method used by the modeling agent.
The shooting agent acts as the interface between the estimation agent and the gun
of the Robocode robot. It converts the location estimated by the estimation agent into a
rotation angle of the gun.
The two evolvers use GA and Nelder-Mead method to adapt the parameters of the
modeling agent. The GA evolver builds the objective function of the modeling agent. The
different forms of this objective function reflect different human-like behaviors. The
Nelder-Mead evolver finds the optimum values of some parameters of the hybrid method
in the modeling agent such as the sample size, the initial guess … etc.
Results show that HICMA behaves humanly. It evolved behaviors that could be
mapped to typical human behaviors such as wisdom, carelessness … etc. Furthermore, it
is found that the employed evolution technique autonomously produces the behavior that
is most suitable for the given situation.
78
Future Work
The proposed system (HICMA) implemented five agents in a Robocode robot:
shooting, modeling, estimation, GA evolver, and Nelder-Mead evolver. A future work is
to add one or more agents for maneuvering so that HICMA can also dodge opponent’s
bullets and participate in real 1-vs-1 Robocode battles.
Currently, the modeling agent is the only evolvable one. A future work is to add
the evolution functionality to shooting and estimation agents. In addition, evolving the
evolver agents is expected to enhance the performance of the entire system.
HICMA could gracefully imitate human behaviors. However, statistical modeling
alone seems insufficient to perform well in a real world such as a Robocode battle. This
suggests integrating statistical models with other techniques like ANN. This cooperation
may greatly enhance the performance of the intelligent agent.
The hybrid optimization method (Nelder-Mead + CMA-ES) is used for modeling
environmental phenomena. It finds a model that relates an output phenomenon to a set of
input features. In this work, the inputs and the output of the model is manually specified.
However, a feature-selection agent can do this task autonomously. It can collect samples
from all available sensors (e.g. temperature, humidity, light … etc.) and relate them to
the available actuators (e.g. motor, gun, heater … etc.). Thus, the modeling agent can
learn how to control every feature. For example, a robot can learn how to conserve its
energy by learning the relation between the battery consumption rate and the slope of the
path.
The results presented in section 4.2 reflect the poor performance of Nelder-Mead
method as an evolver. This calls for adding a new evolver based on CMA-ES or the
hybrid method rather than Nelder-Mead method.
For testing the general principle of the evolver agent, a Nelder-Mead evolver is
used for adapting the parameters of the CMA-ES in the Modeling agent. However, the
CMA-ES is self-adaptive and most of its parameters need no external evolution. For
example, the step-size of its search process is automatically updated at every iteration. A
future work is to confine the evolution to non-self-adaptive parameters such as the sample
size. This suggests adding a flag to every parameter of an agent to indicate whether this
parameter is evolvable or not.
79
References
[1] “Robocode Home.” [Online]. Available: http://robocode.sourceforge.net/.
[2] M. Minsky, “K-Lines : A Theory of Memory *,” Cogn. Sci., vol. 33, pp. 117–133,
1980.
[3] J. Ortega, N. Shaker, J. Togelius, and G. N. Yannakakis, “Imitating human playing
styles in Super Mario Bros,” Entertain. Comput., vol. 4, no. 2, pp. 93–104, Apr.
2013.
[4] Y. Shichel, E. Ziserman, and M. Sipper, “GP-Robocode : Using Genetic
Programming to Evolve Robocode Players,” Proc. 8TH Eur. Conf. Genet.
Program., pp. 143–154, 2005.
[5] Y. Shichel and M. Sipper, “GP-RARS: evolving controllers for the Robot Auto
Racing Simulator,” Memetic Comput., vol. 3, no. 2, pp. 89–99, May 2011.
[6] A. Agapitos, M. O. Neill, A. Brabazon, T. Theodoridis, M. O’Neill, A. Brabazon,
and T. Theodoridis, “Learning environment models in car racing using stateful
GA,” 2011 IEEE Conf. Comput. Intell. Games, pp. 219–226, Aug. 2011.
[7] M. Ebner and T. Tiede, “Evolving Driving Controllers using Genetic
Programming,” Comput. Intell. Games, pp. 279–286, 2009.
[8] A. Agapitos, J. Togelius, and S. M. Lucas, “Evolving Controllers for Simulated
Car Racing using Object Oriented Genetic Programming Categories and Subject
Descriptors,” GECCO ’07 Proc. 9th Annu. Conf. Genet. Evol. Comput., vol. 2, pp.
1543–1550, 2007.
[9] B. Chaperot and C. Fyfe, “Motocross and Artificial Neural Networks,” in Game
Design and Technology Workshop, 2008.
[10] R. Hecht-Nielsen, “The Mechanism of Thought,” 2006 IEEE Int. Jt. Conf. Neural
Netw. Proc., no. May, pp. 419–426, 2006.
[11] L. I. Perlovsky, “Toward physics of the mind: Concepts, emotions, consciousness,
and symbols,” Phys. Life Rev., vol. 3, no. 1, pp. 23–55, Mar. 2006.
[12] L. Perlovsky, “Modeling Field Theory of Higher Cognitive,” Artif. Cogn. Syst.,
pp. 65–106, 2007.
[13] Y. Wang, S. Member, Y. Wang, S. Patel, and D. Patel, “A Layered Reference
Model of the Brain ( LRMB ),” IEEE Trans. Syst. Man, Cybern. (Part C), vol. 36,
no. 2, 2006.
80
[14] Y. Wang and V. Chiew, “On the cognitive process of human problem solving,”
Cogn. Syst. Res., vol. 11, no. 1, pp. 81–92, Mar. 2010.
[15] Y. Wang, “A cognitive Informatics reference Model of Autonomous Agent
Systems ( AAS ),” Int’l J. Cogn. Informatics Nat. Intell., vol. 3, no. March, pp. 1–
16, 2009.
[16] Y. Wang, “formal rtpA Models for a set of Meta-Cognitive processes of the brain,”
Int’l J. Cogn. Informatics Nat. Intell., vol. 2, no. December, 2008.
[17] T. L. Griffiths, C. Kemp, and J. B. Tenenbaum, “Bayesian models of cognition,”
Cambridge Handb. Comput. Cogn. Model., pp. 1–49, 2008.
[18] M. B. Fayek and O. S. Farag, “HICMA : A Human Imitating Cognitive Modeling
Agent using Statistical Methods and Evolutionary Computation,” Comput. Intell.
Human-like Intell. (CIHLI), 2014 IEEE Symp., pp. 4–9, Dec. 2014.
[19] P. S. Holzman, “Personality,” Encyclopedia Britannica. Encyclopædia Britannica,
Inc., 2013.
[20] A. Ortony, “On Making Believable Emotional Agents Believable,” Emotions in
humans and artifacts. pp. 189–212, 2003.
[21] K.-H. Lee, “Evolutionary algorithm for a genetic robot’s personality,” Appl. Soft
Comput., vol. 11, no. 2, pp. 2286–2299, Mar. 2011.
[22] Oxford Dictionary, “definition of ambience in English.” [Online]. Available:
http://www.oxforddictionaries.com/definition/english/ambience.
[23] Encyclopedia Britannica, “Ubiquitous - Merriam-Webster.” [Online]. Available:
http://www.merriam-webster.com/dictionary/ubiquitous.
[24] I. Guyon, “An Introduction to Variable and Feature Selection,” J. Mach. Learn.
Res., vol. 3, pp. 1157–1182, 2003.
[25] M. Melanie, “An introduction to genetic algorithms,” Comput. Math. with Appl.,
vol. 32, p. 133, 1996.
[26] H.-G. Beyer, “Evolution strategies,” Scholarpedia, vol. 2, no. 8, p. 1965, 2007.
[27] N. HANSEN and A. Auger, “Evolution Strategies and CMA-ES (Covariance
Matrix Adaptation),” in Proceedings of the 2014 Conference Companion on
Genetic and Evolutionary Computation Companion, 2014, pp. 513–534.
[28] N. Hansen and a Ostermeier, “Completely derandomized self-adaptation in
evolution strategies.,” Evol. Comput., vol. 9, no. 2, pp. 159–195, 2001.
81
[29] N. Hansen, “An analysis of mutative sigma-self-adaptation on linear fitness
functions.,” Evol. Comput., vol. 14, pp. 255–275, 2006.
[30] A. Chotard Alexandre, A. Auger, and N. Hansen, “Cumulative Step-size
Adaptation on Linear Functions: Technical Report,” Springer, Jun. 2012.
[31] D. V Arnold, “Evolution Strategies with Cumulative Step Length Adaptation on
the Noisy Parabolic Ridge Technical Report CS-2006-02 Evolution Strategies
with Cumulative Step Length Adaptation on the Noisy Parabolic Ridge,”
Computer (Long. Beach. Calif)., pp. 1–30, 2006.
[32] N. Hansen, “The CMA Evolution Strategy : A Tutorial,” 2011.
[33] Nikolaus Hansen, “CMA-ES Source Code,” 2015. [Online]. Available:
https://www.lri.fr/~hansen/cmaes_inmatlab.html. [Accessed: 30-Jan-2015].
[34] B. J. A. Nelder and R. Meadf, “A simplex method for function minimization,”
Comput. J., vol. 7, no. 4, pp. 308–313, 1965.
[35] J. E. J. Dnnis and D. J. Woods, “optimization on microcomputers-Nelder
Mead.pdf,” in ARO Workshop on Microcomputers, 1985.
[36] A.-H. Tan and G.-W. Ng, “A Biologically-Inspired Cognitive Agent Model
Integrating Declarative Knowledge and Reinforcement Learning,” 2010
IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol., pp. 248–251, Aug.
2010.
[37] V. Alexiev, “Machine Learning through Evolution : Training Algorithms through
Competition,” Trinity University, 2013.
[38] N. E. Mastorakis, “On the solution of ill-conditioned systems of linear and non-
linear equations via genetic algorithms (GAs) and Nelder-Mead simplex search,”
EC’05 Proc. 6th WSEAS Int. Conf. Evol. Comput., pp. 29–35, Jun. 2005.
[39] M. O. Odetayo, “Optimal population size for genetic algorithms: an
investigation,” in Genetic Algorithms for Control Systems Engineering, IEE
Colloquium on, 1993, pp. 2/1–2/4.
[40] J. T. Alander, “On optimal population size of genetic algorithms,” in CompEuro
’92 . “Computer Systems and Software Engineering”,Proceedings., 1992, pp. 65–
70.
[41] O. Roeva, S. Fidanova, and M. Paprzycki, “Influence of the population size on the
genetic algorithm performance in case of cultivation process modelling,” in
Computer Science and Information Systems (FedCSIS), 2013 Federated
Conference on, 2013, pp. 371–376.
82
[42] C. R. Reeves, “Using Genetic Algorithms With Small Populations,” in
Proceedings of the Fifth International Conference on Genetic Algorithms, 1993,
pp. 92–99.
[43] N. Maehara and Y. Shimoda, “Application of the genetic algorithm and downhill
simplex methods (Nelder–Mead methods) in the search for the optimum chiller
configuration,” Appl. Therm. Eng., vol. 61, no. 2, pp. 433–442, Nov. 2013.
[44] O. S. Farag, “Human-like behavior in Robocode - HICMA,” 2015. [Online].
Available: https://youtu.be/FNM8z97momM.
[45] Skilgannon, “DrussGT - RoboWiki,” 2013. [Online]. Available:
http://robowiki.net/wiki/DrussGT.
[46] GrubbmGait, “GresSuffurd - RoboWiki,” 2013. [Online]. Available:
http://robowiki.net/wiki/GresSuffurd. [Accessed: 01-Jan-2015].
[47] RoboWiki, “GuessFactor Targeting Tutorial - RoboWiki,” 2010. [Online].
Available: http://robowiki.net/wiki/GuessFactor_Targeting_Tutorial. [Accessed:
01-Jan-2015].
[48] Wompi, “Wallaby - RoboWiki,” 2012. [Online]. Available:
http://robowiki.net/wiki/Wallaby. [Accessed: 01-Jan-2015].
[49] RoboWiki, “Circular Targeting/Walkthrough - RoboWiki,” 2009. [Online].
Available: http://robowiki.net/wiki/Circular_Targeting/Walkthrough. [Accessed:
01-Jan-2015].
[50] R. Steuer, J. Kurths, C. O. Daub, J. Weise, and J. Selbig, “The mutual information:
detecting and evaluating dependencies between variables.,” Bioinformatics, vol.
18 Suppl 2, pp. S231–40, Jan. 2002.
[51] H. C. Peng, F. H. Long, and C. Ding, “Feature selection based on mutual
information: Criteria of max-dependency, max-relevance, and min-redundancy,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, 2005.
[52] G. Herman, B. Zhang, Y. Wang, G. Ye, and F. Chen, “Mutual information-based
method for selecting informative feature sets,” Pattern Recognit., vol. 46, no. 12,
pp. 3315–3327, Dec. 2013.
[53] A. Khan, M. Ishtiaq, and M. A. Jaffar, “A Hybrid Feature Selection Approach by
Combining miD and miQ,” IEEE ICET, 2010.
[54] H. Li, X. Wu, Z. Li, and W. Ding, “Group Feature Selection with Streaming
Features,” 2013 IEEE 13th Int. Conf. Data Min., pp. 1109–1114, Dec. 2013.
[55] K. H. Knuth, “Optimal Data-Based Binning for Histograms.” Departments of
Physics and Informatics, University at Albany, 2013.
83
[56] L. Birgé, Y. Rozenholc, L. Birge, and Y. Rozenholc, “HOW MANY BINS
SHOULD BE PUT IN A REGULAR HISTOGRAM,” ESAIM Probab. Stat., vol.
10, no. February, p. 22, Jan. 2006.
[57] L. Devroye and G. Lugosi, “Bin width selection in multivariate histograms by the
combinatorial method,” Test, vol. 13, no. 1, pp. 129–145, 2004.
[58] B. Silverman, “Density estimation for statistics and data analysis,” Chapman Hall,
vol. 37, no. 1951, pp. 1–22, 1986.
[59] B. E. Hansen, “Lecture Notes on Nonparametrics,” Univ. Wisconsin, 2009.
[60] Y. Soh, Y. Hae, A. Mehmood, R. H. Ashraf, and I. Kim, “Performance Evaluation
of Various Functions for Kernel Density Estimation,” Open J. Appl. Sci., vol.
2013, no. March, pp. 58–64, 2013.
[61] “Comparison of Kernel Density Estimators,” Thail. Stat., vol. 8, no. July, pp. 167–
181, 2010.
[62] M. Clark, “A comparison of correlation measures,” Center for social research,
2013. [Online]. Available:
http://www3.nd.edu/~mclark19/learn/CorrelationComparison.pdf. [Accessed: 20-
Apr-2015].
84
85
Feature Selection
A.1. Mutual Information
Mutual Information (MI) provides a general measure of dependency between
variables [50]. It has been widely used in feature selection [51]–[54]. The mathematical
definition of MI between two discrete random variables X and Y is:
𝐼(𝑋; 𝑌) = ∑ ∑𝑝(𝑥, 𝑦) log (𝑝(𝑥, 𝑦)
𝑝(𝑥). 𝑝(𝑦))
𝑥∈𝑋𝑦 ∈𝑌
Where:
p(x) is the marginal probability distribution function of the variable X. It is the
probability that the variable X has the value x.
p(x, y) is the joint probability distribution function of the variables X and Y. It is
the probability that the variables X and Y have the values x and y
The main disadvantage of MI is the relative complexity of estimating p(x) and
p(x,y). For estimating these probabilities, several methods are used such as histograms
and kernels described in the next subsections.
A.1.1. Histogram Density Estimation
Histogram is the most basic density estimation method. It is a graphical
representation of the distribution of variables (features). Fig. A-1 shows the probability
distribution of a variable (solid line) and a histogram representing that distribution (bars).
The horizontal axis represents the values of the variable and the vertical axis represents
the probability of these values.
A histogram is constructed as follows:
1. Divide the domain of the sampled data into equal b intervals called bins.
Fig. A-1: A histogram
86
2. Select the starting point of the histogram xo (the origin of the histogram)
3. For every sample, add a block with size 1.0 to the bin that contains this sample in
its interval.
A histogram is formally defined as following:
𝑓(𝑥) =1
𝑛ℎ∑∑𝐼(𝑥 ∈ 𝐵𝑖). 𝐼(𝑋𝑗 ∈ 𝐵𝑖)
𝑛
𝑗=1
𝑏
𝑖=1
(A-1)
Where:
- b is the number of bins
- n is the number of samples
- Bi is the ith bin of the histogram
- Xj is the jth sample of the variable X
- I is the indicator function. 𝐼( 𝑥 ∈ 𝐴) = {1 𝑖𝑓 𝑥 ∈ 𝐴0 𝑖𝑓 𝑥 ∉ 𝐴
- h is the width of a bin. There are several method for selecting the optimal bin
size [55]–[57].
The first summation in Eq. (A-1) finds the bin to which x belongs. The second summation
finds how many samples belong to that bin. Then, the function 𝑓(𝑥) counts the
percentage of observations that are close to x.
The final histogram appears like that in Fig. A-1. Every column represents the
number of samples that falls within its interval. Such histogram can estimate the marginal
probability distribution p(x) of a single variable x. The same method can be extended to
estimate the joint probability distribution p(x1, x2, … xd) of d variables. In this case, the
bin size becomes a d-dimensional vector d = (d1, d2, … d3). A 2D histogram is shown in
Fig. A-2.
87
Despite the simplicity of histograms, these have some drawbacks:
1. Sensitive to bin size
2. Sensitive to the selection of its origin (Examine Fig. A-2 (a) and (b))
3. Not smooth
More details about histogram density estimation is provided in [58].
A.1.2. Kernel Density Estimation
The idea of Kernel Density Estimation (KDE) is rather similar to that of
histograms. However, instead of stacking rectangular blocks, KDE stacks a different
form of a function called kernel. A function k(u) can be used as a KDE function if it
satisfies the following criteria [59]:
1. Its integral ∫ 𝑘(𝑢)∞
−∞= 1
2. Non-negative: k(u) ≥ 0, for all u
3. Symmetric: k(u) = - k(u)
Common kernel functions used in density estimation are given in Table A-1. A complete
list of KDE’s is given in [59], [60].
Fig. A-3 illustrates the estimation of a probability distribution using the Gaussian kernel
function.
(a) (b)
Fig. A-2: A 2D histogtam with origin at (a) (-1.5, -1.5) and (b) (-1.625, -1.625)
By Drleft (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL
(http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
88
Table A-1: Common Kernel density functions
Kernel Equation Diagram
Uniform k(u) = 0.5, |u| ≤ 1
Gaussian 𝑘(𝑢) =1
√2𝜋𝑒−
12𝑢2
Epanechnikov 𝑘(𝑢) =3
4(1 − 𝑢2), |𝑢| ≤ 1
KDE is used as follows:
4. Choose a kernel function k (Uniform, Gaussian, Epanechnikov … etc.)
A comparison between the different kernel functions is given in [60], [61].
Epanechnikov and Gaussian kernels are the most common ones.
5. Estimate probability distribution by a function 𝑓(𝑥) which counts the percentage
of observations that are close to x:
𝑓(𝑥) =1
𝑛. ℎ∑𝑘 (
𝑥𝑖 − 𝑥
ℎ)
𝑛
𝑖=1
(A-2)
Fig. A-3: Kernel density estimation. The density distribution (dotted curve) is estimated by the
accumulation (solid curve) of Gaussian function curves (dashed curves)
89
Where:
- n is the number of samples
- h is the bandwidth of the kernel. It controls the degree of smoothness of the
estimation
6. Estimate the bandwidth using Silverman’s rule of thumb:
ℎ = �̂�𝐶(𝐾)𝑛−15 , 𝑤ℎ𝑒𝑟𝑒:
- �̂� = √1
𝑛−1∑ (𝑥𝑖 − �̅�)2𝑛
𝑖=1 is the sample standard deviation of x; the standard
deviation calculated from a sample of x. n is the number of samples. xi is the
ith sample of x and �̅� is the mean of all samples of x.
- Cv (k) is the rule-of-thumb constant as given in Table A-2.
Table A-2: Rule-of-thumb constants
Kernel (K) C
Uniform 1.84
Gaussian 1.06
Epanichnekov 2.34
Formula (A-2) estimates the marginal probability distribution function p(x) of a
single variable x. It can be extended to estimate the joint probability distribution function
p(x1, x2,… xd) of d variables as follows:
𝑓(𝑥) =1
𝑛|𝐻|∑[∏𝑘(
𝑋𝑖 − 𝑥
ℎ𝑗)
𝑑
𝑗=1
]
𝑛
𝑖=1
, 𝑤ℎ𝑒𝑟𝑒 (A-3)
- H = (h1, h2, … hd) is the bandwidth vector
- x = (x1, x2, … xd) is the feature vector
The bandwidth hj of a variable xj is calculated using the rule-of-thumb:
ℎ𝑗 = 𝜎�̂�𝐶(𝑘, 𝑑)𝑛−
1(4+𝑑) , 𝑤ℎ𝑒𝑟𝑒:
- d is the dimension of the variable vector (i.e. number of variables)
- 𝐶(𝑘, 𝑑) is the rule-of-thumb constant:
𝐶(𝐾, 𝑑) = (4. 𝜋
𝑞22𝑞+1𝑅(𝐾)𝑑
2. 𝐶𝑘2(𝐾)(3 + (𝑑 − 1))
)
14+𝑑
90
R(K) and Ck(K) are called the roughness and the second moment respectively. Their
values are given in Table A-3.
Table A-3: Common kernel constants
Kernel (K) R(K) Ck(K)
Uniform 1/2 1/3
Gaussian 1/2√𝜋 1
Epanichnekov 3/5 1/5
A 2-D KDE is shown in Fig. A-4.
KDE has the following advantages over histograms [50]:
1. Better mean square error
2. Insensitive to the choice of the origin
3. The ability to specify more sophisticated window shapes than the rectangular
window
However, the main disadvantage of KDE is its complexity. It requires too many
calculations and, consequently, not suitable for real-time applications such as robotics
and computer games.
(a) (b)
Fig. A-4: 2D kernel density estimate (a) individual kernels and (b) the final KDE
By Drleft (talk) 00:04, 16 September 2010 (UTC) (Own work) [GFDL
(http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 4.0-3.0-2.5-2.0-1.0
(http://creativecommons.org/licenses/by-sa/4.0-3.0-2.5-2.0-1.0)], via Wikimedia Commons
91
A.2. Correlation
Correlation represents how closely two variables co-vary. It ranges between -1
and +1 , where +1 means perfect positive correlation, -1 means perfect negative
correlation, and 0 means no correlation. The main advantage of correlation is that it is
much less complex than Mutual Information. This is because it does not involve the time-
consuming process of density estimation. Two types of correlation are experimented in
this work: Pearson Correlation Coefficient and Distance Correlation. They are
overviewed in the next two subsections.
A.2.1. Pearson Correlation Coefficient (PCC)
Pearson Correlation Coefficient (PCC) is the most common type of correlation
measures. However, it is limited to linear relationships between variables (features).
PCC is defined as the covariance ρ between two variables divided by the product
of their standard deviations:
𝜌(𝑋, 𝑌) =𝑐𝑜𝑣(𝑋, 𝑌)
𝜎𝑥𝜎𝑦=
∑ (𝑥𝑖 − �̅�)(𝑦𝑖 − 𝑦 )𝑛𝑖=1
√∑ (𝑥𝑖 − �̅�)2𝑛𝑖=1 √∑ (𝑦𝑖 − 𝑦 )2𝑛
𝑖=1
(A-4)
Fig. A-5 shows different relations between two variables. The graphs in the first
row illustrate different strengths of correlations between the two variables. The more
linear the relation, the stronger the correlation. The second row illustrates perfect
correlations. Notice that there is no correlation in the middle graph because one of the
variables has zero variance (i.e. constant). In the third row, the two variables seem related.
However, there are no correlations because there are no linear relationships between
them. This reflects the fact that PCC is concerned with only linear relationships.
Fig. A-5: Pearson Correlations of different relationships between two variables
By DenisBoigelot, original uploader was Imagecreator (Own work, original uploader was
Imagecreator) [CC0], via Wikimedia Commons
92
A.2.2. Distance Correlation
Distance Correlation is a relatively new measure of statistical dependence between
two variables. In contrast to Pearson Correlation, it gives a zero correlation if and only if
the two variables are statistically independent. It also takes into account the non-linear
relationships between variables as shown in Fig. A-6. However, experiments in [62]
show that mutual information gives stronger dependency measure than distance
correlation.
The author also compared Mutual Information with Distance Correlation using
similar non-linear patterns. The results are shown in Fig. A-7.
Fig. A-6: Distance Correlation of linear and non-linear relationships
By Naught101 [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0
http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Fig. A-7: Murual Information vs Distance Correlation as dependence measures
93
The author suggests that: “Where distance correlation might be better at detecting
the presence of (possibly weak) dependencies, the MIC is more geared toward the
assessment of strength and detecting patterns that we would pick up via visual
inspection”.
أ
أسامة صالح الدين فرج مهندس: 1987 / 2/ 11 تاريخ الميالد:
مصرى الجنسية: 2010/ 10/ 1 تاريخ التسجيل:
2015 / / تاريخ المنح: هندسة الحاسبات القسم: ماجستير العلوم الدرجة:
المشرفون: ا.د. ماجدة بهاء الدين فايق
الممتحنون:
ة عبد الرازق مشالىا.د. سامي ا.د. محمد مصطفى صالح ا.د. ماجدة بهاء الدين فايق
عنوان الرسالة:
تحسين الوكيل الذكي بتطوير محاكاة سلوك اإلنسان باستخدام تقنيات النمذجة اإلحصائية
الكلمات الدالة:
الوكيل الذكى؛ الوكيل المعرفى؛ تمثيل السلوك اإلنسانى؛ الحسابات التطورية؛ التعلم اآللى ملخص الرسالة:
ذا البحث يقدم طريقة جديدة لتمثيل السلوك اإلنسانى ال تعتمد على نظريات العقل العصبية. هذه هالطريقة تدمج طرق نمذجة إحصائية مع نظرية "مجتمع العقل" لبناء نظام يحاكى سلوك اإلنسان. إن الوكيل الذكى اإلدراكى المحاكى لسلوك اإلنسان و المعروض فى هذا البحث يستطيع تعديل
سلوكه تلقائيا بناءا على الموقف الذى يواجهه.
ب
ج
ملخص الرسالة
إن الذكاء البشرى هو أعظم مصدر إللهام الذكاء الصناعى. حيث ان بناء نظم تتصرف تصرفات إنسانية يعد من طموحات جميع باحثى الذكاء الصناعى. و لهذا الغرض يتبع الباحثون أحد
الحيوية للعقل أو إيجاد طريقة للحصول على ذكاء صناعى -بيةاإلتجاهين: إما دراسة النظريات العص مشابه لذكاء اإلنسان بطرق غير مناظرة لنظريات العقل العصبية.
هذا البحث يتبع المذهب الثانى حيث يتم استخدام طرق إحصائية لتمثيل السلوك اإلنسانى. هذا لغير حيوية لبناء و تحسين نماذج يدمج عددا من التقنيات ا HICMAالبحث يقدم وكيال ذكيا يسمى
للبيئة المحيطة بالوكيل الذكى. بتغيير معامالت هذه النماذج يمكن الحصول على سلوك مشابه يعتبر مجموعة أو مجتمع من عدة وكالء تتفاعل HICMAللسلوك البشرى. و الوكيل المقدم هنا
جديد حسن هذه النظرية بتقديم نوعمعا. و هذا الوكيل الذكى مبنى على نظرية "مجتمع العقل" حيث يمن الوكالء: الوكيل المطِور. و الوكيل المطِور هو نوع خاص من الوكالء وظيفته هى تعديل أو
ضبط وكالء آخرين بناءا على المشكلة التى يواجهها النظام الذكى.
لعميل المطِور أن لإن الطريقة البسيطة التى يتبعها النظام المقدم هنا لتمثيل السلوك اإلنسانى تتيح ُيكسب النظام شخصية أو سلوكا إنسانيا مناسبا للمشكلة التى يواجهها.
ستخدام تقنيات النمذجة تحسين الوكيل الذكي بتطوير محاكاة سلوك اإلنسان با اإلحصائية
إعداد
أسامة صالح الدين فرج رسالة مقدمة إلى
جامعة القاهرة –كلية الهندسة ماجستير العلومحصول على درجة الكجزء من متطلبات
فى هندسة الحاسبات
يعتمد من لجنة الممتحنين
المشرف الرئيسى أستاذ دكتور/ ماجدة بهاء الدين فايق
أستاذ دكتور / سامية عبد الرازق مشالى
الحاسبات و المنظومات, معهد بحوث اإللكترونياتقسم -
أستاذ دكتور / محمد مصطفى صالحعمليات و دعم القرار, كلية الحاسبات و بحوث القسم -
المعلومات, جامعة القاهرة جامعة القاهرة –كلية الهندسة
جمهورية مصر العربية –الجيزة 2015
تحسين الوكيل الذكي بتطوير محاكاة سلوك اإلنسان باستخدام تقنيات النمذجة اإلحصائية
إعداد أسامة صالح الدين فرج
رسالة مقدمة إلىجامعة القاهرة –كلية الهندسة
ماجستير العلومكجزء من متطلبات الحصول على درجة فى
هندسة الحاسبات
تحت إشراف أستاذ دكتور
ماجدة بهاء الدين فايقجامعة القاهرة –كلية الهندسة
جامعة القاهرة –كلية الهندسة
جمهورية مصر العربية –الجيزة 2015
تطوير محاكاة سلوك اإلنسان باستخدام تقنيات النمذجة تحسين الوكيل الذكي ب اإلحصائية
إعداد أسامة صالح الدين فرج
رسالة مقدمة إلىجامعة القاهرة –كلية الهندسة
ماجستير العلومكجزء من متطلبات الحصول على درجة فى
هندسة الحاسبات جامعة القاهرة –كلية الهندسة
صر العربيةجمهورية م –الجيزة 2015