Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling...

125
ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN BEHAVIOR IMITATION USING STATISTICAL MODELING TECHNIQUES By Osama Salah Eldin Farag A Thesis submitted to the Faculty of Engineering at Cairo University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE In Computer Engineering FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2015

Transcript of Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling...

Page 1: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN

BEHAVIOR IMITATION USING STATISTICAL MODELING

TECHNIQUES

By

Osama Salah Eldin Farag

A Thesis submitted to the

Faculty of Engineering at Cairo University in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

In

Computer Engineering

FACULTY OF ENGINEERING, CAIRO UNIVERSITY

GIZA, EGYPT

2015

Page 2: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 3: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN

BEHAVIOR IMITATION USING STATISTICAL MODELING

TECHNIQUES

By

Osama Salah Eldin Farag

A Thesis submitted to the

Faculty of Engineering at Cairo University in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

In

Computer Engineering

Under the Supervision of

Prof. Dr. Magda Bahaa Eldin Fayek

Professor of

Computer Engineering

Faculty of Engineering, Cairo University

FACULTY OF ENGINEERING, CAIRO UNIVERSITY

GIZA, EGYPT

2015

Page 4: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 5: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN

BEHAVIOR IMITATION USING STATISTICAL MODELING

TECHNIQUES

By

Osama Salah Eldin Farag

A Thesis submitted to the

Faculty of Engineering at Cairo University in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

In

Computer Engineering

Approved by the

Examining Committee

____________________________________________________________

Prof. Dr. Magda Bahaa Eldin Fayek, Thesis Main Advisor

Prof. Dr. Samia Abdulrazik Mashaly,

- Department of Computers and Systems,

Electronics Research Institute

Prof. Dr. Mohamed Moustafa Saleh

- Department of Operations Research &

Decision Support, Faculty of Computers

and Information, Cairo University

FACULTY OF ENGINEERING, CAIRO UNIVERSITY

GIZA, EGYPT

2015

Page 6: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 7: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

Insert photo

here

Engineer’s Name: Osama Salah Eldin Farag

Date of Birth: 11 / 2 / 1987

Nationality: Egyptian

E-mail: [email protected]

Phone: +20/ 100 75 34 156

Address: Egypt – Zagazig City

Registration Date: 1 / 10 / 2010

Awarding Date: / / 2015

Degree: Master of Science

Department: Computer Engineering

Supervisors:

Prof. Magda B. Fayek

Examiners:

Prof. Samia Abdulrazik Mashaly

Prof. Mohamed Moustafa Saleh

Prof. Magda B. Fayek

Title of Thesis:

Enhancing intelligent agents by improving human behavior imitation using

statistical modeling techniques

Keywords:

Intelligent agent; Cognitive agent; Human imitation; Evolutionary computation;

Machine Learning

Summary:

This thesis introduces a novel non-neurological method for modeling human

behaviors. It integrates statistical modeling techniques with “the society of mind” theory

to build a system that imitates human behaviors. The introduced Human Imitating

Cognitive Modeling Agent (HICMA) can autonomously change its behavior according

to the situation it encounters.

Page 8: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

viii

Page 9: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ix

Acknowledgements

“All the praises and thanks be to Allah, Who has guided us to this, and never could we

have found guidance, were it not that Allah had guided us”

Immeasurable appreciation and deepest gratitude for the help and support are

extended to the following persons who, in one way or another, contributed in making this

work possible.

A sincere gratitude I give to Prof. Magda B. Fayek for her support, valuable

advice, guidance, precious comments, suggestions, and patience that benefited me much

in completing this work. I, heartily, appreciate her effort to impart her experience and

knowledge to my work.

I would also like to acknowledge with much appreciation all participants of

Robocode experiments. Many thanks to Ali El-Seddeek and his fellows; the students of

Computers department at faculty of Engineering - Cairo University. Thanks a lot to

Ahmed Reda and his students at faculty of Engineering - Zagazig University. Also,

thanks a million to my friends and coworkers who kindly participated in these

experiments.

Deep thanks to Mahmoud Ali and Mohammed Hamdy for helping getting

material and information that supported this work.

Finally, I warmly thank my family who has been motivating me to keep moving

forward. My deepest appreciation to all those who helped me complete this work.

Page 10: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

x

Page 11: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xi

Table of Contents

Acknowledgements ........................................................................................................ ix

Table of Contents ........................................................................................................... xi

List of Tables ................................................................................................................ xiv

List of Figures ............................................................................................................... xv

List of Abbreviations ................................................................................................ xviii

Nomenclature ............................................................................................................... xix

Abstract ........................................................................................................................ xxi

Chapter 1: Introduction ........................................................................................... 1

Problem Statement ...................................................................................... 1

Literature Review ....................................................................................... 1

Previous Work ............................................................................................ 2

Contributions of this Work ......................................................................... 4

Applications ................................................................................................ 4

1.5.1 Brain Model Functions ........................................................................ 5

1.5.2 Artificial Personality ........................................................................... 5

1.5.3 Ambient Intelligence and Internet of Things ...................................... 6

1.5.4 Ubiquitous Computing and Ubiquitous Robotics ............................... 7

Techniques .................................................................................................. 7

1.6.1 Feature Selection ................................................................................. 7

1.6.2 Modeling ........................................................................................... 10

Organization of the Thesis ........................................................................ 11

Chapter 2: Background ......................................................................................... 13

Introduction .............................................................................................. 13

The Society of Mind ................................................................................. 13

Page 12: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xii

2.2.1 Introduction ...................................................................................... 13

2.2.2 Example – Building a Tower of Blocks ........................................... 13

Evolutionary Computation (EC) .............................................................. 16

Evolutionary Algorithms (EA) ................................................................ 16

Genetic Algorithms (GA) ........................................................................ 17

Optimization Problem .............................................................................. 19

2.6.1 Introduction ...................................................................................... 19

2.6.2 Mathematical Definition ................................................................... 20

Evolution Strategies ................................................................................. 23

2.7.1 Basic Evolution Strategies ................................................................ 23

2.7.2 Step-size Adaptation Evolution Strategy (σSA-ES ) ........................ 26

2.7.3 Cumulative Step-Size Adaptation (CSA) ......................................... 28

2.7.4 Covariance Matrix Adaptation Evolution Strategy (CMA-ES) ....... 30

Nelder-Mead Method ............................................................................... 38

2.8.1 Introduction ...................................................................................... 38

2.8.2 What is a simplex ............................................................................. 38

2.8.3 Operation .......................................................................................... 38

2.8.4 Nelder-Mead Algorithm ................................................................... 40

Robocode Game ....................................................................................... 44

2.9.1 Robot Anatomy................................................................................. 44

2.9.2 Robot Code ....................................................................................... 44

2.9.3 Scoring .............................................................................................. 44

Chapter 3: Human Imitating Cognitive Modeling Agent (HICMA) ................ 45

Introduction .............................................................................................. 45

The Structure of HICMA ......................................................................... 45

3.2.1 Modeling Agent ................................................................................ 46

Page 13: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xiii

3.2.2 Estimation Agent ............................................................................... 48

3.2.3 Shooting Agent .................................................................................. 48

3.2.4 Evolver Agents .................................................................................. 49

The Operation of HICMA ........................................................................ 55

Chapter 4: Experiments and Results .................................................................... 65

Human-Similarity Experiments ................................................................ 66

4.1.1 Human Behavior Imitation ................................................................ 66

4.1.2 Human Performance Similarity ......................................................... 69

Modeling-Agent Evolution ....................................................................... 70

Chapter 5: Conclusions and Future Work ........................................................... 77

Conclusions .............................................................................................. 77

Future Work .............................................................................................. 78

References...................................................................................................................... 79

Feature Selection ................................................................................. 85

A.1. Mutual Information .................................................................................. 85

A.1.1. Histogram Density Estimation .......................................................... 85

A.1.2. Kernel Density Estimation ................................................................ 87

A.2. Correlation ................................................................................................ 91

A.2.1. Pearson Correlation Coefficient (PCC) ............................................. 91

A.2.2. Distance Correlation .......................................................................... 92

Page 14: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xiv

List of Tables

Table 2-1: A simple CMA-ES code ............................................................................... 35

Table 2-2: An example of “fitness” function ................................................................. 36

Table 2-3: An example of “sortPop” function ............................................................... 36

Table 2-4: An example of “recomb” function ............................................................... 36

Table 2-5: Simplexes in different dimensions ............................................................... 38

Table 2-6: Nelder-Mead Algorithm ............................................................................... 40

Table 2-7: Iteration count for different initial guesses of Nelder-Mead Algorithm ...... 42

Table 3-1: The parameters of modeling agent ............................................................... 47

Table 3-2: The function lexicon ..................................................................................... 50

Table 4-1: Robocode simulation parameters ................................................................. 66

Table 4-2: Human behavior interpretation ..................................................................... 68

Table 4-3: Description of human behaviors modeled by mathematical functions ........ 68

Table 4-4: The initial state of HICMA .......................................................................... 69

Table 4-5: Human Players Data ..................................................................................... 69

Table 4-6: The parameters of Nelder-Mead evolver used in the experiment ................ 75

Table 4-7: Bad parameter values of Nelder-Mead evolver (used for verfication) ......... 75

Table A-1: Common Kernel density functions .............................................................. 88

Table A-2: Rule-of-thumb constants ............................................................................. 89

Table A-3: Common kernel constants ........................................................................... 90

Page 15: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xv

List of Figures

Fig. 1-1: Asimple IoT system ........................................................................................... 7

Fig. 1-2: A model of battery-decay-rate ........................................................................... 8

Fig. 1-3: mathematical modeling techniques utilized or experimented in this work ....... 8

Fig. 1-4: Modeling-related technique ............................................................................... 9

Fig. 2-1: AI Disciplines map ......................................................................................... 14

Fig. 2-2: A builder agent with its sub-sgents .................................................................. 15

Fig. 2-3: Sub-agents of add agent ................................................................................... 15

Fig. 2-4: A general scheme of evolutionary algorithms ................................................. 17

Fig. 2-5: A 2D fitness landscape .................................................................................... 17

Fig. 2-6: Basic GA flowchart ......................................................................................... 18

Fig. 2-7: An objective function ...................................................................................... 19

Fig. 2-8: Local and global minima ................................................................................. 20

Fig. 2-9: An example application of mathematical optimization ................................... 21

Fig. 2-10 Minimization example in a 2D search space .................................................. 22

Fig. 2-11: Basic steps of evolution strategies ................................................................. 24

Fig. 2-12: Visualization of the search process of a (1/1,100)-ES ................................... 25

Fig. 2-13: One-Sigma ellipse of bivariate normal distribution N(0,I) [µ=0, σ=I] .......... 26

Fig. 2-14: Two random probability distributions with (a) σ = 1.0 and (b) σ = 3.0. The

circles are the one-sigma elipses ............................................................................. 27

Fig. 2-15: A 2D normal distribution (a) 2D vector of points and (b) two 1D histograms

................................................................................................................................. 28

Fig. 2-16: The principle of Cumulative Step-size Adaptation (CSA), ........................... 28

Fig. 2-17: A population of (a) Step-size Adaptation and (b) Covariance Matrix

Adaptation ............................................................................................................... 30

Fig. 2-18: A 2D normal distribution N(0,C) [µ=0, σ=C]................................................ 31

Fig. 2-19: Optimization of 2D problem using CMA-ES. ............................................... 31

Page 16: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xvi

Fig. 2-20: Operations of Nelder-Mead algorithm .......................................................... 39

Fig. 2-21: Twelve iterations of a practical run of Nelder-Mead algorithm ................... 42

Fig. 2-22: Nelder-Mead algorithm flowchart................................................................. 43

Fig. 2-23: A Robocode robot anatomy .......................................................................... 44

Fig. 3-1: The structure of HICMA’s Robocode agent ................................................... 45

Fig. 3-2: The block diagram of HICMA’s modeling agent ........................................... 46

Fig. 3-3: The hybrid optimizatiob method ..................................................................... 47

Fig. 3-4: The block diagram of HICMA’s modeling agent ........................................... 48

Fig. 3-5: The block diagram of HICMA’s shooting agent ............................................ 48

Fig. 3-6: The operation of Nelder-Mead evolver agent ................................................. 49

Fig. 3-7: A chromosome example .................................................................................. 51

Fig. 3-8: The flowchart of the mutation process ............................................................ 54

Fig. 3-9: Operation phases flowchart of HICMA .......................................................... 55

Fig. 3-10: Solving the estimating problem .................................................................... 58

Fig. 3-11: The initialization phase of HICMA ............................................................... 59

Fig. 3-12: The operation phase of HICMA .................................................................... 60

Fig. 3-13: The evolution phase of HICMA .................................................................... 61

Fig. 3-14: The evolution sequence of parameters .......................................................... 62

Fig. 3-15: The optimization operation of the modeling agent ....................................... 63

Fig. 3-16: The operation of the estimation agent ........................................................... 64

Fig. 4-1: Experiment map with HICMA’s agents .......................................................... 65

Fig. 4-2: The behavior of every function pair against: (a) Shadow and (b) Walls ......... 67

Fig. 4-3: Peroformace similarity with human players ................................................... 70

Fig. 4-4: Peroformace difference with human players ................................................... 70

Fig. 4-5: The evolution of the parameters of the modeling agent .................................. 74

Fig. A-1: A histogram .................................................................................................... 85

Page 17: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xvii

Fig. A-2: A 2D histogtam with origin at (a) (-1.5, -1.5) and (b) (-1.625, -1.625) .......... 87

Fig. A-3: Kernel density estimation. The density distribution (dotted curve) is estimated

by the accumulation (solid curve) of Gaussian function curves (dashed curves) ... 88

Fig. A-4: 2D kernel density estimate (a) individual kernels and (b) the final KDE ....... 90

Fig. A-5: Pearson Correlations of different relationships between two variables .......... 91

Fig. A-6: Distance Correlation of linear and non-linear relationships ........................... 92

Fig. A-7: Murual Information vs Distance Correlation as dependence measures .......... 92

Page 18: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xviii

List of Abbreviations

AI Artificial Intelligence

AmI Ambient Intelligence

ANN Artificial Neural Network

CMA-ES Covariance Matrix Adaptation Evolution Strategy

CSA-ES Cumulative Step-size Adaptation Evolution Strategy

EA Evolutionary Algorithm

EC Evolutionary Computation

ES Evolution Strategy

GA Genetic Algorithm

HICMA Human Imitating Cognitive Modeling Agent

IA Intelligent Agent

IoT Internet of Things

LAD Least Absolute Deviations

LRMB Layered Reference Model of the Brain

MFT Modeling Field Theory

PCC Pearson Correlation Coefficient

PIR Passive Infrared

RL Reinforcement Learning

RSS Residual Sum of Squares

SA-ES Step-size Adaptation Evolution Strategy

Ubibot Ubiquitous Robot

Ubicomp Ubiquitous Computing

Page 19: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xix

Nomenclature

A

agency ................................................ 14

Ambience ............................................. 6

Ambient intelligence ............................ 6

Artificial Neural Network .................... 2

C

comma-selection .......................... 22, 24

cost function ....................................... 20

covariance .......................................... 89

D

direct behavior imitation ...................... 2

E

elite .................................................... 22

elitist selection ................................... 22

Euclidean norm ........................... 28, 32

evaluation function ............................ 15

evolver agent...................................... 45

F

feature selection ................................... 7

fitness function ............................ 18, 22

fitness landscape .......................... 16, 22

functional form .................................. 10

G

Genetic Algorithms ............................. 1

global optimum ............................ 16, 18

H

heuristic search .................................. 37

Histogram .......................................... 83

I

indicator function ............................... 84

indirect behavior imitation ................... 2

Intelligent Agent .................................. 1

Internet of Things (IoT) ....................... 6

J

joint probability distribution .............. 83

K

kernel .................................................. 85

Kernel Density Estimation ................. 85

Koza-style GP ...................................... 2

L

local minimum ................................... 37

local optimum .............................. 16, 18

loss function ....................................... 18

M

marginal probability distribution . 83, 87

mutation strength ............................... 25

N

Nelder-Mead ...................................... 37

neurons ................................................. 2

O

object parameter ................................ 23

objective function ......................... 18, 20

one-σ ellipse ....................................... 29

one-σ line ........................................... 24

P

plus-selection ............................... 22, 24

Page 20: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xx

R

Reinforcement Learning ...................... 1

reinforcement signal ............................ 1

Robocode ............................................. 2

S

sample standard deviation .................. 87

search costs ........................................ 22

search space ....................................... 20

search-space ...................................... 16

Silverman’s rule of thumb ................. 87

society of mind ..................................... 1

standard deviation .............................. 89

statistical model ................................... 7

stochastic optimization problem ........ 15

stochastic search ................................ 15

T

training dataset .................................... 2

U

ubibots ................................................. 7

Ubicomp .............................................. 7

Ubiquitous Computing ........................ 7

Page 21: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

xxi

Abstract

Human intelligence is the greatest inspirational source of artificial intelligence

(AI). The ambition of all AI researches is to build systems that behave in a human-like

manner. For this goal, researches follow one of two directions: either studying the

neurological theories of the human brain or finding how a human-like intelligence can

stem from non-biological methods. This research follows the second school. It employs

statistical methods for imitating human behaviors. A Human-Imitating Cognitive

Modeling Agent (HICMA) is introduced. It combines different non-neurological

techniques for building and tuning models of the environment. Every combination of the

parameters of these models represents a typical human behavior. HICMA is a society of

intelligent agents that interact together. HICMA adopts the society of mind theory and

extends it by introducing a new type of agents: the evolver agent. An evolver is a special

type of agents whose function is to adapt other agents according to the encountered

situations. HICMA’s simple representation of human behaviors allows an evolver agent

to dress the entire system in a suitable human-like behavior (or personality) according to

the given situation.

HICMA is tested on Robocode [1]. Robocode is a game where autonomous tanks

battle in an arena. Every tank is provided by a gun and a radar. The proposed system

consists of a society of five agents (including two evolver agents) that cooperate to

control a Robocode tank in a human-like behavior. The individuals of the society are

based on statistical and evolutionary methods: CMA-ES, Nelder-Mead algorithm, and

Genetic Algorithms (GA).

Results show that HICMA could develop human-like behaviors in Robocode

battles. Furthermore, it could select the suitable behavior for every Robocode battle.

Page 22: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 23: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

1

Chapter 1: Introduction

Problem Statement

An Intelligent Agent (IA) is an entity that interacts with the environment by

observing it via sensors and acting on it via actuators. This interaction aims at achieving

some goals. An IA usually builds and keeps models of the environment and the

interesting objects in this environment. Such models represent how an AI sees the

environment and, consequently, how it behaves in it. These models are continuously

adapted according to the environmental changes. These adaptations are reflected on the

behavior of the AI. The ability of an AI to conform to environmental changes mainly

depends on the mechanisms it uses for building and adapting its internal model(s) of the

environment. These internal models became the focus of wide researches in the field of

artificial intelligence (AI). As human intelligence is the greatest inspirational source for

AI, many researches focused on imitating human intelligence by envisioning how human

brain works. However, the operation of mind is very complicated, so it may be better to

imitate human behavior without emulating the operation of mind. This thesis tackles

imitating human behavior by simple mathematical models.

This work integrates the society of mind [2] theory with statistical approaches to

achieve human-like intelligence. An enhanced adaptable model of the society of mind

theory is proposed, where statistical approaches are used to autonomously build and

optimize models of the environment and the interesting objects in this environment.

Literature Review

The problem of providing intelligent agents with human-like behavior has been

tackled in many researches using different AI techniques such as Reinforcement

Learning (RL), Genetic Algorithms (GA), and Artificial Neural Networks (ANN).

Reinforcement Learning (RL) targets the problem of which behavior an agent

should have to maximize a reward called the reinforcement signal. The agent learns the

proper behavior by trial-and-error interactions with the dynamic environment. It observes

the state of the environment through its sensors and selects an action to apply on the

environment by its actuators. Each of the available actions has a different effect on the

environment and gains a different reinforcement signal. The agent behavior should

choose actions that maximize the cumulative reinforcement signal. Learning the proper

behavior is a systematic trial-and-error process guided by a wide variety of RL

algorithms.

Genetic Algorithms (GA) is a type of evolutionary algorithms, which imitates the

natural evolution of generations. It encodes the solution of a problem into the form of

chromosomes and generates a population of candidate-solution individuals each

represented by one or more chromosomes. The individuals (candidate solutions) are then

adapted iteratively hoping to find a good one eventually. An overview of GA is given in

section 2.5.

Page 24: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

2

An Artificial Neural Network (ANN) is imitation of the biological neural networks

existing in the brains of living organisms. Both consists of neurons organized and

connected together in a specific way. The neurons are grouped in three layers: an input

layer, an output layer, and one or more intermediate hidden layers. Some or all neurons

of every layer are connected to the neurons of the next layer via weighted directed

connections. The inputs of the network are received via the input layer and passed to the

hidden the output layers over the weighted connections. The network is trained on a

training dataset that consists of observations of inputs along with their corresponding

outputs. The goal of this learning stage is to adapt the weights of the interconnections so

that the output of the network for a given input is as close as possible to the output in the

training dataset for the same input. ANNs have been utilized in a wide variety of fields

including, but not limited to, computer vision, speech recognition, robotics, control

systems, and game playing and decision making.

This section examines some of the previous works based on RL, GA, and ANN

that had novel contributions to AI and can give inspiration for human imitation

techniques. The review of a previous work focuses on how this work shows human

behavior imitation regardless of how efficient (with respect to scoring) this work is in

comparison with similar ones. The novel contributions of the previous works are

extracted to support human imitation of this thesis.

Previous Work

An extensive comparison between different human behavior-imitation techniques

is introduced in [3]. They are divided into direct and indirect behavior imitation.

In direct behavior imitation, a controller is trained to output the same actions a

human took when faced the same situation. This means that the performance of the

controller depends on the performance of the human exemplar. This arises a dilemma:

should the human exemplar be skillful or amateur? If he was skillful, the IA may

encounter new hard situations that have never been encountered by that human exemplar

because he is merely clever enough to avoid falling into such hard situations. On the other

hand, if the human exemplar is amateur, then imitating his behavior will not endow the

IA with good performance.

On the other hand, indirect behavior imitation uses an optimization algorithm to

optimize a fitness function that measures the human similarity of an IA. There is no

human exemplar, so indirect techniques can achieve more generalization than direct ones

[3]. It is found that controllers trained using GA (indirect) performed more similar to

human players than those trained using back-propagation (direct). All previous works

presented in this section employ indirect techniques.

Genetic Programming (GP) is used in [4] to build a robot for the Robocode game.

In Robocode, programmed autonomous robots fight against each other, this involves

shooting at the enemy robots and dodging their bullets. The authors used Koza-style GP

where every individual is a program composed of functions and terminals. The functions

they used are arithmetic and logical ones (e.g. addition, subtraction, OR, etc.). Their

Page 25: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

3

evolved robot could win the third rank in a competition against other 26 manually

programmed robots.

The same authors of [4] used the same technique, GP, to evolve car-driving

controllers for the Robot Auto Racing Simulator (RARS) [5]. The top evolved controllers

won the second and the third ranks over other 14 RARS hand-coded controllers.

Also In [6], GP is used for evolving car racing controllers. GP builds a model of

the track and a model of the driving controller. The model of the track is built and stored

in memory. Then the driving controller uses this model during the race to output the

driving commands. The driving controller is a two-branch tree of functions and terminals.

The output of the first sub-tree is interpreted as a driving command (gas/break) and the

output of the second one is the steering command (left/right). The functions of a tree are

mathematical ones in addition to memory functions for reading the model stored in

memory. The terminals of a tree are the readings of some sensors provided by the

simulation environment (e.g. distances to track borders, distance traveled, and speed).

A work similar to [6] is introduced in [7] where virtual car driving controllers for

RARS are evolved using GP. An individual consists of two trees; one controls the

steering angle and the other triggers the gas and brake pedals. The controller of the

steering angle is a simple proportional controller that tries to keep the car as close as

possible to the middle of the road. A proportional controller merely gives a steering angle

proportional to the current deviation from the middle of the road without concerning the

previous deviations or the expected future deviation tendency. The best-evolved

controller performed well but not enough to compete with other elaborate manually

constructed controllers.

An interesting comparison between GP and artificial neural networks (ANN) in

evolving controllers is made in [8]. Car controllers similar to those of [6] and [7] were

built using GP and ANN and their performances were compared. It is found that the GP

controllers evolve much faster than those of ANN do. However, the ANN controllers

ultimately reach higher fitness. In addition, the ANN controllers outperform the GP

controllers in generality; ANN controllers could perform significantly better on tracks

for which they have not been evolved. Finally, it is found that both GP and ANN could

use the controllers trained for one track as seeds for evolving other controllers trained for

all tracks. Both GP and ANN could generalize these controllers proficiently on the

majority of eight different tracks. However, it is found that ANN controllers generalized

better.

In [9], an ANN is trained using GA to ride simulated motorbikes in a computer

game. It is found that GA could create a human-like performance and even could find

solutions that no human has previously found. GA is then compared with back-

propagation learning algorithm. As GA requires no training data, it could adapt to any

new track. However, its solutions are not optimal and not as good as the solutions of a

good human player or the solutions of the back-propagation algorithm. On the other hand,

back-propagation requires training data recorded from a game played by a good human

player, but cannot be trained to deal with unusual situations.

Page 26: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

4

Contributions of this Work

This work introduces a flexible and expandable model of IAs. The evolved IA can

autonomously adapt itself to the encountered situation to get higher fitness. Its fitness

function seems as if it implicitly involves an unlimited number of fitness functions from

which it selects the most appropriate one. For example, in Robocode game, our IA is

required to get the highest possible score. It autonomously finds that saving its power

implicitly increases its score. Consequently, it evolves a behavior that targets decreasing

power consumption.

Furthermore, the proposed IA not only evolves human-like behaviors but also

allows for the manual selection among these evolved behaviors. This can be used in

computer games to generate intelligent opponents with various human-like behaviors

.This can make computer games more exciting and challenging. In addition, the evolution

process requires no supervision; it is a type of indirect human imitation.

This thesis introduces the following to the field of AI:

An evolvable model of the society of mind theory by introducing a new type of

agents (evolver agents) that facilitates autonomous behavior adaptation

Automatic unsupervised behavior selection according to the encountered

situation

Simple mathematical representations of different personalities of IAs without

emulating the brain

In addition to the automatic behavior selection, the proposed agent can be

influenced towards a certain behavior. This influence is easily achieved by changing the

fitness function. A suitable fitness function is selected and the agent automatically

changes its behavior to satisfy it. This is similar to a child who acquires the manners and

behaviors of his ideals (e.g. his parents). Also, as the fitness of the agent can be fed back

directly from a human exemplar (e.g. the user), the agent can autonomously learn to have

a satisfying behavior to that person. Experiments show that a wise selection of the fitness

function inspires the agent to the required behavior.

The simple representation of behaviors enables the agent to survive in different

environments. It can learn different behaviors and autonomously select the suitable

behavior for an environment. For example, a robot can live with different persons and

select the behavior that satisfies each person.

Applications

Intelligent Agents are used in almost all fields of life due to their flexibility and

autonomy. They contribute in several emerging disciplines such as ambient intelligence

and cognitive robotics. The benefits of intelligent systems are unlimited. They can be

useful in educating children, guiding tourists, helping handicapped and old persons …

Page 27: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

5

etc. This section presents what and how emerging disciplines can benefit from work

introduced in this thesis

1.5.1 Brain Model Functions

Several models of the brain have been introduced to AI literature. These models

can be categorized into two classes: Models that imitate the architecture of the brain, and

models that imitate the human behavior without imitating the architecture of the brain.

The brain models of the first class develop simple architectures similar to those of

the brain of an organism. They normally adopt Artificial Neural Networks (ANN). An

example of this category is the confabulation theory. It is proposed as the fundamental

mechanism of all aspects of cognition (vision, hearing, planning, language, initiation of

thought and movement, etc.) [10]. This theory has been hypothesized to be the core

explanation for the information processing effectiveness of thought.

The other class of brain modeling theories tries to develop a model of the brain that

does not necessarily resemble the brain of an organism. These models aim to develop

models of the brain that rigorously implement its functions. This implementation can

mainly depend on non-neurological basis such as mathematics, statistics, probability

theory … etc. Examples of this category are: the Modeling Field Theory (MFT) [11],

[12], the Layered Reference Model of the Brain (LRMB) [13]–[16], Bayesian models of

cognition [17], the society of mind [2] outlined in 2.2, and the Human Imitating Cognitive

Modeling Agent (HICMA) [18] introduced in 2.9. These theories adopt different theories

for modeling the brain. However, they all have the same target: developing an artificial

brain that behaves like a human (or animal) brain when encountering real-life situations.

This imitation includes not only brain’s strong points (e.g. adaptability and learning

capability) but also its weak points such as memory loss over time. Inheriting the weak

points of the brain is not necessarily a drawback. It can be useful in some applications

such as in computer games, where a human-like opponent is more amusing than a genius

one.

1.5.2 Artificial Personality

Personality can be defined as a characteristic way of thinking, feeling, and

behaving. It includes behavioral characteristics, both inherent and acquired, that

distinguish one person from another [19]. When imitating human behaviors, personality

must be taken into account, as personality is the engine of behavior [20]. As an important

trait of humans, it has gained much focus of research. For a genetic robot’s personality,

genes are considered as key components in defining a creature’s personality [21]. That

is, every robot has its own genome in which each chromosome, consisting of many genes,

contributes to defining the robot’s personality.

In this work, human behaviors are represented by mathematical models. The

parameters of a model define not only a human behavior but also the degree of that

behavior. For example, a robot can have a tricky behavior and another robot can be

learned to be trickier.

Page 28: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

6

Another advantage of mathematical modeling of behaviors is that it opens the door

to the great facilities of mathematical optimization techniques such as CMA-ES and

Nelder-Mead described in sections 2.7 and 2.8 respectively.

1.5.3 Ambient Intelligence and Internet of Things

Ambience is the character and atmosphere of some place [22]. It is everything

surrounding us in the environment. This includes lights, doors and windows, TV sets,

computers … etc. Ambient intelligence (AmI) is providing the ambience with enough

intelligence to understand user’s preferences and adapt to his needs. It incorporates

smartness into the environment for comfort, safe, secure, healthy, and energy conserving

environment. The applications of AmI in life are unlimited. They include helping

elderlies and disabled persons, nursing children, guiding tourists … etc.

An example of an AmI application is a smart home that detects the arrival of the

user and automatically takes the actions that the user likely requires. It can switch on

certain lights, turn on the TV and switch to the user’s favorite channel with the preferred

volume, suggest and order a meal from a restaurant and so on. All of these actions are

taken automatically by the AmI system depending on preferences the system has learned

before about the user.

The evolution of AmI has led to the emergence of the Internet of Things (IoT). IoT

provides the necessary intercommunication system between the smart things in the

environment including sensors, processors, and actuators. Sensors (e.g. temperature,

clock, humidity … etc.) send their information to a central processor. The processor

receives their information and guesses what tasks the user may like to be done. Finally,

the processor decides what actions need to be done and sends commands to the concerned

actuators to put these actions into effect.

An example scenario: A PIR (Passive Infra-Red) sensor detects the arrival of the

user and sends a trigger to a processor. The processor receives this trigger along with

information from a clock and a temperature sensor. It reviews its expertise in the user’s

preferences and guesses that he likely desires a shower when he arrives at that time in

such a hot weather. Consequently, the processor sends a command to the water heater to

get the water heated at the user’s preferred degree. The same principle can be scaled up

to hundreds or thousands of things connected together via a network. A simple IoT

system is shown in Fig. 1-1.

This work proposes a simple mathematical modeling of human behaviors. This

modeling can be useful for providing the smart environment with human behaviors so

that it interacts with the user humanly. Furthermore, it is simple enough for the

environment to tailor different behaviors for different users.

Page 29: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

7

1.5.4 Ubiquitous Computing and Ubiquitous Robotics

Ambient Intelligence is based on Ubiquitous Computing (ubicomp). The word

“ubiquitous” means existing or being everywhere at the same time [23].Ubicomp is a

computer science concept where computing exists everywhere and not limited to

personal computers and servers. The world that human will be living in is expected to be

fully ubiquitous and everything to be networked. Information may flow freely among

lamps, TVs, vehicles, cellular phones, computers, smart watches and glasses … etc.

Among this ubiquitous life, the ubiquitous robotics (ubibots) may be living as artificial

individuals. They, over other things, deal directly with their human users. This requires

them to understand and imitate human behaviors. Again, mathematical behavior

modeling can be useful.

Techniques

The main part of this work is the statistical modeling. Given a set of observations

or samples, a statistical model is the mathematical representation of the process assumed

to have generated these samples. This section outlines statistical techniques utilized or

experimented in this work for building and optimizing a statistical model. Fig. 1-3

illustrates how these techniques contribute to the modeling process. An extended chart

of related techniques is also given in Fig. 1-4.

1.6.1 Feature Selection

A model can be thought of as a mapping between a set of input features and one or

more output responses. When the model of a process is unknown, it is usually unknown

which features affect the output of that process. For example, assume that a mobile robot

is provided by a battery and some sensors: humidity, temperature, gyroscope,

accelerometer, and speedometer. Assume that it is required to find the battery decay rate

as a function of sensor readings as shown in Fig. 1-2. Obviously, the decay rate does not

change with all sensor readings. The purpose of feature selection is to identify which

features (i.e. sensor readings) are relevant to the output (i.e. battery decay rate). Thus, the

modeling process tries to map the output to only a subset of all features. This makes

modeling faster, less complicated, and more accurate. An introduction to feature selection

is provided in [24].

Fig. 1-1: Asimple IoT system

Page 30: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

8

In general, any feature-selection method tends to select the features that have:

Maximum relevance with the observed feature (output)

Minimum redundancy to (relation with) other input features

Mutual Information and correlation are detailed more in Appendix A.1 and

Appendix A.2 respectively.

Fig. 1-2: A model of battery-decay-rate

Sensor readings Battery Consumption Decay rate

Fig. 1-3: mathematical modeling techniques utilized or experimented in this work

Feature Selection

• Mutual Information

• Density Estimation

• Histogram

• Kernel

• Correlation

• Pearson Correlation

• Distance Correlation

• Linear

• Polynomial

• Logarithmic

• Exponential

Parameter Estimation

(Optimization)

• CMA-ES

• Nelder-Mead

• BOBYQA

• Powell

Model Evaluation

• Least Absolute Deviations (LAD)

Model Selection (Functional Form)

Page 31: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

9

Fig. 1-4: Modeling-related technique

Page 32: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

10

1.6.2 Modeling

A model of a process is the relation between its inputs and its output. This relation

comprises:

Input features (Independent variables)

Observed output (Dependent variable)

Parameters

For example, in Eq. (1-1), the inputs are x1 and x2, the output is y, and the

parameters are a, b, c, d, and e.

𝑦 = 𝑎. 𝑥12 + 𝑏. 𝑥2 + 𝑐. 𝑒𝑑.𝑥1 + 𝑒 (1-1)

The modeling process consists of three main steps:

1. Model Selection

2. Model Optimization

3. Model Evaluation

The following subsections describe these steps.

1.6.2.1 Model Selection

The first step in formulating a model equation is selecting its functional form,

namely the form of the function that represents the process. For example, the formula

(1-1) consists of a quadratic function, a linear function, an exponential function, and a

constant. Making such selection depends on experience and trials. Assuming having

experience about the modeled process, trials must be conducted to find the most suitable

functional form. In this work, Genetic Algorithms searches for a good functional form as

described in section 3.2.4.2.

1.6.2.2 Model Optimization

After selecting a suitable functional form of a model, the function of the

optimization stage is to estimate the values of the parameters that best fit the model to

the real process. As this stage is a main part of this work, it is described in detail in section

2.6.

1.6.2.3 Model Evaluation

The function of the model evaluation stage is to evaluate optimized models. This

allows for selecting the fittest model from among the available ones. For example,

different functional forms (polynomial, linear, logarithmic … etc.) can be optimized for

modeling a process, and the best one can then be chosen. The meaning of best depends

Page 33: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

11

on the model-evaluation method. For example, the residual sum of squares (RSS) adds

up the squares of the differences between the observed (real) output and the predicted

(modeled) output. A different method is Pearson Correlation Coefficient (PCC),

described in Appendix B.1, which calculates the correlation between the real output and

the predicted output. The simplest method is the Least Absolute Deviations (LAD),

which is used in this work.

Organization of the Thesis

This thesis is organized as follows: Chapter 2 gives a background about the

underlying theories of this work. It overviews the society of mind theory and briefly

summarizes Evolutionary Computation (EC), Evolutionary Algorithms (EA), and

Genetic Algorithms (GA). It next explains in detail the mathematical optimization

problem. Two optimization methods are then explained: Covariance Matrix Adaptation

(CMA) and Nelder-Mead method. Finally, the Robocode game is presented as the

benchmark of this work.

Chapter 3 explains in detail the structure of the proposed agent (HICMA). It

comprises five sections that describe the five agents of HICMA. Chapter 4 introduces the

experiments and results of HICMA as a Robocode agent. Finally, Chapter 5 discusses

the conclusions and possible future works.

Page 34: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

12

Page 35: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

13

Chapter 2: Background

Introduction

This chapter overviews the basic disciplines behind this work. It is organized as

follows. Section 2.2 briefly overviews the society of mind theorem and how it is utilized

and extended in this work. Section 2.3 generally overviews evolutionary computation.

Section 2.4 reviews evolutionary algorithms. Section 2.5 overviews Genetic Algorithms

(GA). Section 2.6 defines the optimization problem. Section 2.7 explains solving

optimization problems using the evolution strategies techniques focusing on the

Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Section 2.8 explains the

Nelder-Mead algorithm combined in this work with CMA-ES to form a hybrid

optimization technique, which is the main engine of the proposed system.

The relations between the aforementioned disciplines and similar ones are depicted in

Fig. 2-1. The disciplines used in this work are bounded by double outlines. This map

provides a good reference to different substitutes that can be used for extending this work

in the future.

The Society of Mind

2.2.1 Introduction

The society of mind theory was introduced by Marvin Minsky in 1980 [2]. It tries

to explain how minds work and how intelligence can emerge from non-intelligence. It

envisions the mind as a number of many little parts, each mindless by itself, each part is

called agent. Each agent by itself can only do some simple thing that needs no mind or

thought at all. Yet when these agents are joined in societies, in certain very special ways,

this leads to true intelligence. The agents of the brain are connected in a lattice where

they cooperate to solve problems.

2.2.2 Example – Building a Tower of Blocks

Imagine that a child wants to build a tower with blocks, and imagine that his mind

consists of a number of mental agents. Assume that a “builder” agent is responsible for

building towers of blocks. The process of building a tower is not that simple. It involves

other sub-processes: choosing a place to install the tower, adding new blocks to the tower,

and deciding whether the tower is high enough. It may be better to break up this complex

task into simpler ones and dedicate an agent to each one such as in Fig. 2-2.

Again, adding new blocks is too complicated for the single agent “add” to

accomplish. It would be helpful to break this process into smaller and simpler sub-

processes such as finding an unused block, getting this block, and putting it onto the

tower as shown in Fig. 2-3.

Page 36: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

14

Fig. 2-1: AI Disciplines map

Page 37: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

15

Fig. 2-3: Sub-agents of add agent

In turn, the agent get can be broken up into: “grasp” sub-process that grasps a

block, and “move” sub-process that moves it to the top of the tower. Generally, when an

agent is found to have to do something complicated, it is replaced with a sub-society of

agents that do simpler tasks.

It is clear that none of these agents alone can build a tower, and even all of them

cannot do unless the interrelations between them are defined, that is, how every agent is

connected with the others. In fact, an agent can be examined from two perspectives: from

outside and from inside its sub-society. If an agent is examined from the outside with no

idea about its sub-agents, it will appear as if it knows how to accomplish its assigned

task. However, if the agent’s sub-society is examined from the inside, the sub-agents will

appear to have no knowledge about the task they do.

To distinguish these two different perspectives of an agent, the word agency is used

for the system as a black box, and agent is used for every process inside it.

A clock can be given as an example. As an agency, if examined from its front, its

dial seems to know the time. However, as an agent it consists of some gears that appear

to move meaninglessly with no knowledge about time.

To sense the importance of viewing a system of agents as an agency, one can

examine a steering wheel of a car. As an agency, it changes the direction of the car

without taking into account how this works. However, if it disassembled, it appears as an

agent that turns a shaft that turns a gear to pull a rod that shifts the axle of a wheel.

Bearing this detailed view in mind while driving a car can cause a crash because it

requires too long time to be realized every time the wheels are to be steered.

In summary, to understand a society of agents, the following points must be known:

1. How each separate agent works

2. How each agent interacts with other agents of the society

3. How all agents of the society cooperate to accomplish a complex task

Fig. 2-2: A builder agent with its sub-sgents

Page 38: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

16

This thesis extends the society of mind theory to present a novel model of an

intelligent agent that behaves like a human. The previous points are expanded so that the

entire society evolves according to the environmental changes. 2.9 describes the

proposed model and how it adopts the society of mind theory.

Evolutionary Computation (EC)

Evolutionary Computation (EC) is a subfield of Artificial Intelligence (AI) that

solves stochastic optimization problems. A stochastic optimization problem is the

problem of finding the best solution from all possible solutions by means of a stochastic

search process, that is, a process that involves some randomness. EC methods are used

for solving black-box problems where there is no information to guide the search process.

An EC method tests some of the possible solutions, trying to target the most promising

ones. EC methods adopt the principle of evolution of generations. They generate a

population of candidate solutions and evaluate every individual solution in this

population. Then, a new generation of, hopefully fitter, individuals is generated. The

evolution process is repeated until a satisfying result is obtained.

Evolutionary Algorithms (EA)

An Evolutionary Algorithm (EA) is an Evolutionary Computation subfield that

adopts the principle: survival of the fittest. EA methods are inspired from the evolution

in nature, a population of candidate solutions is evaluated using an evaluation function,

and the fittest individuals, called parents, are granted better chance to reproduce the

offspring of the next generation. Reproduction is done by recombining pairs of the

selected parents to produce offspring. The offspring are then mutated in such a way that

they hold some of the traits inherited from their parents in addition to their own developed

traits. The rates of recombination and mutation are selected to achieve a balance between

utilization of parents’ good traits and exploration of new traits. For improving the fitness

over generations, the process of reproduction ensures that the good traits are not only

inherited over generations but also developed by the offspring. After a predetermined

termination condition is satisfied, the evolution process is stopped. The fittest individual

in the last generation is then selected as the best solution of the given problem. Fig. 2-4

shows the general scheme of evolutionary algorithms.

Page 39: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

17

The search process of an evolutionary algorithm tries to cover the search-space of

the problem, which is the entire range of possible solutions, without exhaustively

experimenting every solution. The fitness of all individuals in the search-space can be

represented by a fitness landscape as shown in Fig. 2-5. The horizontal axes represent

the domain of candidate solutions (i.e. individuals) and the vertical axis represents their

fitness. The optimum solution within a limited sub-range of the search-space is called a

local optimum, while the absolute optimum solution over the entire search-space is called

the global optimum. An EA tries to find the global optimum and not to fall into one of

the local optima.

Genetic Algorithms (GA)

Genetic Algorithm (GA) is a type of evolutionary algorithms that was first

introduced by John Holland in the 1960s and were further developed during the 1960s

and the 1970s [25]. It is designed to imitate the natural evolution of generations. It

encodes the solution of a problem into the form of chromosomes. A population of

candidate-solution individuals (also called phenotypes) is generated, where each solution

is represented by one or more chromosomes (also called genotypes). In turn, each

chromosome consists of a number of genes. GA then selects parents to reproduce from

the fittest individuals in the population. Reproduction involves crossover of parents to

produce offspring, and mutation of offspring’s genes. The newly produced offspring

population represents the new generation, which is hopefully fitter, on average, than the

previous one. The evolution process is repeated until a satisfying fitness level is achieved

Fig. 2-5: A 2D fitness landscape

Population

Parents Offspring

Fitness

function

Termination

Condition

Selection

Initialization

Crossover &

Mutation

Evaluation

Solution

Fig. 2-4: A general scheme of evolutionary algorithms

Global optimum Local optimum

Page 40: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

18

or a maximum limit of generations is exceeded. The general GA flowchart is illustrated

in Fig. 2-6. Every block is briefly explained next.

Encoding

Encoding is how a candidate solution is represented by one or more chromosomes.

The most common type of chromosomes is the binary chromosome. The genes of this

chromosome can hold either 0 or 1.

Initial Population

An initial population of candidate solutions is generated to start the evolution

process. It is often generated randomly, but sometimes candidate solutions are seeded

into it.

Evaluation

The candidate solutions, represented by individuals, are evaluated by a fitness

function. The fitness of an individual determines its probability to be selected for

reproduction.

Selection

Selection is the process of choosing the parents of the next generation from among

the individuals of the current generation. It is a critical part of GA, as it must ensure a

good balance between exploitation and exploration. Exploitation is giving the fittest

individuals better chance to survive over generations while exploration is searching for

Start

Stop

New Generation

No

Yes

Parents

Offspring

Encoding

Initializing Parameters

Generating Initial Population

Evaluating Population

Selection

Crossover

Mutation

Terminate?

Fig. 2-6: Basic GA flowchart

Page 41: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

19

new useful individuals. The tradeoff between exploitation and exploration is critical. Too

much exploitation may lead to a local optimum and too much exploration may greatly

increase the number of required generations to find a good solution of the problem.

Selection methods include roulette-wheel, stochastic universal sampling, rank, and

tournament selection.

Crossover

Crossover is the recombination of two, or more, selected parents to produce

offspring. It is performed by dividing parents’ chromosomes into two or more portions,

and randomly copying every portion to either offspring.

Mutation

Mutation is the adaptation of an individual’s genes. For example, a binary gene is

mutated by flipping it with a specified probability.

Termination

The evolution process is repeated until a termination criterion is satisfied.

Common termination criteria are:

1. A satisfying solution is found

2. A maximum limit of generations is reached.

3. Significant fitness improvement is no longer achieved.

Optimization Problem

2.6.1 Introduction

Optimization is the minimization or the maximization of a non-linear objective

function (fitness function, loss function). An objective function is a mapping of n-

dimension input vector to a single output value as shown in Fig. 2-7. That is, if y = f(x)

is a non-linear function of n-dimension vector x ∈ ℝ(n) , then minimizing f(x) is finding

the n values of x that gives the minimum value of y ∈ ℝ.

Fig. 2-7: An objective function

The optimum value (maximum or minimum) of a function within a limited range

is called local optimum while the optimum value of the function over its domain is called

global optimum. A function can have several local optima, but only one global optimum.

An example of local and global minima is shown in Fig. 2-8.

x(n) f(x) y

X

Page 42: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

20

Fig. 2-8: Local and global minima

2.6.2 Mathematical Definition

Given a number n of observations (samples) of p variables (e.g. sensor readings)

forming a 2-D matrix M with dimensions n x p, such that column j represents the

observations (samples) of the jth variable, and row i represents the ith observation

(sample).

M =[

𝑚1,1 𝑚1,2 ⋯ 𝑚1,𝑝

⋮ ⋮ ⋮ ⋮𝑚𝑛,1 𝑚𝑛,2 ⋯ 𝑚𝑛,𝑝

], where mi,j is the ith sample of the jth variable (sensor)

The p variables may correspond to, for example, a number of sensors attached to a

robot such as (thermometers, speedometers, accelerometers … etc.) which represent the

senses of that robot. For example, suppose that a robot plays the goalkeeper role in a

soccer game. Like a skillful human goalkeeper, the robot should predict the future

location of the ball as it approaches the goal to block it in time. Assume that the motion

of the ball over time is modeled by a quadratic function of time, that is:

Location (x, y, z) = f (t) (2-1)

Equivalently:

Location (x) ≡ 𝑓𝑥(𝑥) = 𝑎𝑥. 𝑡2 + 𝑏𝑥. 𝑡 + 𝑐𝑥 (2-2)

Location (y) ≡ 𝑓𝑦(𝑦) = 𝑎𝑦. 𝑡2 + 𝑏𝑦. 𝑡 + 𝑐𝑦 (2-3)

Location (z) ≡ 𝑓𝑧(𝑧) = 𝑎𝑧. 𝑡2 + 𝑏𝑧 . 𝑧 + 𝑐𝑧 (2-4)

Let the axes of the playground be as illustrated in Fig. 2-9. The robot can use

equation (2-3) to predict the time T at which the ball will arrive at the goal line (this

prediction can be done using optimization). Then the robot can use equations (2-2) and

(2-4) to predict the location of the ball at the goal line (xg, yg, zg) at time T, and can move

to there in time.

y=f(x)

global minimum local minimum x local minimum

Page 43: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

21

Solving this problem (blocking the ball) is done as follows:

1. Get n samples (observations) of the locations of the moving ball (x, y, and z) at

fixed intervals of time (t).

2. Store the (x, t) samples in matrix Mx , where x and t represent the p variables

(sensors).

3. Similarly, store (y, t), and (z, t) samples in My and Mz

The three matrices will appear like these:

Mx =[

𝑥0 𝑡0⋮ ⋮

𝑥𝑛−1 𝑡𝑛−1

], My =[

𝑦0 𝑡0⋮ ⋮

𝑦𝑛−1 𝑡𝑛−1

], Mz =[

𝑧0 𝑡0⋮ ⋮

𝑧𝑛−1 𝑡𝑛−1

]

The function of the optimization strategy is to find the values of functions’

parameters (ax, bx, cx, ay, by …) in equations (2-2), (2-3), and (2-4) that make every

function optimally fits into the sampled data (observations). Finding such values is a

search problem, where the search space is the range of all possible values of the

parameters. This is what the following two steps do. To minimize, for example, 𝑓𝑥(𝑥):

4. Re-organize the given equation so that the left-hand side equals zero:

𝑓𝑥(𝑥) − 𝑎𝑥. 𝑡2 + 𝑏𝑥. 𝑡 + 𝑐𝑥= ex (2-5)

Where ex is the error of the optimization process. This equation is called

the objective function, cost function, or loss function.

5. Search for the values of the parameters (ax, bx, and cx) that minimize the error ex.

This is a 3-D search problem as the algorithm searches for the optimum values of

three parameters. The optimization (minimization) of the function 𝑓𝑥 is done as

follows:

a. Guess a value for (ax0, bx0, cx0)

b. Substitute into (2-5) for (ax, bx, cx) by (ax0, bx0, cx0), and for 𝑓𝑥(𝑥) and t

by x0 and t0 respectively into the matrix Mx , and calculate the error ex0

Fig. 2-9: An example application of mathematical optimization

Page 44: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

22

x0 − 𝑎𝑥0. 𝑡02 + 𝑏𝑥0. 𝑡0 + 𝑐𝑥0 = 𝑒𝑥0 (2-6)

c. Repeat steps (b) and (c) for all of the n observations and accumulate the

errors into ex

𝑒𝑥 = ∑|xi − 𝑎𝑥𝑖. 𝑡𝑖2 + 𝑏𝑥𝑖. 𝑡𝑖 + 𝑐𝑥𝑖|

𝑛−1

𝑖=0

(2-7)

Where |X| is the absolute value of X.

The optimization algorithm tries to improve its guesses of the parameters (a, b, and

c) over the iterations to find the optimum values in the search space that minimizes the

error ex.

Fig. 2-10 visualizes a 2D search space (two parameters), where the horizontal axes

represent the domains of the two parameters and the vertical axis represents the value of

the objective function (the error). The optimization algorithm should start from any point

on the search space and move iteratively towards the minimum error (i.e. the global

optimum).

Searching the search space for the optimum (minimum or maximum) solution

depends on the optimization algorithm. The next section explains evolution-strategy

optimization algorithms.

After optimizing a function, it is saved for usage as a model for an observed

phenomenon or an environmental object. For example, the goalkeeper robot can keep the

optimized functions (1), (2), and (3) for similar future shoots (Assuming that all shoots

have similar paths).

Fig. 2-10 Minimization example in a 2D search space

Page 45: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

23

Evolution Strategies

Evolution Strategies (ESs) are optimization techniques belonging to the class of

Evolutionary Computation (EC) [26], [27]. An evolution strategy searches for the

optimum solution in a search-space similarly as Genetic Algorithms (GA). It generates a

population of individuals representing candidate solutions (i.e. vectors of the parameters

to be optimized). Every individual in the population is then evaluated by a fitness function

that measures how promising it is for solving the given problem. The fittest individuals

are then selected and mutated to reproduce another generation of offspring. This process

is repeated until a termination condition is met. Mutation represents the search steps that

the algorithm takes in the search-space; it is done by adding normally distributed random

vectors to the individuals.

The fitness of all individuals in the search space can be represented by a fitness

landscape as shown in Fig. 2-5.The horizontal axes represent candidate solutions

(individuals) and the vertical axis represents their fitness. The goal of the optimization

algorithm is to converge to the global optimum solution with the minimum search costs

represented by the number of objective function evaluations.

In optimization problems, the search space is the domain of the parameters of the

optimized function (minimized or maximized) which is also used as an objective

function. For example, in maximization problems, the goal is to find the values of the

parameters that maximizes a function, so the value of that function represents the fitness

of the given set of parameters, the higher the value of the function, the fitter the solution.

2.7.1 Basic Evolution Strategies

The basic evolution strategy can be defined by:

(µ/ρ, λ)-ES and (µ/ρ + λ)-ES

Where:

µ is the number of parents (fittest individuals) in the population

ρ is the number of parents that produce offspring

λ is the number of generated offspring

The “,” means that the new µ parents are selected from only the current λ offspring,

this is called comma-selection, while the “+” means that the new parents are selected

from the current offspring and the current parents, this is called plus-selection.

For example, a (4/2, 10)-ES selects the fittest 4 parents from the current population

(10 individuals) and randomly mutates two of them to generate 10 new offspring. The 4

parents are selected from the current 10 offspring only.

On the other hand, a (4/2 + 10)-ES selects the fittest 4 parents from the current

population (10 individuals) along with the 4 parents of the previous generation. This is a

type of elitist selection where the elite individuals are copied to the next generation

without mutation.

Page 46: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

24

The basic steps of an evolutionary strategy are:

1. Generating candidate solutions (Mutating parent individuals)

2. Selecting the fittest solutions

3. Updating the parameters of the selected solutions

Fig. 2-11 illustrates the previous three steps.

Fig. 2-11: Basic steps of evolution strategies

An ES individual x is defined as follows:

x = [y, s, F(y)] (2-8)

Where:

y is the parameter vector to be optimized, it is called object parameter vector

s is a set of parameters used by the strategy, it is called strategy parameter vector

F(y) is the fitness of y

Strategy parameters s are the parameters used by the strategy during the search

process, they can be thought of as the tools used by the strategy, they are similar to a

torchlight a person may use for finding an object (optimum solution) in a dark room

(search space). The most important strategy parameter is the step-size described later.

Notice that an evolution strategy not only searches for the optimum solution of y, but also

searches for the optimum strategy parameter s. This is similar to trying several types of

torchlights to find the best one to use for finding the lost object in the dark room.

Obviously, finding the optimum strategy-parameter vector speeds-up the search process.

Generating Selecting Updating

Page 47: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

25

The basic Algorithm of a (µ/ρ +, λ)-ES is given in Algorithm 2-1:

Algorithm 2-1: A basic ES algorithm

1. Initialize the initial parent population Pµ = {p1, p2 … pµ}

2. Generate initial offspring population Pλ = {x1, x2 … xλ} as follows:

a. Select ρ random parents from Pµ

b. Recombine the selected ρ parents to form an offspring x

c. Mutate the strategy parameter vector s of the offspring x

d. Mutate the object parameter vector y of the offspring x using the

mutated parameter set s

e. Evaluate the offspring x using the given fitness function

f. Repeat for all λ offspring

3. Select the fittest µ parents from either

{Pµ∪ Pλ} if plus-selection (µ/ρ + λ)-ES

{Pλ} if comma-selection (µ/ρ , λ)-ES

4. Repeat 2 and 3 until a termination condition is satisfied

Normally distributed random vectors are used to mutate the strategy-parameter set

s and the object-parameter vector y at steps 2.c and 2.d respectively. The mutation process

is explained in more detail in the next section.

Fig. 2-12 visualizes the search process described above for solving a 2D problem

(i.e. Optimizing two parameters where the object parameter vector y⊂ℝ(2) ). Both µ and

ρ equals 1, and λ equals 100. That is, one parent is selected to produce 100 offspring.

Every circle represents the one-σ line of the normal distribution at a generation. The

center of every circle is the parent that was mutated by the normally distributed random

vectors to produce the rest of the population represented by ‘.’, ‘+’ and ‘*’ marks. The

black solid line represents the direction of search in the search space.

A one-σ line is the horizontal cross-section of the normal-distribution 2D curve at

σ (standard deviation). It is an ellipse (or a circle) surrounding 68.27% of the samples.

This ellipse is useful in studying normal distributions. In Fig. 2-12, the one-σ lines are

unit circles because both of the two sampled variables has a standard deviation σ = 1.

Fig. 2-12: Visualization of the search process of a (1/1,100)-ES

Generation g Generation g+1 Generation g+2

Page 48: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

26

Fig. 2-13 shows the one-σ ellipse (circle in this case) of a 2D normal distribution

represented on a 3D graph.

2.7.2 Step-size Adaptation Evolution Strategy (σSA-ES )

The goal of an optimization algorithm is to take steps towards the optimum; the

faster an optimization algorithm reaches the optimum the better it is. Clearly, if the

optimum is far from the starting point, it is better to take long steps towards the optimum

and vice versa. In an optimization algorithm, the step size is determined by the amount

of mutation of parent individuals. That is, high mutation of a parent causes its offspring

to be highly distinct from it and thus very far from it in the search space.

Usually, there is no detailed knowledge about the good choice of the strategy parameters

including the step-size. Therefore, step-size adaptation evolution strategies adapt the

mutation strength of parents at every generation in order to get the optimum step-size

that quickly reaches the optimum. As mutation is done by adding a normally distributed

random vector to the parent, the standard deviation σ of that random vector represents

the mutation strength. A large value of σ means that the random vector will more likely

hold larger absolute values and thus more mutation strength. The principle of step-size

adaptation is illustrated in Fig. 2-14, where the standard deviation in (a) equals 1.0 and

in (b) equals 3.0. It is clear that the step-size in (b) is larger than in (a).

An individual of a σ-SA strategy is defined as follows:

a = [y, σ, F(y)] (2-9)

Fig. 2-13: One-Sigma ellipse of bivariate normal distribution N(0,I) [µ=0, σ=I]

Page 49: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

27

Recall the general definition of an ES individual in Eq. (2-8). It is clear that σ is

the only element in the strategy parameter vector s. σ is the mutation strength parameter

that would be used to mutate the parameter vector y if that individual is selected for

generating offspring. An offspring generated from a parent is defined as follows:

xt

i

)1( = {

σi(t+1) ← σ(t). eτ𝑁𝑖(0,1)

𝑦𝑖(𝑡+1) ← 𝑦𝑖

(𝑡) + σ(t). 𝑁𝑖(0, 𝐼)

𝐹𝑖 ← F(yt)

(2-10)

As shown in eq. (2-10), the mutation strength parameter σ is self-adapted every

generation t. The learning parameter τ controls the amount of self-adaptation of σ per

generation. A typical value of τ is 1/ 2n [28]. It is obvious that the step-size σ in Eq.

(2-10) is a parameter of every individual in the population, and because the fittest

individuals are selected to produce the offspring of the next generation, the best step sizes

are inherited by the new offspring. This enables the algorithm to find the optimum step

size that quickly reaches the optimum solution.

The exponential function in Eq. (2-10) is usually used in evolution strategies, but

other functions can also be used to mutate the step-size [29]

N(0,1) is a normally distributed random scalar (i.e. a random number sampled from

a normal distribution with mean = 0, and standard deviation = 1). N(0, I) is a normally

distributed random vector with the same dimensions of the optimized parameter vector

s. Fig. 2-15 shows a (a) normal distribution of 2D points (i.e. Two object parameters) and

(b) the probability density function (PDF) of every parameter. It is obvious that the

density of samples is high around the mean and decreases as we move away. That is, the

number of random samples near the mean is larger than far from it.

(a) (b)

Fig. 2-14: Two random probability distributions with (a) σ = 1.0 and (b) σ = 3.0. The

circles are the one-sigma elipses

Page 50: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

28

Normal distributions are used for the following reasons [27]:

1. Widely observed in nature

2. The only stable distribution with finite variance, that is, the sum of independent

normal distributions is also a normal distribution. This feature is helpful in the

design and the analysis of algorithms

3. Most convenient way to generate isotropic search points, that is, no favor to any

direction in the search space

2.7.3 Cumulative Step-Size Adaptation (CSA)

CSA-ESs updates the step-size depending on the accumulation of all steps the

algorithm has made. The importance of a step decreases exponentially with time [30].

The goal of CSA is to adapt the mutation strength (i.e. step-size) such that the correlations

between successive steps are eliminated [30], [31]. Correlation represents how much two

vectors agree in direction. Highly correlated steps are replaced with a long step, and low-

correlated steps are replaced with a short step. The concept of Cumulative Step-seize

Adaptation is illustrated in Fig. 2-16. The thick arrow represents step adaptation

according to the cumulation of the previous six steps. Every arrow represents a transition

of the mean m of the population.

(a) (b)

Fig. 2-15: A 2D normal distribution (a) 2D vector of points and (b) two 1D histograms

Fig. 2-16: The principle of Cumulative Step-size Adaptation (CSA),

Page 51: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

29

The CSA works as follows: 1. Get λ random normally distributed samples around the mean solution (i.e. the

parent of the population):

𝑥𝑖 = 𝑚𝑡 + 𝜎𝑡. 𝑁(0, 𝐼)

Equivalently:

𝑥𝑖 = 𝑁(𝑚𝑡, 𝜎𝑡2)

Where mt is the solution at iteration t, and σt is the standard deviation of the

selection (i.e. the step-size). That is, select λ random samples from around the

solution mt with probability decreasing as we move away from mt. The samples

appear as shown in Fig. 2-15-a.

2. Evaluate the λ samples ,and get the fittest µ of them

3. Calculate the average Zt of the fittest µ samples as follows:

𝑍𝑡 =1

𝜇∑𝑥𝑖

𝜇

𝑖=1

4. Calculate the cumulative path:

Pct+1 = (1-c)Pct + μ.c(2-c) Zt , where 0 < 𝑐 ≤ 1 (2-11)

The parameter c is called the cumulation parameter; it determines how rapidly

the information stored in Pct fades. That is, how long (over generations) the effect

of a step size at generation t lasts. The typical value of c is between 1

√n and

1

n .

It is chosen such that the normalization term µ.c (2-c) normalizes the

cumulative path. That is, if Pt follows a normal distribution with a zero mean and

a unit standard deviation (i.e. Pt ∼ N(0 ,I)), Pt+1 also follows N(0 ,I) [30].

5. Update the mutation strength (i.e. step-size):

𝜎𝑡+1 = 𝜎𝑡 exp(𝑐

2𝑑𝜎(‖𝑃𝑐𝑡+1‖

2

𝑛− 1)) (2-12)

Where ‖𝑋‖ is the Euclidean norm of the vector X ∈ℝn :

‖𝑋‖ = √𝑥12 + 𝑥2

2 + ⋯+ 𝑥𝑛2

The damping parameter dσ determines how much the step-size can change. It is

set to 1 as indicated in [30], [31].

Page 52: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

30

The fittest µ parents are then selected and mutated by the new step-size σt+1 to

form a new population of λ offspring.

6. Repeat all steps until a termination condition is satisfied.

2.7.4 Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

2.7.4.1 Introduction

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [32] is a state-of-the-

art evolution strategy. It extends the CSA strategy described in section 2.7.3, which

adapts the step-size σ every generation and uses the updated step-size to mutate the parent

solution. CMA-ES differs from CSA in that it uses a covariance matrix C, instead of the

identity matrix I, to generate the random mutating vectors. This means that the different

components of the random vector are generated from normal distributions with different

standard deviations. That is, every component has a different step size. For example,

imagine the problem of optimizing a 2D vector, and assume that the optimum solution is

{10, 30} and the initial guess is {0, 0}. It is obvious that the optimum value of the second

parameter is farther than the first one. Therefore, it is better to take larger steps in the

direction of the second one. This is what CMA-ES does and this is why it requires fewer

generations to find the optimum solution. In brief, CMA is more directive than SA. Fig.

2-17 shows the same population generated from a parent at the origin. In (a) the

covariance matrix is the identity matrix I, and in (b) the covariance matrix equals[1 11 3

].

Fig. 2-18 shows a 2D normal distribution shaped by a covariance matrix C. The

black ellipse is the one-σ ellipse of the distribution.

(a) (b)

Fig. 2-17: A population of (a) Step-size Adaptation and (b) Covariance Matrix Adaptation

Page 53: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

31

In addition to using a covariance matrix to adapt the shape of the mutation

distribution, the covariance matrix itself is set as a strategy parameter. Consequently, it

is adapted every generation so that the mutation distributions could adapt to the shape of

the fitness landscape and converge faster to the optimum. The operation of CMA-ES is

further illustrated in Fig. 2-19, where the population concentrate on the global optimum

after six generations. The ‘*’ symbols represent individuals, and the dashed lines

represent the distribution of the population. Background color represents the fitness

landscape, where darker color represents lower fitness.

Fig. 2-18: A 2D normal distribution N(0,C) [µ=0, σ=C]

Fig. 2-19: Optimization of 2D problem using CMA-ES.

C = [1 11 3

]

Generation 1 Generation 2 Generation 3

Generation 4 Generation 5 Generation 6

Page 54: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

32

2.7.4.2 CMA-ES Algorithm

The basic (µ/µ, λ) CMA-ES works as follows [27]:

Initialization:

I.1 λ Number of offspring (i.e. population size)

I.2 µ/ µ Number of parents / number of solutions involved in

updating m, C, and σ

I.3 m ∈ ℝ(nx1) n-dimension Initial solution (the mean of the population)

I.4 C = I(nxn) Initial covariance matrix = Identity matrix

I.5 σ ∈ ℝ+𝑛𝑥1 Initial step size

I.6 cσ ≈ 4/n Decay rate for evolution (cumulation) path for step-size σ

I.7 dσ ≈ 1 Damping parameter for σ change

I.8 cc ≈ 4/n Decay rate for evolution (cumulation) path of C

I.9 c1 ≈ 2/n2 Learning rate for rank-one update of C

I.10 cµ ≈ µw/n2 Learning rate for rank-µ update of C

I.11 Pσ = 0 Step-size cumulation path

I.12 Pc = 0 Covariance-matrix cumulation path

The constant n is the number of state parameters (i.e. the parameters of the objective

function). The left column maps these parameters with the code given in Table 2-1.

Generation Loop: Repeat until a termination criterion is met:

1. Generate λ offspring by mutating the mean m

xi = m + yi, 0 < i ≤ λ

Where: yi is an (n x 1) random vector generated according to a normal distribution

with zero mean and covariance C [yi ~ Ni (σ2, C)] as shown in Fig. 2-17.b and

Fig. 2-18.

2. Evaluate the λ offspring by the fitness function

F(xi) = f(xi)

3. Sort the offspring by fitness so that:

f(x1:λ) < f(x2:λ) < … < f(xλ:λ)

Where 𝑥1:𝜆 is the fittest individual in the population.

4. Update the mean m of the population

𝑚 = ∑𝑤𝑖. 𝑥𝑖:𝜆

µ

𝑖=1

m = m + σ. yw

Page 55: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

33

Where:

𝑦𝑤 = ∑𝑤𝑖 . 𝑦𝑖:𝜆

µ

𝑖=1

Hence:

m = 𝑚 + σ∑𝑤𝑖. 𝑦𝑖:𝜆

µ

𝑖=1

Where xi:λ is the ith best individual in the population, and the constants wi are

selected such that [27]:

w1 ≥ w2 ≥ w3 ≥ ⋯ ≥ wµ ≥ 0,

∑ 𝑤𝑖µ𝑖=1 = 1,

μ𝑤 =1

∑ 𝑤𝑖2µ

𝑖=1

≈𝜆

4

5. Update step-size cumulation path Pσ :

Pσ = (1-cσ)Pσ + 1-(1-cσ)2 µw C yw

Where Pσ ∈ ℝ(nx1)

The square root of the matrix C can be calculated using Matrix Decomposition

[32] or using Cholesky Decomposition [26].

6. Update the covariance-matrix cumulation path Pc :

Pc = (1-cc)Pc + 1-(1-cc)2 µw yw

Where Pc ∈ ℝ(nx1)

7. Update the step-size σ:

𝜎 = 𝜎. exp(𝑐𝜎𝑑𝜎

(‖𝑃𝜎‖

𝐸‖𝑁(0, 𝐼)‖− 1))

According to [30] this formula can be simplified to:

𝜎 = 𝜎. exp(𝑐𝜎2𝑑𝜎

(‖𝑃𝜎‖

2

𝑛− 1))

Where ||X|| is the Euclidean norm of the vector X.

Page 56: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

34

8. Update the covariance matrix C:

𝐶 = (1 − 𝑐1)𝐶 + 𝑐1𝑃𝑐𝑃𝑐𝑇 + (1 − 𝑐𝜇)𝐶 + 𝑐𝜇 ∑𝑤𝑖𝑦𝑖:𝜆𝑦𝑖:𝜆

𝑇

𝜇

𝑖=1

The expression (1 − 𝑐1) 𝐶 + 𝑐1𝑃𝑐 𝑃𝑐𝑇 is called “rank-one update”. It reduces the

number of function evaluations. The constant c1 is called “rank-one learning

rate”.

The expression (1 − 𝑐𝜇)𝐶 + 𝑐𝜇 ∑ 𝑤𝑖𝑦𝑖:𝜆𝑦𝑖:𝜆𝑇𝜇

𝑖=1 is called “rank-µ update”. It

increases the learning rate in large populations and can reduce the number of

necessary generations. The constant cµ is the “rank-µ” learning rate [27].

Termination:

Some example termination criteria used in [32] are:

Stop if the best objective function values of the most recent 10+30n

λ generations

are zero

Stop if the average fitness of the most recent 30% of M generations is not better

than the average of the first 30% of M generations. Where M is 20% all

generations, such that 120 + 30n

λ ≤ M ≤ 20,000 generations.

Stop if all of the best objective function values over the last 10+30n

λ generations

are below a certain limit. A common initial guess is of that limit is 10-12

Stop if the standard deviations (step-sizes) in all coordinates are smaller than a

certain limit. A common limit is 10-12 of the initial σ.

Usually, the algorithm is bounded to a limited search space, but in our experiments

it could find the global optimum even if the search space is unbounded (i.e. the domain

of a component of the solution vector is [-∞, ∞]).

A simple MATLAB/Octave CMA-ES code is given in Table 2-1. The left column

of the table maps the given code with the steps given above in the initialization stage.

Rank-one update Rank-µ update

Page 57: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

35

Table 2-1: A simple CMA-ES code

%Initialization

I.1 lambda = LAMBDA; % number of offspring

I.2 mu = MU; % number of parents

I.3 yParent = INIT_SOL; % Initial solution vector

n = length(yParent); % Problem dimensions

I.4 Cov = eye(n); % Initial covariance matrix

I.5 sigma = INIT_SIGMA; % Initial sigma (step-size)

I.6 Cs = 1/sqrt(n); % Learning rate of step-size

I.8 Cc = 1/sqrt(n); % Decay rate of Pc

I.10 Cmu = 1/n^2; % Learning rate of C

I.11 Ps = zeros(n,1); % Step-size cumulation

I.12 Pc = zeros(n,1); % Cov. matrix cumulation

I.13 minSigma = 1e-3; %Min. step-size…

% … termination condition

% Generation Loop: Repeat until termination criterion

while(1)

SqrtCov = chol(Cov)'; % square root of cov. …

% … matrix

for l = 1:lambda; % generate lambda …

% … offspring

1

offspr.std = randn(n,1); % offspring σ

offspr.w = sigma*(SqrtCov*offspr.std); % σ C N(0, I) ≡ N(σ, C)

offspr.y = yParent + offspr.w; % Mutate the parent

2 offspr.F = fitness(offspr.y); % Evaluate the offspring

offspringPop{l} = offspr; % offspring complete

end; % end for

3 ParentPop = sortPop(offspringPop, mu); % sort pop. and take µ …

% … best individuals

4 yw = recomb(ParentPop); % Calculate yw

yParent = yParent + yw.w; % new mean (parent)

5 Ps=(1-Cs)*Ps+sqrt(mu*Cs*

(2-Cs))*yw.std;

% Update Ps

6 Pc=(1-Cc)*Pc+sqrt(mu*Cc*(2-Cc))*yw.w; % Update Pc

7 sigma=sigma*exp((Ps'*Ps - n)

/(2*n*sqrt(n)));

% Update step-size

8 Cov = (1-Cmu)*Cov + Cmu*Pc*Pc'; % Update cov. matrix

Cov = (Cov + Cov')/2; % enforce symmetry

% Termination

if (sigma < minSigma) % termination condition

printf("solution="); % The solution is…

disp(ParentPop{1}.y'); % … the first parent

break; % Terminate the loop

end; % end if

end % end while

Page 58: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

36

The upper-case words, such as LAMBDA, are predefined constants. The function

fitness evaluates the candidate solutions. The function sortPop sorts the individuals by

fitness and extracts the best µ ones. The function recomb recombines the selected µ

parents to form a new parent for the next generation. A simple recombination is to

average the solution vector and the step-size vector of the selected µ parents.

MATLAB/Octave examples of these three functions are given in Table 2-2, Table 2-3,

and Table 2-4 respectively.

Table 2-2: An example of “fitness” function

function out = fitness(x)

out = norm(x-[5 -5]'); % The global optimum is at [5, -5]

end

Table 2-3: An example of “sortPop” function

function sorted_pop = sortPop(pop, mu);

for i=1:length(pop);

fitnesses(i) = pop{i}.F;

end;

[sorted_fitnesses, index] = sort(fitnesses);

for i=1:mu;

sorted_pop{i} = pop{index(i)};

end;

end

Table 2-4: An example of “recomb” function

function recomb = recomb(pop);

recomb.w = 0; recomb.std = 0;

for i=1:length(pop);

recomb.w = recomb.w + pop{i}.w;

recomb.std = recomb.std + pop{i}.std;

end;

recomb.w = recomb.w/length(pop);

recomb.std = recomb.std/length(pop);

end

The previous code snippets are modifications of the code provided in [26]. Codes

also for C, C++, Fortran, Java, MATLAB/Octave, Python, R, and Scilab are provided in

[33].

Page 59: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

37

2.7.4.3 Advantages of CMA-ES

CMA-ES is efficient for solving:

Non-separable problems 1

Non-convex functions 2

Multimodal optimization problems, where there are possibly many local optima

Objective functions with no available derivatives

High dimensional problems

2.7.4.4 Limitations of CMA-ES

CMA-ES can be outperformed by other strategies in the following cases:

Partly separable problems (i.e. optimization of an n-dimension objective function

can be divided into a series of n optimizations of every single parameter)

The derivative of the objective function is easily available (Gradient

Descend/Ascend is better)

Small dimension problems

Problems that can be solved using a relatively small number of function

evaluations (e.g. < 10n evaluations. Nelder-Mead may be better)

1 An n-dimensional separable problem can be divided into n 1-dimensional separate problems 2 A function is convex if the line segment between any two points lies above the curve of the function

Page 60: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

38

Nelder-Mead Method

2.8.1 Introduction

Nelder-Mead method [34] is a non-linear optimization technique that uses a

heuristic search, that is, its solution is not guaranteed to be optimal. It is suitable for

solving problems where the derivatives of the objective function are not known or too

costly to compute. Normally, it is faster than the CMA-ES, but it easily falls in local

optima. This method uses the simplex concept described in the next section.

2.8.2 What is a simplex

A simplex is a geometric structure consisting of n+1 vertices in n dimensions.

Table 2-5 contains examples of simplexes:

Table 2-5: Simplexes in different dimensions

Dim. Shape Graph

0 Point

1 Line

2 Triangle

3 Tetrahedron

4 Pentachoron

2.8.3 Operation

To optimize an n-dimensional function (i.e. with n parameters), Nelder-Mead

algorithm constructs an (n+1) initial simplex and tries to capture the optimum point inside

it while reducing the size of the simplex. A simplex is similar to a team of police officers

chasing a criminal; every simplex point represents a police officer, the optimum solution

is the criminal, and Nelder-Mead is the plan the police officers follow to catch the

criminal. Selecting the initial simplex is critical and problem-dependent as a very small

initial simplex can lead to a local minimum. This is why Nelder-Mead method is usually

used only when a local optimum is satisfying such as its usage in the hybrid optimization

technique described in section 3.2.4.

After constructing the initial simplex, it is iteratively updated using four types of

operations. Fig. 2-20 illustrates these operations on a 2D simplex (triangle). The shaded

Page 61: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

39

and the blank regions represent the simplex before and after the operation respectively.

P̅ is the mean of all points except for the worst. Ph is the highest (worst) point, Pl is the

lowest (best) point, P* is the reflected point, P** is the expanded or the contracted point.

Every operation is described next.

(a) reflection (b) expansion (c) contraction (d) reduction

(resizing)

Fig. 2-20: Operations of Nelder-Mead algorithm

a) Reflection:

If Ph is the worst point, it is expected to find a better point at the reflection of Ph

on the other side of the simplex. The reflection P* of Ph is:

P* = (1 + α) P̅ – α Ph , where:

α ∈ ℝ+ is the reflection coefficient, and [P* P̅] = α[P̅ Ph]

b) Expansion:

If the reflection point P* is better than the best Pl:

f(P*) < f(Pl)

Then expand P* to the expansion point P**:

P** = γ P* + (1-γ) P̅ , where:

γ > 1 is the expansion coefficient: the ratio of [P** P̄ ] to [P* P̄ ]

c) Contraction:

If the reflection point is worse than all points except for the worst (i.e. worse than

the second worst point):

f(P*) > f(Pi), for i ≠ h

Then, define a new Ph to be either the old Ph or P* whichever is better, and

contract P* to P**:

P** = β Ph (1 – β) P̄

Pl

𝑃 Ph

P*

P**

P*

Pl

𝑃 𝑃

Pl

P*

P**

Pl

P*

Page 62: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

40

The contraction coefficient β lies between 0 and 1 and is the ratio of [P** P̄ ] to

[P* P̄ ]

d) Reduction (Resizing):

If, after contraction, the contracted point P** is found worse than the second worst

point, then replace every point i with (Pi + Pl) / 2. This contracts the entire simplex

towards the best point Pl and, thus, reduces the size of the simplex.

Reduction handles the rare case of having a failed contraction, which can

happen if one of the simplex points is much farther than the others from the

minimum (optimum) value of the function. Contraction may thus move the

reflected point away from the minimum value and, consequently, further

contractions are useless. In this case, reduction is the proposed action in [34] to

bring all points to a simpler fitness landscape.

2.8.4 Nelder-Mead Algorithm

A flowchart of the Nelder-Mead method is illustrated in Fig. 2-22, and the

corresponding MATLAB/Octave code is given in Table 2-6. This algorithm is explained

in detail in [35].

Table 2-6: Nelder-Mead Algorithm

function [x, fmax] = nelder_mead (fun, x)

% Initialization

minVal = 1e-4; % Min. value to achieve

maxIter = length(x)*200; % Max. number of iterations

n = length (x); % Problem dimension

S = zeros(n,n+1); % Empty simplex

y = zeros (n+1,1); % Empty simplex fitness

S(:,1) = x; % The initial guess

y(1) = feval (fun,x); % Evaluate the initial guess

iter = 0; % Iteration counter

for j = 2:n+1

% Build initial simplex S(j-1,j) = x(j-1) + (1/n);

y(j) = feval (fun,S(:,j));

endfor

[y,j] = sort (y,'ascend'); % Sort simplex points

S = S(:,j); % Re-arrange simplex points

alpha = 1;

beta = 1/2;

gamma = 2;

% Reflection coefficient

% Contraction coefficient

% Expansion coefficient

while (1) % Main loop

if (++iter > maxIter)

% Stop if exceeded max. iterations break;

endif

if (abs(y(1)) <= minVal)

% Stop of target min. value achieved break;

endif

Page 63: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

41

mean = (sum (S(:,1:n)')/n)'; % Calculate the mean point

Pr =(1+alpha)*mean - alpha*S(: ,n+1); % Calculate the reflected point

Yr = feval (fun,Pr); % Evaluate the reflected point

if (Yr < y(n)) % Is Reflected better than 2nd worst?

if (Yr < y(1)) % Is reflected better than best?

Pe=gamma*Pr+ (1-gamma)*mean; % Calculate expanded point

Fe = feval (fun,Pe); % Evaluate expanded point

if (Fe < y(1)) % Is expanded better than best?

S(:,n+1) = Pe; % Replace worst with expanded

y(n+1) = Fe;

else

S(:,n+1) = Pr; % Replace worst by reflected

y(n+1) = Yr;

endif

else

S(:,n+1) = Pr; % Replace worst by reflected

y(n+1) = Yr;

endif

else

if (Yr < y(n+1)) % Is reflected better than worst?

S(:,n+1) = Pr; % Replace worst by reflected

y(n+1) = Yr;

endif

Pc = beta*S(:,n+1) + (1-beta)*mean; % Calculate contracted point

Yc = feval (fun,Pc); % Evaluate contracted point

if (Yc < y(n)) % Is contracted better than 2nd worst?

S(:,n+1) = Pc; % Replace worst by contracted

y(n+1) = Yc;

else

for j = 2:n+1

% Shrink the simplex (Reduction) S(:,j) = (S(:,1) + S(:,j))/2;

y(j) = feval (fun,S(:,j));

endfor

endif

endif

[y,j] = sort(y,'ascend'); % Sort the simplex

S = S(:,j);

endwhile

x = S(:,1); % The best solution

fmax = y(1); % The minimum value

endfunction

Fig. 2-21 shows 12 of 19 iterations of a Nelder-Mead algorithm run for

minimizing the function f (x, y) = (x-5)2 + (y-8)4, with the starting point (6, 6). The reason

of selecting a starting point close to the optimum solution is just to view all simplex

updates on the same axes without shifting or scaling the axes at every iteration. This

illustrates the operation of the algorithm better. The same problem was resolved several

Page 64: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

42

times using different initial guesses, and the number of algorithm iterations is recorded

for every starting point in Table 2-7.

Table 2-7: Iteration count for different initial guesses of Nelder-Mead Algorithm

Initial Guess Number of iterations

(6, 6) 19

(0, 60) 21

(-60, 60) 32

(100, 100) 34

(-200, 300) 38

For all results in Table 2-7, the algorithm terminated after achieving the targeted

minimum value (0.0001). It never exceeded the maximum iteration count (400).

Fig. 2-21: Twelve iterations of a practical run of Nelder-Mead algorithm

(1) Initial (2) Contraction (3) Reflection (4) Contraction

(5) Contraction (6) Contraction (7) Contraction (8) Contraction

(9) Contraction (10) flection (11) Contraction (12) Reflection

Page 65: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

43

Fig. 2-22: Nelder-Mead algorithm flowchart

Contraction

Expansion

Reflection

Resizing

Replace all Pi’s by

(Pi+Pl)/2

Get fitness function f Get termination condition

Get α, β, and γ

Yes

P** = (1+γ) P* - γ P̄

y** = f (P**)

End

Start

Sort the simplex

Determine Ph and Pl

Calculate P̄

P* = (1+α) P̄ - α Ph

y* = f (P*)

yr < yn ?

yr < yl ?

y* < yl ?

y** < yl ?

Replace Ph

by P**

Replace Ph

by P*

No

Yes

No

Terminate?

y* < yh?

y** < yn?

P** = β Ph + (1-β) P̄

y** = f (P**)

Replace Ph by P*

No

Yes

Replace Ph

by P**

No Yes

Yes

No

No

Yes

Initialize the simplex

Page 66: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

44

Robocode Game

The proposed IA is tested on Robocode game [1]. It is a Java-programming game

where programmed tanks compete in a battle arena. The tanks are completely

autonomous. That is, programmers have no control over them during the battle.

Therefore, Robocode is ideal for testing IAs. Furthermore, a tank has no perfect

knowledge about the environment (i.e. the arena); its knowledge is restricted to what its

radar can read. Robocode game has been used as a benchmark in many previous works

including [4], [36], [37]. The following subsections describe the Robocode game in

detail.

2.9.1 Robot Anatomy

As shown in Fig. 2-23, every robot consists of a body, a gun, and a radar. The body

carries the gun and the radar. The radar scans for other robots and the gun shoots bullets

with a configurable speed. The energy consumption of a bullet depends on its damage

strength.

Fig. 2-23: A Robocode robot anatomy

2.9.2 Robot Code

Every robot has a main thread and up to five other running threads. The main thread

usually runs an infinite loop where actions are taken. Robocode game also provides some

listener classes for triggering specific actions at certain events such as colliding a robot,

detecting a robot, hitting a robot by a bullet, being hit by a bullet … etc.

2.9.3 Scoring

There are several factors for ranking the battling tanks in Robocode. The factor

considered in this work is the “Bullet Damage”; it is the score awarded to a robot when

one of its bullets hits an opponent.

Page 67: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

45

Chapter 3: Human Imitating Cognitive

Modeling Agent (HICMA)

Introduction

This thesis introduces a Human Imitating Cognitive Modeling Agent (HICMA): a

cognitive agent that models human behaviors by using statistical approaches. We believe

that some human-like behaviors can be represented by simple statistical models. This is

much simpler than imitating the complicated operation of mind as targeted by several

previous works. HICMA models every environmental object or phenomenon by a

statistical model, which has some parameters. The different combinations of these

parameters are used in such a way that each combination is interpreted as a human

behavior. HICMA presents an updated version of Minsky’s society of mind theory

described in section 2.2. It introduces to the agents’ society a new type of agents that is

capable of evolving other agents. This evolution is done by GA (Section 2.5) and Nelder-

Mead method (Section 2.8). Other methods like CMA-ES (Section 2.7.4) can also be

used.

Section 3.2 describes the components of HICMA and section 3.3 explains how

these components interact together.

The Structure of HICMA

For testing the behavior of HICMA, a software agent based on it was implanted in

a Robocode robot. The structure of this agent is shown in Fig. 3-1.

Fig. 3-1: The structure of HICMA’s Robocode agent

Page 68: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

46

Every agent in the society has the following attributes:

An identifier

A fitness measure

A set of evolvable parameters (e.g. maximum speed, minimum temperature …

etc.) that define the state of the agent

A default state (defined by default parameter values)

Each of agent’s parameters has the following attributes:

An identifier

An initial value

A list of evolver agents

An evolver is an agent that implements an optimization technique (e.g. Nelder-

Mead). It is responsible for adapting the parameters of another agent based on its fitness.

The fitness of an agent is calculated by its parent agent. For example, in Fig. 3-1 the

shooting agent is the parent of the modeling agent and, therefore, responsible for

evaluating it.

The following subsections describe in detail HICMA’s Robocode agents.

3.2.1 Modeling Agent

The modeling agent is responsible for building and optimizing models of the

interesting features of the environment and any interesting objects in this environment.

In this work, it models the behavior of an enemy Robocode tank (i.e. how it dodges

bullets). The block diagram of the proposed modeling agent is shown in Fig. 3-2.

Fig. 3-2: The block diagram of HICMA’s modeling agent

HICMA’s modeling agent samples the location of the target robot to construct a

model for its motion. Building motion model is done by a hybrid optimization technique

which integrates Nelder-Mead method [34] and Covariance Matrix Adaptation Evolution

Strategy (CMA-ES) [28]. Nelder-Mead method can find a local optimum whereas CMA-

ES can find the global optimum or a near-optimum solution. Nonetheless, experiments

showed that CMA-ES is less accurate than Nelder-Mead method. Therefore, the hybrid

optimization method is proposed. As shown in Fig. 3-3, the hybrid optimization method

uses CMA-ES to get a rough estimate of model parameters then uses Nelder-Mead

method to fine-tune them.

Page 69: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

47

Fig. 3-3: The hybrid optimizatiob method

A similar technique is proposed in [38] where GA is employed instead of CMA-

ES. The proposed hybrid optimization method does not have a fixed objective function.

Instead, it composes an objective function from a set of elementary mathematical

functions such as linear, polynomial, cosine, and exponential functions. As described

later, the function combination is suggested by a GA evolver agent. This allows for

selecting a good choice from a wide search space of function combinations. Every

combination is evaluated by its bullet miss rate:

𝑚𝑖𝑠𝑠 𝑟𝑎𝑡𝑒 =# 𝑏𝑢𝑙𝑙𝑒𝑡𝑠 𝑡ℎ𝑎𝑡 𝑚𝑖𝑠𝑠 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡

𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑢𝑙𝑙𝑒𝑡𝑠

Where a lower miss rate means a higher fitness.

As mentioned above, every agent has a set of evolvable parameters. Every

parameter has an evolver that searches for its optimum value. The parameters of the

modeling agent along with their evolvers are described in Table 3-1. For instance, the

optimum value of the initial guess parameter is searched by a Nelder-Mead evolver that

uses the fitness provided by the shooting agent (Refer to Fig. 3-1).

Table 3-1: The parameters of modeling agent

Symbol Description Evolver Type

function set The set of basic mathematical functions that form

the objective-function composition GA

initial guesses The initial guess of every optimized parameter Nelder-Mead

step sizes The step size of the search process of every

parameter Nelder-Mead

lower bounds The lower bound of every parameter in the search

space a Nelder-Mead

upper bounds The upper bound of every parameter in the search

space a Nelder-Mead

tolerance The maximum allowed error in the optimization

process Nelder-Mead

trials The maximum number of trials the CMA-ES

makes to find a solution b Nelder-Mead

sample count The number of samples used in the optimization

process b Nelder-Mead

a Experiments show that CMA-ES algorithm finds good solutions for almost any bounds

b More trials and larger sample size give more accurate solutions but consumes more time, so

compromising is important

Page 70: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

48

Just for studying the effect of evolving an agent, the modeling agent is the only

one, so far, allowed to evolve its parameters. Evolving the parameters of the other agents

is a future work.

3.2.2 Estimation Agent

The function of the estimation agent is to predict or estimate the future of an entity

(i.e. the environment or an environment object). It uses one or more models (built by the

modeling agent) at a time. For instance, the proposed estimation agent uses two models

of target’s motion in the arena: a model for the vertical motion of the target tank (along

y- axis) and the other model for its horizontal motion (along x-axis). The output of this

agent is an estimate or prediction of where (location) and when (time) to shoot at the

target tank. This output is sent to the shooting agent described next. Obviously, to predict

the future, the time should be considered as a parameter in the modeling process. The

block diagram of the proposed estimation agent is shown in Fig. 3-4.

Fig. 3-4: The block diagram of HICMA’s modeling agent

3.2.3 Shooting Agent

The shooting agent acts as:

1. A parent of the modeling agent

2. An interface between the estimation agent and the gun of the Robocode robot

As a parent of the modeling agent, the shooting agent is responsible for triggering

its evolvers (GA and Nelder-Mead, as given in Table 3-1) and providing these evolvers

with a fitness corresponding to its current state. The shooting agent is illustrated in Fig.

3-5.

Fig. 3-5: The block diagram of HICMA’s shooting agent

Page 71: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

49

3.2.4 Evolver Agents

As mentioned above (Refer to Table 3-1), the parameters of the modeling agent are

optimized by a Nelder-Mead evolver and a GA evolver agents. The following subsections

describe both in detail.

3.2.4.1 Nelder-Mead Agent

As described in section 2.8, Nelder-Mead is a simplex-based optimization

algorithm. Its function, as an evolver agent, is to find the optimum parameter values of

the modeling agent (Table 3-1) that optimizes its modeling process. This optimizer agent

optimizes the parameters of the modeling agent one by one; it first finds the best initial

guess that obtains a good solution quickly. Then, it finds the best step size, the best

bounds of the search space and so on.

It is important not to confuse with the Nelder-Mead as a part of the hybrid method

(CMA-ES + Nelder-Mead) implemented in the modeling agent. As a modeling method,

it is used for optimizing objective functions while, as an evolver, it is used for tuning the

parameters of another agent. The two roles are completely unrelated.

The operation of Nelder-Mead evolver agent is illustrated in Fig. 3-6. It requires

frequent evaluations of the candidate solutions (i.e. the simplex points shown in Table

2-5). Each of the candidate solution holds a parameter combination of the modeling

agent. Every candidate solution is evaluated by the shooting agent, as it is the parent of

the modeling agent.

Fig. 3-6: The operation of Nelder-Mead evolver agent

However, the parent (i.e. shooting agent) has no explicit evaluation function for its

child (the modeling agent); it must run a complete Robocode round for every candidate

solution. This requires the Nelder-Mead algorithm to interrupt at every evaluation point.

A modified version of Nelder-Mead algorithm with the interrupt points is given in

Algorithm 3-1. The algorithm is interrupted at steps 4, 7, and 11. Therefore, a single

iteration of the algorithm requires up to three Robocode rounds.

Page 72: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

50

Algorithm 3-1: A modified version of Nelder-Mead algorith with interrupt points

1. Order the points of Nelder-Mead simplex

2. Calculate x0: the center of gravity of all points except the worst xn+1

3. Calculate the reflected point xr

4. Evaluate the objective function at the reflected point xr

5. If xr is better than the second worst xn, but not better than the best x1, replace xn+1

by xr and go to step 1

6. If xr is the best point, compute the expanded point xe

7. Evaluate xe

8. If xe is better than xr, replace xn+1 by xe and go to 1

9. Else, replace xn+1 by xr and go to step 1

10. At this step, the selected point is not better than xn. Compute the contracted point

xc

11. Evaluate the objective function at the contracted point xc

12. If xc is better than xn+1, replace xn+1, by xc and go to step 1

xc is not better than xn+1, update all points of the simplex except the best x1

(reduction) and go to step 1

The definitions of reflected, expanded, and contracted points; and “Reduction” operation

are explained in section 2.8.3.

3.2.4.2 GA Evolver Agent As aforementioned, the function combination used by the hybrid optimization

method is suggested by a GA evolver. It suggests different combinations of elementary

mathematical functions until the modeling agent gets a satisfying result (i.e. a good

fitness). GA is employed for this purpose to allow for a large range of elementary

functions (although in the test case at hand only 4 functions are used). 3.2.4.2.1 Chromosome Encoding

A set of elementary functions (called function lexicon) contains the basic building

blocks of the objective-function composition. Each elementary function has an index in

the function lexicon as shown in Table 3-2.

Table 3-2: The function lexicon

Index Function Formula

0 Quadratic a.x2 + b.x + c

1 Linear a.x + b

2 Exponential a.eb.x

3 Cosine a.cos(b.x+c)

Each chromosome represents an objective-function composition. It consists of a

set (no duplicates) of integers; each represents the index of an elementary function in

the lexicon.

Fig. 3-7 shows a chromosome example with its decoded objective function.

Page 73: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

51

Fig. 3-7: A chromosome example

3.2.4.2.2 Initial Population

The initial population contains five distinct individuals generated randomly.

Every gene holds a random integer representing a function index in the function lexicon.

The size of the initial population was selected empirically; it is one of the parameters of

the GA Evolver agent, and hence, can be evolved so that the optimum population size

could be obtained. Selecting the initial population size has been extensively targeted in

[39]–[42].

3.2.4.2.3 Fitness Function

As aforementioned, the shooting agent is responsible for evaluating the modeling

agent. The fitness measure is bullet miss rate: the ratio of missed bullets to the total

number of shot bullets during a complete Robocode round. Hence, smaller fitness values

are better. Notice that evaluating one individual requires a complete Robocode round.

Therefore, every generation has only one new individual that replaces an individual from

the old generation. The new individual is then evaluated after the new round; another

individual is replaced by a new one, and so on.

3.2.4.2.4 Selection Method

Roulette-wheel selection is used; fitter individuals are more likely selected.

3.2.4.2.5 Elitism

One elite is copied to the next generation. Elitism allows the IA to use the best

solution found so far in case it had to interrupt the evolution process for any reason. For

example, if the Robocode battle (several rounds) terminated before the proposed

modeling agent gets the optimum model of the motion of the target, then trying to hit the

target using the best model on hand is better than staying inactive. This is similar to what

a human player may do in a similar situation.

3.2.4.2.6 Mutation and Crossover

Mutation is performed as following. This prevents having duplicate genes in one

chromosome:

1. Initialize a temporary lexicon (temp) with the set of all function indices in the

function lexicon, and define an empty new chromosome

2. Define a mutation constant Pmc

3. Generate a random real number R within the interval [0, 1]. If R ≤ Pmc , do the

following, otherwise go to step 4:

4. Copy a random gene (i.e. an integer) from the temporary lexicon (temp) to the

new chromosome, and remove that gene from the temporary lexicon so that it

could not be selected again.

Page 74: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

52

5. If the selected gene already exists in the new chromosome, replace it by the

next gene in the old chromosome and remove it from there.

6. If the new chromosome is not filled yet, go to step 3.

7. Remove the corresponding gene from the old chromosome and copy it to the

new one if it does not exist in the new one, otherwise go to step (3.a)

8. If the new chromosome is not filled yet, go to step 3.

Hence, mutation probability Pm of the ith gene is:

𝑃𝑚 = 𝑃𝑚𝑐 ∗ (1 −1

𝐿2) ∗ (1 − [

𝑖 − 1

𝐿∗ (1 − 𝑃𝑚𝑐)]) , 𝑓𝑜𝑟 1 ≤ i ≤ 𝐿

(3-1)

Where L is the size of the lexicon.

The constant Pmc is the mutation constant. The second term represents the

probability that the gene copied from temp lexicon differs from the old one at the same

position. The length of the lexicon is L. Therefore, the probability of selecting any index

j from the lexicon is 1

𝐿. In addition, the probability of any index (an allele) to exist in the

old chromosome at any gene is 1

𝐿. Hence, the probability that an index (allele) j is copied

from the lexicon and the same allele exists at the corresponding position in the old

chromosome is 1

𝐿2

Thus, the probability that the copied allele (from the lexicon) is not the same as the

old one is (in the old chromosome) is

1 −1

𝐿2

(3-2)

The last term is the probability that the new gene (from temp) has not previously

been copied from the old chromosome. At the ith iteration, the new chromosome is

already filled with i-1 genes selected randomly with equal probabilities (1

𝐿 for every

allele). Hence, the probability of any index (allele) to exist in the new chromosome at the

ith iteration is

𝑖 − 1

𝐿

(3-3)

At the first iteration (i=1), the new chromosome is still empty. Thus, the probability

that the copied index already exists is zero. This is clear from formula (3-3). To better

clarify this formula, assume that the length of the lexicon is 10. At the 6th iteration, 5

indices have already been copied to the new chromosome. Hence, the probability that the

6th index exists in the previous 5 ones is 1

2. Substituting into formula (3-3) gives the same

result.

Page 75: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

53

The formula (3-3) is correct if the ith index is copied from all of the L indices in

the lexicon. However, step 4 in the mutation process above states that the selected

chromosome is removed from the lexicon so that is could not be re-selected again. This

means that the only source of a repeated allele must be the old chromosome. Therefore,

the probability that the ith index is duplicated is the probability that it has been previously

selected from the lexicon (formula (3-3)) multiplied by the probability of copying from

the old chromosome:

𝑖 − 1

𝐿∗ (1 − 𝑃𝑚𝑐)

(3-4)

From formula (3-4), it is concluded that the probability that the ith index is not

duplicated is

1 − [𝑖 − 1

𝐿∗ (1 − 𝑃𝑚𝑐)]

(3-5)

For a gene in the new chromosome (copied from the lexicon) to be different from

the corresponding one in the old chromosome:

1. Mutation must occur [Pmc ]

2. The new gene must hold a different value [formula (3-2)]

3. The copied allele must be not duplicated [formula (3-5)]

Combining these three conditions gives formula (3-1). This means that the

probability of mutation decreases as we proceed from one gene to the next one.

Pmc and L are empirically selected. In the experiments, the initial value of Pmc is

0.5 and the chromosome length L is 4 as shown in the function lexicon in Table 3-2.

Fig. 3-8 shows the flowchart of the mutation process. After mutation, the new

chromosome is decoded into an objective-function composition for the modeling agent.

Then, it is evaluated by the shooting agent as aforementioned. It is noteworthy that more

decision variables, such as the starting point of Nelder-Mead’s simplex, can be encoded

as proposed in [43].

Page 76: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

54

Fig. 3-8: The flowchart of the mutation process

Page 77: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

55

The Operation of HICMA

This section describes the operation of HICMA as a Robocode robot. It describes

the interactions between its society agents.

The operation of HICMA can be divided into three phases:

1. Initialization phase

Initializing HICMA’s agents

2. Operation phase

Playing a Robocode round

3. Evolution (Learning) phase

Evaluating the performance of HICMA’s agents in the previous round

Fig. 3-9 shows a flowchart of these phases in a Robocode game run.

Fig. 3-9: Operation phases flowchart of HICMA

Initialization Phase

The initialization phase begins just after the game starts. It includes the following:

- Constructing the agents of the society

- Assigning an ID to every agent

- Setting the initial state of every agent (parameter initialization)

- Restoring the states of agents learned from previous experiences if any

Page 78: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

56

As shown in Fig. 3-11, the initialization phase starts with initializing the shooting

agent, which then initializes the estimation agent and two instances of the modeling

agent. After initialization, HICMA is ready for a Robocode battle.

Operation Phase

The operation phase shown in Fig. 3-12 is the core of the battle. After the battle

begins, the shooting agent – as the parent of other agents – initializes an agent index to

be used in the evolution phase. Next, it initializes an instance of the estimation agent and

two instances of the modeling agent. This initialization is done at the beginning of every

battle to recall any stored experience from a previous battle.

After that, the shooting agent uses the radar to sample the position of the opponent

robot. It then forwards the collected samples to the two modeling agents and sends a

command to each of them to start building a model of target’s motion path. One modeling

agent builds a model of the motion in the horizontal direction (i.e. along the x-axis) and

the other in the vertical direction (i.e. along the y-axis). What happens inside a modeling

agent is described next.

Once the shooting agent gets the motion model, it gets the current position of

HICMA’s robot in the arena. Next, the shooting agent forwards the two models along

with the current position to the estimation agent, which estimates the most suitable time

and angle to shoot at the target. This is done by simultaneously solving two equations (of

the two models) with a single unknown (time). The hybrid method is used for solving

these equations but other methods can be used. The operation of the estimation agent is

described later.

Finally, after getting an estimated time and angle for hitting the target, the shooting

agent validates this estimate against the borders of the arena and translates it to a

command to the gun. This command takes the form: “At time τ, turn to angle Θ and shoot

a bullet”.

The operation phase is repeated for every bullet while the gun is cooling down.

Therefore, it does not cause any delay.

Evolution Phase

The evolution phase represents the learning capability of the system. It is based on

the performance of an agent during its operation phase over the previous round.

As shown in Fig. 3-13, The evolution phase begins after every Robocode round.

The shooting agent uses the agent index, initialized in the operation phase, to evolve its

children agents one by one. In turn, every child evolves its parameters one by one as

shown in Fig. 3-14. Each parameter is allowed to evolve until no more improvement in

fitness is achieved. A single evolution of one parameter takes a complete one Robocode

round.

The fitness of a parameter value is determined during the round. For example, the

fitness of the first parameter (the function pair) of the modeling agent is the ratio of the

Page 79: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

57

bullets that missed the target to the total number of bullets during the previous round. It

is calculated by the shooting agent, as it is the parent of the modeling agent.

After a child agent finishes evolving all its parameters, the parent agent advances

the agent index to point to the next agent. This is shown in the last condition in Fig. 3-13.

Finally, after the parent agent finishes the evolution of all of its child agents, it

declares the end of the evolution phase. This means that there will be no more evolution

hereafter. Thus, only the operation phase will be repeated over every round until the

entire battle ends.

Modeling Agent Operation

The optimization process shown in Fig. 3-14 is the core of the modeling agent

operation. It consists of the following:

- Covariance Matrix Adaptation – Evolution Strategy (CMA-ES) algorithm

- Nelder-Mead algorithm

- An objective function

- A termination condition

- A solution vector

First, the shooting agent initializes the CMA-ES algorithm. This includes

initializing its initial guess, step size, search-space bounds, sample size … etc. Next, the

samples collected in the first loop shown in Fig. 3-12 are stored in the buffer of the

modeling agent.

The shooting agent then commands the first modeling agent to optimize its

objective function and to return the result, which models the motion of the target along

the x-axis. Next, the modeling agent runs a multi-start CMA-ES algorithm and stores the

best solution (the one with minimum optimization error). The best solution is then used

by a multi-start Nelder-Mead algorithm as its initial guess. This integration between

CMA-ES and Nelder-Mead is called the hybrid algorithm.

As mentioned in section 3.2.1, the output of the CMA-ES fed to the Nelder-Mead

is a rough estimate of the final solution. Therefore, this rough estimate might be too rough

for Nelder-Mead to converge to the global optimum. Thus, the hybrid algorithm may fall

into a local optimum. To overcome this problem, the entire optimization process (CMA-

ES + Nelder-Mead) is repeated for a number of trials to make sure that the final solution

is the optimum one. The optimization process loops until the maximum number of trials

is exceeded or a tolerable error is achieved.

Similarly, the whole optimization process is repeated for modeling target’s motion

along the y-axis. The results are then returned to the shooting agent as described in the

operation phase.

Page 80: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

58

Estimation Agent Operation

The operation of the estimation agent is similar to that of the modeling agent.

However, the objective function here is a univariate function with the time t as the solo

variable. It is composed of the following:

- The two models returned by the two modeling agents

- The speed of the bullet vb

- The time t

- The current position of the shooting robot (x, y)

It takes the following form:

𝑣𝑏 . 𝑡 = √(𝑥 − 𝑥𝑚𝑜𝑑𝑒𝑙(𝑡))2+ (𝑦 − 𝑦𝑚𝑜𝑑𝑒𝑙(𝑡))

2

(3-6)

Solving this equation gives the time at which the target robot, represented by

(xmodel, ymodel), can be at the same location of the shooter’s bullet. This process is

illustrated in Fig. 3-10. Such equation normally has several solutions. Therefore, the

estimation agent selects the soonest one (i.e. the minimum time t).

An example of an objective function of the estimation agent is given in Eq. (3-7)

11. 𝑡 = √(50 + 3. 𝑡 − 2. cos(3. 𝑡 + 4) + 2)2 + (90 − 3. 𝑡 + 2. 𝑒9.𝑡 − 5)2 (3-7)

First, the estimation agent receives a command from the shooting agent with

current (x,y) position of the shooting robot, and the x-model and y-model returned by the

modeling agent. Next, an objective function similar to Eq. (3-7) is composed and

optimized by the CMA-ES and Nelder-MEAD as shown in Fig. 3-16. The process then

continues as described in the operation of the modeling agent.

Bullet Vb.t

Shooter (x, y)

Target ( xmodel(t), ymodel(t) )

Solution (estimated location) (xs, ys)

Fig. 3-10: Solving the estimating problem

Page 81: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

59

Fig. 3-11: The initialization phase of HICMA

Page 82: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

60

Fig. 3-12: The operation phase of HICMA

Page 83: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

61

Fig. 3-13: The evolution phase of HICMA

Page 84: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

62

Fig. 3-14: The evolution sequence of parameters

Page 85: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

63

Fig. 3-15: The optimization operation of the modeling agent

Page 86: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

64

Fig. 3-16: The operation of the estimation agent

Page 87: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

65

Chapter 4: Experiments and Results

This chapter presents two types of experiments:

1. Human-similarity experiments

Two experiments that measure how and how far HICMA behaves like a

human

2. Evolution experiment

An experiment that examines the benefit of the Nelder-Mead evolver agent

Every experiment targets one or more agents of the system as shown in Fig. 4-1.

Fig. 4-1: Experiment map with HICMA’s agents

All experiments examine the behavior of HICMA agent in a Robocode robot. Only

the five agents described in section 3.2 are implemented: modeling, estimation, shooting,

Nelder-Mead evolver and GA evolver. There are no agents for dodging opponent’s

bullets. Therefore, HICMA’s robot is experimented for only its targeting efficiency. That

is, the opponent robots only dodge bullets without shooting. Two Robocode robots are

involved as opponents:

1. Shadow 3.66d: A powerful robot used in a previous Robocode targeting contest

as a reference robot. It uses a sophisticated dodging technique called “wave

surfing”. It is used for testing HICMA’s efficiency in shooting at advanced

dodging robots.

2. Walls: A simple robot that always follows the walls of the arena. It is used for

testing how efficiently HICMA can model simple behaviors

Table 4-1 gives the parameters of the Robocode simulation environment.

Experiments

Human-likeness

Human Behavior Imitation

- GA evolver Agent

Human Performance Likeness

- Modeling Agent

- Estimation Agent

- Shooting Agent

Evolution

Modeling-Agent Evolution

-Nelder-Mead evolver Agent

Page 88: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

66

Table 4-1: Robocode simulation parameters

Rule Value

Gun Cooling Rate 0.1

Inactivity Time 250

Arena Size 600 x 600

Human-Similarity Experiments

This section presents two experiments:

1. Human behavior imitation

An experiment that illustrates how HICMA imitates typical human behaviors such as

wisdom, carelessness, recklessness … etc.

2. Human performance similarity

An experiment that compares the performance of HICMA with the performance of

human players and hand-coded Robocode robots.

These experiments are described in the next two subsections.

4.1.1 Human Behavior Imitation

As mentioned in section 3.2.1, the objective function of the modeling agent is

composed of a pair of basic mathematical functions. An interesting observation is that

every function pair models a typical human behavior. To affirm that, HICMA was forced

to use every possible pair of functions. For every pair, HICMA’s behavior was observed

over ten rounds against both Shadow and Walls. The following observations are then

taken as shown in Fig. 4-2:

1. Hit %: The average percentage of bullets that hit the target (reflects absolute

efficiency)

2. Round Time: The average time that HICMA took to beat its enemy (reflects

absolute goal reaching). Notice that there is no round time in Fig. 4-2.a because

HICMA could not beat Shadow with any function pair except for pair (1, 2).

3. Avg. Shooting Interval. The average time interval between every two

consecutive bullets shot by HICMA (reflects attack tendency).

4. Avg. Shooting-Interval Standard Deviation: The standard deviation of the Avg.

Shooting Interval (reflects regularity of shooting).

Page 89: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

67

The mapping of every function pair into a certain human behavior is derived from:

1) the observations in Fig. 4-2 and 2) the behavior of HICMA in every battle. The second

point is illustrated by a video at [44].

The previous results show that every function pair imitates a human behavior. For

example, the function pair [0, 1] imitates a wise behavior. Interpreting this function pair

as “wise” behavior is justified by the fact that this function pair makes HICMA shoot

moderately (moderate average shooting interval), regularly (small standard deviation),

and wisely (relatively high hit rate). Similarly, the behavior of the function pair [1, 2] is

interpreted as tricky behavior as HICMA keeps silent for a while and suddenly shoots at

the target; it seems as if it deceives its opponent. This behavior has an average shooting

interval almost similar to that of the wise behavior. However, its standard deviation (and

its variance) is higher, meaning that it fires bullets less regularly than the wise behavior.

The titles of the imitated human behaviors are merely derived from the English

dictionary Longman. This is done by mapping the definitions of these titles with the

observations defined above. For example, Longman’s definition of the word “tricky” is

“a tricky person is clever and likely to deceive you”. This definition agrees with the

behavior of the function pair [1, 2] described above. This The other function pairs can

be mapped to human behaviors similarly by the guidance of the video at [44] and Table

4-2, where a darker color indicates a better result.

(a)

(b)

Fig. 4-2: The behavior of every function pair against: (a) Shadow and (b) Walls

Page 90: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

68

Table 4-2: Human behavior interpretation

Opponent Wise Careless Reckless Tricky Artless Unwise

Hit % Shadow 1.325 0.000 5.591 6.022 4.831 1.553

Walls 53.037 6.673 7.951 43.556 31.030 4.573

Round

Time Walls 418.90 2147.60 1970.50 481.600 617.100 1480.10

Avg.

shooting

interval

Shadow 41.400 455.900 109.500 54.800 44.800 58.400

Walls 36.600 83.900 59.100 34.500 31.200 39.300

Std.

Deviation

Shadow 8.959 166.500 38.968 12.426 8.702 10.543

Walls 2.836 13.924 11.628 4.972 4.185 3.234

Table 4-3 summarizes the observations given in Table 4-2.

Table 4-3: Description of human behaviors modeled by mathematical functions

Function Pair Description Behavior

[0,1] Moderately shoots at the target wise

[0,2] Shoots even if the probability of hitting the target is

low careless

[0,3] Shoots rashly reckless

[1,2] Suddenly shoots at the target when it gets close

enough. tricky

[1,3] Can hit the target only if its motion is very predictable artless

[2,3] Shoots almost continuously and unwisely unwise

The shooting agent is the parent of two instances of the modeling agent: one for

modeling opponent’s motion along the vertical axis (y-axis) and the other for the

horizontal axis (x-axis). The results showed that HICMA could mix two behaviors into a

new hybrid one by selecting a function pair for the x-axis different from that of the y-

axis. For instance, when battling Walls robot, HICMA tends to use pairs [0, 1] (wise) and

[1, 3] (artless) as if it feels that Walls is not tricky and there is no need to worry about it.

On the other hand, when HICMA encounters Shadow, it tends to use pair [1, 2] (tricky)

for both models, as Shadow uses its advanced technique for misleading its opponents. In

almost all battles against Shadow, the function pair [1, 2] was autonomously selected. By

experimenting other pairs manually (i.e. with GA evolver disabled), it is found that the

pair [1, 2] is the only one that could beat Shadow. This is because this pair makes HICMA

behaves like a tricky person in firing bullets suddenly and unexpectedly after some period

of silence.

Page 91: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

69

4.1.2 Human Performance Similarity

This experiment measures how much HICMA performs like a human player. A

number of competitors played ten rounds against both Shadow and Walls, and their scores

were recorded. The competitors are:

1. HICMA: The proposed agent. It autonomously optimizes its parameters from the

initial state, as given in Table 4-4, to a suitable state.

2. Three hand-coded robots:

- DrussGT [45]: A very advanced robot. It is the champion of 1-vs-1 Robocode

champion since 2008

- GresSuffurd [46]: An advanced robot that uses an advanced targeting

technique called “Guess Factor” [47]

- Wallaby [48]: A competitive robot with a sophisticated targeting technique

called “Circular Targeting” [49]

To test only the targeting functionality of these robots, they were modified to

shoot without moving.

3. Twelve human players (each played against Shadow and Walls). Their data is

shown in Table 4-5.

Table 4-4: The initial state of HICMA

Parameter Value

Function composition indices {2, 3}a

Nelder-Mead Simplex initial guess {{0.5,0.5},{5.0,0.5,0.5}}

Nelder-Mead Simplex step sizes {{5.0.0.5},{2.5,0.5,0.25}}

Nelder-Mead Simplex lower bounds {{-500, -500},{-500, -500, -500}}

Nelder-Mead Simplex upper bounds {{500, 500},{500, 500, 500}}

Tolerance b 1.0e-10

Trials b 1

Sample size 3

Table 4-5: Human Players Data

Gender Males

Age 21-27

Education 10 engineering students

2 communication engineers

Familiarity with the game Novice

The average scores of the competitors over the ten rounds are shown in Fig. 4-3.

These results are further clarified in Fig. 4-4. It is obvious that HICMA has the most

similar performance to that of the human players. Furthermore, HICMA’s behavior

Page 92: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

70

resembles the human performance in that it does not always outperform other IAs; both

HICMA and the human players performed better against Walls than against Shadow.

Fig. 4-3: Peroformace similarity with human players

Fig. 4-4: Peroformace difference with human players

Modeling-Agent Evolution

These experiments test the benefit of the Nelder-Mead evolver agent in optimizing

the modeling agent. They are conducted as follows: The first parameter of the modeling

agent (i.e. the objective-function composition) was manually set to [0, 1]. This parameter

is out of the scope of this experiment as it is evolved by the GA evolver agent (covered

in section 4.1.1). Next, the Nelder-Mead evolver agent is activated to evolve the

parameters of the modeling agent as shown in Fig. 4-5-a (the lift column). The parameters

of the Nelder-Mead agent are manually tuned as given in Table 4-6. The description of

every parameter is given in Table 3-1. The initial values are carefully set so that the step

size almost equals the difference between initial guess and the expected optimum value.

For example, the step size of the initial-guess parameter is set to 10, as the initial value

of that parameter is 10 and its expected optimum value is close to 0.

Page 93: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

71

(a1) (b1)

(a2) (b2)

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Init

ial

gues

s

Round

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Init

ial

gues

s

Round

0

2

4

6

8

10

12

18

22

26

30

34

38

42

46

50

54

58

62

66

70

74

78

82

86

90

94

98

102

106

110

Ste

p s

ize

Round

0

2

4

6

8

10

12

16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91S

tep

siz

eRound

Page 94: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

72

(a3) (b3)

(a4) (b4)

-250

-200

-150

-100

-50

0

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

Lo

wer

bo

und

Round-250

-200

-150

-100

-50

0

93 94 95 96 97 98 99 100 101 102 103 104 105 106

Lo

wr

bo

und

Round

0

50

100

150

200

250

136 137 138 139 140 141 142 143 144 145 146 147

Up

per

bo

und

Round

0

50

100

150

200

250

107

109

111

113

115

117

119

121

123

125

127

129

131

133

135

137

139

141

143

145

147

149

151

153

155

157

159

Up

per

bo

und

Round

Page 95: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

73

(a5) (b5)

(a6) (b6)

-2.00E-11

0.00E+00

2.00E-11

4.00E-11

6.00E-11

8.00E-11

1.00E-10

1.20E-10

148 149 150 151

To

lera

nce

Round-2.00E-11

0.00E+00

2.00E-11

4.00E-11

6.00E-11

8.00E-11

1.00E-10

1.20E-10

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

To

lera

nce

Round

0

5

10

15

20

25

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

Tri

als

Round

0

5

10

15

20

25

184 185 186 187 188 189 190T

rial

sRound

Page 96: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

74

(a7) (b7)

Fig. 4-5: The evolution of the parameters of the modeling agent

0

2

4

6

8

10

12

14

174 175 176 177 178 179 180 181 182 183

Sam

ple

siz

e

Round

0

2

4

6

8

10

12

14

191 192 193 194

Sam

ple

siz

e

Round

Page 97: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

75

Table 4-6: The parameters of Nelder-Mead evolver used in the experiment

Parameter Fitness Measure Initial Value Step Size Threshold a

1 Initial guesses Evaluation count b 10 10 550

2 Step sizes Evaluation count 10 5 600

3 Lower bounds Evaluation count -200 100 550

4 Upper bounds Evaluation count 200 100 580

5 Tolerance Evaluation count 1 x 10-100 1 x 10-99 350

6 Trials Minimum error c 1 1 1 x 10-11

7 Sample count Minimum error 3 5 1 x 10-11

a If the difference between the fitness of two solutions is less than the threshold, they are considered equal

b Evaluation count is the number of objective-function evaluations that the modeling agent performs to get

a solution

c The minimum error, of the modeling process, that the modeling agent could get (i.e. the error of the best

solution)

To further study the Nelder-Mead evolver, the entire experiment was repeated

again with bad parameter values as given in Table 4-7. The meaning of bad values is that

the step size is far from the difference between the initial value and the optimum value.

For example, the step-size of the initial guess parameter is set to 1, although the initial

value is 10 and the expected optimum value is close to 0. For fair comparison between

the two experiments, the thresholds and the initial values of the parameters are set

identical. The results of this experiment are shown in Fig. 4-5-b (the right column).

Table 4-7: Bad parameter values of Nelder-Mead evolver (used for verfication)

Parameter Fitness Measure Initial Value Step Size Threshold a

1 Initial guesses Evaluation count b 10 1 550

2 Step sizes Evaluation count 10 1 600

3 Lower bounds Evaluation count -200 1 550

4 Upper bounds Evaluation count 200 1 580

5 Tolerance Evaluation count 1 x 10-100 1 x 10-100 350

6 Trials Minimum error c 1 10 1 x 10-11

7 Sample count Minimum error 3 10 1 x 10-11

The following observations can be extracted from the results of both experiments

as shown in Fig. 4-5:

1. The initial guess in the first experiment was evolved to 7, which is slightly better

than in the second experiment (i.e. 9) as shown in Fig. 4-5.1. However, the

optimum initial guess is 0 and, therefore, both results are poor. Although the

algorithm was given a good step size in the first experiment, it fell in a local

optimum.

2. As shown in Fig. 4-5.2, the step size in the first experiment was evolved to 8.3. It

is slightly worse than in the second experiment. The Nelder-Mead algorithm took

only a single step away from its given initial value (10). This is a reason of why

Nelder-Mead method easily falls in local optima. Other methods such as CMA-

ES usually take several steps away from their given initial values.

Page 98: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

76

3. In the first experiment as shown in Fig. 4-5.3 and Fig. 4-5.4, evolving the upper

bound and lower bound parameters gained a small benefit. The evolver could

tighten the search space from [-200, 200] to [-166, 200]. However, this

improvement has a trivial effect on performance. Furthermore, the algorithm was

given a good step size (i.e. 100) to tighten the search space more, but it fell again

in a local optimum. In the second experiment, the evolver gained no

improvement.

4. As shown in Fig. 4-5.5, evolving the tolerance parameter aggravated its initial

value. For reducing the required number of objective function evaluations, it is

normal to tolerate a wider range of errors. Unexpectedly, the evolver tightened

the tolerance instead of loosening it. Apparently, this is because the fitness

function (i.e. a Robocode round) is very noisy; no two rounds are the same. The

evolver must have made more evaluations before it settles on a solution. This is

another reason of why Nelder-Mead method easily falls in local optima.

5. Evolving the trials parameter was quite good in the first experiment; the evolver

selected to run the modeling process once. This agrees with the manual

experiments, which showed that the hybrid method usually finds the optimum

solution by only one run. However, in the second experiment, the evolver

worsened the initial value; it selected 11 instead of 1. As shown in Fig. 4-5-b.6,

the evolver made a long step away from the initial value to a local optima. It

followed this long jump with a number of short steps; none of them could be long

enough to return to the optimum value. This may be a third reason of why Nelder-

Mead method easily falls in local optima; it tends to make consecutive

contractions (refer to section 2.8.3) when it takes a step to a worse solution. Once

the simplex is contracted, the following steps become shorter and returning to the

initial step becomes less probable.

6. In evolving the sample size parameter, the first experiment, shown in Fig. 4-5-

a.7, could get rather good results. It selected a sample size of 8 instead of 3 (the

initial guess). This is normal as collecting more samples decreases the error and

vice versa. However, in the second experiment shown in Fig. 4-5-b.7, the evolver

settled on a sample size of 3, which seems insufficient for running a good

modeling process but, anyway, better than the initial guess.

It is concluded from the previous observations that the benefit of the Nelder-Mead

evolver mainly depends on its own parameters. This is because the Nelder-Mead method

can easily fall in a local optimum as mentioned in section 2.8. This calls for either of the

following solutions:

- Using CMA-ES or the hybrid method (explained in section 3.2.1) instead of

Nelder-Mead method

- Fine-tuning the Nelder-Mead evolver by another evolver

The second solution seems less realistic as the new evolver may also need a third

evolver. On the other hand, CMA-ES and the hybrid method are less sensitive to their

initial parameters and, usually, converge to global optima.

Page 99: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

77

Chapter 5: Conclusions and Future Work

Conclusions

Statistical modeling techniques can greatly enhance human-behavior imitation of

intelligent agents. They provide not only a simple method for modeling human behaviors

but also robust tools for adapting the behavior of the agent. This work introduces an

intelligent agent that incorporates statistical modeling techniques into a multi-agent

system. It presents a Human Imitating Cognitive Modeling Agent (HICMA) that

represents an enhanced model of the society of mind theory. HICMA introduces a new

type of agents called the evolver agent, which is the source of behavior and performance

evolution. HICMA was implemented and tested in a Robocode robot. It comprises five

agents: modeling agent, estimation agent, shooting agent, and two novel evolver agents.

The modeling agent observes the location of the enemy while moving in the arena

and builds a model for its motion path. This is done by a hybrid optimization method of

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and Nelder-Mead method.

Combining these two methods gains the global search of CMA-ES along with the speed

of Nelder-Mead.

The estimation agent uses the model built by the modeling agent to predict the

future path of the enemy. It finds the soonest time at which HICMA’s bullet can catch

the enemy and estimates the location of the enemy at that time. This estimation is made

also by the hybrid optimization method used by the modeling agent.

The shooting agent acts as the interface between the estimation agent and the gun

of the Robocode robot. It converts the location estimated by the estimation agent into a

rotation angle of the gun.

The two evolvers use GA and Nelder-Mead method to adapt the parameters of the

modeling agent. The GA evolver builds the objective function of the modeling agent. The

different forms of this objective function reflect different human-like behaviors. The

Nelder-Mead evolver finds the optimum values of some parameters of the hybrid method

in the modeling agent such as the sample size, the initial guess … etc.

Results show that HICMA behaves humanly. It evolved behaviors that could be

mapped to typical human behaviors such as wisdom, carelessness … etc. Furthermore, it

is found that the employed evolution technique autonomously produces the behavior that

is most suitable for the given situation.

Page 100: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

78

Future Work

The proposed system (HICMA) implemented five agents in a Robocode robot:

shooting, modeling, estimation, GA evolver, and Nelder-Mead evolver. A future work is

to add one or more agents for maneuvering so that HICMA can also dodge opponent’s

bullets and participate in real 1-vs-1 Robocode battles.

Currently, the modeling agent is the only evolvable one. A future work is to add

the evolution functionality to shooting and estimation agents. In addition, evolving the

evolver agents is expected to enhance the performance of the entire system.

HICMA could gracefully imitate human behaviors. However, statistical modeling

alone seems insufficient to perform well in a real world such as a Robocode battle. This

suggests integrating statistical models with other techniques like ANN. This cooperation

may greatly enhance the performance of the intelligent agent.

The hybrid optimization method (Nelder-Mead + CMA-ES) is used for modeling

environmental phenomena. It finds a model that relates an output phenomenon to a set of

input features. In this work, the inputs and the output of the model is manually specified.

However, a feature-selection agent can do this task autonomously. It can collect samples

from all available sensors (e.g. temperature, humidity, light … etc.) and relate them to

the available actuators (e.g. motor, gun, heater … etc.). Thus, the modeling agent can

learn how to control every feature. For example, a robot can learn how to conserve its

energy by learning the relation between the battery consumption rate and the slope of the

path.

The results presented in section 4.2 reflect the poor performance of Nelder-Mead

method as an evolver. This calls for adding a new evolver based on CMA-ES or the

hybrid method rather than Nelder-Mead method.

For testing the general principle of the evolver agent, a Nelder-Mead evolver is

used for adapting the parameters of the CMA-ES in the Modeling agent. However, the

CMA-ES is self-adaptive and most of its parameters need no external evolution. For

example, the step-size of its search process is automatically updated at every iteration. A

future work is to confine the evolution to non-self-adaptive parameters such as the sample

size. This suggests adding a flag to every parameter of an agent to indicate whether this

parameter is evolvable or not.

Page 101: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

79

References

[1] “Robocode Home.” [Online]. Available: http://robocode.sourceforge.net/.

[2] M. Minsky, “K-Lines : A Theory of Memory *,” Cogn. Sci., vol. 33, pp. 117–133,

1980.

[3] J. Ortega, N. Shaker, J. Togelius, and G. N. Yannakakis, “Imitating human playing

styles in Super Mario Bros,” Entertain. Comput., vol. 4, no. 2, pp. 93–104, Apr.

2013.

[4] Y. Shichel, E. Ziserman, and M. Sipper, “GP-Robocode : Using Genetic

Programming to Evolve Robocode Players,” Proc. 8TH Eur. Conf. Genet.

Program., pp. 143–154, 2005.

[5] Y. Shichel and M. Sipper, “GP-RARS: evolving controllers for the Robot Auto

Racing Simulator,” Memetic Comput., vol. 3, no. 2, pp. 89–99, May 2011.

[6] A. Agapitos, M. O. Neill, A. Brabazon, T. Theodoridis, M. O’Neill, A. Brabazon,

and T. Theodoridis, “Learning environment models in car racing using stateful

GA,” 2011 IEEE Conf. Comput. Intell. Games, pp. 219–226, Aug. 2011.

[7] M. Ebner and T. Tiede, “Evolving Driving Controllers using Genetic

Programming,” Comput. Intell. Games, pp. 279–286, 2009.

[8] A. Agapitos, J. Togelius, and S. M. Lucas, “Evolving Controllers for Simulated

Car Racing using Object Oriented Genetic Programming Categories and Subject

Descriptors,” GECCO ’07 Proc. 9th Annu. Conf. Genet. Evol. Comput., vol. 2, pp.

1543–1550, 2007.

[9] B. Chaperot and C. Fyfe, “Motocross and Artificial Neural Networks,” in Game

Design and Technology Workshop, 2008.

[10] R. Hecht-Nielsen, “The Mechanism of Thought,” 2006 IEEE Int. Jt. Conf. Neural

Netw. Proc., no. May, pp. 419–426, 2006.

[11] L. I. Perlovsky, “Toward physics of the mind: Concepts, emotions, consciousness,

and symbols,” Phys. Life Rev., vol. 3, no. 1, pp. 23–55, Mar. 2006.

[12] L. Perlovsky, “Modeling Field Theory of Higher Cognitive,” Artif. Cogn. Syst.,

pp. 65–106, 2007.

[13] Y. Wang, S. Member, Y. Wang, S. Patel, and D. Patel, “A Layered Reference

Model of the Brain ( LRMB ),” IEEE Trans. Syst. Man, Cybern. (Part C), vol. 36,

no. 2, 2006.

Page 102: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

80

[14] Y. Wang and V. Chiew, “On the cognitive process of human problem solving,”

Cogn. Syst. Res., vol. 11, no. 1, pp. 81–92, Mar. 2010.

[15] Y. Wang, “A cognitive Informatics reference Model of Autonomous Agent

Systems ( AAS ),” Int’l J. Cogn. Informatics Nat. Intell., vol. 3, no. March, pp. 1–

16, 2009.

[16] Y. Wang, “formal rtpA Models for a set of Meta-Cognitive processes of the brain,”

Int’l J. Cogn. Informatics Nat. Intell., vol. 2, no. December, 2008.

[17] T. L. Griffiths, C. Kemp, and J. B. Tenenbaum, “Bayesian models of cognition,”

Cambridge Handb. Comput. Cogn. Model., pp. 1–49, 2008.

[18] M. B. Fayek and O. S. Farag, “HICMA : A Human Imitating Cognitive Modeling

Agent using Statistical Methods and Evolutionary Computation,” Comput. Intell.

Human-like Intell. (CIHLI), 2014 IEEE Symp., pp. 4–9, Dec. 2014.

[19] P. S. Holzman, “Personality,” Encyclopedia Britannica. Encyclopædia Britannica,

Inc., 2013.

[20] A. Ortony, “On Making Believable Emotional Agents Believable,” Emotions in

humans and artifacts. pp. 189–212, 2003.

[21] K.-H. Lee, “Evolutionary algorithm for a genetic robot’s personality,” Appl. Soft

Comput., vol. 11, no. 2, pp. 2286–2299, Mar. 2011.

[22] Oxford Dictionary, “definition of ambience in English.” [Online]. Available:

http://www.oxforddictionaries.com/definition/english/ambience.

[23] Encyclopedia Britannica, “Ubiquitous - Merriam-Webster.” [Online]. Available:

http://www.merriam-webster.com/dictionary/ubiquitous.

[24] I. Guyon, “An Introduction to Variable and Feature Selection,” J. Mach. Learn.

Res., vol. 3, pp. 1157–1182, 2003.

[25] M. Melanie, “An introduction to genetic algorithms,” Comput. Math. with Appl.,

vol. 32, p. 133, 1996.

[26] H.-G. Beyer, “Evolution strategies,” Scholarpedia, vol. 2, no. 8, p. 1965, 2007.

[27] N. HANSEN and A. Auger, “Evolution Strategies and CMA-ES (Covariance

Matrix Adaptation),” in Proceedings of the 2014 Conference Companion on

Genetic and Evolutionary Computation Companion, 2014, pp. 513–534.

[28] N. Hansen and a Ostermeier, “Completely derandomized self-adaptation in

evolution strategies.,” Evol. Comput., vol. 9, no. 2, pp. 159–195, 2001.

Page 103: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

81

[29] N. Hansen, “An analysis of mutative sigma-self-adaptation on linear fitness

functions.,” Evol. Comput., vol. 14, pp. 255–275, 2006.

[30] A. Chotard Alexandre, A. Auger, and N. Hansen, “Cumulative Step-size

Adaptation on Linear Functions: Technical Report,” Springer, Jun. 2012.

[31] D. V Arnold, “Evolution Strategies with Cumulative Step Length Adaptation on

the Noisy Parabolic Ridge Technical Report CS-2006-02 Evolution Strategies

with Cumulative Step Length Adaptation on the Noisy Parabolic Ridge,”

Computer (Long. Beach. Calif)., pp. 1–30, 2006.

[32] N. Hansen, “The CMA Evolution Strategy : A Tutorial,” 2011.

[33] Nikolaus Hansen, “CMA-ES Source Code,” 2015. [Online]. Available:

https://www.lri.fr/~hansen/cmaes_inmatlab.html. [Accessed: 30-Jan-2015].

[34] B. J. A. Nelder and R. Meadf, “A simplex method for function minimization,”

Comput. J., vol. 7, no. 4, pp. 308–313, 1965.

[35] J. E. J. Dnnis and D. J. Woods, “optimization on microcomputers-Nelder

Mead.pdf,” in ARO Workshop on Microcomputers, 1985.

[36] A.-H. Tan and G.-W. Ng, “A Biologically-Inspired Cognitive Agent Model

Integrating Declarative Knowledge and Reinforcement Learning,” 2010

IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol., pp. 248–251, Aug.

2010.

[37] V. Alexiev, “Machine Learning through Evolution : Training Algorithms through

Competition,” Trinity University, 2013.

[38] N. E. Mastorakis, “On the solution of ill-conditioned systems of linear and non-

linear equations via genetic algorithms (GAs) and Nelder-Mead simplex search,”

EC’05 Proc. 6th WSEAS Int. Conf. Evol. Comput., pp. 29–35, Jun. 2005.

[39] M. O. Odetayo, “Optimal population size for genetic algorithms: an

investigation,” in Genetic Algorithms for Control Systems Engineering, IEE

Colloquium on, 1993, pp. 2/1–2/4.

[40] J. T. Alander, “On optimal population size of genetic algorithms,” in CompEuro

’92 . “Computer Systems and Software Engineering”,Proceedings., 1992, pp. 65–

70.

[41] O. Roeva, S. Fidanova, and M. Paprzycki, “Influence of the population size on the

genetic algorithm performance in case of cultivation process modelling,” in

Computer Science and Information Systems (FedCSIS), 2013 Federated

Conference on, 2013, pp. 371–376.

Page 104: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

82

[42] C. R. Reeves, “Using Genetic Algorithms With Small Populations,” in

Proceedings of the Fifth International Conference on Genetic Algorithms, 1993,

pp. 92–99.

[43] N. Maehara and Y. Shimoda, “Application of the genetic algorithm and downhill

simplex methods (Nelder–Mead methods) in the search for the optimum chiller

configuration,” Appl. Therm. Eng., vol. 61, no. 2, pp. 433–442, Nov. 2013.

[44] O. S. Farag, “Human-like behavior in Robocode - HICMA,” 2015. [Online].

Available: https://youtu.be/FNM8z97momM.

[45] Skilgannon, “DrussGT - RoboWiki,” 2013. [Online]. Available:

http://robowiki.net/wiki/DrussGT.

[46] GrubbmGait, “GresSuffurd - RoboWiki,” 2013. [Online]. Available:

http://robowiki.net/wiki/GresSuffurd. [Accessed: 01-Jan-2015].

[47] RoboWiki, “GuessFactor Targeting Tutorial - RoboWiki,” 2010. [Online].

Available: http://robowiki.net/wiki/GuessFactor_Targeting_Tutorial. [Accessed:

01-Jan-2015].

[48] Wompi, “Wallaby - RoboWiki,” 2012. [Online]. Available:

http://robowiki.net/wiki/Wallaby. [Accessed: 01-Jan-2015].

[49] RoboWiki, “Circular Targeting/Walkthrough - RoboWiki,” 2009. [Online].

Available: http://robowiki.net/wiki/Circular_Targeting/Walkthrough. [Accessed:

01-Jan-2015].

[50] R. Steuer, J. Kurths, C. O. Daub, J. Weise, and J. Selbig, “The mutual information:

detecting and evaluating dependencies between variables.,” Bioinformatics, vol.

18 Suppl 2, pp. S231–40, Jan. 2002.

[51] H. C. Peng, F. H. Long, and C. Ding, “Feature selection based on mutual

information: Criteria of max-dependency, max-relevance, and min-redundancy,”

IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, 2005.

[52] G. Herman, B. Zhang, Y. Wang, G. Ye, and F. Chen, “Mutual information-based

method for selecting informative feature sets,” Pattern Recognit., vol. 46, no. 12,

pp. 3315–3327, Dec. 2013.

[53] A. Khan, M. Ishtiaq, and M. A. Jaffar, “A Hybrid Feature Selection Approach by

Combining miD and miQ,” IEEE ICET, 2010.

[54] H. Li, X. Wu, Z. Li, and W. Ding, “Group Feature Selection with Streaming

Features,” 2013 IEEE 13th Int. Conf. Data Min., pp. 1109–1114, Dec. 2013.

[55] K. H. Knuth, “Optimal Data-Based Binning for Histograms.” Departments of

Physics and Informatics, University at Albany, 2013.

Page 105: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

83

[56] L. Birgé, Y. Rozenholc, L. Birge, and Y. Rozenholc, “HOW MANY BINS

SHOULD BE PUT IN A REGULAR HISTOGRAM,” ESAIM Probab. Stat., vol.

10, no. February, p. 22, Jan. 2006.

[57] L. Devroye and G. Lugosi, “Bin width selection in multivariate histograms by the

combinatorial method,” Test, vol. 13, no. 1, pp. 129–145, 2004.

[58] B. Silverman, “Density estimation for statistics and data analysis,” Chapman Hall,

vol. 37, no. 1951, pp. 1–22, 1986.

[59] B. E. Hansen, “Lecture Notes on Nonparametrics,” Univ. Wisconsin, 2009.

[60] Y. Soh, Y. Hae, A. Mehmood, R. H. Ashraf, and I. Kim, “Performance Evaluation

of Various Functions for Kernel Density Estimation,” Open J. Appl. Sci., vol.

2013, no. March, pp. 58–64, 2013.

[61] “Comparison of Kernel Density Estimators,” Thail. Stat., vol. 8, no. July, pp. 167–

181, 2010.

[62] M. Clark, “A comparison of correlation measures,” Center for social research,

2013. [Online]. Available:

http://www3.nd.edu/~mclark19/learn/CorrelationComparison.pdf. [Accessed: 20-

Apr-2015].

Page 106: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

84

Page 107: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

85

Feature Selection

A.1. Mutual Information

Mutual Information (MI) provides a general measure of dependency between

variables [50]. It has been widely used in feature selection [51]–[54]. The mathematical

definition of MI between two discrete random variables X and Y is:

𝐼(𝑋; 𝑌) = ∑ ∑𝑝(𝑥, 𝑦) log (𝑝(𝑥, 𝑦)

𝑝(𝑥). 𝑝(𝑦))

𝑥∈𝑋𝑦 ∈𝑌

Where:

p(x) is the marginal probability distribution function of the variable X. It is the

probability that the variable X has the value x.

p(x, y) is the joint probability distribution function of the variables X and Y. It is

the probability that the variables X and Y have the values x and y

The main disadvantage of MI is the relative complexity of estimating p(x) and

p(x,y). For estimating these probabilities, several methods are used such as histograms

and kernels described in the next subsections.

A.1.1. Histogram Density Estimation

Histogram is the most basic density estimation method. It is a graphical

representation of the distribution of variables (features). Fig. A-1 shows the probability

distribution of a variable (solid line) and a histogram representing that distribution (bars).

The horizontal axis represents the values of the variable and the vertical axis represents

the probability of these values.

A histogram is constructed as follows:

1. Divide the domain of the sampled data into equal b intervals called bins.

Fig. A-1: A histogram

Page 108: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

86

2. Select the starting point of the histogram xo (the origin of the histogram)

3. For every sample, add a block with size 1.0 to the bin that contains this sample in

its interval.

A histogram is formally defined as following:

𝑓(𝑥) =1

𝑛ℎ∑∑𝐼(𝑥 ∈ 𝐵𝑖). 𝐼(𝑋𝑗 ∈ 𝐵𝑖)

𝑛

𝑗=1

𝑏

𝑖=1

(A-1)

Where:

- b is the number of bins

- n is the number of samples

- Bi is the ith bin of the histogram

- Xj is the jth sample of the variable X

- I is the indicator function. 𝐼( 𝑥 ∈ 𝐴) = {1 𝑖𝑓 𝑥 ∈ 𝐴0 𝑖𝑓 𝑥 ∉ 𝐴

- h is the width of a bin. There are several method for selecting the optimal bin

size [55]–[57].

The first summation in Eq. (A-1) finds the bin to which x belongs. The second summation

finds how many samples belong to that bin. Then, the function 𝑓(𝑥) counts the

percentage of observations that are close to x.

The final histogram appears like that in Fig. A-1. Every column represents the

number of samples that falls within its interval. Such histogram can estimate the marginal

probability distribution p(x) of a single variable x. The same method can be extended to

estimate the joint probability distribution p(x1, x2, … xd) of d variables. In this case, the

bin size becomes a d-dimensional vector d = (d1, d2, … d3). A 2D histogram is shown in

Fig. A-2.

Page 109: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

87

Despite the simplicity of histograms, these have some drawbacks:

1. Sensitive to bin size

2. Sensitive to the selection of its origin (Examine Fig. A-2 (a) and (b))

3. Not smooth

More details about histogram density estimation is provided in [58].

A.1.2. Kernel Density Estimation

The idea of Kernel Density Estimation (KDE) is rather similar to that of

histograms. However, instead of stacking rectangular blocks, KDE stacks a different

form of a function called kernel. A function k(u) can be used as a KDE function if it

satisfies the following criteria [59]:

1. Its integral ∫ 𝑘(𝑢)∞

−∞= 1

2. Non-negative: k(u) ≥ 0, for all u

3. Symmetric: k(u) = - k(u)

Common kernel functions used in density estimation are given in Table A-1. A complete

list of KDE’s is given in [59], [60].

Fig. A-3 illustrates the estimation of a probability distribution using the Gaussian kernel

function.

(a) (b)

Fig. A-2: A 2D histogtam with origin at (a) (-1.5, -1.5) and (b) (-1.625, -1.625)

By Drleft (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL

(http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

Page 110: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

88

Table A-1: Common Kernel density functions

Kernel Equation Diagram

Uniform k(u) = 0.5, |u| ≤ 1

Gaussian 𝑘(𝑢) =1

√2𝜋𝑒−

12𝑢2

Epanechnikov 𝑘(𝑢) =3

4(1 − 𝑢2), |𝑢| ≤ 1

KDE is used as follows:

4. Choose a kernel function k (Uniform, Gaussian, Epanechnikov … etc.)

A comparison between the different kernel functions is given in [60], [61].

Epanechnikov and Gaussian kernels are the most common ones.

5. Estimate probability distribution by a function 𝑓(𝑥) which counts the percentage

of observations that are close to x:

𝑓(𝑥) =1

𝑛. ℎ∑𝑘 (

𝑥𝑖 − 𝑥

ℎ)

𝑛

𝑖=1

(A-2)

Fig. A-3: Kernel density estimation. The density distribution (dotted curve) is estimated by the

accumulation (solid curve) of Gaussian function curves (dashed curves)

Page 111: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

89

Where:

- n is the number of samples

- h is the bandwidth of the kernel. It controls the degree of smoothness of the

estimation

6. Estimate the bandwidth using Silverman’s rule of thumb:

ℎ = �̂�𝐶(𝐾)𝑛−15 , 𝑤ℎ𝑒𝑟𝑒:

- �̂� = √1

𝑛−1∑ (𝑥𝑖 − �̅�)2𝑛

𝑖=1 is the sample standard deviation of x; the standard

deviation calculated from a sample of x. n is the number of samples. xi is the

ith sample of x and �̅� is the mean of all samples of x.

- Cv (k) is the rule-of-thumb constant as given in Table A-2.

Table A-2: Rule-of-thumb constants

Kernel (K) C

Uniform 1.84

Gaussian 1.06

Epanichnekov 2.34

Formula (A-2) estimates the marginal probability distribution function p(x) of a

single variable x. It can be extended to estimate the joint probability distribution function

p(x1, x2,… xd) of d variables as follows:

𝑓(𝑥) =1

𝑛|𝐻|∑[∏𝑘(

𝑋𝑖 − 𝑥

ℎ𝑗)

𝑑

𝑗=1

]

𝑛

𝑖=1

, 𝑤ℎ𝑒𝑟𝑒 (A-3)

- H = (h1, h2, … hd) is the bandwidth vector

- x = (x1, x2, … xd) is the feature vector

The bandwidth hj of a variable xj is calculated using the rule-of-thumb:

ℎ𝑗 = 𝜎�̂�𝐶(𝑘, 𝑑)𝑛−

1(4+𝑑) , 𝑤ℎ𝑒𝑟𝑒:

- d is the dimension of the variable vector (i.e. number of variables)

- 𝐶(𝑘, 𝑑) is the rule-of-thumb constant:

𝐶(𝐾, 𝑑) = (4. 𝜋

𝑞22𝑞+1𝑅(𝐾)𝑑

2. 𝐶𝑘2(𝐾)(3 + (𝑑 − 1))

)

14+𝑑

Page 112: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

90

R(K) and Ck(K) are called the roughness and the second moment respectively. Their

values are given in Table A-3.

Table A-3: Common kernel constants

Kernel (K) R(K) Ck(K)

Uniform 1/2 1/3

Gaussian 1/2√𝜋 1

Epanichnekov 3/5 1/5

A 2-D KDE is shown in Fig. A-4.

KDE has the following advantages over histograms [50]:

1. Better mean square error

2. Insensitive to the choice of the origin

3. The ability to specify more sophisticated window shapes than the rectangular

window

However, the main disadvantage of KDE is its complexity. It requires too many

calculations and, consequently, not suitable for real-time applications such as robotics

and computer games.

(a) (b)

Fig. A-4: 2D kernel density estimate (a) individual kernels and (b) the final KDE

By Drleft (talk) 00:04, 16 September 2010 (UTC) (Own work) [GFDL

(http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 4.0-3.0-2.5-2.0-1.0

(http://creativecommons.org/licenses/by-sa/4.0-3.0-2.5-2.0-1.0)], via Wikimedia Commons

Page 113: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

91

A.2. Correlation

Correlation represents how closely two variables co-vary. It ranges between -1

and +1 , where +1 means perfect positive correlation, -1 means perfect negative

correlation, and 0 means no correlation. The main advantage of correlation is that it is

much less complex than Mutual Information. This is because it does not involve the time-

consuming process of density estimation. Two types of correlation are experimented in

this work: Pearson Correlation Coefficient and Distance Correlation. They are

overviewed in the next two subsections.

A.2.1. Pearson Correlation Coefficient (PCC)

Pearson Correlation Coefficient (PCC) is the most common type of correlation

measures. However, it is limited to linear relationships between variables (features).

PCC is defined as the covariance ρ between two variables divided by the product

of their standard deviations:

𝜌(𝑋, 𝑌) =𝑐𝑜𝑣(𝑋, 𝑌)

𝜎𝑥𝜎𝑦=

∑ (𝑥𝑖 − �̅�)(𝑦𝑖 − 𝑦 )𝑛𝑖=1

√∑ (𝑥𝑖 − �̅�)2𝑛𝑖=1 √∑ (𝑦𝑖 − 𝑦 )2𝑛

𝑖=1

(A-4)

Fig. A-5 shows different relations between two variables. The graphs in the first

row illustrate different strengths of correlations between the two variables. The more

linear the relation, the stronger the correlation. The second row illustrates perfect

correlations. Notice that there is no correlation in the middle graph because one of the

variables has zero variance (i.e. constant). In the third row, the two variables seem related.

However, there are no correlations because there are no linear relationships between

them. This reflects the fact that PCC is concerned with only linear relationships.

Fig. A-5: Pearson Correlations of different relationships between two variables

By DenisBoigelot, original uploader was Imagecreator (Own work, original uploader was

Imagecreator) [CC0], via Wikimedia Commons

Page 114: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

92

A.2.2. Distance Correlation

Distance Correlation is a relatively new measure of statistical dependence between

two variables. In contrast to Pearson Correlation, it gives a zero correlation if and only if

the two variables are statistically independent. It also takes into account the non-linear

relationships between variables as shown in Fig. A-6. However, experiments in [62]

show that mutual information gives stronger dependency measure than distance

correlation.

The author also compared Mutual Information with Distance Correlation using

similar non-linear patterns. The results are shown in Fig. A-7.

Fig. A-6: Distance Correlation of linear and non-linear relationships

By Naught101 [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0

http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Fig. A-7: Murual Information vs Distance Correlation as dependence measures

Page 115: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

93

The author suggests that: “Where distance correlation might be better at detecting

the presence of (possibly weak) dependencies, the MIC is more geared toward the

assessment of strength and detecting patterns that we would pick up via visual

inspection”.

Page 116: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 117: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

أ

أسامة صالح الدين فرج مهندس: 1987 / 2/ 11 تاريخ الميالد:

مصرى الجنسية: 2010/ 10/ 1 تاريخ التسجيل:

2015 / / تاريخ المنح: هندسة الحاسبات القسم: ماجستير العلوم الدرجة:

المشرفون: ا.د. ماجدة بهاء الدين فايق

الممتحنون:

ة عبد الرازق مشالىا.د. سامي ا.د. محمد مصطفى صالح ا.د. ماجدة بهاء الدين فايق

عنوان الرسالة:

تحسين الوكيل الذكي بتطوير محاكاة سلوك اإلنسان باستخدام تقنيات النمذجة اإلحصائية

الكلمات الدالة:

الوكيل الذكى؛ الوكيل المعرفى؛ تمثيل السلوك اإلنسانى؛ الحسابات التطورية؛ التعلم اآللى ملخص الرسالة:

ذا البحث يقدم طريقة جديدة لتمثيل السلوك اإلنسانى ال تعتمد على نظريات العقل العصبية. هذه هالطريقة تدمج طرق نمذجة إحصائية مع نظرية "مجتمع العقل" لبناء نظام يحاكى سلوك اإلنسان. إن الوكيل الذكى اإلدراكى المحاكى لسلوك اإلنسان و المعروض فى هذا البحث يستطيع تعديل

سلوكه تلقائيا بناءا على الموقف الذى يواجهه.

Page 118: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ب

Page 119: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ج

ملخص الرسالة

إن الذكاء البشرى هو أعظم مصدر إللهام الذكاء الصناعى. حيث ان بناء نظم تتصرف تصرفات إنسانية يعد من طموحات جميع باحثى الذكاء الصناعى. و لهذا الغرض يتبع الباحثون أحد

الحيوية للعقل أو إيجاد طريقة للحصول على ذكاء صناعى -بيةاإلتجاهين: إما دراسة النظريات العص مشابه لذكاء اإلنسان بطرق غير مناظرة لنظريات العقل العصبية.

هذا البحث يتبع المذهب الثانى حيث يتم استخدام طرق إحصائية لتمثيل السلوك اإلنسانى. هذا لغير حيوية لبناء و تحسين نماذج يدمج عددا من التقنيات ا HICMAالبحث يقدم وكيال ذكيا يسمى

للبيئة المحيطة بالوكيل الذكى. بتغيير معامالت هذه النماذج يمكن الحصول على سلوك مشابه يعتبر مجموعة أو مجتمع من عدة وكالء تتفاعل HICMAللسلوك البشرى. و الوكيل المقدم هنا

جديد حسن هذه النظرية بتقديم نوعمعا. و هذا الوكيل الذكى مبنى على نظرية "مجتمع العقل" حيث يمن الوكالء: الوكيل المطِور. و الوكيل المطِور هو نوع خاص من الوكالء وظيفته هى تعديل أو

ضبط وكالء آخرين بناءا على المشكلة التى يواجهها النظام الذكى.

لعميل المطِور أن لإن الطريقة البسيطة التى يتبعها النظام المقدم هنا لتمثيل السلوك اإلنسانى تتيح ُيكسب النظام شخصية أو سلوكا إنسانيا مناسبا للمشكلة التى يواجهها.

Page 120: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 121: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

ستخدام تقنيات النمذجة تحسين الوكيل الذكي بتطوير محاكاة سلوك اإلنسان با اإلحصائية

إعداد

أسامة صالح الدين فرج رسالة مقدمة إلى

جامعة القاهرة –كلية الهندسة ماجستير العلومحصول على درجة الكجزء من متطلبات

فى هندسة الحاسبات

يعتمد من لجنة الممتحنين

المشرف الرئيسى أستاذ دكتور/ ماجدة بهاء الدين فايق

أستاذ دكتور / سامية عبد الرازق مشالى

الحاسبات و المنظومات, معهد بحوث اإللكترونياتقسم -

أستاذ دكتور / محمد مصطفى صالحعمليات و دعم القرار, كلية الحاسبات و بحوث القسم -

المعلومات, جامعة القاهرة جامعة القاهرة –كلية الهندسة

جمهورية مصر العربية –الجيزة 2015

Page 122: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 123: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

تحسين الوكيل الذكي بتطوير محاكاة سلوك اإلنسان باستخدام تقنيات النمذجة اإلحصائية

إعداد أسامة صالح الدين فرج

رسالة مقدمة إلىجامعة القاهرة –كلية الهندسة

ماجستير العلومكجزء من متطلبات الحصول على درجة فى

هندسة الحاسبات

تحت إشراف أستاذ دكتور

ماجدة بهاء الدين فايقجامعة القاهرة –كلية الهندسة

جامعة القاهرة –كلية الهندسة

جمهورية مصر العربية –الجيزة 2015

Page 124: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques
Page 125: Enhancing Intelligent Agents By Improving Human Behavior  Imitation Using Statistical Modeling Techniques

تطوير محاكاة سلوك اإلنسان باستخدام تقنيات النمذجة تحسين الوكيل الذكي ب اإلحصائية

إعداد أسامة صالح الدين فرج

رسالة مقدمة إلىجامعة القاهرة –كلية الهندسة

ماجستير العلومكجزء من متطلبات الحصول على درجة فى

هندسة الحاسبات جامعة القاهرة –كلية الهندسة

صر العربيةجمهورية م –الجيزة 2015