Evolution, Learning and Imitation in Autonomous Agents · Evolution, Learning and Imitation in...

Evolution, Learning and Imitation in Autonomous Agents

Elhanan BorensteinSchool of Computer Sciences, Tel Aviv University, Tel-Aviv 69978, Israel

[email protected]

Research proposal for Ph.D. thesis in Computer Scienceunder the supervision of Prof. Eytan Ruppin

November 9, 2003

Contents

1 Introduction 2

2 Background 22.1 Evolution and Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Evolution and Imitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Learning by Imitation: A Neuroscience Perspective . . . . . . . . . . . . . . . . . . . 42.4 Evolutionary Autonomous Agents - EAA . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Research Objectives and Expected Significance 63.1 General motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Specific Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.1 Study the Effects of Learning by Imitation on the Evolutionary Process . . . 63.2.2 Study the Emergence of Imitative Behavior Mechanisms in EAAs . . . . . . . 7

3.3 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Preliminary Work 94.1 Learning decreases evolutionary bottlenecks and accelerates evolution . . . . . . . . 9

4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.2 Model and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Enhancing Autonomous Agents Evolution with Learning by Imitation . . . . . . . . 174.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2.3 The Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Future Research Plan 215.1 The Interplay between Imitation and Evolution – Extending Previous Work . . . . . 21

5.1.1 Construct a Framework to Analyze the Effect of Learning on Evolution . . . 215.1.2 Examine the Effect of Learning by Imitation on Evolution . . . . . . . . . . . 21

5.2 The Emergence of Imitative Behavior – Work in Progress . . . . . . . . . . . . . . . 215.2.1 Study the Emergence of Imitative Behavior in EAA . . . . . . . . . . . . . . 215.2.2 Enhance the EAAs’ Experimental Setup, Studying More Complex Scenarios . 235.2.3 Obtain Testable Predictions Regarding Human/Nonhuman Imitation . . . . . 23

1

1 Introduction

The research proposed here focuses on the complex interplay between evolution and learning byimitation. Although lifetime adaptations are not inherited in a pure Darwinian framework1, theymay still change the individual’s fitness, and consequently the expected number of offsprings thatcarry its genotype. Learning may thus dramatically alter the course of the evolutionary process.When learning is implemented via imitation, the resulting dynamics are further complicated as thesuccess of the learning process depends not only on the interaction between the individual and itsenvironment but also on the state of other members of the evolving population.

We intend to put forward a computational and experimental framework to analyze these complexinteractions, and to examine the interplay between learning in general, and learning by imitationin particular, and evolution. Our key goal is to demonstrate how these learning mechanisms couldhave evolved and prevailed in the first place. To gain a comprehensive understanding ofthese phenomena we propose a two-pronged approach:

(a) Study the evolutionary benefits of learning by imitation.

(b) Study the emergence of imitative behavior mechanisms.

The main difference between these two approaches is the underlying assumption: In the firststudy, the ability and incentive to imitate is assumed to be instinctive, and the focus is on theeffect of this mechanism on evolution. In the second study, the effect of imitation is assumed to bebeneficial, and the emergence of the neuronal mechanisms that support imitative behavior are inthe center of attention.

2 Background

2.1 Evolution and Learning

While searching for the optimal value of a multi-attribute objective function, optimization algo-rithms usually take one of two approaches; a global search scheme or a local one. When facingthe challenge of developing an individual that best fits its environment, nature demonstrates aninteresting combination of both methods. Genetic evolution can be viewed as a global search,operating on the population level, exploring simultaneously various optional genetic configurationswhich may differ considerably from one another. Individual learning, or any other form of phe-notypic adaptation, is in essence a local search mechanism, allowing each individual to adjust itsbehavior to its environment within the constraints determined by its genotype. Although learntlifetime adaptations are not inherited in a pure Darwinian framework, they may still change theindividual’s fitness, and consequently the expected number of offsprings that carry its genotype.Learning may thus dramatically alter the dynamics of the evolutionary process. Clearly, learninghas an advantageous effect in a non-stationary environment, allowing individuals to adapt to rapidchanges that cannot be tracked by the slow evolutionary process [43, 53, 60]. However, it hasbeen suggested that learning may also be beneficial in static (or slowly changing) environments[6, 29, 52] where adaptation can be viewed as maximizing a given fitness function. In particular,

1Though acquired traits may be genetically assimilated through the Baldwin effect [6].

2

initially acquired traits may be genetically assimilated through the Baldwin effect [6, 44], as wasempirically demonstrated in Drosophila by Waddington [61].

In recent years, a number of researchers have studied the complex interaction between learningand evolution, employing a variety of methodologies. In the seminal work of Hinton and Nowlan[29], a simple computational model was introduced to demonstrate how learning can guide andaccelerate evolution. Despite its obvious limitations [52], Hinton and Nowlan’s model successfullydemonstrated a distilled model of the Baldwin effect, bringing the interaction between learning andevolution back to the forefront of scientific research. A large body of work that followed Hinton andNowlan’s study [8, 18, 22, 25, 42, 43, 47, 49, 51, 52] further explored the beneficial effect of learningon evolution. These studies, focusing mainly on simulation studies and qualitative explanations,demonstrated the beneficial effect of learning in a wide range of stationary and non-stationaryenvironments. It has become the accepted wisdom that in stationary environments, learning accel-erates evolution through the Baldwin effect [46], setting up favorable selection preferences for thoseindividuals whose genotypic configurations are in the vicinity of the global solution (as in Hintonand Nowlan’s work).

Unfortunately, even with this ever-growing body of evidence for the advantageous effect oflearning on evolution, rigorous theoretical analysis of this interaction is still scarce. Fontanariand Meir [21] performed a quantitative analysis of an asexual version of the Hinton and Nowlanmodel, corroborating the claims made by Hinton and Nowlan. However, studying a more generalselection scenario, considering a one-dimensional Gaussian fitness function, Anderson [4] found thatwhile learning does have an obvious beneficial effect in changing environments, the advantage oflearning in a fixed environment is transient. Dopazo et al. [16] analyzed a population of adaptiveperceptrons, where some of the synapses can be modified through learning. They too found thatlearning has an adverse effect on the evolutionary dynamics.

There is a clear inconsistency between the findings of these more recent analyticalstudies and the beneficial effect of learning demonstrated in empirical simulations.We believe that the source of this discrepancy lies in the structure of the fitnesslandscapes analyzed in these studies. These relatively simple landscapes lack one ofthe most noticeable characteristics of real-life fitness landscapes which are built uponan epistatic genotype space - their ruggedness [66]. Clearly, it is the ruggedness of the fitnesslandscape and the existence of multiple local optima that significantly slows down the convergencerate of the evolutionary process, making rugged landscapes the subject of recent numerical [33]and experimental [14, 41] studies. Such rugged landscapes introduce greater complexity and posea bigger challenge to the mathematical analysis of evolutionary dynamics in general, and the effectof learning on evolution in particular. A preliminary work, examining analytically the effects of

learning on evolution in rugged landscape is further discussed in Section 4.1.

2.2 Evolution and Imitation

Studying the effect of learning on evolution, the simulation studies cited in Section 2.1 employedvarious sources of training data such as external oracles, regularities in the environment or ”self-generated” teaching data. There is, however, an additional source of training data; one which isnaturally available within the evolutionary paradigm - the knowledge possessed by other membersof the population. This knowledge can be harnessed to improve the evolutionary process in the

3

form of learning by imitation. The variety of definitions of imitative behavior ranges from strongdefinitions, which require an aspect of novelty in the imitated behavior, to very weak definitions,which have no clear boundaries to forms of non-imitative social learning [9]. In our study, focusingon imitation within evolving populations, we will refer to any scenario where the behavior of oneagent in the population is used to train another agent, as imitation.

Learning by imitation has already been applied by researchers in the fields of artificial intelli-gence and robotics in various experiments. Hayes and Demiris [26] presented a model of imitativelearning to develop a robot controller. Billard and Dautenhahn [9] studied the benefits of socialinteractions and imitative behavior for grounding and use of communication in autonomous roboticagents. Imitation is also the force that drives cultural evolution - the evolution of ideas, thoughts,knowledge and beliefs. Cultures never stand still - they evolve, showing many similarities to bi-ological evolutionary processes. Various frameworks that study the interaction between culturaltransmission and evolution have already been well established [e.g 13, 15, 39]. Gene-culture coevo-lution accounts for many adaptive traits [17]. Studies and simulations of the evolution of language[1, 5, 36] assume, by definition, some sort of cultural transmission.

Our goal is to merge these two approaches and to put forth a novel framework forstudying learning by imitation within the scope of the interaction between learning andevolution. In particular, we wish to examine the most basic question: can imitation enhancethe evolution of autonomous agents in an analogous manner to the results previouslyshown for supervised learning, and how? The motivation for using learning by imitation toenhance evolution is twofold. First, imitation is one of the most common methods for learning innature and guides the behavior of a range of species [55]. Living organisms (not to say humans)often imitate one another [34, 48, 65]. Second, while oracles or other forms of supervised trainingdata are scarce in agent environments, learning by imitation is still a valid option, using othermembers of the population as teachers.

2.3 Learning by Imitation: A Neuroscience Perspective

Imitation is an effective and robust way to learn new traits by utilizing the knowledge alreadypossessed by others. Human beings are by far the the most imitative creatures, however, evidencesfor imitative behavior in other species are continuously accumulated [28, 34, 64, 65]. The pasttwenty years have seen a renewed interest in imitation in various fields of research [55] such asdevelopmental psychology [48], experimental studies of adult social cognition [7], and most impor-tant, neurophysiology and neuropsychology [23]. Research in this last field had led to the excitingdiscovery of mirror neurons. These neurons, found in area F5 in monkeys, discharge both when themonkey performs an action and when it observes another individual making a similar action [56].An analogous mechanism in humans, whereby motor centers of adults resonate during movementobservations was also demonstrated (using TMS, MEG and EEG). These neuronal devices, demon-strating an internal correlation between the representations of perceptual and motor functionalities,may form one of the underlying mechanisms of imitative ability. Our working hypothesis isthat the neuronal structures and processes involved in imitative behavior can bestbe explained by the evolutionary process that produced them. We wish to demon-strate the emergence of similar neuronal mechanisms in a simple model of evolutionaryautonomous agents.

4

2.4 Evolutionary Autonomous Agents - EAA

Recent years have witnessed a growing interest in the study of neurally-driven evolved autonomousagents (EAAs). These studies, part of the field of Evolutionary Computation and Artificial Life (see[2, 20, 40, 50] for general introductory textbooks), involve agents that live in an environment andautonomously perform typical animat tasks like gathering food, navigating, evading predators, andseeking prey and mating partners. Each agent is controlled by an Artificial Neural Network (ANN)“brain”. This network receives and processes sensory inputs from the surrounding environment andgoverns the agent’s behavior via the activation of the motors controlling its actions. The agents canbe either software programs living in a simulated virtual environment, or hardware robotic devices.Their controlling networks are developed via Genetic Algorithms (GAs) that apply some of theessential ingredients of inheritance and selection to a population of agents that undergo evolution.

Genotype

Phenotype ("brain")

Behaviour

Survival capability

Next generation genomes

Reproduction probability

Interaction with the environment

Selection

Variation (genetic operators)

Network dynamics

Developmental plan

Figure 1: The Paradigm of Evolutionary Autonomous Agents

A typical EAA experiment consists of a population of agents that are evolved using a geneticalgorithm over many generations to best survive in a given environment (see Figure 1). In gen-eral, agents may have different kinds of controllers and encode also sensors and motors in the theirgenome, but we focus on agents with a genome that solely encodes their controlling neural network.At the beginning of each generation, a new population of agents is generated by selecting the fittestagents of the previous generation and letting them mate – i.e., form new agent genomes via geneticrecombination followed by mutations that introduce additional variation in the population. Thegenomes formed in this process are “transcribed” to form new agents that are placed in the envi-ronment for a given amount of time, after which each agent receives a fitness score that designateshow well it performed the evolutionary task. This ends a generation cycle, and a new generation isinitiated. Typically, this evolutionary “search” process is repeated for many generations until theagents’ fitness reaches a plateau and further evolutionary adaptation does not occur. The result is

a final population of best-fitted agents, whose emergent behavior and underlying neural dynamics

can now be thoroughly studied in “ideal conditions”: One has full control on manipulating the

5

environment and other experimental conditions. More important, one has complete knowledge ofthe agents behavior on one hand, and the controlling network’s architecture and dynamics, on theother (see [3] for a concrete example).

EAA studies typically evolve neurally-driven computer simulated animats or robots that solvea variety of cognitive and behavioral tasks. As such, they form an intuitively appealing approachfor modelling and studying biological nervous systems [58].

3 Research Objectives and Expected Significance

3.1 General motivation

The hypothesis that forms the basis of the proposed research is that the interplay between evolutionand lifetime adaptation, both alters the dynamics of the evolutionary process and has a significantinfluence on the evolving adaptive mechanisms [54]. We thus believe, that without a comprehensiveunderstanding of these interactions, any theory of either evolutionary models or learning/imitationmechanisms that studies them in isolation, is bound to be lacking. A framework that incorporatesthe full variety of adaptive mechanisms found in nature can shed new light on the ways thesemechanisms evolved and operate.

3.2 Specific Aims

As stated in Section 1, to develop a comprehensive theory of the interaction between evolutionand imitation, a two-pronged research program is proposed. First, focusing on the population leveland explicitly introducing imitation as an existing trait, we wish to examine the effect of imitationon the dynamics of the evolutionary process. In this study, we regard learning by imitation as a“black box”, ignoring the perceptual and neuronal mechanisms it may require. Second, focusingon the individual level, we wish to study the emergence of the underlying neuronal mechanismsthat support imitative behavior. We intend to construct an EAA based experimental setup, wheresuch adaptive mechanisms may emerge (without being explicitly introduced), and to analyze theevolving mechanisms from a neuroscience standpoint.

3.2.1 Study the Effects of Learning by Imitation on the Evolutionary Process

We first intend to construct a simple mathematical model for studying the interaction betweenevolution and learning in general. The model must meet the following criteria:

• Simple enough to allow a rigorous analysis of the resulting effects.

• Flexible enough to model a wide range of environments, tasks, and learning schemes.

• Powerful enough to incorporate the notion of imitative learning.

Our goal is to formulate the process of learning by imitation (and learning in general) as atransformation (a functional) that operates on the fitness landscape and may thus modify theselection pressures. Using a model of this kind, we can study which forms of learning/imitation arebeneficial in an evolutionary process, and under which conditions.

6

Once such a model is established we can set out to examine the evolutionary advantage oflearning by imitation. A large body of work that studied the interaction between individual learningand evolution (Section 2.1) has shown how lifetime learning can enhance and guide the evolutionarysearch. We intend to explore the effect of learning by imitation in a similar method. Applying ananalogous framework to the one used in these studies, we focus on the case where imitation takesplace only within members of the same generation and does not percolate across generations viavertical cultural transmission.

We plan to tackle this issue employing two complementary approaches:

(a) A mathematical analysis of this specific type of interplay, using the model defined above.

(b) A simulation study, using Evolutionary Autonomous Agents (EAA) [58], where evolving indi-viduals apply learning by imitation, utilizing other (successful) members of the population asteachers.

3.2.2 Study the Emergence of Imitative Behavior Mechanisms in EAAs

We wish to explore the hypothesis that low-level imitative behavior can (and will) emerge sponta-neously from a simple model of learning. This is not to say that imitation in human and primatesdoes not involve a dedicated mechanism, but rather to suggest a hypothesis as to how these com-plex mechanisms initially evolved. While in the previous goal we assume that imitationcapabilities already exist and focused on the effects of these capabilities on the evo-lutionary dynamics, here we wish to examine the emergence of imitative behavior,focusing on the evolving neural mechanisms. This includes:

• Defining an EAA based experimental setup where imitative behavior patterns may emerge.Clearly, this setup should not explicitly introduce any form of imitation.

• Implementing a simulation environment that embodies this setup, attempting to evolve EAAthat demonstrate a true imitative learning.

• Examining the set of sufficient conditions under which imitative behaviors emerge.

Once such imitating agents evolve, we wish to systematically study and analyze the resultingagents’ neurocontrollers (i.e. neural networks) and to compare them to biological systems. We hopeto discover that the agents that evolved in our environment embody mechanisms analogous to thoserecently found in neuroscience research (e.g. mirror neurons). We intend to use existing method-ologies [35] and to develop new analytical tools for analyzing such adaptive (learning) artificialneural networks. We believe that the analysis of these networks can lead to a signifi-cant improvement in understanding neuronal mechanisms involved in the perceptionof observed actions and imitation.

7

3.3 Significance

We expect the proposed research program to lead to significant advances in all areas mentioned,resulting with a better, integrated theory of both the interaction between imitation and evolution,and the origin of imitative behavior.

Although the foundations of the research of the interaction between evolution and learning datesback to the late 19th century [6, 44], the issues we plan to address are presently at the forefrontof theoretical and computational research. The Evolutionary Computation and Adaptive Behaviorcommunities continuously study the dynamics of such interactions in a wide range of environments,tasks and applications. However, the effects of imitation, which at least in humans forms one ofthe most common methods of learning, is still far from being fully understood. In an analogousmanner to the studies of the interaction between conventional learning and evolution, we intend toaddress two questions:

(i) Does imitative behavior enhance the dynamics of the evolutionary process?

(ii) What are the environmental/evolutionary/adaptive factors that determine thenature of such interplay?

Furthermore, the study of imitation has been brought back to the center of attention in recentyears [55]. High-level imitation is unique to humans, but can be seen in low-level forms in otherspecies. These observations give rise to a fundamental question, which is also in debate amonglinguistic researchers: “Is there a qualitatively unique mechanism (device) in the human brain thatcan account for this functionality, and if so, how could such a mechanism have evolved?”

According to Prinz and Meltzoff [55], there are two basic issues that need to be address by anytheory of imitation:

(i) How are actions perceived?

(ii) How can similarity be effective between perception and action?

We believe that our research, focusing on the emergence of imitation in EAA can demonstratethe common underlying principles that give rise to imitative behavior. Studying the perceptionand performance of actions in imitating agents we hope to shed new light on imitation in humansand primates and ultimately improve our knowledge regarding the issues raised by Prinz andMeltzoff. Due to the inherent limitations of neuroscience research, a fully detailed theory of theneuronal mechanisms involved in the perception and performance of actions, is still out of our reach.Producing EAA that embody a simple and fully accessible version of such mechanisms, may forma simple, yet biologically plausible model for imitation in natural neural networks. Forinstance, the emergence of mirror-neurons in artificial neural-networks, and a rigorous analysis oftheir activity and interaction with the rest of the network, can provide new insights regarding theevolution and function of mirror-neurons in the brain.

8

4 Preliminary Work

This section presents a summary of the preliminary results of our research, based on detailed and

extensive studies [11, 12].

4.1 Learning decreases evolutionary bottlenecks and accelerates evolution

As a first stage in the study of the interaction between evolution and learning by imitation, we putforward a novel framework to analytically study the effects of learning on evolution.Using random walk theory to derive a rigorous, quantitative analysis, we show that the conver-gence rate of the evolutionary process is dominated by the largest descents (drawdowns) in thefitness landscape, forming exponential time “bottlenecks”, and that learning accelerates evolutionby reducing the extent of such drawdowns. Furthermore, considering an ideal model of learningand representing the innate fitness function as a superposition of Fourier basis components, it isshown that learning eliminates the highest frequency component, and hence, in fitness landscapeswhere this component is significant, learning markedly accelerates the evolutionary process.

4.1.1 Introduction

Following the seminal work of Hinton and Nowlan [29], the past fifteen years have seen manyinvestigations of the effects that learning may have on the evolutionary process [8, 22, 25, 29,42, 47, 49, 51, 52] (see also Section 2.1). However, the superiority of hybrid processes combiningglobal evolutionary search with learning versus pure evolutionary search has not yet been clearlydemonstrated [38]. Moreover, rigorous theoretical analysis of the effect of learning on evolutionhas been scarce, focusing on various simplified unimodal scenarios [4, 16], and failing to providea general framework for studying such effects in a more biologically plausible scenario of ruggedlandscapes [14, 33, 41, 66]. This study is the first to rigorously analyze the influence of learning onevolution in given arbitrary one-dimensional landscapes, establishing the fundamental advantageouseffect of learning on the convergence rate of the evolutionary process and identifying its sources.

4.1.2 Model and Analysis

We examine the dynamics of the evolutionary process in two modes: In the first, nonadaptive

mode, learning is absent and the fitness value F (x) assigned to each genotypic configuration x isdetermined according to the innate survival and reproduction probability of the phenotype thatit encodes. In the second, adaptive mode, phenotypes employ learning during their lifetime andas a result may gain a higher effective fitness value Fefc(x). Learning hence manifests itself as atransformation of the fitness landscape, replacing the innate fitness that initially governed selectionwith an effective fitness landscape. The strength of this observation lies in the fact that the complexdynamics of a hybrid process combining evolution and learning can be studied by examining thesimpler dynamics of a pure evolutionary process on the appropriate effective fitness landscape. Inparticular, the effect of learning on the evolutionary convergence rate can be measured by comparingthe time it takes the evolutionary process to obtain near-optimal solutions using the effective vs.the innate fitness functions for selection.

9

Analyzing the dynamics of an evolutionary search is a difficult challenge that has attractedconsiderable attention in recent years [32]. Most efforts focused on studying geometric properties offitness landscapes, including multimodality[24], autocorrelation[62], and neutrality [31], in attemptto predict the difficulty of the search task [59]. These measures however, do not provide a directestimate of the expected convergence time to the global optimum for a given landscape. To obtaina rigorous mathematical analysis of the dynamics of the evolutionary process with and withoutlearning, we employ a canonical, one-dimensional model, representing evolution as a simple randomwalk (RW) process in a changing environment, where the probabilities pi (taking a +1 step) andqi = 1− pi (taking a −1 step) for each location i are determined according to the fitness landscapeat the immediate neighborhood of i. Specifically, assuming that the genetic configuration in thefirst generation is 0 and letting N denote the location of the global optimum, the expected first-passage time from 0 to N , EN

0 , measures the convergence time of the evolutionary process and canbe explicitly derived for any given landscape.

Consider a simple RW St (±1 increments) in changing environment on {0, 1, 2, . . . , N}. Letpi = P (St+1 = i + 1|St = i) and let qi = 1 − pi = P (St+1 = i − 1|St = i). Let p0 = 1 and assumethat 0 < pi < 1 for all 0 < i < N . As demonstrated in Appendix A, the expected first-passage timefrom 0 to N on a given landscape is

EN0 =

K+1∑

i=1

n2i + 2

∑i≤j

0<i,j<K+1

ninj+1

j∏

k=i

ρxk

where ρi denotes the odds-ratio qi

pi, x1 < x2 < . . . < xK denote the indices for which ρxi 6= 1

(pxi 6= 12) (define also x0 = 0 and xK+1 = N), and ni = xi − xi−1 denote the corresponding

increments. This expression can also be represented as the quadratic form EN0 = V ′AV , where

V = (n1n2 . . . nK+1)′ and

A =

1 ρx1 ρx1ρx2 ρx1ρx2ρx3 · · · ∏Kk=1 ρxk

ρx1 1 ρx2 ρx2ρx3 · · · ∏Kk=2 ρxk

ρx1ρx2 ρx2 1 ρx3 · · · ∏Kk=3 ρxk

.... . .

...∏Kk=1 ρxk

∏Kk=2 ρxk

∏Kk=3 ρxk

· · · ρxK 1

.

Furthermore, defining the “drawdown” R as the largest descent throughout the fitness landscape,that is, R = max

i≤j0≤i,j<K+1

∏jk=i ρxk

, the expected convergence time of the evolutionary process, EN0 , is

sharply bounded from above by N2(1+R2 ) (see Appendix A for a full analysis).

The expressions for EN0 and R provide an accurate measure of the expected convergence time

for any given one-dimensional landscape, allowing a direct quantitative comparison between theconvergence times on the innate and the effective fitness landscapes. We further examine a simplemodel of learning, for which the resulting effective fitness landscape can be explicitly constructedand studied. Learning, as a form of phenotypic plasticity, is usually regarded as an iterativeprocess of phenotypic modifications aimed at increasing the individual’s effective fitness. As inprevious studies [4, 29], we focus on the simple case where the phenotypic and genotypic spacesare similar and where learning and evolution both operate on the same fixed fitness landscape,allowing learning to take the form of an iterative local search process in the genotype/phenotype

10

x x+dx

F(x)

F(x+dx)

Genetic configuration

Fitn

ess

Innate fitnessEffective fitness

(a)

A B C D E FGenetic configuration

Fitn

ess

x1

x2

Innate fitnessEffective fitness after partial learningEffective fitness after IDLL

(b)

Figure 2: The effect of learning on the fitness landscape. (a) An individual with genotype configu-ration x and innate fitness value F (x) may adapt by learning (illustrated here as a simple gradientascent process) and gain a fitness value of F (x + ∆x). As the genotype of this individual remainsunchanged, the effective fitness value Fefc(x) = F (x + ∆x) is applied to x. (b) The innate fitnessfunction (solid line) and the effective functions obtained with partial learning, i.e., after a limitednumber of hill-climbing iterations (dotted line) and with IDLL (dashed line). As demonstrated, inIDLL all configurations in the basin of attraction of a given local optimum (e.g. genotypes x1 andx2 in the interval [B,D]) acquire the same effective fitness value, that of the local optimum (C).

space. During each learning episode (iteration), an individual compares the innate fitness valueof its current configuration with those of slightly modified configurations, and adopts a modifiedconfiguration if its innate fitness value is higher (Figure 2a). Figuratively speaking, one can thinkof the configuration x as denoting a vital behavioral strategy, and of F (x) as denoting its expectedbenefit. During the first learning episode, an individual with innate configuration x0 examines theoutcomes of applying a slightly different behavioral strategy (x0 + ε or x0 − ε) and adopts thisnew behavior (e.g., configuration x0 + ε) if it turns out to be more successful (F (x0 + ε) > F (x0)).Such learning iterations may repeat, allowing the individual to adopt behaviors further away fromits innate one. We consider an Ideal Deterministic Local Learning (IDLL) model, where ε = 1and where each individual repeatedly employs such deterministic hill-climbing learning iterationsuntil it converges to the nearest local-optimum. As demonstrated in Figure 2b, in IDLL all geneticconfigurations in the region forming the basin of attraction of a given local optimum will eventuallyacquire the same effective fitness value, equal to the innate fitness of the local optimum, totallysuppressing selection pressures within each such region. Consequently, after IDLL, the resultingeffective fitness landscape can be partitioned into piecewise flat regions induced by the local optimaof the innate fitness function, making the regional boundaries the sole points in the landscapewhere genuine selection pressures are still in effect. IDLL thus transforms each given consecutivepair of descending and ascending intervals in the innate fitness landscape into a single point stepin the effective fitness landscape, whose height is equal to the difference between the extents of thisdescent and the consequent ascent (Figure 2b). Hence, the drawdown characterizing the effectivefitness landscape is, by definition, smaller (or equal, in the worst case) than that induced by theoriginal, innate fitness landscape, making the beneficial effect of IDLL evident.

11

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

a1

a 2

10

20

30

40

50

60

70

(a)

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

a1

a 2

10

15

20

25

30

35

(b)

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

a1

a 2

0

5

10

15

20

25

30

35

(c)

Figure 3: The effects of IDLL as a function of the innate fitness landscape structure. The expectedfirst-passage time EN

0 has been calculated for a series of innate fitness functions with varyingruggedness and the corresponding effective fitness functions obtained with IDLL. Each innate func-tion is defined over the interval [1, 200] as a superposition of two Fourier components with varyingcoefficients and a fixed monotonically increasing function, F (x) = sin( πx

400)+a1 sin(πx80 )+a2 sin(πx

16 ).The results are drawn on a logarithmic color scale. (a) The expected first-passage time EN

0 obtainedfor the innate fitness functions described above. a1 and a2 denote the coefficients of the low andhigh frequency components respectively. (b) The expected first-passage time EN

0 obtained for thecorresponding effective fitness functions. (c) The improvement gained by IDLL measured as theconvergence time ratio between the values obtained in the innate and the effective fitness functions.

Examining this benefit of IDLL as a function of the innate landscape structure further illumi-nates its effects on the evolutionary convergence rate. A common approach to studying the structureof a given fitness function is to decompose it into a superposition of some functional basis, such asthe Fourier series [30, 32, 57, 63]. Following this approach and using the analysis presented above,the expected convergence time EN

0 and the drawdown R can be derived for innate fitness landscapeswith varying degree of ruggedness, defined as a superposition of two Fourier components (high andlow frequencies) with varying coefficients. As demonstrated in Figure 3a, the expected convergencetimes derived in the innate fitness landscapes are clearly correlated with the coefficients of bothfrequencies, i.e., as either of these coefficients increases, so does the evolutionary convergence time.However, examining these measures in the corresponding effective fitness landscapes obtained withIDLL, the correlation with the high frequency coefficient vanishes (Figure 3b), because IDLL hassuccessfully eliminated the ruggedness induced by the high frequency component, cancelling out itsinhibitory effect. Apparently, being a local search operator, IDLL can remove only the significantcomponent with the highest frequency in the Fourier decomposition of the fitness function, thusforming a low-pass filter mechanism, capable of neutralizing local perturbations in the innate land-scape. Consequently, the improvement gained by learning, measured as the ratio between EN

0 in theinnate landscape and EN

0 in the corresponding effective landscape, increases with the coefficient ofthe high frequency component, markedly accelerating the evolutionary process in landscapes wherethis component is significant (Figure 3c). Only when the high frequency coefficient is relativelysmall and the convergence time of the evolutionary process is dominated by the low frequencycomponent, IDLL may also attenuate the low frequency ruggedness and the drawdown it induces(Figure 3c bottom-right triangular area). It is worthwhile mentioning though, that other, lesslimited learning strategies, may successfully eliminate additional frequencies, further acceleratingthe evolutionary process. A good candidate for such a superior learning strategy is social learning,

12

where learners can evaluate a wide range of behaviors, including such that differ considerably fromtheir current behavior, by examining the success (fitness) of other members of the population.

4.1.3 Results

To validate the results obtained in Section 4.1.2, and to further explore various more complicatedscenarios, a complementary numerical simulation study was performed. We first tested the simplehill-climbing model described previously. A one-dimensional innate fitness function was defined onthe interval [1, 200] as a sum of several Gaussian functions, yielding a continuous, rugged functionF (x) with several optima (Figure 4a). IDLL was then applied to produce the corresponding effectivefitness function. The evolutionary process was simulated as a simple random walk, as described inSection 4.1.2. The individual’s genetic configuration x in the first generation of each evolutionarytrial was set to 1. Figure 4b demonstrates the average innate fitness value of the evolving individualas a function of generation, allowing us to track the extent of the Baldwin effect, i.e., how well didthe encoded solution approach the optimal one [46]. Evidently, individuals evolving in the adaptivemode converge much faster to the global optimum and gain higher fitness values. The mean firsthitting time of each genetic configuration is illustrated in Figure 4c. The curves clearly agree withthe results of our analysis for both modes.

20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

Genetic configuration (x)

Fitn

ess

valu

e

Innate fitnessEffective fitness after partial learningEffective fitness after IDLL

(a)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

1

2

3

4

5

6

7

8

9

Generation

Mea

n fit

ness

val

ue

Nonadaptive ModeAdaptive ModePartially Adaptive Mode

(b)

20 40 60 80 100 120 140 160 180 200

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Genetic configuration (x)

Mea

n fir

st h

ittin

g tim

e

Nonadaptive ModeAdaptive ModePartially Adaptive Mode

(c)

Figure 4: The effect of the IDLL learning scheme on the evolutionary process in the one-dimensionalcase. (a) The innate and effective fitness functions. (b) The mean innate fitness value as a functionof evolutionary generation. The standard deviation of the main simulations is also illustrated. Eachcurve represents the average result of 100 simulation runs. (c) Mean first hitting time as a functionof the genetic configuration x.

When learning “resources” are limited and individuals employ only a limited number of hill-climbing iterations, a partial adaptive mode is obtained instead of IDLL. In this mode, not allgenetic configurations in the basin of attraction of each local optimum will inevitably gain thesame effective fitness value. Consequently, the effective fitness landscape forms an intermediatestate between the adaptive and nonadaptive modes, including both intervals of constant fitnessand intervals with positive or negative slopes (see Figure 4a). As expected, this mode yields anintermediate convergence time, progressing slower than the adaptive mode, but still faster than thenonadaptive one (Figure 4c). Interestingly, as this form of learning does not entirely suppress theselection pressures in each optimum domain, it allows individuals that did hit the global optimumbasin of attraction to converge closer to the exact global optimum configuration, resulting in overallbetter average innate fitness values than those obtained using IDLL (Figure 4b).

13

The effect of learning on evolution was also examined in a two-dimensional landscape (Figure 5)and in evolving populations (not detailed here) showing that the markedly beneficial effect oflearning found in the simple model also applies to these more complex scenarios.

Both the IDLL and the partial learning schemes examined in this study, embody two basiccharacteristics: locality and accuracy; However, the lack of complete environmental data, sensoryinput noise, imperfect information processing and nondeterministic decision making, all make astochastic learning process more plausible as a model of learning in biology. We examined theeffect of stochastic learning schemes on evolution using a simple variation of our model, wherethe hill-climbing learning algorithm used in the initial simulation is replaced with a simulated an-nealing (SA) optimization process [37]. Our results (not depicted here) show that even under astochastic learning paradigm, adaptive individuals gain significantly higher innate fitness valuesthan those evolving in the nonadaptive mode. Furthermore, when compared with the convergencerate obtained under a deterministic learning scheme, the results clearly demonstrate that stochasti-cally adapting individuals not only converge much faster to the global optimum configuration thannonadaptive individuals, but in fact they outperform those evolving under the IDLL scheme.

As demonstrated, different learning schemes yield different dynamics of the evolutionary processand result with different convergence rates. The number of learning iterations employed during life(which we will refer to as the learning rate) may also influence the convergence rate and the stabilityof the evolutionary process, as was demonstrated by the favorable effects of the partial learningscheme. To further explore and compare the effect of these learning schemes and in particular theeffect of varying learning rates, an additional set of simulations was carried out.

Two measures were examined for each scenario: The overall convergence time, which was takenas the first hitting time of the global optimum (x = 193), and the genetic stability of the evolvingindividuals that was measured as the average genetic deviation from the global optimum config-uration throughout 1000 generations following the first hitting time. As shown in Figure 6a thebest convergence time for deterministic learning is obtained after 10 learning iterations. In fact,the convergence time in this scenario is shorter than the one obtained with IDLL. There is also aclear tradeoff between the convergence time and the genetic stability of the resulting evolutionary

(a) (b)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

1

2

3

4

5

6

7

8

9

10

Generation

Mea

n fit

ness

val

ue

Nonadaptive ModeAdaptive Mode

(c)

Figure 5: The effect of IDLL on the evolutionary process in the two-dimensional case. (a) Theinnate fitness functions F2(x, y). (b) The effective fitness functions F2−efc(x, y) generated by thetwo-dimensional IDLL scheme. The solid lines in (a) and (b) represent a single random walk pathover 50,000 generations in each mode. (c) The average and standard deviation of the innate fitnessvalue as a function of generation. Each curve represents the average result of 100 simulation runs.

14

0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2x 10

4

Glo

bal o

ptim

umm

ean

first

hitt

ing

time

6 ite

r

8 ite

r

10 ite

r

12 ite

r

14 ite

r

16 ite

r

ID

LL

First hitting timeGenetic deviation

4

5

6

7

8

9

10

11

12

Mea

n ge

netic

dev

iatio

nfr

om th

e gl

obal

opt

imum

(a) Deterministic Learning

0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2x 10

4

Glo

bal o

ptim

umm

ean

first

hitt

ing

time

100

iter

200

iter

400

iter

100

0 ite

r

200

0 ite

r

500

0 ite

r

1000

0 ite

r

First hitting timeGenetic deviation

4

5

6

7

8

9

10

11

12

Mea

n ge

netic

dev

iatio

nfr

om th

e gl

obal

opt

imum

(b) Stochastic Learning

Figure 6: The effect of learning on evolution for varying learning rates. (a-b) The average conver-gence rate and genetic stability obtained for various deterministic and stochastic learning schemes.Most simulation runs using less than 6 deterministic learning iterations did not converge to theglobal optimum.

process. Figure 6b illustrates the results for stochastic learning schemes with varying number oflearning iterations. Evidently, a low number of stochastic learning iteration actually results with afaster convergence rates than those obtained in deterministic schemes and still yields a relativelystable genetic solutions. Only when the learning process employs a considerably large numberof iterations it diverges from the original structure of the innate fitness function. In summary,learning may significantly enhance evolution not only by using ‘ideal’ learning (IDLL),but also by employing partial and stochastic learning strategies.

4.1.4 Summary

Our study focuses on the effects of individual learning on the genetic evolutionary process instationary environments. It is only when considering a more realistic fitness landscape,such that includes multiple local optima, that the true benefits of learning emerge.We show that in rugged fitness landscapes, introducing the adaptive mechanism of learning cansignificantly accelerate the convergence rate of evolution. We further find that a stochastic learningscheme, using relatively few learning iterations, results with the fastest convergence rate and stillmaintains stable genetic solutions. An interesting extension of these findings would be to discoverthe structure of the optimal learning scheme, that is, the learning scheme which produces an effectivefitness landscape that induces the fastest convergence rate while maintaining genetic stability. Onenatural method to embark upon this quest is to encode the learning scheme parameters as part of theindividual’s genotype, allowing the evolutionary process to reveal the optimal scheme. Presumably,phenotypic plasticity may have evolved in nature is an analogous way.

We find that applying a large number of stochastic learning iterations, may break the corre-lation between the effective and innate fitness functions, leading to an unstable genetic evolution.Fortunately, nature imposes one fundamental constraint on any learning scheme - a limited lifes-pan. Even a stochastic learning algorithm requires relatively many iterations to discover remoteoptima. It is worthwhile mentioning that there is one learning paradigm that does break away from

15

the strict constraint of limited learning time. In social learning, accumulated acquired knowledgepercolates across generations via vertical cultural transmission, building up from one generation tothe next. A learning scheme based on cultural transmission may thus result in a total suppressionof the genetic selection pressure, holding the genetic evolutionary process on a leash [10].

The cornerstone of our model is the observation that deterministic learning manifests itself as atransformation of the fitness landscape, replacing the innate fitness function that initially governedselection, with the effective fitness function. Explicitly constructing the effective fitness landscapeinduced by each learning scheme, and comparing the expected convergence rate of the evolutionaryprocess on innate and effective fitness landscapes, facilitates the analysis of the effect of learning onevolution. Furthermore, studying the dynamics of the evolutionary process on such effective fitnesslandscapes, we may resort to well-established mathematical theories (e.g. random walk theory) toderive rigorous quantitative results. As was demonstrated in this study, these analytical results canbe further validated and extended upon, using numerical simulations.

Our study focuses on the simple case where the phenotypic and genotypic landscapes are fixedand similar, and where learning is cost-free. Partially correlated landscapes, and learning costsmay clearly alter the dynamics of the interplay between learning and evolution [45]. Obviously,the simplicity of our model, and especially, the fully correlated genotypic and phenotypic spacesit assumes, induce some inherent limitations, a claim that was also made about the Hinton andNowlan’s model. We acknowledge these limitations, however, following the footsteps of Hintonand Nowlan, we believe that the simplicity of the model outweigh its limitations. In particular,such fully correlated landscapes allow us to explicitly construct and examine the resulting effectivefitness function and to derive a rigorous, quantitative analysis of the expected convergence rate.Furthermore, although real-life genotypic and phenotypic landscapes are not likely to be similar,it is fair to assume they are nonetheless at least partially correlated. Quantitative analysis ofcomplex phenomena often requires simple models. Such models provide valuable insights and allowa thorough analytical study, proving to be a powerful and profound research tool. In particular,the model presented here can serve as a theoretical and experimental basis for future studies of theinterplay between learning and evolution.

16

4.2 Enhancing Autonomous Agents Evolution with Learning by Imitation

In this study, we focus on the effects that imitation may have on the genetic evolutionary process,starting with the most basic question: Can imitation enhance the evolution of autonomousagents (in the absence of vertical transmission), in an analogous manner to the resultspreviously shown for supervised learning, and how? We present a set of simulations, wherelifetime learning by imitation was used to adapt individuals that go through an evolutionary process.The results are compared with those of a simple evolutionary process, where no lifetime learning isemployed, and with those of an evolutionary process that employs conventional supervised learning.

4.2.1 Introduction

The numerous studies cited in Section 2.2 fall into two main categories. Studies of the interactionbetween learning and evolution, and studies of various learning by imitation applications in arti-ficial systems. Our goal is to merge these two approaches and to put forward a novelframework for studying learning by imitation within the scope of the interaction be-tween learning and evolution. We wish to explore learning by imitation as an alternative toconventional supervised learning and to apply it as a tool to enhance genetic evolution. We willlabel this framework as imitation enhanced evolution (IEE). The motivation for using learning byimitation to enhance evolution is clear. While oracles or other forms of supervised training dataare scarce in agent environments, learning by imitation is still a valid option, using other membersof the population as teachers. It is important to realize though, that in contradistinction to thestudies cited in Section 2.2, our framework does not employ cultural evolution. Following the foot-steps of the studies of the interaction between learning and evolution cited above, we thus avoidany form of acquired-knowledge transfer between generations either genetically or culturally. Interms of cultural transmission (see [13] for a detailed definition), we allow horizontal transmissionalone (where individuals of the same generation imitate each other) and exclude any form of vertical

transmission (where members of the current generation transmit their knowledge to members of thenext generation). We wish to further generalize this framework and study the effects of learning by

imitation in a more realistic scenario of autonomous agents evolution (see [58] for a general review).

4.2.2 The Model

A haploid population of agents evolve to solve various tasks. Each agent’s neurocontrollers is asimple feed-forward (FF) neural network [27]. The initial weights of the network synapses arecoded directly into the agent’s genome (the network topology is static throughout the process).The initial population is composed of 100 individuals, each assigned randomly selected connectionweights from the interval [-1,1]. The innate fitness of each individual is determined according to itsability to solve the specific task upon birth. Within the pure evolutionary process, the innate fitnesswill determine the reproductive probability of this individual. Each new generation is created byrandomly selecting the best agents from the previous generation according to their innate fitness,and allowing them to reproduce [50].

When conventional supervised learning is applicable (i.e., an explicit oracle can be found) wealso examined the effect of supervised learning on the evolutionary process. Each individual inthe population goes through a lifetime learning phase where the agent employs a back-propagation

17

algorithm [27], using the explicit oracle as a teacher. Its fitness is then reevaluated to determineits acquired fitness (i.e., its fitness level after learning takes place). This fitness value is then usedto select the agents that will produce the next generation.

In the IEE paradigm, agents do not use conventional supervised learning, but rather employlearning by imitation. In every new generation of agents, created by the evolutionary process,each agent in the population selects one of the other members of the population as an imitationmodel (teacher). Teachers are selected according to their innate fitness (i.e., their initial fitnesslevels before learning takes place). The agent employs a back-propagation algorithm, using theteacher’s output for each input pattern as the target output, mimicking a supervised learningmode. As stated above, acquired knowledge does not percolate across generations. Each timea new generation is produced, all lifetime adaptations possessed by the members of the previousgeneration are lost. Newborn agents inherit only the genome of their parents which does not encodethe acquired network adaptations that took place during the parent’s lifetime.

4.2.3 The Tasks

The model described in the previous section was tested on three different tasks. The first two arestandard classification benchmark problems. The third is an agent-related task used in previousstudies of the interaction between learning and evolution.

1. The Parity Problem: The agents evolved to solve the five bit parity problem.

2. The Triangle Classification Problem: A simple two-dimensional geometrical classifica-tion problem was used in this task. The network receives as input a point from the unitsquare and should determine whether it falls within the boundaries of a predefined triangle.

3. Foraging: The task in this simulation is similar to the one described by Nolfi et al. [51].An agent is placed on a two-dimensional grid-world. A number of food objects are randomlydistributed in the environment. As its sensory input the agent receives the angle (relative toits current orientation) and distance to the nearest food object. The agent’s output determinesone of four possible actions: turn 90 degrees left, turn 90 degrees right, move forward one cell,or do nothing (stay). If the agent encounters a food object while navigating the environment,it consumes the food object. The agent’s fitness is the number of food objects that wereconsumed during its lifetime. In this task, unlike the previous ones, there is no explicit oraclewe can use to train the agent. Nolfi et al. [51] used available data to train the agent on thetask of predicting the next sensory input, however, in our model, we can still use the samemechanism of learning by imitation to train the agent on the original evolutionary task, usingthe best individuals in the population as teachers.

4.2.4 Results

We first studied IEE in the two classification tasks, where conventional supervised learning canstill be applied. In these tasks we were able to compare the effects that both lifetime adaptationmechanisms (i.e., learning and imitation) have on the evolutionary process. The results clearlyvalidate that the IEE model consistently yields an improved evolutionary process. The innate fitness

18

of the best individuals in populations generated by applying learning by imitation is significantlyhigher than that produced by a standard evolution.

Figure 7a illustrates the innate performances of the best agent as a function of generation, inpopulations evolved to solve the triangle classification problem. The results of a simple evolutionaryprocess (dashed line) and of an evolutionary process that employs conventional supervised learning(dotted line) are compared with those of an evolutionary process that employs learning by imitation(solid line) and clearly demonstrates how applying either of the learning paradigms yields betterperforming agents than those generated by a simple evolutionary process. When facing the 5-bitparity task (not depicted here), the effect of applying lifetime adaptation is even more surprising.While simulations applying the IEE model still outperform the simple evolutionary process, usingconventional supervised learning actually results with a significant decrease in performances.

(a) (b)

Figure 7: The effect of learning by imitation on evolution. (a) The triangle classification task:the innate fitness of the best individual in the population as a function of generation. (a) Theforaging task: the average innate fitness of the population as a function of generation. The resultsof a simple evolutionary process are compared with those of simulations that employed lifetimeimitation with two distinct adaptation forces.

Evidently, learning by imitation is sufficient (if not superior) to enhance the evolutionary processin the same manner that was previously shown for conventional supervised learning. We thenturned to use IEE to enhance evolution where explicit training data is not available. This isthe case in the foraging task (Figure 7b). The average innate fitness of the population in a simpleevolutionary process is compared with the average innate fitness of populations that applied learningby imitation. Autonomous agents produced by our model demonstrate better performances thanthose generated by the simple evolutionary process; that is, their innate capacity to find food inthe environment is superior. We also examined the effect of employing different adaptation forces.The results illustrated in Figure 7b also demonstrate that a higher adaptation force (i.e., a highernumber of iterations in each imitation phase) further improves the performance of the resultingagents.

To further explore the effects of lifetime imitation on evolution, we examined the improvement infitness during lifetime as a function of generation. Our results demonstrate that in very early stages

19

of the evolutionary process, the best agents in the population already possess enough knowledgeto improve the fitness of agents that imitate them. In fact, the contribution of imitative learningdecreases as the evolutionary process proceeds, probably due to population convergence to highperformance solutions. An additional observation on the interaction between lifetime adaptationand evolution can be obtained from examining the diversity of the population throughout theevolutionary process. We find that throughout most of the evolutionary process, the diversityfound in populations subject to lifetime adaptation by imitation is higher than the diversity ofpopulations undergoing a simple evolutionary process. Allowing members of the population toimprove their fitness through lifetime adaptation before natural selection takes place facilitates thesurvival of suboptimal individuals and helps to maintain a diversified population. This feature canpartly account for the benefit gained by applying lifetime adaptation to agents evolution.

4.2.5 Summary

This study demonstrates how learning by imitation can be applied to an evolutionary process of apopulation of agents, utilizing the knowledge possessed by members of the population. We show thatintroducing the adaptive mechanism of lifetime learning by imitation can significantly enhance theevolutionary processes, resulting in better performing agents. This paradigm is particulary usefulin evolutionary simulations of autonomous agents, when conventional supervised learning is notpossible. Our IEE model proves to be a powerful tool that can successfully enhance evolutionarycomputation simulations in agents.

20

5 Future Research Plan

5.1 The Interplay between Imitation and Evolution – Extending Previous Work

5.1.1 Construct a Framework to Analyze the Effect of Learning on Evolution

Continue the work described in Section 4.1, extending the mathematical model to further analyzethe effect of learning on evolution in more complex scenarios. We wish to relax the inherentassumptions incorporated in the current model (e.g. genotypic and phenotypic landscapes aresimilar, static environment, etc.) and to validate and generalize the results we obtained on avariety of different fitness landscapes.

5.1.2 Examine the Effect of Learning by Imitation on Evolution

The beneficial effects of horizontal transmission (imitation within the same generation) were alreadydemonstrated using a simulation study, in Section 4.2. We intend to extend and apply the modeland analysis tools presented in Section 4.1 to provide a rigorous analysis of the effect of imitation onevolution. Clearly, to examine the effect of imitation our analysis must incorporate a model of theevolving population and the knowledge possessed by it. Additionally, using our model, we wish toexamine the effect of vertical cultural transmission on evolution. This form of social learning, whereacquired knowledge percolates across generations, may in effect be equivalent to individual learningwith a considerably long learning period. The results obtained in Section 4.1.3 suggest that thisscheme may yield an uncorrelated effective fitness landscape and hinder the genetic evolutionaryprocess.

5.2 The Emergence of Imitative Behavior – Work in Progress

5.2.1 Study the Emergence of Imitative Behavior in EAA

We intend to use evolutionary autonomous agents simulations, wherein agents can learn abouttheir environment and adapt their behavior to it, using information from both static environmentalinput and the behavior of other successful agents.

The studies presented in Section 4 assume that the agents’ ability and incentive to learn/imitateis instinctive. Quoting Billard and Dautenhahn [9], “our experiments address learning by imitationinstead of learning to imitate”. We wish to demonstrate that such imitative behavior patterns canspontaneously emerge under certain conditions. Coding various parameters of the imitation algo-rithm, e.g. learning rate, model selection scheme, etc., into the genomes of the evolving individualscan serve as a first step, showing that under favorable conditions, imitation will increase and prevail.However, to examine the neuronal mechanisms that support imitative behavior and to demonstratea true emergence phenomenon, a different approach should be employed. We thus intend to use anevolving population of adaptive agents, where the genotype of each individual encodes a Hebbianlearning rule for each synapse, but not the synaptic weight itself [19]. We let these synapses adapttheir weights online, during life, using the specified learning rules. Such agents can thus rely lesson fixed genetically-inherited invariants and must develop a learning scheme that will allow themto achieve the task.

These adaptive autonomous agents are then placed in an artificial environment that promotesthe emergence of imitation. Obviously, there is no sense in imitating another’s actions if you do

21

(a) (b)

Figure 8: The experimental setup for the study of the emergence of imitative behavior patterns.(a) The adaptive agents’ neurocontroller. (b) Retinal representation of several model actions.

not know the context in which these actions were made. Evolving imitative agents, thus requirestwo types of sensory input: actions and contexts. The agents in our simulation inhabit a worldthat can be in one of several world states in each time step (Figure 8a). These states can representthe presence of certain food items (e.g. banana, ants, nuts, etc.) and form the context in whichactions are observed and preformed. Occasionally, there is also a model (teacher) present in theenvironment, performing a certain action that is appropriate to the current world state (e.g. using astone-hammer to crack the nut). The sensory input of the agent includes an indication of the worldstate and a retinal representation of the model’s action (Figure 8b), however, the visibility of eitherthe world state or the model’s action is stochastic and with certain probabilities, one of them may behidden. The agent output represents the motor action it selects (e.g. pick hammer) and the agent’sfitness increases whenever the selected action is appropriate for the current world state (assumingthe world state is visible). There is however an additional twist in this model: To prevent the agentfrom developing a simple instinctive behavior, the appropriate action assigned to each world stateis randomly selected in every generation. In these settings, evolving agents can demonstratesuccessful behavior only if they develop an imitation strategy where the appropriateaction for each world state is learned from observing the model action. It is importantto note that in order to do so, agents must first develop some sort of innate internal representation,indicating that a certain observed retinal input corresponds to a certain motor action (for example,observation of the model’s action when picking a stone, corresponds to the activation of the agent’smotors when doing the same). An initial version of this simulation environment has already been

implemented and the preliminary results are promising.

Assuming such imitative behavior patterns emerge, we shall examine the resulting agents’neurocontrollers, taking a neuroscience perspective and compare the underlaying mech-anisms with those found in humans and primates. In particular, we wish to explore whetherthe phenomenon of mirror neurons has emerged. We intend to apply various methodologies to an-alyze these networks, using tools that have already been developed in previous studies (e.g. FCA,MSA), and constructing new analytical tools when appropriate.

22

5.2.2 Enhance the EAAs’ Experimental Setup, Studying More Complex Scenarios

To examine a more biologically plausible scenario, and to study the effects of additional environ-mental parameters on the emergence of imitative behavior, the experimental setup described inSection 5.2.1 can be further extended. This includes:

• Enriching the agents’ sensory input, adding information about the fitness of the observedmodel and the success of each observed action.

• Allowing the agent to choose between several teachers with varying success levels.

• Extending the visual sensory input, modelling a wider range of actions. In particular, we areinterested in examining how the following scenarios may affect the resulting internal actionrepresentation.

– Alternative actions: Where there is more than one action appropriate for each worldstate. This scenario can also represent a case where the same action is observed fromseveral viewpoints.

– Correlated actions: Where actions are divided into several categories (or hierarchies),each corresponds to a set of related world states. For example, certain actions may be acombinations of other actions.

– Noisy visual input: where a stochastic “white” noise is added to the observed action.

5.2.3 Obtain Testable Predictions Regarding Human/Nonhuman Imitation

Assuming, the EAA paradigm provides a vehicle to study information representation and process-ing in neural networks, our model can be used to obtain testable predictions about the neuronalmechanisms involved in imitative behavior. Such predictions may include:

(a) The set of sufficient conditions under which imitation emerges.

(b) The emergence of internal actions’ representation and the resulting representation structure(e.g. hierarchy and categorization) as a function of the imitation process. For example, howdoes the set of observed action and contexts determines the internal representation of theseactions.

(c) The interdependencies between imitation, and particulary the performance of observed actions,and the development of internal action representation. For example, how will the representationof actions that can not be performed by the observer (e.g. due to different embodiment) differfrom those of imitated actions.

(d) The correlation between imitative behavior and the success of the observed model and action.For example, how does the model conceived fitness or the success of the observed action influencethe observer’s incentive to imitate.

We wish to collaborate with researchers in neuroscience and psychobiology to test these pre-dictions and to gain new understanding on imitation in humans in animals that can be furtherexamined by our model.

23

References

[1] D.H. Ackley and M.L Littman. Altruism in the evolution of communication. In R.A. Brooksand P. Maes, editors, Artificial Life IV: Proceedings of the International Workshop on theSynthesis and Simulation of Living Systems. MIT Press, 1994.

[2] C. Adami. Introduction To Artificial Life. Springer-Verlag, New York, NY, 1998.

[3] R. Aharonov-Barki, T. Beker, and E. Ruppin. Emergence of memory-driven command neuronsin evolved artificial agents. Neural Computation, (13):691–716, 2001.

[4] R.W. Anderson. Learning and evolution: A quantitative genetics approach. Journal of Theo-retical Biology, (175):89–101, 1995.

[5] M. Arbib. The mirror system, imitation, and the evolution of language. In ChrystopherNehaniv and Kerstin Dautenhahn, editors, Imitation in Animals and Artifacts. The MITPress, 2001.

[6] J.M. Baldwin. A new factor in evolution. American Naturalist, (30):441–451, 1896.

[7] J.A. Bargh. The automaticity of everyday life. In Jr. R.S. Wyer, editor, The automaticity ofeveryday life: Advances in social cognition, pages 1–61. Erlbaum, Mahwah, NJ, 1997.

[8] R.K. Belew. Evolution, learning, and culture: computational metaphors for adaptive algo-rithms. Complex Systems, 4:11–49, 1990.

[9] A. Billard and K. Dautenhahn. Experiments in learning by imitation: grounding and use ofcommunication in robotic agents. Adaptive Behavior, 7(3/4):411–434, 1999.

[10] S. Blackmore. The meme machine. Oxford University Press, Oxford, UK, 1999.

[11] E. Borenstein, I. Meilijson, and E. Ruppin. Learning decreases evolutionary bottlenecks andaccelerates evolution. Preprint, 2003.

[12] E. Borenstein and E. Ruppin. Enhancing autonomous agents evolution with learning by imi-tation. Journal of Artificial Intelligence and the Simulation of Behavior, 1(4), 2003.

[13] R. Boyd and P.J. Richerson. Culture and the evolutionary process. The University of ChicagoPress, Chicago, 1985.

[14] C. L. Burch and L. Chao. Evolution by small steps and rugged landscapes in the rna virus φ6.Genetics, 151:921–927, 1999.

[15] L.L. Cavalli-Sforza and M.W. Feldman. Cultural transmission and evolution: a quantitativeapproach. Princeton University Press, 1981.

[16] H. Dopazo, M.B. Gordon, R. Perazzo, and S. Risau-Gusman. A model for the interaction oflearning and evolution. Bulletin of Mathematical Biology, (63):117–134, 2001.

[17] M.W. Feldman and K.N. Laland. Gene-culture coevolutionary theory. Trends in Ecology andEvolution, 11(11):453–457, 1996.

24

[18] D. Floreano and F. Mondada. Evolution of plastic neurocontrollers for situated agents. InP. Maes, M. Mataric, J.A. Mayer, J. Pollack, and S. Wilson, editors, From Animals to Ani-mates, volume IV, Cambridge, MA., 1996. MIT Press.

[19] D. Floreano and J. Urzelai. Evolutionary robots with on-line self-organization and behavioralfitness. Neural Networks, 13:431–443, 2000.

[20] D. B. Fogel. Evolutionary Computation - Toward a New Philosophy of Machine Intelligence.IEEE Press, Piscataway, NJ, 1995.

[21] J.F. Fontanari and R. Meir. The effect of learning on the evolution of asexual populations.Complex Systems, 4:401–414, 1990.

[22] R.M. French and A. Messinger. Genes, phenes and the baldwin effect: Learning and evolutionin a simulated population. In R.A. Brooks and P. Maes, editors, Artificial Life IV, pages277–282. MIT Press, 1994.

[23] V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti. Action recognition in the premotor cortex.Brain, 119:593–609, 1996.

[24] D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. ddison-Wesley Publishing Company, Reading, MA, 1989.

[25] F. Gruau and D. Whitley. Adding learning to the cellular development of neural networks:Evolution and the baldwin effect. Journal of Evolutionary Computation, 1(3):213–233, 1993.

[26] G. Hayes and J. Demiris. A robot controller using learning by imitation. In Proceedings of the2nd International Symposium on Intelligent Robotic Systems, 1994.

[27] J. Hertz, A. Krogh, and R. Palmer. Introduction to the theory of neural computation. SantaFe Institute, 1991.

[28] C. Heyes and B. Galef. Social learning in animals: The roots of culture. Academic Press, 1996.

[29] G.E. Hinton and S.J. Nowlan. How learning can guide evolution. Complex Systems, 1:495–502,1987.

[30] W. Hordijk and P. Stadler. Amplitude spectra of fitness landscapes. J. Complex Systems, 1:39–66, 1998.

[31] M. Huynen, P. Stadler, and W. Fontana. Smoothness within ruggedness: The role of neutralityin adaptation. In Proc. Natl. Acad. Sci., volume 93, pages 397–401, 1996.

[32] L. Kallel, B. Naudts, and C.R. Reeves. Properties of fitness functions and search landscapes. InL. Kallel, B. Naudts, and A. Rogers, editors, Theoretical Aspects of Evolutionary Computing,pages 175–206. Springer, Berlin, 2001.

[33] S.A. Kauffman and S. Levin. Towards a general theory of adaptive walks on rugged landscapes.Theoretical Biology, 128:11–45, 1987.

25

[34] S. Kawamura. The process of sub-culture propagation among japanese macaques. In C.H.Southwick and Van Nostrand, editors, Primates Social Behaviour, pages 82–90, New York,1963.

[35] A. Keinan, I. Meilijson, and E. Ruppin. Controlled analysis of neurocontrollers with informa-tional lesioning. to appear. Phil. Trans. R. Soc. Lond. A, 2003.

[36] S. Kirby and J. Hurford. Learning, culture and evolution in the origin of linguistic constraints.In P. Husbands and I. Harvey, editors, 4th European Conference on Artificial Life, volume IV,pages 493–502, Cambridge, MA., 1997. MIT Press.

[37] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science,220(4598):671–680, 1983.

[38] Kim W.C. Ku, M.W. Mak, and W.C. Siu. Approaches to combining local and evolutionarysearch for training neural networks: a review and some new results. In Advances in evolutionarycomputing: theory and applications, pages 615–641. Springer-Verlag New York, Inc., 2003.

[39] K.N. Laland. A theoretical investigation of the role of social transmission in evolution. Ethologyand Sociobiology, 13(2):87–113, 1992.

[40] C. Langton. Artificial Life: An Introduction. MIT Press, Boston, MA, 1995.

[41] R.E. Lenski, C. Ofria, T.C. Collier, and C. Adami. Genome complexity, robustness and geneticinteractions in digital organisms. Nature, 400(6745):661–664, 1999.

[42] M. Littman. Simulations combining evolution and learning. In Richard K. Belew and MelanieMitchell, editors, Adaptive Individuals in Evolving Populations: Models and Algorithms, pages465–477. Addison Wesley, Reading, MA, 1996.

[43] M.L. Littman and D.H. Ackley. Adaptation in constant utility non-stationary environments.In R.K. Belew and L.B. Booker, editors, Proc. of the Fourth Int. Conf. on Genetic Algorithms,pages 136–142, San Mateo, CA, 1991. Morgan Kaufmann.

[44] C. Lloyd Morgan. On modification and variation. Science, 4:733–740, 1896.

[45] G. Mayley. Landscapes, learning costs and genetic assimilation. Evolutionary Computation, 4(3):213–234, 1996.

[46] G. Mayley. Guiding or hiding: Explorations into the effects of learning on the rate of evolution.In P. Husbands and I. Harvey, editors, Proceedings of the Fourth European Conference onArtificial Life. Bradford Books/MIT Press, 1997.

[47] J. Maynard-Smith. Natural selection: when learning guides evolution. Nature, (329):761–762,1987.

[48] A. Meltzoff. The human infant as imitative generalist: a 20-year progress report on infantimitation with implications for comparative psychology. In C.M. Hayes and B.G. Galef, editors,Social Learning in Animals; The Roots of Culture, New York Academic Press, 1996.

26

[49] F. Menczer and R.K. Belew. Evolving sensors in environments of controlled complexity. InR.A. Brooks and P. Maes, editors, Artificial Life IV, pages 210–221. MIT Press, 1994.

[50] M. Mitchell. An introduction to genetic algorithms. MIT Press, 1996.

[51] S. Nolfi, J.L. Elman, and D. Parisi. Learning and evolution in neural networks. AdaptiveBehavior, 1(3):495–502, 1994.

[52] S. Nolfi and D. Floreano. Learning and evolution. Autonomous Robots, 7(1):89–113, 1999.

[53] S. Nolfi and D. Parisi. Learning to adapt to changing environment in evolving neural networks.Adaptive Behavior, 1:99–105, 1997.

[54] D. Parisi and S. Noli. The influence of learning on evolution. In R.K. Belew and M. Mitchell,editors, Adaptive individuals in evolving populations: Models and algorithms, pages 419–428.Addison Wesley, Reading, MA, 1996.

[55] W. Prinz and A.N. Meltzoff. An introduction to the imitative mind and brain. In A.N.Meltzoff and W. Prinz, editors, The imitative mind: Development, evolution and brain bases,pages 1–15. Cambridge University Press, Cambridge, MA, 2002.

[56] G. Rizzolatti, L. Fadiga, L. Fogassi, and V. Gallese. From mirror neurons to imitation: Factsand speculations. In A.N. Meltzoff and W. Prinz, editors, The imitative mind: Development,evolution and brain bases, pages 247–266. Cambridge University Press, Cambridge, MA, 2002.

[57] D. Rockmore, P. Kostelec, W. Hordijk, and P. Stadler. Fast fourier transform for fitnesslandscapes. Appl. Comput. Harmonic Anal., 12:57–76, 2002.

[58] E. Ruppin. Evolutionary autonomous agents: A neuroscience perspective. Nature ReviewsNeuroscience, 3(2):132–141, 2002.

[59] P.F. Stadler. Towards a theory of landscapes. In R. Lopez-Pena, R. Capovilla, R. Garcıa-Pelayo, H. Waelbroeck, and F. Zertuche, editors, Complex Systems and Binary Networks,pages 77–163. Springer Verlag, Berlin, New York, 1995.

[60] P.M. Todd and G.F. Miller. Exploring adaptive agency ii: Simulating the evolution of associa-tive learning. In J.A. Meyer and S.W. Wilson, editors, From animals to animats: Proceedingsof the First International Conference on Simulation of Adaptive Behavior, pages 306–315,Cambridge, MA., 1991. MIT Press.

[61] C. Waddington. Genetic assimilation for acquired character. Evolution, 7:118–126, 1953.

[62] E.D. Weinberger. Correlated and uncorrelated fitness landscapes and how to tell the difference.Biological Cybernetics, 63:325–336, 1990.

[63] E.D. Weinberger. Fourier and taylor series on fitness landscapes. Biological Cybernetics, 65:321–330, 1991.

[64] A. Whiten, J. Goodall, W.C. McGrew, T. Nishida, V. Reynolds, Y. Sugiyama, C.E.G. Tutin,R.W. Wrangham, and C. Boesch. Cultures in chimpanzees. Nature, 399:682–685, 1999.

27

[65] A. Whiten and R. Ham. On the nature and evolution of imitation in the animal kingdom:Reappraisal of a century of research. In P.J.B. Slater, J.S Rosenblatt, C. Beer, and M Milinski,editors, Advances in the study of behavior, pages 239–283, San Diego, CA, 1992. AcademicPress.

[66] S. Wright. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. InProceedings of the Sixth International Congress of Genetics 1, pages 356–366, 1932.

28

Evolution, Learning and Imitation in Autonomous Agents · Evolution, Learning and Imitation in...

Documents

Transcript of Evolution, Learning and Imitation in Autonomous Agents · Evolution, Learning and Imitation in...