Responding efficiently to relevant stimuli using an emotion-based agent architecture

8
Responding efficiently to relevant stimuli using an emotion-based agent architecture $ Rodrigo Ventura , Carlos Pinto-Ferreira Institute for Systems and Robotics, Instituto Superior Te´cnico, TULisbon, Lisbon, Portugal article info Available online 8 April 2009 Keywords: Emotions Relevance Autonomous agents abstract It is widely accepted that one important role of emotions consists in providing a mechanism for adequate and efficient response to relevant stimuli. In this paper we propose a methodology for implementing such a mechanism, based on a previously presented emotion-based agent model. This agent model is biologically inspired in the emotion mechanisms found in the brain, following recent neurophysiological research. This model is founded on two principles: (1) stimuli is represented internally by two representations with different degrees of complexity and accuracy, and (2) the matching of these representations is implemented by a distance function. The mechanism considered in this paper amounts to matching the current stimulus the agent is perceiving with its past experience. This paper addresses a twofold strategy for optimizing the efficiency and accuracy of this mechanism. The first one consists in adapting the distance function employed in one of the representations, while the second one has the goal of upgrading that representation with new relevant features. Techniques borrowed from nonmetric multidimensional scaling are used to approach these goals. & 2009 Elsevier B.V. All rights reserved. 1. Introduction When facing the design of intelligent agents, it is inevitable to consider human intelligence, for it provides the only a priori natural model of intelligence. Human intelligence, in a broad sense, involves two major capabilities: appropriate communica- tion, and appropriate decision-making. That emotions play a role in human communication is unquestionable: the expression of affect, the sensitivity to the affective state of others, the difficulty of faking emotions (e.g., actors often induce in themselves the emotions they need to display, in order for those emotional states to be believable). However, human–computer interface is tradi- tionally a cold one, where the computer is completely insensitive to the user’s emotional state (e.g., user frustration). The idea of changing this state of affairs was first proposed by Picard, while coining the word for the affective computing field: ‘‘computing that relates to, arises from, or deliberately influences emotions’’ [1]. The role that emotions play in appropriate decision-making is, however, more controversial. One source of controversy derives from the folk conception opposing sound and cold reasoning about an issue, and being emotional about it. Emotions are often considered a threat to the goodness of cold reasoning. Recent neuroscientific evidence has however undermined that idea: the emotional mechanisms of the brain prove to be essential for appropriate decision making. Research by Da ´ masio and colleagues report that patients with lesions in the prefrontal cortex show a severe impairment, for instance, in feeling emotions after being exposed to emotionally strong pictures. These patients, that nevertheless are able to perform in I.Q. tests within average, display a striking inability to perform simple, daily-life tasks. They report the case of a patient taking disproportionally long periods of time making his mind about scheduling his next encounter with the physician that has been following his case. Further research by Da ´ masio has allowed him to assemble an explanation: the somatic marked hypothesis (SMH). According to this hypothesis, decision-making in normal individuals is assisted by ‘‘the appearance of a somatic signal that marks the ultimate consequences of the response option with a negative or positive somatic state’’ [2, p. 220]. These somatic signals can be either conscious or covert, but they are physically measurable in general. One such common measure is the change in skin conductance, termed skin conductance responses (SCR). Conscious effects of this somatic marking are, for instance, the ‘‘gut feeling’’ when certain response options are considered. Covert effects include appetitive or aversive behaviors towards/away certain response options [2]. This paper presents an agent model originally inspired by Dama ´sio’s SMH. However, it should be stressed that the goal of this research is not the emulation of human emotions, but rather ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2008.09.019 $ This work was partially supported by FCT (ISR/IST plurianual funding) through the POS-Conhecimento Program that includes FEDER funds. Corresponding author. Tel.: +351218418195; fax: +351218418291. E-mail address: [email protected] (R. Ventura). Neurocomputing 72 (2009) 2923–2930

Transcript of Responding efficiently to relevant stimuli using an emotion-based agent architecture

ARTICLE IN PRESS

Neurocomputing 72 (2009) 2923–2930

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

$ Thi

the POS� Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

Responding efficiently to relevant stimuli using an emotion-based agentarchitecture$

Rodrigo Ventura �, Carlos Pinto-Ferreira

Institute for Systems and Robotics, Instituto Superior Tecnico, TULisbon, Lisbon, Portugal

a r t i c l e i n f o

Available online 8 April 2009

Keywords:

Emotions

Relevance

Autonomous agents

12/$ - see front matter & 2009 Elsevier B.V. A

016/j.neucom.2008.09.019

s work was partially supported by FCT (ISR/IST

-Conhecimento Program that includes FEDER

esponding author. Tel.: +351 218418195; fax:

ail address: [email protected] (R. Ventura).

a b s t r a c t

It is widely accepted that one important role of emotions consists in providing a mechanism for

adequate and efficient response to relevant stimuli. In this paper we propose a methodology for

implementing such a mechanism, based on a previously presented emotion-based agent model. This

agent model is biologically inspired in the emotion mechanisms found in the brain, following recent

neurophysiological research. This model is founded on two principles: (1) stimuli is represented

internally by two representations with different degrees of complexity and accuracy, and (2) the

matching of these representations is implemented by a distance function. The mechanism considered in

this paper amounts to matching the current stimulus the agent is perceiving with its past experience.

This paper addresses a twofold strategy for optimizing the efficiency and accuracy of this mechanism.

The first one consists in adapting the distance function employed in one of the representations, while

the second one has the goal of upgrading that representation with new relevant features. Techniques

borrowed from nonmetric multidimensional scaling are used to approach these goals.

& 2009 Elsevier B.V. All rights reserved.

1. Introduction

When facing the design of intelligent agents, it is inevitable toconsider human intelligence, for it provides the only a priori

natural model of intelligence. Human intelligence, in a broadsense, involves two major capabilities: appropriate communica-tion, and appropriate decision-making. That emotions play a rolein human communication is unquestionable: the expression ofaffect, the sensitivity to the affective state of others, the difficultyof faking emotions (e.g., actors often induce in themselves theemotions they need to display, in order for those emotional statesto be believable). However, human–computer interface is tradi-tionally a cold one, where the computer is completely insensitiveto the user’s emotional state (e.g., user frustration). The ideaof changing this state of affairs was first proposed by Picard,while coining the word for the affective computing field:‘‘computing that relates to, arises from, or deliberately influencesemotions’’ [1].

The role that emotions play in appropriate decision-making is,however, more controversial. One source of controversy derivesfrom the folk conception opposing sound and cold reasoningabout an issue, and being emotional about it. Emotions are often

ll rights reserved.

plurianual funding) through

funds.

+351 218418291.

considered a threat to the goodness of cold reasoning. Recentneuroscientific evidence has however undermined that idea: theemotional mechanisms of the brain prove to be essential forappropriate decision making. Research by Damasio and colleaguesreport that patients with lesions in the prefrontal cortex show asevere impairment, for instance, in feeling emotions after beingexposed to emotionally strong pictures. These patients, thatnevertheless are able to perform in I.Q. tests within average,display a striking inability to perform simple, daily-life tasks. Theyreport the case of a patient taking disproportionally long periodsof time making his mind about scheduling his next encounterwith the physician that has been following his case.

Further research by Damasio has allowed him to assemble anexplanation: the somatic marked hypothesis (SMH). According tothis hypothesis, decision-making in normal individuals is assistedby ‘‘the appearance of a somatic signal that marks the ultimateconsequences of the response option with a negative or positivesomatic state’’ [2, p. 220]. These somatic signals can be eitherconscious or covert, but they are physically measurable in general.One such common measure is the change in skin conductance,termed skin conductance responses (SCR). Conscious effects ofthis somatic marking are, for instance, the ‘‘gut feeling’’ whencertain response options are considered. Covert effects includeappetitive or aversive behaviors towards/away certain responseoptions [2].

This paper presents an agent model originally inspired byDamasio’s SMH. However, it should be stressed that the goal ofthis research is not the emulation of human emotions, but rather

ARTICLE IN PRESS

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–29302924

the construction of agents capable of dealing with complex anddynamic environments. The role of the biological inspiration boilsdown to the formulation of the conceptual model of the agent.Once the model is founded on a solid, well-defined, and self-contained formulation, references to emotions can be omitted.

The paper is organized as follows: Section 2 presents theemotion-based architecture, together with the problem state-ment, Section 3 describes the methodology taken to addressthis problem, and Section 4 presents experimental resultsillustrating the methodology. Section 5 closes this paper withsome conclusions.

2. Emotion-based agents

Artificial intelligence first approached the construction ofintelligent systems from the standpoint of high-level cognition,since pure rationality is commonly associated with logicalreasoning. Concepts such as the Newall and Simon’s physicalsymbol system hypothesis [3] and McCarthy’s representation ofcommon sense [4] materialize this perspective, which eventuallyproved limiting whenever a system faces the complexity anduncertainty inherent to the contact with the real-world. Thisdifficulty is, at least in part, due to disembodiment of suchsystems, that can be traced back to a Cartesian mind-and-bodydualism.

Brooks challenged this approach by proposing that intelligencerequires embodiment [5]. Rather than aiming at an accuraterepresentation of the outside world, an agent should use theworld as its (world) own representation. Thus, the idea of reactiveagents emerged. However, this idea fell short of scaling up to morecomplex levels of competence. It has, nevertheless, impelledresearch in areas ranging from robotics to philosophy of mind(embodied cognition). Dreyfus has recently acknowledged thisapproach as an advance towards machine intelligence, but pointsout that it shows limitation since the agents ‘‘respond only tofixed features of the environment’’ [6].

In the architecture that we propose here, we acknowledge theneed for a reactive layer, in the following sense: perceptualfeatures are extracted from stimuli entering the agent, wherecertain feature configurations elicit a quick response. Wedesignate it the perceptual layer. The goal of this layer is toprovide an efficient mechanism to deal with basic aspects of theenvironment (e.g., to ensure survival). Moreover, such a layeralone ascribes a primordial level significance to stimuli, for it onlyresponds in certain situations. However, it shows the samelimitations of reactive systems discussed above.

To tackle these limitations, we propose to add a new layer ofprocessing, on top of the reactive one, that functions in paralleland asynchronously with it. When the agent is exposed to astimulus, it is processed, simultaneously, by these two layers. Theperceptual layer extracts a small set of features (termed perceptual

image1 (a percept)), being capable to respond quickly to certainstimuli configurations (e.g., in situations demanding urgentaction). The novel one—here designated cognitive layer—extractsfrom the stimulus a complex representation (cognitive image), andthus slow to process, and that can be stored in memory, thusallowing the agent to perform complex pattern matching opera-tions when facing similar situations.

The biological plausibility of this hypothesis is backed up bythe functional role of the thalamus in the brain: all sensoryinformation originated by the sense organs is relayed by thisstructure, simultaneously to the cortex and to the amygdala [7].

1 The term image utilized here is in a broad sense.

The cortex, which is responsible for the higher cognitive functions(such as reasoning, planning, and rational decision-making),processes sensory information relayed by the thalamus. Inparallel, the amygdala also processes the same sensory informa-tion, but is able to, first, respond much quickly to certain stimuli,and second, elicit emotional responses through its stronginfluence in the body regulatory mechanisms.

It is worthwhile to make here a few considerations about therole of the amygdala [8]. First, the amygdala resides in aevolutionary older part of the brain, suggesting its role in moreprimitive functions (e.g., the ones related with basic survival).Second, it provides mechanisms for many reactive responses, forinstance when animals face threatening situations, such as thepresence of predators. And third, the amygdala modulates notonly many body regulatory mechanisms, explaining the physio-logical changes that follow the elicitation of emotions (such assweaty hands), but also modulates cognitive processes, namelymemory storage and decision-making (projections to the pre-frontal cortex). According to Damasio, this latter modulationexerts covert biases to the decision-making mechanisms. Sincethe amygdala is sensitive to external stimuli (through thethalamus) as well as mental imagery, decision-making can bestrongly influenced by the amygdala response, in a fashion that isoften out of the subject’s awareness. Examples of this modulationcan be observed in phenomena such as intuitions, ‘‘gut feeling,’’phobias, sexual drives, and so on. The amygdala is thenresponsible for assessing external stimuli, as well as internalimagery. This assessment is performed according to basic aspects,such as threatening, desirability, and repulsiveness.

In order to implement such an assessment capability, a thirdrepresentation was also considered in the proposed architecture:the desirability vector. This vector represents the assessment of aperceptual image according to a set of pre-specified dimensions.These dimensions correspond to basic and relevant aspects ofstimuli. Examples of possible desirability vector dimensions are:danger, pleasantness, repulsiveness, and novelty. The elicitation inone of these dimensions, for instance, following a stimulus, can beinterpreted as an emotion in this framework. Furthermore, sincethe desirability vector assesses stimuli according to the basicneeds and drives of the agent, it can be understood as groundingrelevance and meaning of stimuli to the agent: only stimulieliciting it is considered a priori relevant (although relevance canbe extended to other stimuli), and the vector components that areelicited constitute an indicator for a primordial meaning ofstimuli.

An emotionally strong stimulus, according to the SMH, inducesan association between the mental imagery related with thestimulus and the body state representing the evoked emotion. Inthe framework of the agent architecture presented here, this ideacorresponds to associating in memory the three representationsdescribed so far: the cognitive image, for it contains a detailedrepresentation of the stimulus, the perceptual image, for it can bequickly compared with new stimuli in the future, and thedesirability vector, for it represents the agent assessment ofthe stimulus according to various dimensions of relevancy for theagent. The establishment and storage in memory of theseassociations is designated the marking mechanism.

These associations, once stored in memory, can be recalled (thematching mechanism) in two distinct ways. One is using theperceptual representation (here designated perceptual matching):the simplicity of this representation allows for a quick matchof a perceptual image extracted from the stimulus, and thusthe retrieval of the other two representations associated with it.And the other one (cognitive matching) is using the cognitiverepresentation: on the one hand, using this representationfor matching leads to more accurate results, however, the

ARTICLE IN PRESS

Fig. 1. Diagram of the indexing mechanism: a perceptual image ip extracted from a

stimulus s is subject to perceptual matching, from which a subset of memory pairs

(N-best) is obtained, which are then used in the cognitive match with the cognitive

image ic extracted from the same stimulus s; the best match i�c resulting from this

cognitive matching is the outcome of the indexing mechanism.

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–2930 2925

corresponding complexity makes its use computationally hard.Thus, it has been proposed to use these two representationstogether in the following way (indexing mechanism): first, to usethe perceptual representation to quickly select a set of candidatesfrom memory, and then to restrict the cognitive match to thissubset. In this way, the computationally hard part of the process(computing the cognitive distance) is limited to a smaller subsetof candidate pairs.

2.1. Formal model

Let us designate s a stimulus received by the agent sensors,which domain is the set of all possible stimuli S. From thisstimulus, the agent extracts two images, as outlined in Section 2: acognitive image ic 2 Ic , and a perceptual one ip 2 Ip, where thesets Ic and Ip denote the sets of all possible cognitive andperceptual images. These two representations of the stimulus,together with the desirability vector, here denoted vd 2Vd, canthen be associated and stored in memory.

The remaining part of this paper focuses exclusively on thecognitive and perceptual representations: their properties, theassociated mechanisms proposed above, and the consequencesthat can be derived from the assumptions made concerning thedistinct complexity levels of the representations involved. Themethodology followed in this paper is to isolate and analyze indepth the consequences of assuming a double-representationparadigm as introduced in the previous section. Research coveringthe full model as presented above can be found in previouspublications [9–12].

The association of the cognitive and perceptual representationsof a given stimulus can be associated and stored in the agentmemory, according to some policy (e.g., depending on thesituation relevance, as assessed by the agent). Let us denote theagent memory as a set M of ordered pairs in the form hic ; ipi (asubset of Ic �Ip).

A strong assumption is made at this point: both cognitive andperceptual matching is performed using distance functions. Inother words, the degree of match of two given images (of thesame nature, i.e., either both cognitive or both perceptual) is givenby a nonnegative real-valued distance function. This assumptionis necessary for degrees of match to be comparable, thus allowingus to derive theoretical results about the model. This assumptionis biologically plausible insofar one accepts that we are able toassign degrees of similarity between two images.

To formalize this idea, two distance functions are introducedhere. One for comparing pairs of cognitive images,

dc : Ic �Ic ! Rþ0 , (1)

where Rþ0 denotes the set of nonnegative real numbers, andanother one for pairs of perceptual images,

dp : Ip �Ip ! Rþ0 . (2)

The matching mechanisms can now be formulated using thesefunctions. Suppose that the agent is exposed to a stimulus s, fromwhich the two representations are extracted, say ic and ip. Findingin the memory M the image pair which cognitive image bestmatches the one from the stimulus is designated full cognitive

matching. The pair found can be written as

hi�c ; i�

pi ¼ arg minhiMc ;i

Mp i2M

fdcðic; iMc Þg. (3)

The computational complexity of this operation depends, on theone hand, upon the number of pairs in memory, and on the other,upon the time required to compute the distance function dc. If wedesignate by Jc the time it takes to compute a single cognitive

distance, a full cognitive matching operation takes the time JcjMj(where jMj denotes the cardinality of the set M). Taking intoaccount the assumption that the cognitive representation iscomplex, and thus Jc is high, this matching process is inappropri-ate for real-time situations.

The indexing mechanism aims at reducing this complexity byusing the perceptual representation to find a small set of candi-dates for the cognitive matching (see Fig. 1). This set of candidatesis obtained by determining which pairs are closest to the stimulus,according to the perceptual representation (perceptual matching),

Spðip; TpÞ ¼ fhiMc ; i

Mp i 2M jdpðip; i

Mp ÞoTpg, (4)

where Tp is a threshold value. A cognitive matching can now beperformed, using this reduced set of pairs:

hiþc ; iþ

p i ¼ arg minhiMc ;i

Mp i2Spðip ;TpÞ

fdcðic ; iMc Þg. (5)

The threshold value Tp can be set a priori, or it can set such thatthe number of pairs in Spðip; TpÞ is upper bounded by somespecified value Np (we designate this method by N-best, since itcorresponds to choosing the N best candidates for the cognitivematching):

Tp ¼ arg maxtjSpðip; tÞj restricted to jSpðip; tÞjpN. (6)

Doing so is useful whenever one desires to limit the time availablefor finding a match.

The efficiency of the indexing mechanism can be evaluated bycomparing the temporal complexities of the indexing and of thefull cognitive matching. Considering Jp to be the time necessary tocompute a single perceptual distance, the cost of the indexing isJpjMj þ JcjSpj. The ratio Z of these two costs is then,

Z ¼Jp

Jc

þjSpj

jMj. (7)

A lower value of the Z, means a higher efficiency of themechanism. As expected, a low value of Z depends, on the onehand, that Jp5Jc , meaning that dp should be much faster tocompute than dc , and on the other, that jSpj5jMj, meaning thatthe number of candidates after the perceptual matching should besmall when compared with the total amount of pairs in memory.The assumption that dp is faster to compute than dc follows fromneurophysiological evidence, according to which the stimuliprocessing at the level of the amygdala shows significantly lower

ARTICLE IN PRESS

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–29302926

latency than the one at the cortical level. LeDoux has termed theformer level as the low-road, where the stimuli evaluation is roughbut quick, and the latter as high-road, where it is evaluated withhigher accuracy, but at the price of taking more time [8].

In the ideal case, the resulting pair is the same as the oneobtained by a full cognitive match in (3). In [13] the indexingmechanism was theoretically analyzed. One of the interestingresults of this analysis was a set of theorems guaranteeing that,under certain conditions, the indexing obtains the same results asthe pure cognitive matching, while taking a fraction of the timerequired for the full cognitive matching. For example, if the set ofcandidates Sp is constructed incrementally, once a perceptualimage in that set has a perceptual distance to the stimulus higherthan the best cognitive distance found so far, it is proved that italso corresponds to the best global cognitive match. The citedpaper also illustrates the theoretical results in a simple domain(handwritten digit recognition).

2.2. Problem statement

The efficiency (7) of the indexing mechanism depends on tworatios: on Jp=Jc , which is intrinsic to the implementation of thedistance functions, and on jSpj=jMj, which depends on theamount of candidates chosen for the cognitive matching. Whilethe former ratio is implementation dependent, the latter can bepre-specified, for instance, using the N-best algorithm describedabove. However, the lower the number of candidates, the higherthe risk of missing the best cognitive match. This risk depends onthe accuracy of the perceptual distance to find good candidates.Therefore, the choice of the perceptual metric is crucial for theefficiency of the mechanism.

The research presented here concerns the following problem:how to construct a perceptual representation (and metric) withthe goal of optimizing the indexing efficiency. In other words, theideal perceptual representation and metric are the ones that yieldsmall perceptual distances if and only if the correspondingcognitive distances are also small. In this way, the memory pairswhich cognitive images are closest to the stimulus, are also theones that hold the closest perceptual distances, thus reducing theamount of candidates needed in the indexing mechanism. To doso, two strategies are explored. One corresponds to adapting aperceptual metric, via a set of parameters, such that cognitiveproximity implies perceptual nearness,

dcði1c ; i

2c Þodcði

1c ; i

3c Þ ) dpði

1p ; i

2pÞodpði

1p ; i

3pÞ (8)

for all image pairs hikc ; ikpi (k ¼ 1;2;3) obtainable from stimuli in a

given environment. The second strategy addresses the improve-ment of the perceptual representation, in the following sense.Assuming that the perceptual representation is a vector offeatures extracted from stimuli, when these features are notsufficiently representative to satisfy (8), the goal is to upgrade theperceptual representation with new, more representative features.Both of these strategies are approached here using multidimen-sional scaling (MDS) techniques [14].

2.3. Multidimensional scaling

Multidimensional scaling comprises a group of techniquesfrom the field of statistics, sharing a common goal: given a set of n

objects, together with a measure of dissimilarity for each pair ofthem, to assign point coordinates in a metric space to each one ofthe objects (usually of a reduced dimensionality), so that theirdistances approximate as much as possible the given dissimila-rities [14]. For a pair of objects r and s, the dissimilarity between

them is denoted by drs. It is here assumed that two properties aresatisfied: drr ¼ 0 (identity) and drs ¼ dsr (symmetry).

In the context of the MDS, the terms dissimilarity and distance

have distinct and specific meaning: the dissimilarities fdrsg aregiven beforehand, which may or may not constitute a metric,while the distances result from the given metric space for whichthe points coordinates are sought.

There are two fundamental MDS techniques [14]: the metric

MDS seeks to find coordinates for each object such that drs � drs,while the nonmetric MDS is satisfied once drsodtu ) drspdtu forall pairs of objects. The metric MDS can usually be solved inclosed-form, while the nonmetric one requires numerical optimi-zation methods, such as the gradient descent.

3. Methodology

A good perceptual representation is one which satisfies theimplication (8) for all image pairs encountered by the agent. Notethat this goal is similar to the MDS one, once one considers thecognitive distances to be the dissimilarities (in MDS terminology),and the perceptual ones, to be the distances (in MDS terminology)among objects. However, there are differences. In the case of theMDS, the metric is given while the object coordinates are sought.In the case of the indexing mechanism, the object coordinates(perceptual images) are known, while the goal is to find themetric.

A parametric approach was adopted in order to adapt theperceptual metric. In particular, we chose these parameters toassign a degree of relevance to each feature of the perceptualrepresentation. The perceptual distance between two images x ¼ðx1; . . . ; xqÞ and y ¼ ðy1; . . . ; yqÞ is then,

dpðx; yÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXq

i¼1

y2i ðxi � yiÞ

2

vuut , (9)

where y1; . . . ;yq are the parameters. We propose to perform agradient descent, within the framework of the nonmetric MDS,with respect to the parametrization of the perceptual metric,instead of to the point coordinates. Regarding the construction ofadditional perceptual features, we propose to append eachperceptual image with a pre-specified amount of additionalcomponents. Their values are randomly initialized, and subjectto gradient descent as in the nonmetric MDS. After the optimiza-tion process, these values are changed such that the cost functionis minimized. Thus, they represent the values that the newfeatures ought to take for each one of the perceptual images in thetraining set, in order to maximize indexing efficiency. Concerningthe computation of those added components for new stimuli, theidea we advance is to utilize the obtained values to construct aregression model. That regression model can then be used toobtain the new features values for new stimuli.

Let a perceptual image irp, consisting of the concatenation of q

numerical features xr1; . . . ; xrq, extracted from a given stimulus,with p additional components yr1; . . . ; yrp, be denoted by thevector

irp ¼ ðxr1; . . . ; xrq; yr1; . . . ; yrpÞT . (10)

These additional components yrk correspond to the values that thenew features ought to take for that particular perceptual image.The perceptual metric employed here is parametrized by q

coefficients y1; . . . ; yq, taking the form

drs ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXq

i¼1

y2i ðxri � xsiÞ

2þXp

i¼1

ðyri � ysiÞ2

vuut . (11)

ARTICLE IN PRESS

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–2930 2927

This parametrization corresponds to assigning a weight (rele-vance) to each perceptual feature, before calculating the Euclideanmetric (see Eq. (9)). Moreover, when the algorithm assigns a zeroweight to a feature, that feature can be deleted from theperceptual representation, since it is irrelevant to the cognitivematching. The additional components are not weighted since itwould just add redundant degrees of freedom.

The cost function employed here is the sum of the MDS stress[14] with a regularization term penalizing the absolute values ofthe metric parameters. The MDS stress S assesses the fit of theperceptual distances fdrsg to the ones resulting from the isotonicregression, here denoted as fdrsg

S ¼

ffiffiffiffiffiS�

T�

r S� ¼Pr;sðdrs � drsÞ

2

T� ¼Pr;s

d2rs

. (12)

In these sums, the indices ranges are r ¼ 1; . . . ; ðn� 1Þ ands ¼ ðr þ 1Þ; . . . ;n, where n is the total amount of image pairsavailable. The distances fdrsg satisfy, by construction, two condi-tions: (a) they preserve the cognitive distances ordering, i.e.,dcði

rc ; i

scÞodcði

rc ; i

tcÞ ) drsodrt , and (b) they minimize S� subject to

condition (a). The cost function is then the sum of the MDS stressS with a regularization term, corresponding to the sum of theabsolute values of the coefficients yi:

J ¼ Sþ xXq

i¼1

jyij. (13)

This regularization term, weighted by x, is included in the costfunction for two reasons. First, if the stress is invariant to someperceptual component, say yk, the stress gradient with respect toyk would be zero, and therefore the initial value for yk wouldremain constant. The second reason is due to the quadraticcontribution of the parameters to the stress: in order to prevent aslow asymptotic convergence to zero (and therefore never reach-ing zero exactly), the gradient of their absolute values forces themto approach zero faster.2 In sum, this term contributes to reducethe number of nonzero parameters yi, and therefore to permit thereduction of the dimensionality of the perceptual representation.

In order to express the gradient of the stress, one can considera parameter vector L containing all variables subject to thegradient descent:

L ¼ ½l1 � � � lqþnp�T ¼ ½y1 � � �yq j y11 � � � ynp�

T . (14)

The gradient is then obtained by partial differentiation of the costwith respect to each (nonzero) parameter lk:

@J

@lk¼@S

@lkþ x sgnðlkÞ, (15)

@S

@li¼ S

Xr;s

drs � drs

S��

drs

T�

!@drs

@li. (16)

As before, the summation above is performed for r ¼ 1; . . . ; ðn� 1Þand s ¼ ðr þ 1Þ; . . . ;n.

If lk corresponds to a metric parameter yl, then,

@drs

@yl¼ðxrl � xslÞ

2

drsyl, (17)

otherwise, if it corresponds to a component yui, then,

@drs

@yui

¼yri � ysi

drsðdru� dsu

Þ, (18)

2 Numerically this makes parameters close to zero to oscillate around zero, so,

they are set to zero once they become negative. The implementation further forces

them to stay at zero thereafter.

where dij is the usual Kronecker function (1 if i ¼ j, and 0otherwise).

Taking into account these considerations, we propose thefollowing algorithm, based on the standard nonmetric MDSalgorithm [14].

(1)

3

com

para

num

grow

Start with an initial variables vector L. Here we consider, themetric parameters yk initialized to all ones, and the additionalcomponents fyrig randomly distributed with a uniformdistribution.

(2)

Normalize the metric parameter vector Y ¼ ðy1; . . . ; yqÞT to

unit norm, since the stress is invariant to scaling of this vector.The additional components fyrig are, however, not normal-ized.3

(3)

Compute the distances set fdrsg using the parametrizedperceptual metric (11).

(4)

Perform the isotonic regression to obtain the set of distancesfdrsg.

(5)

Compute the cost; if its value is below a threshold �, stop thealgorithm (stopping criterion).

(6)

Find the gradient of the cost function (13) with respect to thevariables vector L.

(7)

Perform a step of the gradient descent method;. (8) Go to step (2).

4. Results

To validate the proposed methodology, a simple test-bed wasdevised. Random points x 2 Rc (simulating stimuli) were uni-formly drawn from an hypercube with dimension c and unit sidelength. The cognitive images ic 2 R

c were set to the components ofx multiplied by fixed coefficients ½w1; . . . ;wc�, randomly chosenbetween 0 and 2 before each run:

ic ¼ diagðw1; . . . ;wcÞx ¼Wx. (19)

These coefficients introduce different degrees of relevance to thecomponents of ic . The perceptual images were obtained byconcatenating two vectors: the first p components of x multipliedby a second set of fixed coefficients ½v1; . . . ;vc� (for ppc), alsorandomly chosen between 0 and 2; and n random numbers(noise) between 0 and 1. (The upper limit of these intervals isarbitrary). Thus, the perceptual images have pþ n components,

ip ¼½diagðv1; . . . ;vpÞ j0�x

u

� �¼

Vx

u

� �, (20)

where 0 is a matrix of zeros with appropriate dimension, and u isthe noise vector. The random weights in W and V, randomlydrawn from the ½0;2� interval, together with the numbers c, p, andn, define a world, represented by a tuple hc; p;n;W;Vi. Thecognitive distances were calculated using the Euclidean distance,while the perceptual ones employ the metric (11).

In order to evaluate the results, a measure of performancecalled eval-order was introduced, aiming at assessing how well theindexing mechanism would behave, for a particular perceptualmetric. Borrowing the terminology from supervised learning, theset of pairs in memory used to adapt the perceptual metric isdesignated train set, while the ones used to evaluate the systemform the test set. Inspired by the N-best indexing algorithmdescribed in [13], the eval-order is defined in the following way:

Otherwise it would constrain a priori the scaling factor of the additional

ponents in relation with the original features in (11). Normalizing the

meters vector prevents its norm from growing or shrinking because of

erical errors. Moreover, because of (11), the additional components do not

/shrink arbitrarily.

ARTICLE IN PRESS

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13

wei

ght

weight index

Wresult

Fig. 2. Weights obtained by the algorithm for the described test-bed (c ¼ p ¼ 10,

n ¼ 3).

Table 1Obtained eval-order performance values for the described test-bed (c ¼ p ¼ 10,

n ¼ 3).

Metric Mean Min Max

Unweighted 11.4 1 92

Weighted 1.00 1 2

4 This is equivalent to a mean over all image pairs, since all test sets have the

same size.5 Random sampling from the 4950 dissimilarities originated by the 100

patterns of the training set.

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–29302928

given a cognitive and perceptual image pair hic; ipi, all perceptualdistances from it to all images in the training set are computed;then, after ordering them by the perceptual distances to the givenperceptual image, the n-th image pair hik

c ; ikpi on the resulting

ordered list with the minimum cognitive distance to hic; ipi isdetermined. In the ideal case, it corresponds to the first one, andthus to an eval-order of 1. It can be easily verified that the eval-

order corresponds to the minimum amount of candidatesnecessary (in Sp) in order for the best cognitive match be foundwithin Sp. Therefore, the smaller the eval-order, the higher theefficiency (7) of the indexing mechanism.

The features in the perceptual images were all (training andtest sets) normalized to zero mean and unit variance, prior to anyexperiment. Unless otherwise stated, the parametrization Y of theperceptual metric was initialized to all ones. The additionalcomponents, when used, were initialized with a uniformlydistributed random configuration, as in the nonmetric MDSalgorithm.

In the first phase of experimentation, no additional perceptualcomponents were considered, and the cognitive and perceptualdimensions were made equal (c ¼ p). The data set consisted of 100generated training sets with the same world parameters, each onecontaining 100 training patterns (and thus 4950 dissimilaritiesamong them). The world parameters were c ¼ p ¼ 10 (cognitiveand perceptual dimensions) and n ¼ 3 (amount of perceptualcomponents set to noise). For each training set, a test setcontaining 100 patterns was also generated, for posterior eval-order assessment. Fig. 2 shows the obtained results: for eachindex 1; . . . ;10 a pair of vertical bars are displayed, where the darkone corresponds to the weight from matrix W (labeled W), and thelight one to the mean value of the obtained metric parameter(labeled result) together with standard deviation error bars. Bothvectors are normalized to unitary norm in order to be comparable.The obtained values are identical to the weights from W, apartfrom a scaling factor, meaning that the relative importance of thex coordinates in the cognitive metric are correctly reflected in theperceptual metric. The observed extinguishing of the third weightis due to the combined effect of its diminished importance (i.e.,low value in W), and to the penalization of nonzero weights in(13). The indices 11; . . . ;13 correspond to the noise components,for which the obtained weights are all zero, thus showing asuccessful capability of identifying irrelevant features. Thisfurther means that these components can be safely deleted from

the perceptual representation, without compromising indexingefficiency.

Concerning the eval-order assessment, the results are pre-sented in Table 1. Recall that the eval-order reflects the indexingefficiency (lower is better). In this table, the eval-order using theadapted perceptual metric is compared with the unadapted one.These values were obtained in the following way: for each run, atraining set and a test set were randomly generated, as explainedabove; then, the train set was used to obtain the metricparameters, which were evaluated in the test set, calculating themean, minimum, and maximum values of the obtained eval-orders for all images in the test set. The results shown herecorrespond to the mean of these means4 (central tendency), theminimum of all minima, and the maximum of all maxima (worstcase of eval-order). The mean and maximum values of eval-orderare close to the ideal value of 1, when using the adaptedperceptual metric, whereas the unadapted metric yields eval-order values one or two magnitudes higher. These results show asignificant improvement of the eval-order performance afterusing the metric weights found by the algorithm. Namely, theworst case (maximal eval-order) went down from 92 to just 2.Note that the test set has 100 image pairs, therefore, the worstpossible eval-order value is 100.

The algorithm was run with several initial conditions in orderto determine whether the solution is a local minima. Apart from11% of outlier runs, the metric weights converged to the correctvalues. These outlier runs were found to be caused by at least oneweight initialized close to zero.

Because of the lower dimensionality of the parameters vector,the algorithm still converged to the correct solution using eitherabout 10 training patterns, or about 0.5% of the total number ofdissimilarities.5

The introduction of a strictly monotonic nonlinear distortionfunction f was also tested by setting the cognitive distance todcði

1c ; i

2c Þ ¼ f ðki2c � i1c kÞ. The results were not altered, as expected, by

construction of the underlying nonmetric MDS techniques.The relationship between the cost values and the eval-order is

critical to the success of the approach. The algorithm seeks thereduction of the cost function (13), while the quality of the resultis measured by the eval-order performance metric. For thissynthetic world, the relationship between the cost and the eval-order during the gradient descent was examined. Fig. 3 plots asampling of the eval-order, the stress S, and cost values J in thetest set, from 25 runs, by randomly taking 1 out of 5 descent steps.This plot shows a tendency for smaller costs values to beassociated with lower eval-order values. This means that a lowcost value indicates, in principle, a better generalization ability.This kind of analysis is useful to assess the appropriateness of themethod for a given world in what concerns the generalizationperformance to stimuli different from the ones in the trainingprocess.

ARTICLE IN PRESS

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–2930 2929

The second phase of the experimentation comprised theintroduction of new components to the perceptual representation.To do so, the dimension of the cognitive images was made higherthan the perceptual one, i.e., c4p. Thus, the perceptual metric isperformed with less components than the cognitive one. The firstimpact of this is that, without the introduction of new compo-nents, the final cost values were much higher than before, due tolack of fit (previous experiments resulted in final costs between0.02 and 0.03). Fig. 4 shows the obtained initial and final costs,after testing four different generated worlds. The algorithm wasrun for several amounts of new components for each one of theworlds. The plots display the mean and the standard deviation of

123456789

1011121314

0 0.05 0.1 0.15 0.2 0.25

eval

-ord

er

stress and cost

Stress (S)Cost (J)

Fig. 3. Sampling of the cost (and stress) in function of the eval-order.

0 1 2 3 4 5

cost

weight index

initial costfinal cost

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

cost

weight index

initial costfinal cost

0 1 2 3 4 5 6 7

Fig. 4. Statistics of initial and final costs (vertical axis) in function of the number of new

c ¼ 6, p ¼ 5, and n ¼ 3. (c) For c ¼ 8, p ¼ 5, and n ¼ 3. (d) For c ¼ 10, p ¼ 5, and n ¼ 3.

the initial and final costs, after 100 runs performed in each world.Error bars denote the standard deviation of the cost values acrossall runs. The only difference among runs sharing the same worldparameters is the initial values for the additional components(initialized to random values, as explained above). The training setcontained 20 patterns.

The final cost values shown in Fig. 4, when compared with theinitial ones, illustrate how successful the algorithm was onadapting the perceptual metric. Thus, when the amount ofperceptual components is insufficient, the final costs are not aslow as in the cases where that amount is sufficient. For plots 4(a)and (b), the perceptual images are one feature short (c � p ¼ 1), sothat one or more additional perceptual components are enough todrive the final costs to lower values. The only difference betweenthese two plots is the amount of components with noise, whicheffect is negligible since the plots are identical. Plots 4(c) and (d)show the results for three and for five additional components: asexpected, only with the appropriate amount of additionalcomponents (X3 for the first, and X5 for the second), the finalcosts stabilize around their lowest values, with the increase ofadditional components. These plots corroborate the idea that,once the number of new components reaches c � p, the final coststabilizes on values close to the ones found in previousexperiments. This observation suggests a methodology forestimating how many new components are required for a givenproblem of unknown structure: to try out successively higheramounts of new components, until the final cost value stabilizes.

Further experimentation showed more interesting results. Inone of them, a single additional component to the cognitiverepresentation was considered (c ¼ pþ 1). The algorithm was ableto reconstruct, for the perceptual images in the training set, thevalues of that component. The reconstruction power, measured interm of signal-to-noise ratio (SNR) between the missing compo-nent and the recovered dimension yielded values of about 45 dB.

0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

cost

weight index

initial costfinal cost

0 1 2 3 4 5 6 7 8 9

cost

weight index

initial costfinal cost

components, for various world parameters. (a) For c ¼ 6, p ¼ 5, and n ¼ 0. (b) For

ARTICLE IN PRESS

R. Ventura, C. Pinto-Ferreira / Neurocomputing 72 (2009) 2923–29302930

In another experiment, a linear regression model was employed toextract features for the perceptual representation. The obtainedresults were also satisfactory.

5. Conclusions

The hypothesis underlying the research presented here is thatthe emotional mechanisms provide relevance and meaning, andthus their inclusion in artificial agents is important, from thestandpoint of coping with complex and dynamic environments.The agent architecture proposed follows a biological inspirationfrom neurological research on emotional mechanisms, namely inwhat concerns their role in decision-making.

The efficiency of these mechanisms is paramount, in particularwith regard for the agent to be able to match current situationswith previous experiences (cognitive matching). This paperdescribes progress made towards an efficient mechanism forfinding in memory cognitive matches. The devised methodadaptively optimizes the perceptual metric, with respect to theefficiency of the indexing mechanism.

In order to validate the approach, a simple test-bed wasdevised and explored. The experimental results are interesting, inthe sense of showing that the method meets the expectations. Inparticular, the goal of obtaining a better indexing efficiency, afteradapting the perceptual metric, is visible in the improved eval-order results. This means that using that adapted metric, the N-

best indexing algorithm described in [13] allows for cognitivematches to be found much more efficiently. However, due to thelack of exploitable structure of the current test-bed, whenever thecognitive dimensionality exceeds the perceptual one (includingany additional components), the obtained results become sig-nificantly degraded, because of the poor fitting. For instance, noisecomponents get zero weights assigned to them (as it should be,because of their irrelevancy), only if a good fit is found. Two waysto circumvent this limitations are proposed here: one is to adopt aperceptual metric function with more degrees of freedom, so thatit could better fit the data; another possibility is to allow for somemisfit in the perceptual matching, as long as the cognitive one isable to solve the rest of the problem without significant loss ofefficiency.

References

[1] R.W. Picard, Affective computing, Technical Report 321, M.I.T. MediaLaboratory, Perceptual Computing Section, November 1995.

[2] A.R. Damasio, D. Tranel, H.C. Damasio, Frontal lobe function and dysfunction,in: Somatic Markers and the Guidance of Behavior, Oxford University Press,NY, 1991, pp. 217–229.

[3] A. Newell, H.A. Simon, Computer science as empirical inquiry: symbols andsearch, Communications of the ACM 19 (3) (1976) 113–126.

[4] J. McCarthy, Programs with common sense, in: Mechanization of ThoughtProcesses, Proceedings of the Symposium of the National Physics Laboratory,Her Majesty’s Stationery Office, London, UK, 1958, pp. 77–84.

[5] R.A. Brooks, A robust layered control system for a mobile robot, IEEE Journalof Robotics and Automation RA 2 (1) (1989) 14–23.

[6] H.L. Dreyfus, Why Heideggerian AI failed and how fixing it would requiremaking it more Heideggerian, Philosophical Psychology 20 (2) (2007)247–268.

[7] R.A. Wilson, F.C. Keil (Eds.), The MIT Encyclopedia of the Cognitive Sciences,MIT Press, Cambridge, MA, 1999.

[8] J. LeDoux, The Emotional Brain, Simon & Schuster, New York, 1996.[9] M. Mac- as, L. Custodio, Multiple emotion-based agents using an extension of

DARE architecture, Informatica 27 (2003) 185–195.[10] R. Sadio, G. Tavares, R. Ventura, L. Custodio, An emotion-based agent

architecture application with real robots, in: D. Canamero (Ed.), Emotionaland Intelligent II: The Tangled Knot of Social Cognition, Papers from the AAAIFall Symposium, Technical Report FS-01-02, The AAAI Press, Menlo Park,California, 2001, pp. 117–122.

[11] P. Vale, L. Custodio, Combining reinforcement learning with an emotion basedarchitecture, in: Proceedings of International Conference on ArtificialIntelligence and Applications 2002 (IASTED), ACTA Press, Malaga, Spain,2002, pp. 41–46 (Special Session on Automated Reasoning: Perception andEmotions).

[12] B.D. Damas, L.M. Custodio, Emotion-based decision and learning usingassociative memory and statistical estimation, Informatica 27 (2003)147–157.

[13] R. Ventura, C. Pinto-Ferreira, A formal indexing mechanism for an emotion-based agent, in: Proceedings of International Conference on ArtificialIntelligence and Applications 2002 (IASTED), ACTA Press, Malaga, Spain,2002, pp. 34–40 (Special Session on Automated Reasoning: Perception andEmotions).

[14] T.F. Cox, M.A.A. Cox, Multidimensional Scaling, Chapman & Hall, London, UK,1994.

Rodrigo Ventura got a Licenciatura degree (1996), anM.Sc. degree (2000), and a Ph.D. degree (2008), all inElectrical Engineering and Computers from InstitutoSuperior Tecnico (Technical University of Lisbon), inPortugal. He is currently a member of Institute forSystems and Robotics, a Portuguese private researchinstitution located at the campus of IST, and anAssistant Professor at Instituto Superior Tecnico, wherehe lectures since 1998. His current research interestsinclude emotion-based agents, humanoid robotics, andfield robotics.

Carlos Pinto-Ferreira got his MBA in 1982 fromUniversidade Nova de Lisboa, and his Ph.D. in Electricaland Computing Engineering in 1991 from InstitutoSuperior Tecnico (IST), Technical University of Lisbon,Portugal. He co-organized the Portuguese Conferenceon Artificial Intelligence (EPIA’95), as well as variousother meetings. He is an Associate Professor atInstituto Superior Tecnico (IST), Technical Universityof Lisbon, Portugal. He is also a senior member of theInstitute for Systems and Robotics, a Portugueseprivate research institution located at the campus ofIST.