The physical symbol grounding problem

Cognitive Systems Research 3 (2002) 429–457www.elsevier.com/ locate/cogsys

T he physical symbol grounding problem

Action editor: Tom Ziemke

Paul VogtIKAT /Infonomics, Universiteit Maastricht, P.O. Box 616, 6200MD Maastricht, The Netherlands

Received 14 May 2001; accepted 5 November 2001

Abstract

This paper presents an approach to solve the symbol grounding problem within the framework of embodied cognitivescience. It will be argued that symbolic structures can be used within the paradigm of embodied cognitive science byadopting an alternative definition of a symbol. In this alternative definition, the symbol may be viewed as a structuralcoupling between an agent’s sensorimotor activations and its environment. A robotic experiment is presented in whichmobile robots develop a symbolic structure from scratch by engaging in a series of language games. In this experiment it isshown that robots can develop a symbolic structure with which they can communicate the names of a few objects with aremarkable degree of success. It is further shown that, although the referents may be interpreted differently on differentoccasions, the objects are usually named with only one form. 2002 Elsevier Science B.V. All rights reserved.

Keywords: Symbol grounding problem; Embodied cognitive science; Symbolic structure; Robots

1 . Introduction 1987) and thesymbol grounding problem (Harnad,1990). These problems arise because symbols are

This paper tries to show how symbols can be defined as internal representations, which are sup-redefined to describe cognitive functions within the posed to relate to entities in the real world.paradigm of embodied cognition. Traditionally, A recent approach in cognitive science tries tocognitive scientists describe cognition in terms of overcome these problems by describing cognition insymbol systems (Newell & Simon, 1976; Newell, terms of the dynamics of an agent’s interaction with1980). This is very useful because they assume that the world. This novel approach has been calledcognitive agents manipulate symbols when they embodied cognitive science (e.g. Pfeifer & Scheier,think, reason or use language. However, when 1999). In embodied cognitive science it is assumedexplaining the processes underlying such (higher) that intelligence can be described in terms of ancognitive functions in terms of symbol manipulation, agent’s bodily experiences that are acquired throughat least two major fundamental problems arise: the its interaction with its environment. That is, anframe problem (McCarthy & Hayes, 1969; Pylyshyn, agent’s intelligence should be based on its past

interactions with the physical world. Within thisE-mail address: [email protected](P. Vogt). paradigm, the frame problem and the symbol

1389-0417/02/$ – see front matter 2002 Elsevier Science B.V. All rights reserved.PI I : S1389-0417( 02 )00051-7

mailto:[email protected]

430 P. Vogt / Cognitive Systems Research 3 (2002) 429–457

grounding problem are avoided to some degree, discussing the symbol grounding problem (Dorffnerbecause it is argued that symbolic representations are et al., 1993; Prem, 1995) and to illustrate their ideasno longer necessary to implement intelligent be- they have simulated some aspects in a connectionisthaviors (Brooks, 1990). model of language acquisition (Dorffner, 1992). The

But is this true? Are symbols no longer necessary? major difference between their work and the currentIndeed much can be explained without using sym- work is that Dorffner’s model, besides being abolic descriptions, but most of these explanations connectionist model, has been tested only in simula-only dealt with low-level reactive behaviors such as tions of language acquisition, whereas this workobstacle avoidance, phototaxis, simple forms of invokes a concrete robotic experiment that investi-categorization and the like (Pfeifer & Scheier, 1999). gates language evolution. As the ideas behind theirHigher cognitive functions such as language process- approach is very similar to the one presented below,ing have been modeled most successfully using they will not be discussed further.symbolic representations. This can be inferred from In the next section, the traditional cognitivistthe fact that most natural language processing appli- approach, the embodied cognitive science approach,cations use symbolic processing, either hand-coded and their problems in relation to symbols are shortlyor acquired statistically from large corpora (see, e.g., reviewed. This review paves the way for introducingACL, 1997). It might therefore still be desirable to an alternative interpretation of symbols. As will bedescribe higher cognition in terms of symbol sys- argued, the newly defined semiotic symbols aretems. However, to overcome the symbol grounding meaningful within themselves and fit well within theproblem, the symbol system has to be embodied and embodied cognition paradigm. From there on, thesituated (Pfeifer & Scheier, 1999). It has to be article will present a model by which agents canembodied in order to experience the world and it has acquire a set of semiotic symbols of which theto be situated so that it can acquire its knowledge meaning is grounded through the agents’ interactionsthrough interactions with the world (see, e.g., (Sun, with their environment. The model (explained in2000) for a discussion). This leads to the central Section 3) is based on thelanguage game model thatquestion of this paper: Is it possible to define and has been proposed by Luc Steels to study the originsdevelop an embodied and situated symbol system? and evolution of language (Steels, 1996b). Section 4

This paper argues that this could be possible by will present the results of a concrete experiment inadopting Peirce’s triadic definition of symbols. To which robotic agents develop a set of semioticdistinguish this triadic definition from the traditional symbols using the language game model. The resultsdefinition, these symbols will be calledsemiotic will be discussed in Section 5. Finally, Section 6symbols. As will be shown these semiotic symbols concludes.are both embodied and situated. The proposed ap-proach combines the paradigm ofphysical grounding(Brooks, 1990) with thesymbol grounding problem(Harnad, 1990) such that semiotic symbols are 2 . Symbolsgrounded inherently and meaningfully from thephysical interactions of robots with their environ- 2 .1. Symbols as internal representationsment. It will be argued that this approach reduces thesymbol grounding problem into a technical problem Traditionally, symbols are defined as internalwhich will be called thephysical symbol grounding representations with which computations can beproblem (Vogt, 2000b). To illustrate how a system of carried out. As mentioned, this approach is subject tosemiotic symbols can be constructed from scratch, a some fundamental problems such as the frameconcrete robotic experiment will be presented. problem and the symbol grounding problem. In this

A similar argumentation has recently been put section, a brief review of the classicalcognitivistforward in AI and cognitive science by Dorffner and approach is given, together with a discussion of thecolleagues (Dorffner, Prem, & Trost, 1993; Prem, symbol grounding problem. Although the frame1995). They apply the theory of semiotics for problem is closely related, the discussion in this

P. Vogt / Cognitive Systems Research 3 (2002) 429–457 431

paper concentrates on the symbol grounding prob- symbols should be grounded from the bottom-up bylem. invariantly categorizing sensorimotor signals.

The discussion starts with thephysical symbol Harnad proposes that this should be done in threesystem hypothesis as put forward by Newell and stages:Simon (Newell & Simon, 1976; Newell, 1980). Thishypothesis states that physical symbol systems are 1.Iconization: Analogue signals need to be trans-sufficient and necessary conditions for intelligence. formed toiconic representation (or icons).Physical symbol systems (or symbol systems for 2.Discrimination: ‘‘[The ability] to judge whethershort) are systems that can store, manipulate and two inputs are the same or different, and, ifinterpret symbolic structures according to some different, how different they are.’’specified rules. 3.Identification: ‘‘[The ability] to be able to assign

In Newell and Simon’s definition, symbols are a unique (usually arbitrary) response—a ‘name’—considered to be patterns that provide distal access to to a class of inputs, treating them all as equivalentsome structure (Newell, 1990). These are internal orinvariant in some respect.’’ (Harnad, 1990, myrepresentations that can be accessed from some italics)external structure. (Where external is relative to thepattern, hence it could be some other pattern.) Iconization and discrimination, according to HarnadAlthough Newell (1990) admits that the relation yield sub-symbolic representations; symbols are thebetween a symbol’s meaning and the outside world result of identification. Hence, identification is theis important, he leaves it unspecified. Other related goal of symbol grounding. The process of identifica-conceptions of a symbol are also used in the tion is task dependent (Sun, 2000). As will be arguedcognitive science community. De Saussure, for in- in this paper, using language is a task that isstance, defines a sign as a relation between a particularly suited to do the identification. This ismeaning and some arbitrary form/ label, or more mainly because language through its conventionsconcretely as ‘‘a link between. . . a concept and a offers a basis for invariant labeling of the real world.sound pattern’’ (De Saussure, 1974, p. 66). Harnad In Harnad’s work symbols are still defined asdefines a symbol as an ‘arbitrary category name’ names for categories of sensorimotor activity. As(Harnad, 1993). All these definitions assume that such the symbol grounding problem relates to thesymbols are internal representations. cognitivist paradigm, which concentrates on internal

These notions of symbols fit well with the mind as symbol processing (Ziemke, 1999). As will bea computer metaphor. However, symbols in com- argued, symbols could also be viewed as structuralputers only have a meaning when interpreted by an couplings betweenreality and sensorimotor activa-external observer; computers manipulate symbols tions of an agent that arises from the agent–environ-without being aware of their meaning. Naturally, this ment interaction. This reality may be a real worldis not the way human cognition works. Humans are object or some internal state. When symbols arevery well capable of interpreting the symbols, which structures that inherently relate reality with internalthey manipulate for instance during thought or while structures, they are already meaningful in some senseusing language; they need no external observer to do and the symbol grounding problem is not a fun-this. Therefore, one would like to have symbols that damental problem anymore.agents can interpret themselves.

This problem led Searle to formulate his famous 2 .2. Symbols or no symbols?Chinese Room argument (Searle, 1980), which willnot be discussed here. It also led Harnad to formulate To overcome the problems of the cognitivisthis symbol grounding problem (Harnad, 1990). As approach, embodied cognitive science came aroundargued, symbolic manipulation should beabout in the late 1980s and gained popularity ever since.something and the symbols shouldacquire their The approach has strong roots in artificial intelli-meaning from reality. This is what Harnad calls the gence, where it also became popular under the termssymbol grounding problem. According to Harnad, nouvelle AI andbehavior-based robotics. Besides in


AI, it also has many roots in other disciplines of coupled parallel processes and can be learned. Cogcognitive science such as psychology (Gibson, has learned, for instance, to detect human faces and1979), linguistics (Lakoff, 1987), philosophy gaze directions, to control hand–eye coordination, to(Boden, 1996), and neuroscience (Edelman, 1987; saccade its eyes and to interact socially with humansJohnson, 1997). (Brooks et al., 1998). All these behaviors are based

The essence of this modern approach will be on the physical grounding hypothesis.discussed here briefly in line with the argumentation Although many behaviors can be explained by thebrought by Brooks who introduced thephysical physical grounding hypothesis, the question stillgrounding hypothesis (Brooks, 1990, 1991). This remains whether it is able to explain higher cognitivehypothesis states that intelligence should be functions. The assumption taken in this paper is thatgrounded in the interaction between a physical agent intelligent behaviors such as thought and languageand its environment. Furthermore, according to this do require some form of symbolic representation.hypothesis, symbolic representations are no longer The reason for this is twofold: First, as scientists likenecessary. Intelligent behavior can be established by to describe overt behavior in terms of symbolparallel operating sensorimotor couplings. manipulation, it is very useful to have a proper

When, as Brooks argues, symbolic representations definition that fits well within the embodied cogni-are no longer necessary, it could be argued that the tion paradigm. This makes it easier to ascribesymbol grounding problem is no longer relevant symbols to embodied cognitive agents from ansince there are no symbols (Clancey, 1997; Pfeifer & observer’s perspective. The second reason is that inScheier, 1999). Another important aspect of the order to facilitate higher cognitive functions such asphysical grounding hypothesis is that intelligent language, agents might actually need symbols thatbehaviors are oftenemergent phenomena (e.g. they can manipulate. The question remains how arePfeifer & Scheier, 1999). This means that intelligent symbols represented?behaviors may arise from mechanisms that appearnot to be designed to perform the observed behavior. 2 .3. Symbols as structural couplingsThe physical grounding hypothesis lies at the heartof embodiment and situatedness. Intelligence is As already discussed, the traditional approach toembodied through an agent’s bodily experiences of cognitive science and AI is confronted with problemsits behavior and it is situated through the agent’s such as theframe problem (McCarthy & Hayes,interaction with the world. 1969; Pylyshyn, 1987), thesymbol grounding prob-

Brooks and others showed that much of an agent’s lem (Harnad, 1990) and theChinese Room ‘problem’surprisingly intelligent behavior can be explained at (Searle, 1980). At the heart of these problems liesthe level of sensorimotor control (e.g. Steels & the fact that the traditional symbols are neitherBrooks, 1995; Arkin, 1998; Pfeifer & Scheier, 1999). situated nor embodied, see, e.g., (Clancey, 1997;Very simple mechanisms that connect agents’ sensors Pfeifer and Scheier, 1999) for broad discussions onwith its motors can exhibit rather complex behavior these problems. As mentioned, the physical ground-without requiring symbolic representations (Braiten- ing hypothesis (Brooks, 1990) doubts the necessityberg, 1984). of symbolic representations, but if they would be

An example is the famous Cog experiment of necessary they should be both situated and embodiedBrooks and his colleagues (e.g. Brooks, Breazeal, (Clancey, 1997). In this section a definition of

´Irie, Kemp, Marjanovic, Scassellati, & Williamson, symbols will be given that is both situated and1998). Cog is a humanoid robot that can mimic some embodied from an agent’s point of view.human-like behaviors by connecting its sensorystimulation to some actuator response. Its behaviors 2 .3.1. Semiotic symbolsare controlled according to the behavior-based Various scientists from the embodied cognitiveparadigm. Several behaviors are modeled in a science field assume that when symbols should belayered organization of sensorimotor couplings. necessary to describe cognition, they should beThese behavior modules are implemented as loosely defined as structural couplings connecting objects to


their categories based on their sensorimotor projec- The sign (or semiotic symbol) is often illustratedtions (Clancey, 1997; Maturana & Varela, 1992). as asemiotic triangle such as the one introduced byThere is, however, already a definition of a symbol Ogden and Richards (1923). The triangle displayedthat comes very close to such a structural coupling. here (Fig. 1) only differs from the original one in its

1This alternative definition stems from the work of terminology. The dotted line in the figure indicatesPeirce (Peirce, 1931–1958). that the relation between form and referent is not

Peirce’s theory extends the semiotics of De Saus- always explicitly observable.sure. While De Saussure defines a sign as having ameaning (thesignified) and a form (thesignifier)(De Saussure, 1974), Peirce also includes its relation 2 .3.2. The meaning of meaningto a referent. A sign consists of what he calls a The termmeaning requires special attention. It hasrepresentamen, interpretant and anobject. These are been (and still is) subject to much debate indefined as follows (Chandler, 1994): philosophy and the cognitive sciences in general.

Representamen: the form which the sign takes This work tries to distill a definition of meaning that(not necessarily material). is suitable within the context of the current in-

Interpretant: . . . the sense made of the sign. vestigation.Object: to which the sign refers. According to Peirce, a semiotic symbol’s meaningAccording to Peirce, the sign is called asymbol if arises in its interpretation (Chandler, 1994). As such

the representamen in relation to its interpretant is the meaning arises from the process of semiosis,either arbitrary or conventionalized, so that the which is the interaction between form, meaning andrelationship must be learned. In this respect the referent. This means that the meaning depends onrepresentamen could be, for instance, a word-form as how the semiotic symbol is constructed and withused in language. The interpretant could be viewed what function. This is comparable to the notion of‘‘as another representation which is referred to the meaning that is prominent in embodied cognitivesame object’’ (Eco, 1976, p. 68). The object can be science, where meaning depends on ‘‘the way weviewed as a physical object in the real world, but perceive the overall shape of things. . . and by themay also be an abstraction, an internal state or way we interact with things with our bodies’’another sign. In the experiments described below, the (Lakoff, 1987, p. 292).objects will be physical objects and the interpretant So, the meaning of semiotic symbols can bewill be represented by a category that is formed from viewed as a functional relation between a form and athe visual interaction of a robot with the real world. referent. This relation is based on an agent’s bodily

Often the term symbol is used to denote therepresentamen (e.g. Ogden & Richards, 1923; Har-nad, 1993). Many scientists,including Peirce, tendto ‘misuse’ the term sign when referring to therepresentamen. However, the sign was originallydefined by Peirce as thetriadic relation (Chandler,1994). In this paperthe triadic interpretation of thesign is adopted as the definition of the symbol,provided that the representamen of the sign is eitherarbitrary or conventionalized. In order to distinguishthis definition from the traditional interpretation, this

Fig. 1. The semiotic triangle illustrates the relations that constitutealternative interpretation of the symbol shall bea sign. When the form is either arbitrary or conventionalized, the

called asemiotic symbol. sign can be interpreted as a symbol.Also a more familiar terminology is adopted.

Following Steels, the representamen is called aform, 1In Ogden and Richards’ original diagram, the termsymbol wasthe interpretant ameaning and the object areferent actually used instead ofform. In addition, they call the meaning a(Steels & Kaplan, 1999). thought or reference.


experience and interaction with a referent. The This meaning is solely based on the infants’ visualexperience of an agent is based on its history of interaction with, and categorization of the toy.interactions. Each interaction between an agent and a Naturally, when the infants further interact with thereferent can activate its past experiences bringing object, e.g. by playing with it, they expand theirforth a new experience. The way these bodily meaning of the object and they come to learn moreexperiences are represented and memorized form the of its functions.internal representation of the meaning. The actual As Ziemke and Sharkey do, one might still argueinteraction between an agent and a referent ‘defines’ that robots cannot use semiotic symbols meaningful-the functional relation. ly, since they are not rooted in the robot as the robots

In the experiment described below, robots develop are designed rather than shaped through evolution ora system of semiotic symbols through communica- physical growth (Ziemke & Sharkey, 2001; Ziemke,tive interactions called language games. The robots 1999). In that respect, they continue, today’s robotshave a very simple body and can only visually do not use semiotic symbols meaningfully, sinceinteract with objects and, in principle, point at them whatever task they might have stems from itsby orienting towards the objects. In the language designer or is in the head of a human observer. Withgames, communication is only used to name a this in mind, it will be assumed that robots, oncevisually detected referent (in case of the speaker) and they can construct semiotic symbols, they do soto guess what visually detected referent the speaker meaningfully. This assumption is made to illustratenames (in case of the hearer). The function (oruse in how robots can construct semiotic symbols meaning-Wittgenstein’s terminology) of this naming is only to fully.focus the hearer’s attention towards a referent suchthat it, for instance, can go to or point at the referent. 2 .3.3. Physical symbol groundingThe meaning of the semiotic symbol in such a Adopting Peirce’s triadic notion of a symbol has atguessing game is conveyed by an agent’s perception, least two advantages. One advantage is that one maycategorization and naming of the referent, together argue that the semiotic symbol isper definitionwith, in case of the hearer, an appropriate reaction. grounded, because the triadic relation (i.e. theSuch meanings may change dynamically over time semiotic symbol) already bears the symbol’s mean-as the robots (visually) interact more with the ing with respect to reality. As meaning is defined byreferents. the functional relation between the form, meaning

This use of meaning may not seem realistic, and referent, one might argue that the use of suchbecause the communication of semiotic symbols semiotic symbolic structures ‘‘are meaningful to

2have no meaning with respect to the robots’ survival begin with’’ (Lakoff, 1987, p. 372). Hence theas, for instance, pointed out by Ziemke and Sharkey symbol grounding problem is no longer relevant. Not(2001). But it is very much similar to the way infants because symbols do not exist anymore as argued by,seem to construct meanings upon their first visual e.g. Brooks, but because the semiotic symbols areencounter with some object. This is nicely illustrated already meaningful. This does not mean, however,by a series of experiments reported in (Tomasello that there is no problem anymore, the symboland Barton, 1994). In these experiments, infants are grounding problem is no longer afundamentalshown a, for them, novel toy together with an problem of interpreting symbols meaningfully. Theinvented name (i.e. a non-existing word), such as problem is reduced to the process of semiosis, which‘‘toma’’. After that, the toy is hidden and the infants is defined by Peirce as the interaction between theare requested to find the ‘‘toma’’. Even if they do not referent, meaning and form. The semiosis can beknow where the toy is hidden, the infants are very viewed as the process of constructing the semioticsuccessful in recognizing the ‘‘toma’’ when they find triangle. Implementing this in autonomous systems,it. So, although the children are not yet able to graspthe entire meaning of the toy (they do not know, for 2Note that Lakoff does not explicitly apply this quote toinstance, what they can or should do with it), they semiotic symbols, but his argument is similar to the one expressedpresumably do form some meaning for the object. here.


however, remains a hard problem. This problem shall The semiotic definition provided by Peirce yields abe addressed as thephysical symbol grounding structural coupling between the real world object andproblem (Vogt, 2000b) and will be treated as a some internal representation (or a sensorimotor acti-technical problem. vation pattern), especially in the process of semiosis.

The physical symbol grounding problem is a Since the semiotic symbol is defined by a relationcombination of the physical grounding problem and between a form, meaning and referent, its meaning isthe symbol grounding problem. It is based on the an intrinsic property bearing the relation to the realidea that symbols should be grounded (cf. Harnad, world. Hence, it could be argued that the semiotic1990) and the idea that they should be grounded by symbol is per definition grounded and the symbolphysical agents that interact with the real word (cf. grounding problem is not relevant anymore. How-Brooks, 1990). To solve the physical symbol ground- ever, there still remains the problem of constructinging problem, the three phases of symbol grounding a semiotic symbol. Rather than a fundamental prob-identified by Harnad (1990) are still relevant. The lem, this will be treated as a hard technical problemnext section will show how these phases (iconiza- and it is addressed as thephysical symbol groundingtion, discrimination and identification) can be problem. The semiotic symbols acquire their mean-modeled in a concrete experiment. ing through perception and categorization as a

The second advantage is that the semiotic symbol process that conveys the semiotic symbols’ names tois situated and embodied. It should be acquired the referents they stand for.through some interaction between a physical agent As argued, and as the experimental results willand its environment. This allows to connect semiotic reveal, semiotic symbols are constructed most effi-symbols with embodied cognition. ciently in communication. Language development

So constructing semiotic symbols is required for gives rise to the development of meaning and vicecommunication. Every time a semiotic symbol is versa.used, it is (re-)constructed by its user. As a result, thesemiotic symbols are not static, but may changedynamically whenever the agent–environment inter- 3 . Adaptive language gamesaction requires so. In communication the form needsto be conventionalized (although still arbitrary to 3 .1. Synthetic modeling of language evolutionsome extend). Establishing conventions about asemiotic symbol’s form in relation to its referent is From the second half of the 1990s, research at theparticularly useful for the invariant identification of Vrije Universiteit Brussel and at the Sony Computerthe referent. As a referent is perceived differently Science Laboratory in Paris is focussed on the studyunder varying circumstances, it may be categorized of language origins and evolution. These studies aredifferently. As will be shown, language use helps to based on thelanguage game model as proposed byidentify the sensing of the referents invariantly. Steels (1996b). In this model languageuse is theTherefore, it is assumed that meaning co-evolves central issue in language development as has beenwith the language (Steels, 1997a), such that meaning hypothesized by the ‘father of language games’can also be viewed as a cultural unit (cf. Eco, 1976). (Wittgenstein, 1958). Steels hypothesized threeSimilar arguments have been put forward to explain mechanisms for language development:culturalvarious aspects of language development (see, e.g.,interaction, individual adaptation and self-organiza-Whorf, 1956; Lakoff, 1987). tion.

In this model, agents have a mechanism to ex-2 .3.4. Summary change parts of their vocabulary with each other,

In this subsection a definition of a semiotic symbol called cultural interaction. When novel situationshas been adopted that provides an alternative to the occur in a language game, agents can expand theirtraditional definitions. As Clancey has argued, a lexicons. In addition, they evaluate each ‘speech act’symbol within the embodied cognition paradigm on their effectiveness which they use to strengthen orshould be some structural coupling (Clancey, 1997). weaken used form-meaning associations. These


mechanisms are called individual adaptations. Al- studied separately in simulations (Steels, 1996c) andthough agents have been implemented with mecha- on mobile robots (Vogt, 1998b; De Jong & Vogt,nisms that model local behavior, such as com- 1998).municating to another agent and individual adapta- Other experiments have been done in which thetions, the iterative combination of these mechanisms discrimination game has been coupled to the namingyields the emergence of a global coherence in the game in simulations (De Jong, 2000; Belpaeme,agents’ language. This process is very similar to the 2001), on immobile robots called the ‘Talkingself-organizing phenomena that have been observed Heads’ (Belpaeme, Steels, & van Looveren, 1998;in many biological, physical and cultural systems Steels & Kaplan, 1999) and on mobile robots (Steels(Prigogine & Strengers, 1984; Maynard-Smith & & Vogt, 1997; Vogt, 2000b). Again, and as will be

´Szathmary, 1995; Maturana & Varela, 1992). Self- shown in the next section, these experiments re-organization is thought to be a basic mechanism of vealed that the three mechanisms are very powerfulcomplex dynamical systems. to explain how lexicons can evolve in co-evolution

The above mechanisms for lexicon development with the meaning without implementing any priorhave been tested extensively under different settings knowledge of the lexicon and meaning.by the researchers in Brussels and Paris to investi- Another variant of the language game that isgate various aspects of lexicon origins, evolution and worthwhile mentioning is theimitation game that hasdevelopment. These aspects includelexicon forma- been used to study the origins of human-like voweltion (Steels, 1996b),lexicon dynamics (Steels & systems (De Boer, 1997, 2000a). De Boer showed inMcIntyre, 1999), multiple word games (Van his experiments, that agents can develop repertoiresLooveren, 1999) andstochasticity (Steels & Kaplan, of vowels that are very similar to vowel systems1998). All these experiments reveal that the language observed in human languages. These systemsgame model (also called thenaming game in relation emerged based on the three mechanisms proposed byto lexicon formation) is a strong model to explain Steels as mentioned above. A crucial factor in thelexicon formation in terms of the nurture paradigm. experiments’ success is the realistic simulation of theThe model is very similar to those that have been vocal tract and auditory system with which thesuccessfully studied by, e.g., Batali (1998), Oliphant agents are modeled. It is important to realize that in(1999) and Kirby and Hurford (1997). It contrasts the experiments none of the vowels have beenthe nativist approach advocated by, e.g., Chomsky preprogrammed, neither are their features as the(1980), Pinker and Bloom (1990) and Bickerton Chomsky and Halle theory proposes (Chomsky &(1998). Halle, 1968).

The above mechanisms to explore lexicon forma- For an overview of the experiments that are donetion have been extended to include meaning creation in Brussels and Paris, consult (Steels, 1997b).(hence modeling symbol grounding). The extensionresulted in a model ofdiscrimination games (Steels, 3 .2. The experiment1996c). The mechanisms on which the discriminationgames are based are very similar to those of the The development of the language games haslanguage games:agent–environment interaction, in- resulted in a model that has been implemented ondividual adaptation and self-organization. Agents various robotic platforms: on mobile LEGO robotsare given a mechanism to perceive their environ- (Steels & Vogt, 1997), on the Talking Heads, whichment. Another mechanism allows them to adapt their are immobile pan-tilt cameras (Belpaeme et al.,memories. They can construct new categories that 1998) and most recently on the AIBO, which is aform the basis of the meaning, and they can adapt four legged robot (Kaplan, 2000). The experimentthem based on their use in grounded language reported here is based on the one reported in (Steelsgames. The evolution of meaning using these mecha- and Vogt, 1997). The goal of this experiment is thatnisms is an emergent property of the simple inter- two mobile robots, given their bodies and inter-action and adaptations, so the term self-organization action/ learning mechanisms, develop a shared andis in place. The discrimination game model has been grounded lexicon from scratch about the objects that


the robots can detect in their environment. Thismeans that the robots construct a vocabulary ofform-meaning associations from scratch with whichthe robots can successfully communicate the namesof the objects.

Although not always modeling lexicon evolution,lexicon grounding on real robots is becoming in-creasingly more popular. Other research, however,

Fig. 2. The semiotic square illustrates the guessing game scenario.does not investigate lexicon development fromThe two squares show the processes of the two participatingscratch, but assumes a part of the lexicon is given.robots. This figure is adapted from (Steels and Kaplan, 1999).Examples are the work of Billard and Hayes (1997)

and Yanco and Stein (1993) in robot–robot com-munication, or Roy (2000) and Sugita and Tani The robots categorize the preprocessed data, which(2000) in human–robot communication. It is beyond results in a meaning if the categorization is used inthe scope of this paper to discuss these researches the communication, i.e. when it is coupled to a form.here, but all this work solves, to some extend, the After categorization, the speaker produces an utter-physical symbol grounding problem. ance and the hearer tries to interpret this utterance.

In the experiments, the robots play a series of When the meaning of this utterance applies to theguessing games (Steels & Kaplan, 1999). The gues- categorization of the sensed referents, the hearer cansing game is a variant of a language game in which act to the appropriate referent, for instance, bythe hearer tries to guess what referent the speaker pointing at it. The guessing game is successful if thenames. Much of the processing of the robots that is hearer guessed the right topic from the speaker’snot directly involved with lexicon development, like utterance. The success is evaluated in the feedback,sensing, turn-taking and evaluating the feedback has which is passed back to both robots so that they canbeen preprogrammed in a behavior-based manner, adapt their ontology of categories (used as memor-see (Vogt, 2000b) for details. The same holds for the ized representations of the meanings) and lexicon.learning mechanisms. The reason for this is not to As can be seen in Fig. 2, when the segment-nodecomplicate the system too much, so that a working is ignored (although it is a necessary intermediateexperiment could be developed to investigate how step), the guessing game allows the robots to con-robots can develop a grounded lexicon, given the struct a semiotic symbol. The game is thus similar tomechanisms explained below. Other research that the process of semiosis. In this paper it is assumedinvestigates various aspects of the origins of lan- that meaning arises from the sensing, segmentationguage and communication tries to explain how such and categorization. Hence semiotic symbols aremechanisms may have evolved. Examples of such illustrated with the triangle (Fig. 1) rather than theresearch investigate the origins of communication square. When a guessing game is successful, thechannels (Quinn, 2001), feature detectors (Bel- relations between form, meaning and referent arepaeme, 1999) and communication as such (Noble, established properly and one could argue that the2000; De Jong, 2000). constructed semiotic symbol is used meaningful.

The basic scenario of a guessing game is illus- Below follows a detailed description of the ex-trated in what Steels calls the semiotic square (Steels perimental setup and the guessing game model. Each& Kaplan, 1999), see Fig. 2. The game is played by basic part of the model is exemplified with antwo robots. One robot takes the role of the speaker, illustrative example.while the other takes the role of the hearer. Bothrobots start sensing their surroundings, after which 3 .3. The experimental setupthe sensory data is preprocessed. This way the robotsacquire a context setting. The speaker selects the The experiment makes use of two LEGO robots,sensing of one referent as the topic; the hearer see Fig. 3. The robots are equipped with four lightconsiders all detected referents as a potential topic. sensors, two motors, a radio module and a sen-


tion might have come up with a similar visualapparatus.

3 .4. Sensing, segmentation and feature extraction

As a first step towards solving the symbol ground-ing problem, the robots have to construct whatHarnad calls an iconic representation. AlthoughHarnad refers to raw sensory data such as an imageon the retina, in this work it is assumed that aniconic representation is the preprocessed sensory datathat relates to the sensing of a light source. Theresulting representation is called afeature vector inline with the terminology from pattern recognition

Fig. 3. The LEGO robots in their environment. (see, e.g. Fu, 1976).A feature vector is acquired in three stages:

sensing, segmentation and feature extraction. Belowsorimotor board. The light sensors are used to detect these three stages are explained in more detail.the objects in the robots’ environment. The twomotors control the robots’ movements. The radio 3 .4.1. Sensingmodule is used to coordinate the two robots’ be- During the sensing phase, the robots detect what ishaviors and to send sensor data to a PC where most in their surroundings one by one. They do so byof the processing takes place. rotating 7208 around their axis. While they do this,

The robots are situated in a small environment they record the sensor data of the middle 3608 part of2(2.53 2.5 m ) in which four light sources are placed the rotation. This way the robots obtain a spatial

at different heights. The light sources act as the view of their environment for each of the four lightobjects that the robots try to name. The different sensors, see Fig. 4. The robots rotate 7208 instead oflight sensors of the robots are mounted at the same 3608 in order to cancel out nasty side effects inducedheight as the different light sources. Each sensor by the robots’ acceleration and deceleration.outputs its readings on asensory channel. A sensory Fig. 4 shows, as an example, the sensing of twochannel is said tocorrespond with a particular light robots during a language game. Fig. 4(a) shows thatsource if the sensor has the same height as this light robot r0 clearly detected the four light sources; theresource. appears a peak for every light source. In each peak,

The goal of the experiments is that the robots the sensor that has the same height as the lightdevelop a lexicon with which they can successfully source responsible for the peak (i.e. the corre-name the different light sources. Although this is not sponding sensor) has the highest intensity. The twoa realistic task for living organisms (or even for robots do not sense the same view as can be seen inrobots), complicating the task would not contribute Fig. 4(b). This is due to the fact that the robots aremuch to the investigation of how robots can develop not located at the same place.a lexicon to name things.

The reasons for designing the correspondence 3 .4.2. Segmentationbetween light sources and sensors are twofold. First, Segmentation is used by the robots to extract theit helps the observer in analyzing the experiments. It sensory data that is induced by the detection of theprovides the observer with an easy tool to investigate light sources. The sensing of the light sources relateswhat light source the robots saw based on the in the raw sensory data with the peaks of increasedsensory data. Secondly, if the robots were to live in intensity. As can be seen in Fig. 4, between twothis environment and had to distinguish the light peaks the sensory channels are noisy. The first stepsources from each other in order to survive, evolu- in the segmentation preprocesses the raw sensory


Fig. 4. The sensing of the two robots during a language game. The plot shows the spatial view of the robots environment. It is acquiredduring 3608 of their rotation. The figures make clear that the two robots have a different sensing, since they stand at different positions. They-axis shows the intensity of the sensors, while thex-axis determines the time (or angle) of the sensing in PDL units. A PDL unit takes about1] second, hence the total time of this sensing event took 1.5 s for robot r0 and slightly less for robot r1. The robots have different rotation40

periods due to the noisy control of the robots. (a) Robot r0, (b) robot r1.

data to remove this noise. The preprocessed sensorychannel datat , for sensory channelsi 5 0, . . . ,n oni,t

time stepst, is acquired by subtracting an uppernoise levelQ from the raw sensory datax . This isi i,t

expressed in the following equation:

x 2Q if x 2Q $ 0,i,t i i,t it 5 (1)Hi,t 0 if x 2Q , 0.i,t i

The upper noise levelsQ for the different sensoryi

channels have been obtained empirically. Applyingthe above equation to the middle part of the scenefrom Fig. 4(a) results in the scene displayed in Fig. 5.

In this figure there are two connected regions that Fig. 5. The preprocessed sensory channel data of a part of thehave positive sensory values for at least one sensorysensed view from Fig. 4(a). The vertical lines mark the boundaries

of segmentsS and S .channel. These regions, of which the boundaries are 1 2

marked with vertical lines, relate to the sensing of alight source and form the segments. Note that there 0, . . . ,m there exist at least one sensory channeli foris no noise anymore between the segments. which t . 0. This means for every segment thatk,i, jThe segmentation results in a set of segmentshS j,k for every observationj there is at least one pre-where processed sensory channel with a positive value.

Applying this to the data shown in Fig. 5 yields theS 5 s , . . . ,s , (2)h jk k,0 k,n21segments given in Table 1.

in which k is the number of the segment,s 5 It is assumed that each segment relates to thek,i

(t , . . . ,t ) is the preprocessed sensory channel detection of one light source. Due to the noisyk,i,0 k,i,m

data, m is the length of the segment andn is the control and sensing of the robots, this is not alwaysnumber of sensory channels (four in this case). The the case. The set of segments constitute what isfollowing holds for each segmentS : for all j 5 called thecontext of the guessing game, i.e. Cxt5k


Table 1 ing each maximum value by the maximum value ofThe segmentsS and S from Fig. 51 2 s . The resulting feature vector for segmentS is1,1 1

S S thus f 5 (0.94, 1.00, 0.06, 0.00).1 2 1

Similarly, segmentS has maximum values of 0,s s s s s s s s 21,0 1,1 1,2 1,3 2,0 2,1 2,2 2,3

36, 40 and 1 for sensory channels 0 to 3. Normaliz-t 2 0 0 0 0 1 11 0k,i,0 ing each value to 40 yields feature vectorf 5 (0.00,2t 14 2 0 0 0 1 37 0k,i,1

0.90, 1.00, 0.03). The context of this robots could bet 32 17 1 0 0 16 39 1k,i,2

t 46 40 3 0 0 36 40 1 described in terms of the feature vectors as Cxt5k,i,3

t 51 54 3 0 0 28 35 1k,i,4 h f , f j.1 2t 33 35 4 0 0 28 35 1k,i,5 The fact that the sensor that reads the highestt 3 1 4 0 0 0 8 0k,i,6 intensity during the sensing of a light source ist 3 1 3 0 0 0 8 0k,i,7

mounted at the same height as the light is anThe preprocessed sensory channel datas is given in thek,i invariant property of the sensing. The feature ex-columns. Note that for every segment each row has at least one

traction from Eq. (3) has been designed to extractpositive value. Further note that here both segments have equallength. In general this is not the case. this property. In a feature vector there is one feature

with value 1, the others have lower values. Thisfeature indicates to which light source it corresponds.

hS , . . . ,S j, whereN is the number of segments that Because in the example featuref of feature vector1 N 1,1

are sensed. Each robot participating in the guessing f has a value 1, one can infer that this feature vector1

game has its own context which may differ from corresponds to the light source that is at the same3another. height as sensor s1. Similarly,f corresponds to the2

light source at the same height as sensor 2.The reasons for this feature extraction are mani-3 .4.3. Feature extraction

fold. First, it is useful to have a consistent repre-For all sensory channels, the feature extraction is asentation of the sensed referents in order to categor-function w(s ) that normalizes the maximum inten-k,iize. It is easier to implement categorization when itssities of sensory channeli to the overall maximuminput has a fixed format. As the segments vary inintensity from segmentS ,klength, they have no fixed format. Second, the

max ts ds k,i, j normalization to the maximum intensity within thek,i]]]]]]w s 5 (3)s dk,i segment (the ‘invariance detection’) is useful to dealmax max ts dS s k,i, js dk k,i

with different distances between the robot and theThe result of applying a feature extraction to the data light source. Furthermore, it helps to analyze theof sensory channeli will be called a featuref , so experiments from an observer’s point of view and tok,i

f 5w(s ). SegmentS can now be related to a evaluate feedback. Besides its use during feedbackk,i k,i k

feature vectorf 5 ( f , . . . ,f ), where n is the (see below), the robots are not ‘aware’ of thisk k,0 k,n21

total number of sensory channels. Like a segment, a invariance. It should be noted that such featurefeature vector is assumed to relate to the sensing of a extraction functions could well have been learned orreferent. The space that spans all possible feature evolved as, for instance, shown by De Jong andvectors f is called then dimensional feature space Steels (1999) and Belpaeme (1999).

n^ 5 [0, 1] , or feature space for short.Applying this feature extraction to the segments 3 .5. Meaning formation

from Table 1 goes as follows. First the maximumvalues for each sensory channel (the numerator in In order to form a semiotic symbol, the robotsEq. (3)) have to be identified. In segmentS these have to categorize the sensing of a referent so that it1

are: 51, 54, 4 and 0 for sensory channels is distinctive from the categories relating to the others , . . . ,s , respectively. The maximum of these1,0 1,3

3maxima (the denominator in Eq. (3)) is in sensory Note that the sensors are numbered from 0 to 3 and that sensorchannels . The features are extracted by normaliz- s0 is the lowest in height.1,1


referents. This way the category can be used as the space are available to a robot. These different featurememorized representation of the meaning of this spaces, indicated by^ , allow different levels ofl

referent. Although a category should not be equaled generality specified byl50, . . . ,l , wherelmax max

with a meaning, it is labeled as such when used in is the level in which maximum specialization can becommunication, because it forms the memorized reached. This maximum is introduced because therepresentation of a semiotic symbol’s meaning. sensors have a limited resolution and to prevent

The ‘meaning’ formation is modeled with the unnecessary specialization, which makes the dis-so-calleddiscrimination game (Steels, 1996c). The crimination game computationally very inefficient. Indiscrimination game models the discrimination phase each space a different resolution is obtained byin symbol grounding as proposed by Harnad (1990) allowing each dimension of^ to be exploited up tol

land it searches for distinctive categories in three 3 times. How this is done will be explained soon.steps: (1) the feature vectors from the context are The higherl is, the more dense the distribution ofcategorized, (2) categories that distinguish a topic categories in feature space^ can be and the lessl

from the other segments in the context are identified, general the categories in that feature space are. Theand (3) the ontology of categories is adapted. categories of all feature spaces together constitute an

Each robot plays a discrimination game for each agent’s ontology.(potential)topic(s) individually. A topic is a segment The different feature spaces allow the robots tofrom the constructed context (described by its feature categorize a segment in different ways. The categori-vector). The topic of the speaker is arbitrarily zation of segmentS results in a set of categoriesk

selected from the context and is the subject of C 5 hc , . . . ,c j, where m #l . So, assumingk 0 m max

communication. As the hearer tries to guess what the only one feature space, the two feature vectors fromspeaker intends to communicate, it considers all the example are categorized with the following setssegments in its context as apotential topic. of categories:C 5 hc j and C 5 hc j.1 1 2 2

In the discrimination games, a prototypical repre-3 .5.1. Categorization sentation has been used rather than the binary

Let a categoryc 5 kc, n, r, kl be defined as a representation used in (Steels, 1996c; Steels andregion in the feature space. It is represented by a Vogt, 1997) or the subspace representation used byprototype c 5 (x , . . . , x ), i.e. a point in then De Jong and Vogt (1998), which all yield more or0 n21

dimensional feature space, and n, r and k are less similar results (De Jong & Vogt, 1998; Vogt,scores. The category is the region in in which all 2000b). The reason for adopting a prototype repre-points have the nearest distance toc. sentation is its biological plausibility as inferred by

A feature vectorf is categorized using the 1- for instance psychologists (Rosch, Mervis, Gray,nearest neighbor algorithm (see, e.g., Fu, 1976). Johnson, & Boyes-Braem, 1976) and physiologistsThis algorithm returns the category of which the (Churchland, 1989).prototype has the smallest Eucledian distance tof.Each robot categorizes all segments this way.

3 .5.2. DiscriminationConsider the context of feature vectors from theSuppose that a robot wants to find distinctiveexample derived in the previous section. Further

categories for (potential) topicS , then a distinctivetsuppose that the robot has two categories in itscategory set can be defined as follows:ontology c and c , which are represented by the1 2

prototypes c 5 (0.50, 0.95, 0.30, 0.00) andc 51 2DC 5 c [C u; S [Cxtu S : c [⁄ C .h s d jh j(0.50, 0.95, 1.00, 0.00). Then the Eucledian distance i t k t i k

betweenf andc is 0.50, and betweenf andc this1 1 1 2

is 1.04. Since the distance betweenf and c is Or in words: the distinctive category set consists of1 1

smallest, f is categorized withc . Likewise, f is all categories of the topic that are not a category of1 1 2

categorized withc . any other segment in the context. That is, those2

In order to allow generalization and specialization categories that distinguish the topic from all otherin the categories, different versions of the feature segments in the context.


3 .5.3. Adaptation where e is the step size with which the prototypeIf DC 5 5, the discrimination game is a failure and moves towardsf. This way the prototype becomes a

some new categories are constructed. Suppose that more representative sample of the feature vectors itthe robot tried to categorize feature vectorf 5 categorizes. This update is similar to on-linek means( f , . . . , f ), then new categories are created as clustering (MacQueen, 1967).0 n21

follows (see also the example in Section 3.5.4):3 .5.4. An example

To continue the example from above, recall that1. Select an arbitrary featuref .0.ithe two feature vectors in the context are categorized2. Select a feature space that has not beenl

l with the following sets of categories:C 5 hc j andexploited 3 times in dimensioni for l as low as 1 1

C 5 hc j. Suppose thatS is the topic of thepossible. 2 2 1

discrimination, the distinctive category set isDC 53. Create new prototypesc 5 (x , . . . ,x ), wherej 0 n21hc j, becausec [⁄ C . BecauseDC ± 5, the discrimi-x 5 f and the otherx are made of already 1 1 2i i rnation game is a success. Ifc is further successfullyexisting prototypes in . 1l

used in naming the referent, its prototype shifts4. Add the new prototypical categoriesc 5 kc , n ,j j jtowardsf by applying Eq. (4). Ife 50.1, which isr , k l to the feature space , with n 5r 5 0.01 1j j l

l 9the case in the experiment, thenc becomesc 5]andk 512 . 1 1lmax (0.54, 0.96, 0.28, 0.00).Now suppose that the context did not consists ofThe scoren indicates the effectiveness of a category

two feature vectors, but three: Cxt5 h f , f , f j,1 2 3in discrimination game,r indicates the effectivenesswhere f and f are as before and f 51 2 3in categorization andk indicates how general the(1.00,0.90,0.05,0.00). Further suppose that the robotcategory is (i.e., in which feature space thelhas the same ontology as before, thenf is categor-3category houses). Scorek is a constant, based on theized with the category setC 5 hc j, sincec is the3 1 1feature space and the feature space that has thelclosest tof . When S is the topic, the distinctive3 1highest resolution possible (i.e.l5l ). This scoremaxcategory set is empty, sinceC 5 hc j and c [C .1 1 1 3implements a bias towards selecting the most generalThe discrimination game fails and two newcategory. The other scores are updated as in re-categories are formed of which the prototypes are asinforcement learning. It is beyond the scope of thisfollows when featuref is selected as an exemplar1,0paper to give exact details of the update functions,of topic S (note that only one feature space is1see (Vogt, 2000b) for these details.considered in this example):c 5 (1.00, 0.95, 0.30,3The three scores together constitute themeaning

1 0.00) andc 5 (1.00, 0.95, 1.00, 0.00). Featuref is] 4 1,0score m 5 (n 1r 1k), which is used in the naming3copied to the new prototypes, together with the otherphase of the experiment. The influence of this scorefeatures of already existing prototypes, comparec3is small, but it helps to select a form-meaningand c with c and c .4 1 2association in case of an impasse.

The reason to exploit only one feature of the topic3 .6. Namingduring the construction of new prototypes, rather

than the complete feature vector is to speed up theAfter both robots have obtained distinctiveconstruction.

categories of the (potential) topic(s) from the dis-If the distinctive category setDC ± 5, the dis-crimination game as explained above, thenamingcrimination game is a success. TheDC is forwardedgame (Steels, 1996b) starts. In the naming game, theto the naming game that models the naming phase ofrobots try to communicate the topic.the guessing game. If a categoryc is used successful-

The speaker tries to produce an utterance as thely in the guessing game, the prototypec of thisname of one of the distinctive categories of the topic.category is moved towards the feature vectorf itThe hearer tries to interpret this utterance in relationcategorizes,to distinctive categories of its potential topics. This

c[c 1e ? ( f 2 c), (4) way the hearer tries to guess the speaker’s topic. If


the hearer finds a possible interpretation, the gues- of communication. A topic relates to an entry whensing game is successful if both robots communicated its distinctive category matches the entry’s meaning.about the same referent. This is evaluated by thefeedback process as will be explained below. Ac- 3 .6.4. Feedbackcording to the outcome of the game, the lexicon is In the feedback, the outcome of the guessing gameadapted. is evaluated. It is important to note that in this paper,

the term feedback is only used to indicate the processof evaluating a game’s success by verifying whether3 .6.1. The lexiconboth robots communicated about the same referent.The lexiconL is defined as a set of form-meaningAs mentioned, the guessing game is successful whenassociations:L 5 hFM j, whereFM 5 kF , M , s l isi i i i i both robots communicated about the same referent.a lexical entry. HereF is a form that is made of ai The feedback is established by comparing the featurecombination of consonant–vowel strings,M is ai vectors of the two robots relating to the topics. If(memorized part of a) meaning represented by athese feature vectors correspond to each other, i.e.category, ands is the association score that indicatesthey both have a value of 1 in the same dimension,the effectiveness of the lexical entry in the languagethe robots have identified the same topic, cf. theuse.invariance criterion mentioned in Section 3.4.3. Theoutcome of the feedback is known to both robots.

3 .6.2. Production If the hearer selected a topic after the understand-The speaker of the guessing game will try to name ing phase, but if this topic is not consistent with the

the topic. To do this it selects a distinctive category speaker’s topic, there is amismatch in referent. Thisfrom the DC for which the meaning scorem is is the case when the invariance criterion is not met.highest. Then it searches its lexicon for form-mean- If the speaker has no lexical entry that matches aing association of which the meaning matches the distinctive category, or if the hearer could notdistinctive category. interpret the speaker’s utterance because it does not

If it fails to do so, the speaker will first consider have a proper lexical entry in the current context,the next distinctive category from theDC. If all then the guessing game is a failure.distinctive categories have been explored and still no The feedback is evaluated rather artificial and notentry has been found, the speaker may create a newvery realistic since robots normally have no access toform as will be explained in the adaptation section. each other’s internal states. However, previous at-

If there are one or more lexical entries that fulfill tempts to implement feedback physically have failedthe above condition, the speaker selects that entry (Vogt, 1998a). In these attempts the hearer pointed atthat has the highest association scores. The form the topic so that the speaker could verify whether thethat is thus produced is uttered to the hearer. hearer identified the same topic. This failed because

the speaker was not able to verify reliably at what3 .6.3. Interpretation object the hearer pointed. A similar pointing strategy

On receipt of the utterance, the hearer searches its has been successfully implemented in the Talkinglexicon for entries for which the form matches the Heads experiment (Steels & Kaplan, 1999). Toutterance,and for which the meaning matches one of overcome this technical problem, it is assumed thatthe distinctive categories of the potential topics. the robots can do this. Naturally, this problem needs

If it fails to find one, the lexicon has to be to be solved in the future.expanded, as explained later.

If the hearer finds more than one, it will select the 3 .6.5. Adaptationentry that has the highest scoreS5s 1a ? m, where Depending on the outcome of the game, thea 50.1 is a constant weight. The potential topic that lexicon of the two robots is adapted. There are fourrelates to this lexical entry is selected by the hearer possible outcomes/adaptations:as the topic of the guessing game. That is, thissegment is what the hearer guessed to be the subject 1. The speaker has no lexical entry: In this case the


speaker creates a new form and associates this Suppose that the speaker selected the segmentwith the distinctive category it tried to name. This relating to L3 as the topic, which it categorizedis done with a probability ofP 5 0.1. distinctively withc . This category is only associateds 3

2. The hearer has no lexical entry: The hearer adopts with the formtyfo, which it utters. Upon receivingthe form uttered by the speaker and associates this the formtyfo, the hearer tries to interpret it. The

9with the distinctive categories of a randomly hearer has two meanings associated withtyfo: c and1

9 9selected segment from its context. c . As the association score relatingtyfo with c is3 3

3. There was a mismatch in referent: Both robots highest, the hearer selects the segment belonging to59lower the association scores of the used lexical c as the topic. If this segment relates to L3, the3

entry: s[h ? s, where h5 0.9 is a constant guessing game is successful and the scores arelearning rate. In addition, the hearer adopts the updated as follows: The speaker and the hearer both

9utterance and associates it with the distinctive increase the association score betweenc (or c )3 3

categories of a different randomly selected seg- withtyfo, which become 0.955 and 0.73, respective-ment. ly (recall thath50.9). Competing associations are

4. The game was a success: Both robots reinforce laterally inhibited: the association score ofkc ,tyfol1

9the association score of the used entry:s[h ? becomes 0.18, ofkc ,tyfol becomes 0.09 and of1

9s 1 (12h). In addition, they lower competing kc ,labol becomes 0.045.3

entries (entries for which either the form or the Reconsider the lexicon in Table 2, but now themeaning is the same as in the used entry):s[h ? speaker selected L1 that it categorized withc as the1

4s. The latter update is called lateral inhibition. topic. In this case, the speaker will selectlabo to

name L1, because this has the highest association93 .6.6. An example score. The hearer interpretslabo with c , which2

The following example illustrates the naming unfortunately has been categorized for L2. There is aphase of the guessing game. Suppose the speaker and mismatch in referent. The association scores of thethe hearer have a lexicon as given in Table 2. Each used association scores are lowered. So, the associa-

9agent has three private meaningsc (or c ) associated tion score ofkc ,labol becomes 0.225 and ofi i 1

9with two different formstyfo and labo. In the cells kc ,labol becomes 0.675.2

of the table the association scores are given, a dash If the speaker has categorized the topic, e.g. theindicates that there is no association between that segment relating to L1, with a category that is notmeaning and form. Further suppose that both robots yet in its lexicon, sayc , then it may invent a new4

each have detected three light sources L1, L2 and form. This new form, for instance,gufi is thenL3. uttered to the hearer. The hearer, however, does not

know the form yet and associates it with the cate-gorization(s) of a randomly selected segment and the

Table 2 game is considered as a failure.The lexicon of the speaker and hearer used in the example

Speaker Hearer3 .7. SummaryM tyfo labo M tyfo labos h

9c 0.20 0.25 c 0.10 –1 1 This section presented the guessing game model9c – 0.65 c – 0.752 2 by which the experiments are done. Two mobile9c 0.95 – c 0.70 0.053 3

robots try to construct a lexicon with which they canEach agent has associated three meanings (in the rows) with two

solve the physical symbol grounding problem. Informs (in the columns). Real-valued numbers indicate the associa-each guessing game, the robots try to name one oftion scores and a dash indicates there is no association.

4Independent of each other, Oliphant (1999), De Jong (2000)5and Kaplan (2001) have shown that lateral inhibition is crucial for Note that the meaning scorem is discarded to simplify the

convergence in lexicon development. example.


the light sensors that are in their surroundings. They are specified in Section 4.2. Section 4.3 presents thedo so by taking the following steps: results.

1. Sensing, segmentation and feature extraction. 4 .1. The sensory data2. Topic selection.3. Discrimination games: (a) categorization, (b) dis- As mentioned, the guessing games are for a large

crimination and (c) ontology adaptation. part processed off-line on a PC. Only the sensing is4. Naming: (a) production, (b) interpretation, (c) done on-board. The recorded sensory data of the

evaluating feedback and (d) lexicon adaptation. sensing is re-used to do multiple experiments usingthe same data, but also to process more games than

The guessing game described above implements the have been recorded. The most important reason forthree mechanisms hypothesized by Luc Steels the off-line processing has to do with time efficiency.(1996b) that can model lexicon development. (Cul- Conducting a complete experiment on-board wouldtural) interactions are modeled by the sensing, take at least one week of full-time experimenting.communication and feedback.Individual adaptation Another advantage of processing off-board is thatis modeled at the level of the discrimination and one can do multiple experiments in which variousnaming game. The selection of elements and the methods and parameter settings can be comparedindividual adaptations are the main sources for the reliably.self-organization of a global lexicon. The data that has been recorded for the experi-

The coupling of the naming game with the dis- ments reported here, consists (after preprocessing thecrimination games and the sensing part makes that raw sensory data) of approximately 1000 contextthe emerging lexicon is grounded in the real world. settings. These context settings are used to experi-The robots successfully solve the physical symbol ment 10 runs of 10 000 guessing games. Hence ingrounding problem in some situation when the each run the context settings are used approximatelyguessing game is successful. This is so, because ten times. Statistics on the data set revealed that theidentification (Harnad, 1990) is established when the average context size is about 3.5 segments per robot.semiotic triangle (Fig. 1) is constructed completely. In each game one robot is selected randomly to beIdentification is done at the naming level and it is the speaker, who arbitrarily selects one feature vectorsuccessful when the guessing game is successful. It as the topic. Therefore, it takes, in principle, approxi-is important to realize that internal representations mately 7000 games until a particular situation re-stored in the robots’ memories as such are not occurs.semiotic symbols; they only constitute part of a Other statistics on the preprocessed data set re-semiotic symbolwhen used in a language game. vealed that the a priori chance for success is aroundOnly then the relation with a referent is assured. 23.5%. This means that when both the speaker and

the hearer select a topic at random, in about 23.5%they will select the same topic. This a priori chance

4 . The experiments is calculated from the average context size (3.5segments) and thepotential understandability. Since

Using the defined model (and some variations of both robots not always detect the same surroundingsit), a series of experiments have been done to (Fig. 4), the possibility exists that the speaker selectsinvestigate various aspects of the model. These a topic that the hearer did not detect. Although inexperiments are all reported in (Vogt, 2000b). One of natural communication the hearer might try to findthem is reported in this section. the missing information, this is not done here for

This section is organized as follows: Before practical reasons. Besides, human communication isreporting the experiment some expectations of the not always perfect, because they fail to construct aexperiment’s success are stated according to some shared context. So, there is a maximum in thestatistics calculated from the recorded sensory data. success to be expected, which has been coined theThe measures with which the experiment is analyzed potential understandability. The potential understan-


dability has been calculated to lie around 80%. For weighting factor is to scale the importance of such adetails on the calculation and other information about meaning to its occurrence.the sensory data, see (Vogt, 2000b).

4 .2.3. ParsimonyThe parsimonyP is calculated similar to the4 .2. Measures R

distinctiveness,The experiments are investigated using six differ- m

ent measures. Two of these (the discriminative- and H m ur 5O2P m ur ? log P m ur ,s d s d s di j i j ij51communicative success) measure the success rate of

the discrimination games and the guessing games. H m urs diThe others measures indicate the quality of the ]]]pars(r )5 12 , (6)i H(m)system that emerges. These measures, which arenbased on the entropy measure taken from infor- O P (r ) ?pars(r )mation theory (Shannon, 1948), are developed by o i i

i51Edwin De Jong (2000). They are called distinctive- ]]]]]]P 5 ,R nness, parsimony, specificity and consistency, and arecalculated every 200 guessing games. Below follows with H(m)5 log m. Parsimony thus calculates toa description of these measures. what degree a referent gives rise to a unique

meaning.4 .2.1. Discriminative success

4 .2.4. Communicative successDiscriminative success (Steels, 1996c) measuresCommunicative success (Steels, 1996b) measuresthe number of successful discrimination games aver-

the number of successful guessing games averagedaged over the past 100 guessing games.over the past 100 guessing games.

4 .2.2. Distinctiveness4 .2.5. Specificity‘‘Intuitively, distinctiveness expresses to what

‘‘The specificity of a word[-form] is . . .defined asdegree a meaning identifies the referent’’ (De Jong,the relative decrease of uncertainty in determining2000, p. 76). For this one can measure how thethe referent given a word that was produced’’ (Deentropy of a meaning in relation to a certain referentJong, 2000, p. 115). It thus is a measure to indicateH(r um ) decreases the uncertainty about the referentihow well a word-form can identify a referent. It isH(r). To do this, one can calculate the differencecalculated analogous to the distinctiveness and par-betweenH(r) and H r um . Herer are the referentss disimony. For a set of word-formss , . . . ,s , ther , . . . , r and m relates to one of the meanings 1 q1 n i

specificity is defined as follows:m , . . . ,m for robot R. The distinctivenessD can1 m R

now be defined as follows: n

n H r us 5O2P r us ? log P r us ,s d s d s di j i j ij51H r um 5O2P r um ? log P r um ,s d s d s di j i j i

j51H r uss di]]specs 512 , (7)s diH(r)2H r um H r ums d s d H(r)i i

]]]]] ]]]dist(m )5 512 , (5)i H(r) H(r) qO P s ? specsm s d s do i ii51O P (m ) ? dist(m ) ]]]]]]S 5 ,o i i R qi51

]]]]]]D 5 ,R mwhereH(r)5 log n is defined as before andP is theo

where H(r)5 log n and P (m ) is the occurrence occurrence probability of encountering word-formo i

probability of meaningm . The use ofP (m ) as a s .i o i i


4 .2.6. Consistency in Fig. 6(e), increases to a value slightly above 0.9. ItConsistency measures how consistent a referent is shows that when a form is used, it is mainly used to

named by a certain word-form. It is calculated as name one referent. Hence there is little polysemy.follows: The consistency (Fig. 6(f)) is lower than the spe-

cificity. This means that a referent is not alwaysq

named with the same form. As will be shown later,H s ur 5O2P s ur ? log P s ur ,s d s d s di j i j ij51 this does not mean that the lexicon is inefficient, it

rather means that the system bears some synonymy.H s urs di It should be noted that although the parsimony is]]consr 512 , (8)s di H(s) almost as high as the consistency, this does not mean

n that the number of meanings used is close to theO P r ? consrs d s d number of forms. It merely indicates that the incon-o i ii51]]]]]] sistent use of forms happen about as often as theC 5 ,R n

inconsistent use of meanings.whereH(s)5 log q and P r is defined as before.s d As can be seen in Figs. 6(e) and (f), the specificityo i

The four entropy-based measures specify whether and consistency rise very rapidly. To understand this,or not there is order in the system. When either it should be realized that these measures are calcu-measure has the value of 1, there is order in the lated relative to the successful use of forms (in casesystem. When a measure has value 0, there is of specificity) or to the successful naming of refer-disorder. All these entropy measures are calculated ents (in case of consistency). Specificity and consis-per robot every 200 games within one run; the other tency are not relative to the number of successfulmeasures are calculated after every single game.guessing games. So, this means that, whenever theWhen presented, all measures are averaged over therobots communicate successfully, the used semioticten runs that are done each experiment. symbols reveal order in the lexicon. Similar argu-

ments hold for the distinctiveness and parsimony4 .3. The results (Figs. 6(b) and(c)).

Although still rather fast, the communicativeThe experiment is done with 10 runs of 10 000 success rises slower (see Fig. 6(d)). Because the

guessing games. Fig. 6 shows the evolution of the agents tend to categorize the referents very different-different measures. The discriminative success ap- ly on different occasions, there arises muchproaches 100% early in the experiment (Fig. 6(a)). synonymy and polysemy in the system. Too muchThis indicates that the discrimination game is a very synonymy and polysemy cause many confusions, soefficient model for categorizing different sensings. many guessing games fail. Hence the robots mustSimilar results are confirmed in other experiments disambiguate the synonymy and polysemy, whichusing different representations of categories and takes some time. Nevertheless, the agents performvarying numbers of objects, e.g. (Steels, 1996c; De better than chance already after a few hundredJong and Vogt, 1998). As the distinctiveness shows games. Kaplan has shown that the speed of conver-in Fig. 6(b), when a categorization (or meaning) is gence in communicative success depends, amongstused, it usually stands for one referent only. That is, others, on the number of meanings/ referents, thethere is a one-to-one relation between meaning and number of agents and noise in transmission (Kaplan,referent. This, however, does not imply that there is 2001).a one-to-one relation between referent and meaning. The run that will be discussed in more detailThe lower parsimony shows this (see Fig. 6(c)). below resulted in the lexicon that is displayed in the

The communicative success (Fig. 6(d)) approaches semiotic landscape shown in Fig. 7. This figurethe potential understandability of 80%. After 10 000 shows the associations between referent, meaninggames, the communicative success is approximately and form for both robots with a strength that75%. Hence the robots are fairly well capable of indicates the relative occurrence frequency of con-constructing a shared lexicon. The specificity, shown nections that are successfully used over 10 000


Fig. 6. The evolution of the measures during the experiment.


Fig. 7. The semiotic landscape of the experiment. A semiotic landscape provides a way to illustrate how the semiotic symbols of the tworobots are related. It illustrates the connections between referent / light source L, meaning M and forms such assema, zigi, etc. The upperhalf of the graph shows the lexicon of robot r0, the lower half the lexicon of r1. The connections drawn indicate the relative occurrencefrequencies (P) of referents, meanings and forms. The relations between referent and meaning are relative to the occurrence of the referent.The relations between meaning and form are relative to the occurrence of the form. Associations with an occurrence frequency ofP , 0.005are left out for clarity.

guessing games. Ideally, the connections between relative co-occurrence frequency in time of, forreferent–form–referent would be orthogonal. That is, instance, forms in relation to a referent (Steels &the couplings between a referent and its form should Kaplan, 1999). Fig. 8(a) shows the referent–formnot cross-connect with other form–referent relations. competition. This figure shows the successful co-

This orthogonality criterion is achieved formety, occurrence of referent and form, where the occur-luvu and possiblyzigi. The word-formskate and rence of the form is calculated relative to thedemi have cross-connections, but these are relatively occurrence of the referent. Very infrequent occurringunimportant because they occur with very low elements are left out for clarity. Fig. 8(a) shows thatfrequencies. More polysemy is found forsema and formtyfo clearly wins the competition and is nearlytyfo. As will be shown below, tyfo gets well used uniquely to name light source L1. Hence L1 hasestablished to name L1 almost unambiguously. The very little synonymy. Vice-versa, the form–referentform sema however, provides some instability in the diagram shows that when the formtyfo is used, it issystem. used mostly to name light source L1 (see Fig. 8(b)).

Fig. 8 shows various competition diagrams of This, however, happens after game 3000. Before this,robot r0, relating to referent L1 in one of the runs of the formtyfo shows quite some polysemy.the experiment. A competition diagram displays the Fig. 8(c) shows that, throughout the run, L1 is


Fig. 8. Some competition diagrams of robot r0 in one run of the experiment. (a) A referent–form competitions for light source L1. They-axis shows the co-occurrence frequencies of form and referent relative to the occurrence of the referent over the past 200 guessing games.The x-axis shows the number of guessing games. (b) The form–referent competition fortyfo. Again they-axis shows the co-occurrencefrequencies, but now of the form and referent relative to the occurrence of the form. (c) The referent-meaning competition for L1 and (d)shows the form-meaning competition fortyfo.

categorized with more than one meaning of which referent. Such relations are much less frequentlytwo are used most frequently. A similar competition observed when investigating co-occurrences of refer-can be seen in Fig. 8(d) where various meanings ent and meaning or form and meaning.compete to be the meaning oftyfo. Apparently, thesecompetitions compensate each other such thatcompetitions as in Fig. 8(a) and (b) emerge. 5 . Discussion

That the competition is not always running sosmooth is shown in Fig. 9. Here there are two forms 5 .1. Meeting the limitsstrongly competing for naming light source L2. Inmost cases, the forms are used to name only one The results make clear that the robots construct areferent in the end as Fig. 9(b) shows. But some- communication system that meets its limits. Thetimes this could fail. Nevertheless, the overall picture communicative success is in the end nearly as highis that all referents are mostly named by only one as the potential understandability.form and all forms are mostly used to name only one Both the discriminative success and distinctiveness


Fig. 9. (a) A referent–form competition for light source L2 and (b) a form–referent competition for the formluvu.

are very close to 1, and the specificity is also close to prototype shift towards detected referents, cf. Eq.1. So, when a robot uses a semiotic symbol success- (4).fully, it almost always refers to the same referent. In what respect did the robots acquire meaningfulThis means that there is hardly any polysemy. The symbols? In Section 2.3.2, meaning has been definedparsimony and consistency are somewhat lower than as a functional relation between form and referent.the distinctiveness and specificity. Hence, there are The robots do not acquire meaning in the sense thatsome one-to-many relations between referent and they use the communicated symbols to fulfill a ‘life-meaning and between referent and form in the task’, neither is it rooted in the sense that the robots’system. The semiotic landscape (Fig. 7) already bodies, their interaction and learning mechanisms areshowed that most of the synonymy does not neces- designed (see Ziemke, 1999). But given the robots’sarily mean that the communication is difficult. bodies, interaction mechanisms and learning mecha-Usually, the hearer can rather easily interpret any nisms, the semiotic symbols are meaningful in thatspeaker’s utterance. they are used by the robots to name referents. They

The landscape also shows that a one-to-many could also be used to perform simple actions, such asrelationship between form and meaning does not pointing at the referent, as shown in (Vogt, 1998a)necessarily mean polysemy. In fact, it is beneficial, and in the Talking Heads experiment (Steels &since it cancels out the one-to-many mapping of Kaplan, 1999). However, because this pointing couldreferent to meaning for a great deal. Hence semiotic in the current experiments not be used to evaluatesymbols may have different meanings. Referents are feedback, it has not been implemented in the currentinterpreted differently when observed under different experiment. In more realistic experiments, the robotscircumstances. Yet the referents can be named in- should use the communication to fulfill some task.variantly to a high degree. One may argue that the Fulfilling tasks could then be used to evaluate therobots use different semiotic symbols when they use language game’s success as is the case in, e.g. (Dedifferent meanings and sometimes this is even Jong, 2000; Vogt, 2000a; Kaplan, 2000). Furtherappropriate. However, when the semiotic symbols research is currently in progress where the robots usehave the same form and relate to the same referent, the lexicon to improve their learned capability toattributing them to a single semiotic symbol can be sustain their energy-level in order to survive. Thisuseful. Especially when the semiotic symbols are experiment combines the language game model withonly used toname a referent. Therefore, one could the viability experiments previously done at the AIalso argue that the meaning of a semiotic symbol Lab in Brussels, see, e.g., (Steels, 1996a; Birk,changes dynamically over time, depending on the 1998). The meaning of the semiotic symbols that aresituation the robots are in. Note, by the way, that the constructed in such an experiment is than based onmeanings also change dynamically over time by the the robots’ activity to remain viable. But even then,


as argued in, e.g., (Ziemke, 1999; Ziemke and game adopts forms that may already exists for theSharkey, 2001), the semiotic symbols will only be robot, but that are not applicable in the currentreally meaningful to an agent when the agent is context (i.e. the meaning does not apply to anycompletely rooted by, for instance, evolution distinctive category). Consequently, a lot of variety(Ziemke, 1999). Current studies in ALife focus on is introduced in the lexicon. As a result of adaptinghow robotic agents may evolve, for instance, their the association scores, the robots will tend to selectbodies (Lund, Hallam & Lee, 1997), their control effective elements appropriately more often. Thisarchitecture (Nolfi & Floreano, 2000), their sensors induces a repetitive cycle that strengthens the effec-(Jung, Dauscher, & Uthmann, 2001) and communi- tive elements more and meanwhile weakens thecation channels (Quinn, 2001). These researches ineffective ones. In the beginning this adaptation ismight help to explain how robots can become more flexible than later on when the association‘rooted’ in their environment. For a broad discussion scores become stronger. When the association scoreson these issues (see, e.g., Ziemke & Sharkey, 2001). are strong, it is more difficult for other associations

In the current experiment the forms are transmitted to influence the communicative success. This, how-as strings in the off-line processing on the PC. ever, becomes less important as the communicativePreviously, this has been done using radio communi- success rises to a satisfactory degree.cation (Steels & Vogt, 1997). However, in a more The dynamics of the system allows an importantrealistic setting, the transmission should occur in conclusion to be drawn, namely that the selectionphonetic strings. In such a case, the phonetic strings criteria and the dynamics of association scores causeare also physical objects and must be processed and a self-organizing convergence of the lexicon. Thecategorized. Both processing and categorization conclusion that this is an emergent property can,could, for instance, be done in a similar way as amongst others, be drawn from the fact that lexiconmodeled by De Boer (1997), see Section 3.1. Using dynamics is controlled at the form-meaning level andDe Boer’s model, vowels, or even more complex not at the form–referent level. That is, adaptations inutterances (De Boer, 2000b; Oudeyer, 2001), could the lexicon occur only at the form–meaning level. Abe developed in a similar way as the lexicon is referent is categorized differently in different situa-developed. One non-trivial problem remains to be tions (see, e.g., Fig. 8(c)). At the same time, thesesolved. This problem has to do with distinguishing different categories may be named by only one nameutterances as forms from physical objects. How this (or more concretely, one name may have differentproblem can be solved is unclear, but it may depend meanings in different situations) as Fig. 8(d) depicts.on the context setting, and for that the agents need to Fig. 8(a) and (b) show that the one-to-many relationsdevelop more sophisticated means of recognizing a at the referent–meaning and form–meaning levelscontext. But perhaps it could also be solved by the cancel each other out at the referent–form level inevolution of communication channels as, for in- both directions. This happens despite the fact thatstance, is being investigated in (Quinn, 2001). when a form–meaning association is used success-

fully, the strength of competing form–meaning as-5 .2. The dynamics of the system sociations are laterally inhibited. Although this later-

al inhibition helps to decrease polysemy andWhat can be said about the dynamics of the synonymy at the referent–form level, it is also an

system? Fig. 6(d) showed that the communicative antagonizing force at the form–meaning level whensuccess shows a rapid increase during the first 1000 the meaning is used to stand for the same referent.games, after which the rate of increase decreases. This antagonizing force, however, is not problematicFurthermore, the figure shows a slower increase than, due to the context dependence of the guessinge.g. the discriminative success shown in Fig. 6(a). games. Selected lexical elements must be applicableSo, what happens in the beginning? In the first few within the context of a particular game. This is a nicehundred games, when the robots try to name the consequence of the pragmatic approach. Further-various referents, there is a lot of confusion what the more, the feedback signals that operate at the form–speaker intends to communicate. The hearer of a referent level contribute largely to the convergence


of the lexicon. It has been shown in (Vogt, 2000b) experiments here none of the robots has beenthat leaving out the feedback without alternative preprogrammed with the lexicon.ways of knowing the topic, does not lead to conver- Although not situated and embodied, the simula-gence in this experimental setup. This does not tions of Oliphant, too, are relevant. He showed thatmean, however, that leaving out the feedback does populations of agents could rather easily developnot work in general. It may well be that not using highly efficient lexicons from scratch by associatingsuch feedback, or any other non-verbal means of given meanings with given forms (Oliphant, 1999).exchanging topic knowledge, might lead to conver- Also relevant is the work of De Jong who ex-gence in a more rich environment, cf. the results of perimented with language games, of which thesimulations reported in (Smith, 2001). This is cur- meanings were grounded in simulations (De Jong,rently being investigated. Another strategy that could 2000). In addition, De Jong’s agents tried to improvebe beneficial in such cases is to use a cooperative their (task-oriented) behavior using the lexicon de-rather than a competitive selection and adaptation veloped. Both Oliphant and De Jong showed thatscheme as demonstrated in (Kaplan, 2001). agents could develop a coherent lexicon without

using feedback on the effect of the game, provided5 .3. The relation to other work the agents have access to both the form and meaning

during a language game. Although in this paper theThe experiment presented here is unique in its robots did use such feedback, robotic experiments

modeling the development of a lexicon grounded in have confirmed Oliphant and De Jong’s resultsreality from scratch using mobile robots. Although (Vogt, 2000b, 2001). In these experiments boththe Talking Heads experiment (Belpaeme et al., robots had access to both the form and the topic by1998; Steels & Kaplan, 1999) also models lexicon means of ‘pointing’, so that the hearer knows thedevelopment from scratch and is also a grounded topic in advance.experiment, the Talking Heads are immobile (theycan only move pan-tilt from a fixed position). Thisimmobility helps to evaluate the feedback on gues- 6 . Conclusionssing games, because the Talking Heads use cali-brated knowledge about their environment to evalu- In order to overcome fundamental problems thatate feedback. In addition, the different sensings of a exist in the cognitivist approach towards cognition,referent are more similar on different occasions than and to allow describing cognition in terms of sym-in the mobile robots as the Talking Heads sense their bols within the paradigm of embodied cognition, thisenvironment from a fixed position. In controlled paper proposes an alternative definition of symbols.experiments, this allowed the Talking Heads to speed This definition is not novel, but is adopted fromup the lexicon development (Steels & Kaplan, 1999). Peirce’s definition of symbols as the relation betweenNevertheless, the overall success on both platforms a form, a meaning and a referent. The relation asis more or less comparable. such is not meaningful, but arises from its active

Similar findings are found when comparing the construction and use. This process is called semiosiswork of this paper with the work of Billard and her and has been modeled in robotic agents throughcolleagues on mobile robots (Billard & Hayes, 1997; adaptive language games.Billard & Dautenhahn, 1998). In Billard’s experi- As a result of the semiotic definition of symbols, itments, a student robot learns a grounded lexicon could be argued that the symbols are per definitionabout its environment by imitating a teacher robot grounded, because semiotic symbols have intrinsicwho has been preprogrammed with the lexicon. The meaning in relation to the real world (cf. Lakoff,overall results are similar to the results presented 1987). Hence, the symbol grounding problem is nohere, although the lexicon acquisition is much faster longer a fundamental problem, since a semioticin (Billard and Dautenhahn, 1998). The latter result symbol is a relation between a form, meaning andis presumably due to the fact that one of Billard’s referent, and the way this relation is formed and usedrobots already knows the lexicon, while in the specifies its meaning. There, however, remains the


problem of constructing the semiotic symbols, but this setup is rather simplistic. Future work shouldthis problem can be viewed as a technical problem. confirm the scalability of the model in more realisticThis problem is called thephysical symbol ground- and more complex environments using more com-ing problem. plex robots. Another improvement that is currently

The experiment reported shows how robotic under investigation, is that the communication sys-agents can develop a structured set of semiotic tem is used to perform concrete ‘life-tasks’ rathersymbols which they can use to name various real than just using and developing a lexicon. This wouldworld objects invariantly. The semiotic symbols are make the approach more realistic since in naturalconstructed through the robots’ interactions with systems communication is usually used to guidetheir environment (including themselves), individual (task-oriented) behavior, such as coordinating eachadaptations and self-organization. These three mech- other’s actions.anisms, hypothesized by Luc Steels (1996a), arebased on the core principles of embodied cognitivescience: embodiment, situatedness and emergence.A cknowledgementsThe semiotic symbols are structural couplings thatare formed through some interaction of an agent with The author wishes to thank Ida Sprinkhuizen-the real world as proposed for instance by Clancey Kuyper, Edwin De Jong and Eric Postma for proof-(1997). reading earlier versions of this paper. Erik Myin,

An important result that the experiment reveals is Tom Ziemke, Georg Dorffner and two anonymousthat semiotic symbols need not be categorized the reviewers are thanked for various useful commentssame under different circumstances. As the semiotic that helped improve this paper a lot.landscape shows, there is no one-to-one-to-one rela-tion between a referent, meaning and form; thisrelation is rather one-to-many-to-one. In different R eferencessituations, the robots detect the referents differently.Yet they are able to identify them invariantly at the ACL (1997). Proceedings of the fifth conference on appliedform level. In the process of arriving at such natural language processing, Washington, DC. Menlo Park:

Association for Computational Linguistics.invariant identification, which is the most importantArkin, R. C. (1998).Behavior-based robotics. Cambridge, MA:aspect of symbol grounding (Harnad, 1990), the

MIT Press.co-evolution of form and meaning reveals to beBatali, J. (1998). Computational simulations of the emergence of

extremely important. grammar. In Hurford, J. R., Studdert-Kennedy, M., & Knight,The experiment shows that the physical symbol C. (Eds.),Approaches to the evolution of language. Cambridge,

grounding problem can be solved in the simple UK: Cambridge University Press.Belpaeme, T. (1999). Evolution of visual feature detectors. Inexperimental setup, given the language game model,

Evolutionary computation in image analysis and signal process-the designed robots and under the assumption thating and telecommunications first European workshops,

feedback can be established by, for instance, using ¨EvoIASP99 and EuroEcTel99 joint proceedings, Goteborg,pointing. These given assumptions make that the Sweden, LNCS 1596. Berlin: Springer.physical symbol grounding problem is not entirely Belpaeme, T. (2001). Simulating the formation of color

categories. InProceedings of the international joint conferencesolved, because for this the language game model,on artificial intelligence 2001 (IJCAI’01), Seattle, WA.the robots and other assumptions should be rooted

Belpaeme, T., Steels, L., & van Looveren, J. (1998). Theby, e.g., evolution (Ziemke, 1999; Ziemke & Shar- construction and acquisition of visual categories. In Birk, A., &key, 2001). The experiment nevertheless illustrates Demiris, J. (Eds.),Learning robots: proceedings of the EWLR-how semiotic symbols can be constructed and used 6, Lecture notes on artificial intelligence 1545. Berlin: Springer.

Bickerton, D. (1998). Catastrophic evolution: the case for a singleand is thus an important step towards solving thestep from protolanguage to full human language. In Hurford, J.,physical symbol grounding problem, or at least, inKnight, C., & Studdert-Kennedy, M. (Eds.),Approaches to the

our understanding of cognition. evolution of language. Cambridge: Cambridge University Press,Although the guessing game works well in the pp. 341–358.

current experimental setup, it should be realized that Billard, A., & Dautenhahn, K. (1998). Grounding communication


in autonomous robots: an experimental study.Robotics and De Saussure, F. (1974).Course in general linguistics. New York:Autonomous Systems, 24(1–2), 71–79. Fontana.

Billard, A., & Hayes, G. (1997). Robot’s first steps, robot’s first Dorffner, G. (1992). Taxonomies and part-whole hierarchies inwords. In Sorace, P., & Heycock, S. (Eds.),Proceedings of the the acquisition of word meaning –a connectionist model. InGALA ’97 conference on language acquisition, Edinburgh. Proceedings of 14th annual conference of the cognitive scienceUniversity of Edinburgh: Human Communication Research society. Hillsdale, NJ: Lawrence Erlbaum, pp. 803–808.Centre. Dorffner, G., Prem, E., & Trost, H. (1993).Words, symbols, and

Birk, A. (1998). Robot learning and self-sufficiency: What the symbol grounding, Technical report TR-93-30. Wien: Oester-energy-level can tell us about a robot’s performance. In Birk, reichisches Forschungsinstitut fuer Artificial Intelligence, Avail-A., & Demiris, J. (Eds.),Learning robots: proceedings of able on-line athttp: / /www.ai.univie.ac.at /papers/oefai-tr-93-EWLR-6, Lecture notes on artificial intelligence 1545. Berlin: 30.ps.gz.Springer, pp. 109–125. Eco, U. (1976).A theory of semiotics. Bloomington, IN: Indiana

Boden, M. A. (1996).The philosophy of artificial life. Oxford: University Press.Oxford University Press. Edelman, G. M. (1987).Neural Darwinism. New York: Basic

Braitenberg, V. (1984).Vehicles, experiments in synthetic psy- Books.chology. Cambridge, MA: MIT Press. Fu, K. S. (Ed.), (1976).Digital pattern recognition. Berlin:

Brooks, R. A. (1990). Elephants don’t play chess.Robotics and Springer.Autonomous Systems, 6, 3–15. Gibson, J. J. (1979).The ecological approach to visual percep-

Brooks, R. A. (1991). Intelligence without representation.Artifi- tion. Boston, MA: Houghton-Mifflin.cial Intelligence, 47, 139–159. Harnad, S. (1990). The symbol grounding problem.Physica D,

Brooks, R. A., Breazeal-Ferrell, C., Irie, R., Kemp, C. C., 42, 335–346.´Marjanovic, M., Scassellati, B., & Williamson, M. M. (1998). Harnad, S. (1993). Symbol grounding is an empirical problem:

Alternative essences of intelligence. InProceedings of the neural nets are just a candidate component. InProceedings offifteenth national conference on artificial intelligence. Menlo the fifteenth annual meeting of the cognitive science society.Park, CA: AAAI Press. Hillsdale, NJ: Laurence Erlbaum.

Chandler, D. (1994). Semiotics for beginners, http: / Johnson, M. H. (1997).Developmental cognitive science. Oxford:/www.aber.ac.uk/media/Documents/S4B/semiotic.html. Blackwell.

Chomsky, N. (1980). Rules and representations.The Behavioral Jung, T., Dauscher, P., & Uthmann, T. (2001). Some effects ofand Brain Sciences, 3, 1–61. individual learning on the evolution of sensors. In Kelemen, J.,

´Chomsky, N., & Halle, M. (1968).The sound pattern of English. & Sosık, P. (Eds.),Proceeding of the 6th European conferenceCambridge, MA: MIT Press. on artificial life, ECAL 2001, LNAI 2159. Berlin: Springer, pp.

432–435.Churchland, P. M. (1989).A neurocomputational perspective.Cambridge, MA: MIT Press. Kaplan, F. (2000). Talking aibo: First experimentation of verbal

interactions with an autonomous four-legged robot. In Nijholt,Clancey, W. J. (1997).Situated cognition. Cambridge, UK:A., Heylen, D., & Jokinen, K. (Eds.),Learning to behave:Cambridge University Press.interacting agents CELE-TWENTE workshop on languageDe Boer, B. (1997). Generating vowels in a population of agents.technology.In Husbands, C., & Harvey, I. (Eds.),Proceedings of the fourth

Kaplan, F. (2001).La naissance d’ une langue chez les robots.European conference on artificial life. Cambridge, MA: MITParis: Hermes Science.Press, pp. 503–510.

Kirby, S., & Hurford, J. (1997). Learning, culture and evolution inDe Boer, B. (2000a). Emergence of vowel systems through self-the origin of linguistic constraints. In Husbands, C., & Harvey,organisation.AI Communications, 13, 27–39.I. (Eds.), Proceedings of the fourth European conference onDe Boer, B. (2000b). Imitation games for complex utterances. Inartificial life. Cambridge, MA: MIT Press.Van den Bosch, A., & Weigand, H. (Eds.),Proceedings of the

Lakoff, G. (1987). Women, fire and dangerous things. TheBelgian–Netherlands artificial intelligence conference, pp.University of Chicago Press.173–182.

Lund, H. H., Hallam, J., & Lee, W. (1997). Evolving robotDe Jong, E. D. (2000).The development of communication. PhDmorphology. In Proceedings of the IEEE fourth internationalthesis, Vrije Universiteit, Brussels.conference on evolutionary computation. IEEE Press.De Jong, E. D., & Steels, L. (1999). Generation and selection of

MacQueen, J. B. (1967). Some methods for classification andsensory channels. InEvolutionary computation in image analy-analysis of multivariate observations. InProceedings of the fifthsis and signal processing and telecommunications: first Euro-Berkeley symposium on mathematical statistics and probability,pean workshops, EvoIASP99 and EuroEcTel99 joint proceed-

¨ pp. 281–297.ings, Goteborg, Sweden, LNCS 1596. Berlin: Springer.Maturana, H. R., & Varela, F. R. (1992).The tree of knowledge:De Jong, E. D., & Vogt, P. (1998). How should a robot

the biological roots of human understanding. Boston: Sham-discriminate between objects. In Pfeifer, R., Blumberg, B.,bhala.Meyer, J. -A., & Wilson, S. (Eds.),From animals to animats,

´Proceedings of the fifth international conference on simulation Maynard-Smith, J., & Szathmary, E. (1995).The major transitionsof adaptive behavior, vol. 5. Cambridge, MA: MIT Press. in evolution. Oxford: W.H. Freeman.

http://www.ai.univie.ac.at/papers/oefai-tr-93-30.ps.gz









http://www.aber.ac.uk/media/Documents/S4B/semiotic.html













´McCarthy, J., & Hayes, P. J. (1969). Some philosophical problems without explicit meaning transmission. In Kelemen, J., & Sosık,from the standpoint of artificial intelligence.Machine Intelli- P. (Eds.), Proceeding of the 6th European conference ongence, 4, 463–502. artificial life, ECAL 2001, LNAI 2159. Berlin: Springer, pp.

381–390.Newell, A. (1980). Physical symbol systems.Cognitive Science, 4,135–183. Steels, L. (1996a). Discovering the competitors.Adaptive Be-

havior, 4(2), 173–199.Newell, A. (1990).Unified theories of cognition. Cambridge, MA:Harvard University Press. Steels, L. (1996b). Emergent adaptive lexicons. In Maes, P. (Ed.),

From animals to animats, Proceedings of the fourth interna-Newell, A., & Simon, H. A. (1976). Computer science astional conference on simulating adaptive behavior, vol. 4.empirical inquiry: Symbols and search.Communications of theCambridge, MA: MIT Press.ACM, 19, 113–126.

Steels, L. (1996c). Perceptually grounded meaning creation. InNoble, J. (2000). Talk is cheap: evolved strategies for communi-Tokoro, M. (Ed.),Proceedings of the international conferencecation and action in asymmetrical animal contests. In Meyer, J.on multi-agent systems. Menlo Park, CA: AAAI Press.-A., Berthoz, A., Floreano, D., Roitblat, H., & Wilson, S. W.

(Eds.), From animals to animats, Proceedings of the sixth Steels, L. (1997a). Synthesising the origins of language andinternational conference on simulation of adaptive behavior, meaning using co-evolution, self-organisation and level forma-vol. 6. Cambridge, MA: MIT Press, pp. 481–490. tion. In Hurford, J., Knight, C., & Studdert-Kennedy, M. (Eds.),

Approaches to the evolution of language. Cambridge: Cam-Nolfi, S., & Floreano, F. (2000).Evolutionary robotics: thebridge University Press.biology, intelligence, and technology of self-organizing ma-

chines. Cambridge, MA: MIT Press. Steels, L. (1997b). The synthetic modeling of language origins.Ogden, C. K., & Richards, I. A. (1923).The meaning of meaning: Evolution of Communication, 1(1), 1–34.

a study of the influence of language upon thought and of the Steels, L., & Brooks, R. (Eds.), (1995).The ‘ artificial life’ routescience of symbolism. London: Routledge & Kegan Paul. to ‘ artificial intelligence’. Building situated embodied agents.

Oliphant, M. (1999). The learning barrier: Moving from innate to New Haven, CT: Lawrence Erlbaum.learned systems of communication.Adaptive Behavior, 7(3–4), Steels, L., & Kaplan, F. (1998). Stochasticity as a source of371–384. innovation in language games. InProceedings of Alive VI.

Oudeyer, P. -Y. (2001). The origins of syllable systems: an Steels, L., & Kaplan, F. (1999). Situated grounded word seman-operational model. InProceedings of the international confer- tics. In Proceedings of IJCAI 99. San Mateo, CA: Morganence on cognitive science, COGSCI’2001, Edinburgh. Kaufmann.

Peirce, C. S. (1931–1958).Collected papers, vol. I–VIII. Cam- Steels, L., & McIntyre, A. (1999). Spatially distributed namingbridge MA: Harvard University Press. games.Advances in Complex Systems, 1(4), 1.

Pfeifer, R., & Scheier, C. (1999).Understanding intelligence. Steels, L., & Vogt, P. (1997). Grounding adaptive language gamesMIT Press. in robotic agents. In Husbands, C., & Harvey, I. (Eds.),

Pinker, S., & Bloom, P. (1990). Natural language and natural Proceedings of the fourth European conference on artificialselection.Behavioral and Brain Sciences, 13, 707–789. life. Cambridge, MA: MIT Press.

Prem, E. (1995). Symbol grounding and transcedental logic. In Sugita, Y., & Tani, J. (2000).A connectionist model which unifies´Niklasson, L. F., & Boden, M. B. (Eds.),Current trends in the behavioral and the linguistic processes: Results from robot

connectionism, Proceedings of the Swedish conference on learning experiments, Technical report SCSL-TR-00-001. Sonyconnectionism. Hillsdale, NJ: Lawrence Erlbaum, pp. 271–282. CSL.

Prigogine, I., & Strengers, I. (1984).Order out of chaos. New Sun, R. (2000). Symbol grounding: A new look at an old idea.York: Bantam Books. Philosophical Psychology, 13(2), 149–172.

Pylyshyn, Z. W. (Ed.), (1987).The robot’ s dilemma. New Jersey: Tomasello, M., & Barton, M. (1994). Learning words in nonosten-Ablex Press. sive contexts.Developmental Psychology, 30(5), 639–650.

Quinn, M. (2001). Evolving communication without dedicated Van Looveren, J. (1999). Multiple word naming games. In´communication channels. In Kelemen, J., & Sosık, P. (Eds.), Postma, E., & Gyssens, M. (Eds.),Proceedings of the eleventh

Proceeding of the 6th European conference on artificial life, Belgium–Netherlands conference on artificial intelligence. Uni-ECAL 2001, LNAI 2159. Berlin: Springer, pp. 357–366. versity of Maastricht.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Vogt, P. (1998a). The evolution of a lexicon and meaning inBoyes-Braem, P. (1976). Basic objects in natural categories. ´robotic agents through self-organization. In La Poutre, H., &Cognitive Psychology, 8, 382–439. van den Herik, J. (Eds.),Proceedings of the Netherlands–

Roy, D. (2000). A computational model of word learning from Belgium artificial intelligence conference, Amsterdam. Am-multimodal sensory input. InProceedings of the international sterdam: CWI.conference of cognitive modeling, Groningen, The Netherlands. Vogt, P. (1998b). Perceptual grounding in robots. In Birk, A., &

Searle, J. R. (1980). Minds, brains and programs.Behavioral and Demiris, J. (Eds.),Learning robots: proceedings of the EWLR-Brain Sciences, 3, 417–457. 6, Lecture notes on artificial intelligence 1545. Berlin: Springer.

Shannon, C. (1948). A mathematical theory of communication. Vogt, P. (2000a). Grounding language about actions: MobileThe Bell System Technical Journal, 27, 379–423; 623–656. robots playing follow me games. In Meyer, P., Bertholz, S.,

Smith, A. D. M. (2001). Establishing communication systems Floreano, F., Hoitblat, H., & Wilson, P. (Eds.),SAB2000


proceedings supplement book. Honolulu: International Society Roitblat, H. L., & Wilson, S. (Eds.),From animals to animats,for Adaptive Behavior. Proceedings of the second international conference on simula-

Vogt, P. (2000b).Lexicon grounding on mobile robots. PhD thesis, tion of adaptive behavior, vol. 2. Cambridge MA: MIT Press,Vrije Universiteit, Brussels. pp. 478–485.

Vogt, P. (2001). The impact of non-verbal communication on Ziemke, T. (1999). Rethinking grounding. In Riegler, A., Peschl,lexicon formation. InProceedings of the Belgian /Netherlands M., & von Stein, A. (Eds.),Understanding representation in theartificial intelligence conference, BNAIC’01. cognitive sciences: does representation need reality. New York:

Whorf, B. L. (1956).Language, thought, and reality. Cambridge, Plenum Press.MA: MIT Press. Ziemke, T., & Sharkey, N. E. (2001). A stroll through the worlds

¨Wittgenstein, L. (1958).Philosophical investigations. Oxford: of robots and animals: Applying Jakob von Uexkull’s theory ofBlackwell. meaning to adaptive robots and artificial life.Semiotica, 134(1–

Yanco, H., & Stein, L. (1993). An adaptive communication 4), 701–746.protocol for cooperating mobile robots. In Meyer, J. -A.,

The physical symbol grounding problem

Documents

Transcript of The physical symbol grounding problem