Spatial Language for Mobile Robots: The Formation and ...staff.itee.uq.edu.au/janetw/papers/PhD 2008...

Spatial Language for Mobile Robots:

The Formation and Generative Grounding of Toponyms

Ruth Jennifer Schulz

B.E. (Hons), B.Sc.

A thesis submitted for the degree of Doctor of Philosophy at

The University of Queensland in November 2008

School of Information Technology and Electrical Engineering

i

Declaration This thesis is composed of my original work, and contains no material previously published or

written by another person except where due reference has been made in the text. I have clearly

stated the contribution by others to jointly-authored works that I have included in my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statistical

assistance, survey design, data analysis, significant technical procedures, professional editorial

advice, and any other original research work used or reported in my thesis. The content of my thesis

is the result of work I have carried out since the commencement of my research higher degree

candidature and does not include a substantial part of work that has been submitted to qualify for

the award of any other degree or diploma in any university or other tertiary institution. I have

clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Library

and, subject to the General Award Rules of The University of Queensland, immediately made

available for research and study in accordance with the Copyright Act 1968.

I acknowledge that copyright of all material contained in my thesis resides with the copyright

holder(s) of that material.

ii

Statement of Contributions to Jointly Authored Works Contained in the Thesis and Published

Works by the Author Incorporated into the Thesis

1. Schulz, R., Prasser, D., Stockwell, P., Wyeth, G., & Wiles, J. (2008). The formation,

generative power, and evolution of toponyms: Grounding a spatial vocabulary in a cognitive

map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language:

Proceedings of the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore:

World Scientific Press.

RS was responsible for designing and conducting the three studies and for

the majority of the writing; DP and PS contributed to design discussions;

GW and JW lead the RatSLAM and RatChat projects on which this work is

based; all authors contributed to editing.

Incorporated with more detail as Study 1 in Chapter 6 and Study 2 in

Chapter 7

2. Milford, M., Schulz, R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial

concepts from RatSLAM representations. Robotics and Autonomous Systems - From

Sensors to Human Spatial Concepts, 55(5), 403-410.

MM and DP were responsible for the pose cell and experience map work;

RS was responsible for the conceptualisation work; GW and JW lead the

RatSLAM and RatChat projects on which this work is based; MM was

responsible for updating the workshop paper (jointly authored word 3) to

this paper with assistance from all authors.

Incorporated as Pilot Study 2B in Chapter 5

3. Schulz, R., Prasser, D., Wakabayashi, M., & Wiles, J. (2007). Robots and the evolution of

spatial language. Unrefereed poster presentation at the 8th Asia-Pacific Complex Systems

Conference (Complex07).

RS was responsible for designing and conducting the studies and for the

majority of the writing; DP and MW contributed to design discussions; MW

provided software development for one of the studies; JW leads the RatChat

project on which this work is based; all authors contributed to editing.

Incorporated as part of Study 1B in Chapter 6

iii

4. Schulz, R., Milford, M., Prasser, D., Wyeth, G., & Wiles, J. (2006). Learning spatial

concepts from RatSLAM representations. Paper presented at From Sensors to Human

Spatial Concepts, a workshop at the International Conference on Intelligent Robots and

Systems, Beijing, China.

MM and DP were responsible for the pose cell and experience map work;

RS was responsible for the conceptualisation work; GW and JW lead the

RatSLAM and RatChat projects on which this work is based; all authors

contributed to the writing.

Incorporated as Pilot Study 2B in Chapter 5

5. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Generalization in languages

evolved for mobile robots. In L. M. Rocha, L. S. Yaeger, M. A. Bedau, D. Floreano, R. L.

Goldstone & A. Vespignani (Eds.), ALIFE X: Proceedings of the Tenth International

Conference on the Simulation and Synthesis of Living Systems (pp. 486-492). Cambridge,

Massachusetts: MIT Press.


majority of the writing; PS and MW contributed to design discussions; JW

leads the RatChat project on which this work is based; all authors

contributed to editing.

Incorporated as part of Pilot Study 2A in Chapter 5

6. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Towards a spatial language

for mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of

Language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298).

Singapore: World Scientific Press.


majority of the writing; PS and MW contributed to design discussions; JW

leads the RatChat project on which this work is based; all authors

contributed to editing.

Incorporated as part of Pilot Study 2A in Chapter 5

iv

Statement of Contributions by Others to the Thesis as a Whole

Janet Wiles and Gordon Wyeth put together the initial proposal for funding that resulted in the

conception of the RatChat project as a whole, of which this thesis was a part.

The RatSLAM team provided the robot base, including the Pioneer robots, the pose cell and

experience mapping algorithms, and the simulation world of the robots. The team provided

assistance throughout this project regarding the functions of RatSLAM and keeping the robots

functional. During the course of this thesis the RatSLAM team included: Gordon Wyeth, Michael

Milford, David Prasser, and Shervin Emami.

The RatChat team provided a forum for discussing the progress of the project. During the course

of this thesis the RatChat team included: Janet Wiles, David Prasser, Paul Stockwell, Mark

Wakabayashi, Steven Livingston, Jacinta Fitzgerald, and Andrew Schrauf.

Statement of Parts of the Thesis Submitted to Qualify for the Award of Another Degree

None

Additional Published Works by the Author Relevant to the Thesis but not Forming Part of it

None

Keywords

language, representation, game, robot, agent, spatial, concepts, words, grounding, generative

Australian and New Zealand Standard Research Classifications (ANZSRC)

080101 Adaptive Agents and Intelligent Robotics 100%

v

Abstract For robots to interact with each other and humans in a human environment, it is important for them

to be able to use language meaningfully in practical applications. Grounding connects words and

sentences with their meanings and is a necessary foundation for the meaningful usage of language.

Combining simple concepts provides a way to label other simple concepts. The process of forming

a simple concept from a combination of concepts is termed generative grounding in this thesis.

To understand how language may be used meaningfully in practical applications, the nature of

language and the concepts on which language is built must be understood. Concepts of space and

time are among those that are directly experienced and directly grounded. In particular, space is

used to form other concepts with spatial metaphors used to describe mood, energy, emotion,

personal attributes, and time.

This thesis addresses the question of how robots can form spatial languages. The literature

review in Chapters 2 and 3 covers the diverse fields that this thesis builds on, from linguistics,

computer science, psychology, neurology, and robotics. The literature review explores the language

features that need to be addressed if language evolution and acquisition are to be understood, and

the features of space and spatial language that make the domain appropriate for investigating the

grounding and generative grounding of concepts.

The studies presented investigated grounding spatial language in mobile robots that had

constructed maps to represent their world. The pilot studies investigated the underlying spatial

representations and methods to produce and comprehend language. The experience map, a

representation similar to a cognitive map of the world, provided the most appropriate representation

for forming a spatial language.

A key question is what impact interactions between agents have on the languages. For the two

major studies described in this thesis, agents interacted through language games and the experience

map provided the base representation for concepts formation. A new way to associate concept

elements and words was developed: the distributed lexicon table. Each study had three sections: an

investigation into the experimental design of the language games in a simulation world based on a

grid, the implementation of the language games in simulated robots, and the implementation of

language games in the richer and less predictable real world.

The first study investigated spatial concept formation through collective experience and agent

interactions as they explored their environment and played ‘where are we’ games. Games were

played when the agents were within hearing distance, with the shared attention defined by being

vi

near each other. After playing many language games, the agents successfully formed a toponymic

language that labelled all visited locations.

The second study addressed the question of whether robots can form concepts for spatial

relationships. Agents played ‘where is there’ language games in addition to ‘where are we’

language games. In ‘where is there’ games, they referred to locations other than the current location.

Shared attention was being near each other and the same perspective was achieved when an

orientation location was named. A third target location was specified by name, direction, and

distance. Agents formed a comprehensive spatial language of directions and distances that were

combined to specify other locations in their world.

In summary, a computational language model for mobile robots was successfully developed, in

which the robots formed spatial concepts that were associated with words through interactions with

other agents. The features that made spatial concept formation easy included an appropriate concept

representation, the distributed lexicon table with methods to produce and comprehend words, and

simple interactions from which the language emerged. With the addition of generative interactions,

the agents extended languages in which known locations were labelled to languages in which

external locations were labelled. The result was robots that formed nouns (place names) and simple

prepositions (direction and distance terms).

The major conclusions of this thesis are that generative grounding for spatial concepts is

possible and that representations, methods, and social interactions influence the languages that

form. The meaningful usage of language in practical applications therefore requires appropriate

representations, interactions, and methods for grounding. This thesis has shown that rather than the

directly perceivable world, it is interactions building on innate abilities that influence the final

structure of spatial languages.

vii

List of Publications Publications related to this project:

1. Schulz, R., Prasser, D., Stockwell, P., Wyeth, G., & Wiles, J. (2008). The formation,

generative power, and evolution of toponyms: Grounding a spatial vocabulary in a cognitive

map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language:

Proceedings of the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore:

World Scientific Press.

2. Milford, M., Schulz, R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial

concepts from RatSLAM representations. Robotics and Autonomous Systems - From

Sensors to Human Spatial Concepts, 55(5), 403-410.

3. Schulz, R., Milford, M., Prasser, D., Wyeth, G., & Wiles, J. (2006). Learning spatial

concepts from RatSLAM representations. Paper presented at From sensors to human spatial

concepts, a workshop at the International Conference on Intelligent Robots and Systems,

Beijing, China.

4. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Generalization in languages


Goldstone & A. Vespignani (Eds.), ALIFE X: Proceedings of the Tenth International

Conference on the Simulation and Synthesis of Living Systems (pp. 486-492): MIT Press.

5. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Towards a spatial language

for mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of

Language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298).

Singapore: World Scientific Press.

Other publications:

6. Wiles, J., Schulz, R., Hallinan, J., Bolland, S., & Tonkes, B. (2001). Probing the persistent

question marks. In L. Spector, E. Goodman, A. Wu, W. B. Langdon, H.-M. Voigt, M. Gen,

S. Sen, M. Dorigo, S. Pezesk, M. Garzon & E. Burke (Eds.), Proceedings of the Genetic and

Evolutionary Computation Conference (GECCO-2001) (pp. 710-717). San Francisco, CA:

Morgan Kaufmann Publishers.

viii

7. Wiles, J., Schulz, R., Bolland, S., Tonkes, B., & Hallinan, J. (2001). Selection procedures

for module discovery: Exploring evolutionary algorithms for cognitive science. In J. D.

Moore & K. Stenning (Eds.), Proceedings of the 23rd Annual Conference of the Cognitive

Science Society (CogSci 2001) (pp. 1124-1129). Mahwah, NJ: Lawrence Erlbaum

Associates.

ix

Acknowledgements Firstly thanks to my supervisor, Janet Wiles, for her encouragement, ideas, knowledge, and support.

We have had numerous meetings over the past three and a half years where I left feeling somewhat

overwhelmed with information and ideas, but always more on track.

Thanks to my associate supervisor, Gordon Wyeth, for many suggestions that have helped to

improve both the studies and the final document, and for helping me to find the story in my thesis.

Thanks to the RatChat team, including Janet Wiles, David Prasser, Paul Stockwell, Mark

Wakabayashi, Steven Livingston, Jacinta Fitzgerald, and Andrew Schrauf, for providing a forum to

discuss the progress of the project; to the RatSLAM team, including Gordon Wyeth, Michael

Milford, David Prasser, and Shervin Emami, for providing the robot base for my experiments and

keeping the robots working; and to the Thinking Systems group for fascinating diversions from my

work and an environment for interesting discussions that were (mostly) relevant to my thesis.

Thanks to the people who were on level 5 of the Axon building, including the RatSLAM and

RatChat teams, Toby Smith, Chris Nolan, Peter Stratton, Daniel Angus, Damien Kee, David Ball,

Daniel Bradley, John Hawkins, James Watson, Nic Geard, Kai Willadsen, Stefan Maetschke, Jon

Witty, Mikael Boden, and Marcus Gallagher, for an enjoyable working environment, entertaining

lunch time discussions, and putting up with the beeping robots for the duration of my experiments.

Thanks to everyone who read drafts of this thesis and earlier papers. Your comments were

valuable, insightful, and have helped to improve this document in many ways.

Thanks to the ITIG for handling all of my technical requests quickly and competently.

Thanks to the School of Information Technology and Electrical Engineering at The University

of Queensland for the provision of computing facilities, office space, and an area in which the

robots could roam; to the Australian Government for support in the form of an Australian

Postgraduate Award; to the Australian Research Council Complex Open Systems Research

Network (COSNet) for funding which allowed me to travel internationally to a conference; and to

the Australian Research Council for the Discovery Grant which funded a top-up in my first year and

allowed me to travel internationally to a conference.

Thanks to everyone in Bai Rui Taekwon-Do, particularly Master Charles Birch, for providing an

environment where I could get away from my thesis for a while and come back feeling positive and

refreshed.

Finally, thanks to my family for always being there for me (even when they were all gallivanting

around overseas). And most importantly, thanks to David Ball for putting up with me, encouraging

me, supporting me, and helping me to maintain a positive outlook on life.

xi

Table of Contents Declaration ......................................................................................................................................i

Abstract ..........................................................................................................................................v

List of Publications.......................................................................................................................vii

Acknowledgements .......................................................................................................................ix

Table of Contents ..........................................................................................................................xi

List of Figures .............................................................................................................................xiv

List of Tables..............................................................................................................................xvii

Chapter 1 Grounding Spatial Concepts ....................................................................................1

1.1 Learning and Evolving Language ....................................................................................2

1.2 Space in Language ...........................................................................................................3

1.3 Understanding Space and Language ................................................................................4

1.4 Thesis Overview ..............................................................................................................5

Chapter 2 Understanding Language .........................................................................................7

2.1 The Importance of Grounding..........................................................................................7

2.2 Embodiment for Language Models..................................................................................9

2.3 Learning Language ........................................................................................................10

2.4 How Could Language Have Evolved?...........................................................................12

2.5 Translating Meanings and Signals .................................................................................13

2.6 Summary ........................................................................................................................15

Chapter 3 The Ubiquity of Space ...........................................................................................17

3.1 Talking About Where.....................................................................................................17

3.2 How Space Becomes Place ............................................................................................18

3.3 Describing Relationships Between Places .....................................................................19

3.4 Choosing Which Perspective To Use.............................................................................20

3.5 Universals Across Different Languages ........................................................................21

3.6 Spatial Language Models...............................................................................................21

3.7 Representing Space ........................................................................................................22

3.7.1 Cognitive Maps In Rats..........................................................................................23

3.7.2 Maps for Mobile Robots ........................................................................................23

3.7.3 RatSLAM...............................................................................................................24

3.8 RatChat...........................................................................................................................30

3.9 Summary ........................................................................................................................30

xii

Chapter 4 A Location Language Game ..................................................................................31

4.1 Concept Representations................................................................................................34

4.2 Word Representations ....................................................................................................35

4.3 Lexicon...........................................................................................................................35

4.3.1 Simple Neural Networks........................................................................................36

4.3.2 Recurrent Neural Networks ...................................................................................38

4.3.3 Lexicon Table ........................................................................................................40

4.3.4 Distributed Lexicon Table .....................................................................................42

4.4 Population Dynamics .....................................................................................................47

4.5 Environment...................................................................................................................48

4.5.1 Grid World .............................................................................................................48

4.5.2 Simulation World...................................................................................................49

4.5.3 Real World .............................................................................................................49

4.6 Performance Measures ...................................................................................................51

4.6.1 Coherence...............................................................................................................52

4.6.2 Specificity ..............................................................................................................52

4.6.3 Language Size ........................................................................................................53

4.6.4 Word Coverage ......................................................................................................53

4.6.5 Language Layout....................................................................................................53

4.6.6 Word locations .......................................................................................................53

4.6.7 Most Information Templates..................................................................................53

4.6.8 Toponym Value......................................................................................................54

4.7 Summary ........................................................................................................................54

Chapter 5 Experimental Design..............................................................................................55

5.1 Pilot Study 1: Methods – Recurrent Neural Networks and Lexicon Tables ..................55

5.1.1 Pilot Study 1A: Recurrent Neural Networks..........................................................56

5.1.2 Pilot Study 1B: Lexicon Tables .............................................................................64

5.1.3 Discussion for Pilot Study 1...................................................................................69

5.2 Pilot Study 2: Representations – Pose Cells, Vision, and Experiences .........................70

5.2.1 Pilot Study 2A – Pose Cells and Vision.................................................................70

5.2.2 Pilot Study 2B: Pose Cells and Experiences ..........................................................82

5.2.3 Discussion for Pilot Study 2...................................................................................89

5.3 Discussion: Representations Matter...............................................................................89

Chapter 6 A Toponymic Language Game..............................................................................91

xiii

6.1 Study 1A: Grid World....................................................................................................93

6.1.1 Experimental Setup ................................................................................................93

6.1.2 Results....................................................................................................................94

6.1.3 Discussion ............................................................................................................100

6.2 Study 1B: Simulation World........................................................................................101

6.2.1 Experimental Setup ..............................................................................................102

6.2.2 Results..................................................................................................................103

6.2.3 Discussion ............................................................................................................115

6.3 Study 1C: Real World ..................................................................................................116


6.3.2 Results..................................................................................................................118

6.3.3 Discussion ............................................................................................................123

6.4 Discussion: A Toponymic Language...........................................................................124

Chapter 7 A Generative Spatial Language Game.................................................................127

7.1 Study 2A: Grid World..................................................................................................129

7.1.1 Study 2Ai: World Size .........................................................................................130

7.1.2 Study 2Aii: Obstacles...........................................................................................135

7.1.3 Study 2Aiii: Generations of Agents .....................................................................138

7.1.4 Discussion ............................................................................................................142

7.2 Study 2B: Simulation World........................................................................................142


7.2.2 Results..................................................................................................................144

7.2.3 Discussion ............................................................................................................146

7.3 Study 2C: Real World ..................................................................................................153


7.3.2 Results..................................................................................................................153

7.3.3 Discussion ............................................................................................................158

7.4 Discussion: A Generative Spatial Language................................................................158

Chapter 8 General Discussion ..............................................................................................159

8.1 Summary ......................................................................................................................159

8.2 Discussion ....................................................................................................................162

8.3 Contributions................................................................................................................163

8.4 Conclusions and Further Work ....................................................................................169

References ..................................................................................................................................171

xiv

List of Figures Figure 2.1 Semiotic square...........................................................................................................11

Figure 2.2 Language transmission................................................................................................13

Figure 3.1 Preposition classification ............................................................................................19

Figure 3.2 Robot used in the RatSLAM and RatChat projects ....................................................25

Figure 3.3 Map of the real and simulated world ..........................................................................25

Figure 3.4 Visual input.................................................................................................................26

Figure 3.5 Local view cells ..........................................................................................................27

Figure 3.6 Pose cells.....................................................................................................................28

Figure 3.7 Experiences .................................................................................................................29

Figure 4.1 A location language game...........................................................................................33

Figure 4.2 Concepts types ............................................................................................................35

Figure 4.3 Simple neural network ................................................................................................36

Figure 4.4 Recurrent neural network............................................................................................38

Figure 4.5 Distributed lexicon table .............................................................................................42

Figure 4.6 Information value........................................................................................................43

Figure 4.7 Neighbourhood information value..............................................................................44

Figure 4.8 Relative neighbourhood information value ................................................................45

Figure 4.9 Grid world...................................................................................................................48

Figure 4.10 Simulation world with path of robot.........................................................................49

Figure 4.11 Map of the real world................................................................................................50

Figure 4.12 Language game utterances........................................................................................51

Figure 5.1 Word production and concept comprehension networks (Pilot 1Ai)..........................58

Figure 5.2 Word representation (Pilot 1Ai)..................................................................................59

Figure 5.3 Concept representations (Pilot 1Aii)...........................................................................61

Figure 5.4 Word representation (Pilot 1Aii) ................................................................................61

Figure 5.5 Training networks on evolved languages (Pilot 1Aii) ................................................63

Figure 5.6 Typical runs (Pilot 1Bi) ..............................................................................................66

Figure 5.7 Word creation and absorption results (Pilot 1Bii) ......................................................68

Figure 5.8 Robot route and target concepts (Pilot 2Ai) ...............................................................72

Figure 5.9 Production network (Pilot 2Ai)...................................................................................72

Figure 5.10 Language layout (Pilot 2Ai)......................................................................................73

Figure 5.11 Production and comprehension networks (Pilot 2Aii)..............................................75

xv

Figure 5.12 Vision prototype and scenes (Pilot 2Aii) ..................................................................76

Figure 5.13 Scenes and their location (Pilot 2Aiii) ......................................................................78

Figure 5.14 Pose cell map and location of pose cell patterns (Pilot 2Aiii) ..................................79

Figure 5.15 Word production network (Pilot 2Bi) .......................................................................84

Figure 5.16 Floor plan, pose cells, and experience map (Pilot 2Bi) ............................................85

Figure 5.17 Conceptualisation using pose cells (Pilot 2Bi) .........................................................87

Figure 5.18 Conceptualisation using experiences (Pilot 2Bi) ......................................................88

Figure 6.1 Hearing area ................................................................................................................94

Figure 6.2 Coherence (1A: Basic) ................................................................................................95

Figure 6.3 Words used (1A: Basic) ..............................................................................................96

Figure 6.4 Specificity (1A: Basic)................................................................................................96

Figure 6.5 Shared language (1A: Basic) ......................................................................................97

Figure 6.6 Coherence (1A: Best)..................................................................................................98

Figure 6.7 Words used (1A: Best)................................................................................................99

Figure 6.8 Specificity (1A: Best) .................................................................................................99

Figure 6.9 Shared language (1A: Best) ......................................................................................100

Figure 6.10 Simulation World....................................................................................................101

Figure 6.11 Results of ‘go to’ games (1B) .................................................................................105

Figure 6.12 Shared language (1B: low temperature) .................................................................106

Figure 6.13 Interactions (1B: low temperature) .........................................................................107

Figure 6.14 Shared language (1B: medium temperature) ..........................................................108

Figure 6.15 Interactions (1B: medium temperature) ..................................................................109

Figure 6.16 Shared language (1B: high temperature) ................................................................110

Figure 6.17 Interactions (1B: high temperature) ........................................................................111

Figure 6.18 Shared language (1B: small neighbourhood)..........................................................112

Figure 6.19 Interactions (1B: small neighbourhood) .................................................................113

Figure 6.20 Shared language (1B: large neighbourhood) ..........................................................114

Figure 6.21 Interactions (1B: large neighbourhood) ..................................................................115

Figure 6.22 Real world...............................................................................................................116

Figure 6.23 Results of ‘go to’ games (1C) .................................................................................118

Figure 6.24 Interactions (1C: minimal)......................................................................................119

Figure 6.25 Shared language (1C: minimal) ..............................................................................120

Figure 6.26 Interactions (1C: checksum) ...................................................................................121

Figure 6.27 Shared language (1C: checksum)............................................................................122

xvi

Figure 6.28 Real world with human labels.................................................................................123

Figure 7.1 A generative language game.....................................................................................127

Figure 7.2 Match between templates..........................................................................................128

Figure 7.3 Worlds size results (2Ai) ..........................................................................................131

Figure 7.4 World size coherence (2Ai) ......................................................................................132

Figure 7.5 Example language (2Ai: 5 × 5) ................................................................................133

Figure 7.6 Example language (2Ai: 10 × 10) ............................................................................133



Figure 7.9 Grid world with obstacles .........................................................................................135

Figure 7.10 Obstacles results (2Aii)...........................................................................................136

Figure 7.11 Obstacles coherence (2Aii) .....................................................................................137

Figure 7.12 Example language (2Aii: Desks) ............................................................................137

Figure 7.13 Example language (2Aii: Perimeter) ......................................................................138

Figure 7.14 Toponym change throughout generations (2Aiii)...................................................139

Figure 7.15 Generations results (2Aiii) ......................................................................................140

Figure 7.16 Generations coherence (2Aiii) ................................................................................141

Figure 7.17 Results of ‘go to’ games (2B) .................................................................................145

Figure 7.18 Shared language (2B: Separate)..............................................................................147

Figure 7.19 Spatial lexicon (2B: Separate) ................................................................................148

Figure 7.20 Interactions for ‘where are we’ games (2B: Separate)............................................148

Figure 7.21 Interactions for ‘where is there’ games (2B: Separate) ..........................................149

Figure 7.22 Shared language (2B: Together) .............................................................................150

Figure 7.23 Spatial lexicon (2B: Together)................................................................................151

Figure 7.24 Interactions for ‘where are we’ games (2B: Together) ...........................................151

Figure 7.25 Interactions for ‘where is there’ games (2B: Together)..........................................152

Figure 7.26 Results of the ‘go to’ games (2C) ...........................................................................155

Figure 7.27 Interactions for ‘where are we’ games (2C) ...........................................................155

Figure 7.28 Example language (2C)...........................................................................................156

Figure 7.29 Spatial lexicon (2C) ................................................................................................157

Figure 7.30 Interactions for ‘where is there’ games (2C) ..........................................................157

xvii

List of Tables Table 4.1 Parameters for a location language game.....................................................................54

Table 5.1 Parameters for Pilot Study 1Ai.....................................................................................57

Table 5.2 Source of variability (Pilot 1Ai)...................................................................................59

Table 5.3 Word production (Pilot 1Ai) ........................................................................................60

Table 5.4 Concept comprehension (Pilot 1Ai).............................................................................60

Table 5.5 Parameters for Pilot Study 1Aii ...................................................................................62

Table 5.6 Generations to expressive languages (Pilot 1Aii) ........................................................62

Table 5.7 Parameters for Pilot Study 1Bi.....................................................................................65

Table 5.8 Results for different strategies (Pilot 1Bi)....................................................................65

Table 5.9 Parameters for Pilot Study 1Bii....................................................................................67

Table 5.10 Parameters for Pilot Study 2Ai...................................................................................71

Table 5.11 Parameters for Pilot Study 2Aii .................................................................................74

Table 5.12 Word production and concept comprehension (Pilot 2Aii) .......................................76

Table 5.13 Parameters for Pilot Study 2Aiii ................................................................................78

Table 5.14 Word production (Pilot 2Aiii) ....................................................................................80

Table 5.15 Patterns close to the prototype (Pilot 2Aiii) ...............................................................81

Table 5.16 Parameters for Pilot Study 2Bi...................................................................................82

Table 5.17 Correctly labelled patterns (Pilot 2Bi) .......................................................................86

Table 6.1 Parameters for Study 1A ..............................................................................................94

Table 6.2 Parameters for Study 1B ............................................................................................103

Table 6.3 Results for Study 1B ..................................................................................................103

Table 6.4 Parameters for Study 1C ............................................................................................117

Table 6.5 Results for Study 1C ..................................................................................................119

Table 7.1 Parameters for Study 2A ............................................................................................130

Table 7.2 Results for Study 2Ai .................................................................................................131

Table 7.3 Results for Study 2Aii ................................................................................................136

Table 7.4 Results for Study 2Aiii ...............................................................................................142

Table 7.5 Parameters for Study 2B ............................................................................................144

Table 7.6 Results for Study 2B ..................................................................................................145

Table 7.7 Parameters for Study 2C ............................................................................................154

Table 7.8 Results for Study 2C ..................................................................................................154

1

Chapter 1 Grounding Spatial Concepts

Spatial cognition is at the heart of our thinking

(Levinson, 2003a, p.xvii)

For robots to interact with each other and humans in a human environment, it is important for them

to be able to use language meaningfully in practical applications. In human language, many

concepts are grounded directly from experience, particularly from sensory perception. Grounding

connects words and sentences with their meanings and is a necessary foundation for the meaningful

usage of language. Language becomes intrinsically meaningful with grounding.

Combining simple concepts provides a way to label other simple concepts. The process of

forming a simple concept from a combination of concepts is termed generative grounding in this

thesis. Generative grounding allows language to bootstrap from simple words and concepts to a full

vocabulary of complex and abstract concepts.

To understand how language may be used meaningfully in practical applications, the nature of

language and the concepts on which language is built must be understood. Concepts of space and

time are among those that are directly experienced and grounded: “We have a sense of space

because we can move and of time because, as biological beings, we undergo recurrent phases of

tension and ease” (Tuan, 1977, p.118). In particular, space is used to help understand and form

other concepts, with “most of our fundamental concepts … organized in terms of one or more

spatialization metaphors” (Lakoff & Johnson, 1980, p.17). Spatial metaphors are used to describe

many non-spatial qualities including mood, consciousness, health, control, quantity, time, and social

status (Lakoff & Johnson, 1980). Understanding spatial language can improve our understanding of

spatial cognition as there is a link between the spatial concepts in cognition and in language. As

“spatial cognition is at the heart of our thinking” (Levinson, 2003a, p.xvii), understanding spatial

cognition may lead to a greater understanding of human thinking. The formation of simple spatial

concepts in computational models and agents may lead to a greater understanding of human spatial

language.

The symbol grounding problem was first identified by Harnad (1990), when he discussed

limitations in symbolic models of the mind. Unless symbols are grounded in an agent’s

representations, they are meaningless to the agent. To solve the symbol grounding problem the

meanings of symbols must be grounded so that agents can use symbols appropriately.


2

Harnad’s (1990) suggested solution to the symbol grounding problem was a hybrid

connectionist and symbolic system, in which sensory representations are used to form iconic and

categorical representations that can be linked to symbols with neural networks. Another solution to

the symbol grounding problem is embodiment. The interaction between the autonomous agent and

the world provides a means to ground concepts and representations and enables the agent to

understand and use grounded concepts effectively (Pfeifer & Scheier, 1999). Steels (2007) claims

that the symbol grounding problem has been solved, as studies now incorporate the features

necessary for symbol grounding to be addressed. The necessary features include using an embodied

autonomous agent that forms its own grounded meaning representations and associates symbols

with meanings through interactions with other agents. The solution to the symbol grounding

problem appears to be appropriate embodiment, representations (sensory, meaning, iconic, and

categorical), associations between representations and symbols, and interactions between agents.

Language simulation draws on and extends knowledge about language, including the origins and

evolution of language, concepts representations, concept formation, and word acquisition. Spatial

language is interesting as it is directly grounded in experience, and influences how more abstract

concepts are formed and understood through metaphor. The next two sections review the current

research in the domains of language and space.

1.1 Learning and Evolving Language

Simulation can add to the debate on the origins and evolution of language by determining features

that are important for evolving communication systems. Language games are a framework for

language models in which agents engage in tasks requiring communication (for more information

about language games see Steels, 2001). Language games have been used to evolve lexicons

(Hutchins & Hazlehurst, 1995), categories (Cangelosi & Harnad, 2001), and grammars (Batali,

2002) in agent populations.

Addressing the challenge of understanding language evolution involves examining how it has

evolved in humans and how it could evolve in simulated agents. Investigations include brain

structure, culture, linguistics, interactions, and reasons for using language (for more information

about investigations into language evolution see Cangelosi, Smith, & Smith, 2006; and Smith,

Smith, & Ferrer i Cancho, 2008). In language models, agent interactions vary, with games played as

negotiators or as teacher and student. The games may involve following commands, interpreting

descriptions, finding resources in the world, or coordinating with other agents (Kirby, 2002; Nolfi,

2005; Steels, 2005; Vogt, 2007; Wagner, Reggia, Uriagereka, & Wilkinson, 2003).

Language ties internal meanings to external signals. One question to be answered is whether

concepts are innate, or whether language allows concepts to form. The Sapir-Whorf hypothesis


3

(Carroll, 1956) states that language constrains the way we think with concepts firmly linked to the

words associated with them. The Whorfian viewpoint is that speakers of different languages form

different concepts and ways of thinking constrained by the language they speak. In neo-

Whorfianism, language helps construct complex concepts and aids in cognitive development rather

than reflecting underlying innate concepts (Levinson, 2003a).

Language extends beyond simple concepts to complex concepts and metaphor. Language is

generative as it allows the formation of new concepts from existing concepts. New concepts may be

formed using a generative lexicon and obtaining labels either through invention or reapplication of a

relevant name from another domain.

1.2 Space in Language

Spatial language extends beyond describing relationships between objects and locations with spatial

metaphors prevalent in natural language. Spatial concepts vary across languages (Levinson, 2001),

but the basic properties may be the same. Place names are basic spatial concepts from which other

spatial concepts can be formed. Understanding spatial concepts may inform about concepts in

general, especially through spatial metaphor.

When people describe spatial locations, landmarks are preferred, followed by spatial relations

(Tversky, 2003). In English, spatial relations are generally provided by spatial prepositions, with

directions and distances combined to form spatial terms such as ‘in front of’, ‘near’, and ‘at’. Other

languages and cultures express spatial relations in other ways (Levinson, 1996); for example, the

Mayan language Tzeltal has only one preposition while nouns and verbs provide spatial location

information (P. Brown, 2006).

Several spatial language models have been developed, including studies that involve terms

related to spatial locations (Bodik & Takac, 2003; Steels, 1995). The language games of the studies

involve concepts of direction and distance from the agent to an object in the world, where object

and agent locations are unambiguous and known by all agents.

There is a natural human tendency to assume that a spatial language entails descriptions of

objects at specific locations, or using objects to define landmarks. However, the brain provides a

basic system for representing space that does not require an understanding of objects (O'Keefe &

Nadel, 1978). As demonstrated in Chapter 6, a set of concepts to describe space and a

corresponding toponymic language does not require knowledge of objects or descriptions of visual

scenes.

The most basic spatial concepts correspond to areas in space and are referred to by labels for

places, such as city or suburb names. Areas within an environment or along a path can often be

described by single words, such as corner, corridor, or intersection, or larger regions such as


4

kitchen, office, or backyard. In this thesis, names for specific places in an environment are called

toponyms (i.e. topographic names) and a set of such terms to comprehensively describe an agent’s

environment is a toponymic language.

Language models that investigate language emergence are typically based on vision or simple

perception. For an autonomous agent, the representations that language could be based on include

behaviours and spatial representations. To form spatial representations, agents explore the world

and build up a locations representation based on their perceptions and motor actions over time. A

spatial representation provides information than can not be obtained from direct perception.

Mobile robots are currently able to build complex representations of the world, especially in

indoor environments (Milford, Wyeth, & Prasser, 2004; Thrun, 2002). Robot representations of the

world provide an ideal basis for computational models of language, where symbols referring to

location and spatial relations can be grounded in the interactions of the robots with the world.

1.3 Understanding Space and Language

The state of the art relevant to this thesis involves the symbol grounding problem, the ideas about

how concepts and words interact, and spatial cognition models used in robots and agents. To a large

extent, the symbol grounding problem for simple concepts is solved (Steels, 2007). Symbol

grounding requires working with embodied autonomous agents, a mechanism for generating

meanings, internal representations for grounded meanings, the ability to establish and negotiate

symbols, and coordination between members of the population. Regarding spatial concepts in

humans, the current research indicates that words and concepts for space do interact, with the

precise spatial concepts formed relating to the language learned and culture in which the language is

learned (Levinson, 2003a). A variety of spatial cognition models exist, including those that relate to

geographic navigation (Milford, 2008), human language models, where robots are given a method

for forming human spatial concepts (Dobnik, 2006; Skubic et al., 2004), and models with evolved

language, where agents refer to relationships between places in the world (Bodik & Takac, 2003;

Steels, 1995).

Open questions regarding understanding space and language include the grounding of spatial

languages in robots, the formation of a spatial language from a cognitive map representation of the

world, the design of communicative interactions between mobile agents, the design of interactions

to create a language for locations in the world, and determining what is required for the grounding

of concepts that cannot be directly experienced. The key question is: how can a robot form and label

complex concepts in an embodied spatial environment?


5

1.4 Thesis Overview

This thesis addresses the formation of spatial language in mobile robots using language games. The

overall goal was to ground a computational model of spatial language in mobile robots that could be

used meaningfully in practical applications. Developing the language model involved determining

the features that made the language easy for the robots to negotiate and learn, particularly agent

features, such as the concept and word representations, and the agent’s society, including how the

agents interact with each other to learn or negotiate a language. The specific aims were to run a

series of experiments that demonstrated language learning and formation in agents and robots. The

series of experiments involved an investigation into representations and methodology, an

investigation into the impact of interactions on spatial languages, and an investigation into going

beyond shared attention for ‘here’ to talking about ‘there’. In this project, robots played language

games with the concepts grounded in spatial representations. The concepts were those obtained

directly from the robot representations: locations and relationships between locations.

The main contributions of this thesis are:

• a series of studies to demonstrate that representations and methods matter,

• the development of a method for concept formation with a distributed representation,

• the development of a method for producing the word that provides the most information

about the chosen topic,

• the formation and grounding of spatial concepts based on a cognitive map representation,

• grounding locations: the design of language game interactions between mobile robots

that enable the formation and grounding of location concepts, and

• generative grounding: the design of generative interactions that enable agents to ground

concepts that are not directly experienced.

Chapter 2 presents a review of the literature for grounding and language. Chapter 3 presents a

review of the literature for space and spatial language. Chapter 4 is a description of the

experimental design for the studies in this thesis. Chapter 5 presents initial studies that investigated

methods for language formation and concept representations. The main studies described in this

thesis involved toponymic and generative language games. The first game involved the formation of

toponyms and is presented in Chapter 6. The second game involved generative grounding with

spatial terms and is presented in Chapter 7. Chapter 8 has a general discussion, conclusions of the

thesis, and possible future work.


6

7

Chapter 2 Understanding Language

The language is perpetually in flux: it is a living stream, shifting, changing,

receiving new strength from a thousand tributaries, losing old forms in the

backwaters of time.

(Strunk & White, 2000, p.83)

Understanding the nature of language and how language can be used meaningfully in practical

applications has many different facets. Traditional linguistics deals with finding rules and ways of

describing language exactly. Modern linguistics addresses modern languages, and focuses on

linguistic form with respect to phonetics, phonology, syntax, semantics, and pragmatics (Hurford,

2007). Language evolution has, until recently, been dismissed by linguists as not being part of their

area, or as a task that needs to wait until language has been described in sufficient detail (Bickerton,

2003; Newmeyer, 2003). However, language evolution has now been investigated by researchers

from many fields, including “psycholinguistics, linguistics, psychology, primatology, philosophy,

anthropology, archaeology, biology, neuroscience, neuropsychology, neurophysiology, cognitive

science, and computational linguistics” (Christiansen & Kirby, 2003b, p.2).

Languages continually update with new concepts and words. To completely understand what

language is, ‘messy’ factors need to be considered. Methods such as playing language games are

able to inform about embodiment, learning, culture, evolution, grounding, and vocabulary.

Language models allow various features of agent populations to be investigated. This chapter

provides a review of the research areas relevant for computational models of language using agents.

2.1 The Importance of Grounding

Grounding has been defined as “the processes by which an agent relates beliefs to external physical

objects” (Roy, 2005, p.176), with language grounding referring to “processes specialized for

relating words and speech acts to a language user’s environment via grounded beliefs” (Roy, 2005,

p.176), and the grounding problem being “how to embed an artificial agent into its environment

such that its behaviour, as well as the mechanisms, representations, etc. underlying it, can be

intrinsic and meaningful to the agent itself, rather than dependent on an external designer or

observer” (Ziemke, 1999, p.87).

For humans, grounding involves the connections between meanings as represented in the mind

of the individual and words as agreed on by the population. Each person has a different grounded


8

semiotic network that matches enough to enable joint action and communication for speakers of the

same language (Steels, 2007).

For artificial agents, the symbol grounding problem (Harnad, 1990) considers how an agent is

connected with the world so that the representations and mechanisms defining behaviour are both

intrinsic and meaningful (Ziemke, 1999). Traditional AI has not been concerned with how symbols

are related to the world (Pfeifer & Scheier, 1999), using symbols to investigate reasoning, problem

solving, and communication (Vogt, 2003). However, robots need the symbols to be related to the

world in order to discover meaning for themselves, without a human interpreter (Pfeifer & Scheier,

1999).

Two examples of the symbol grounding problem are Searle’s (1980) ‘Chinese Room Argument’

and Harnad’s (1990) extension ‘The Chinese / Chinese Dictionary Go-Round’. In Searle’s example,

a computer attempts to pass the Turing Test in Chinese by performing manipulations on the

symbols received and responding with appropriate Chinese symbols. An external Chinese speaking

observer could interpret the computer’s behaviour as understanding Chinese. However, if a person

unable to speak Chinese replaced the computer and performed the same symbol manipulations, they

would not be considered to have comprehended the symbols.

In Harnad’s extension, a Chinese / Chinese Dictionary is used to learn Chinese as a second

language (hard version) or as a first language (impossible version). As symbols are only referred to

by other symbols in the dictionary, there is no way out of the ‘symbol / symbol merry-go-round’ to

ground the meaning of a symbol in something other than another symbol. These examples show

that symbol manipulation cannot be considered cognition due to the lack of intentionality and

comprehension of the agent performing symbol manipulation.

The grounding problem is not restricted to symbols, and as such has been referred to by different

names, including representation grounding, concept grounding, and the internalist trap (Ziemke,

1999). Another term related to symbol grounding is anchoring, the concrete aspect of the symbol

grounding problem (Coradeschi & Saffiotti, 2000). Anchoring is the creation and maintenance of

relationships between symbols and perceptual data (Coradeschi & Saffiotti, 2000; Vogt, 2003),

while symbol grounding also refers to relationships between symbols and abstractions (Vogt, 2003).

One possible solution to the symbol grounding problem is Harnad’s (1990) hybrid connectionist

and symbolic system for connecting symbols to the world. In Harnad’s proposed system, neural

networks are used to link connectionist representations with symbols. The non-symbolic

representations are categorical and iconic representations. Categorical representations, consisting of

the invariant features of objects and events, are used to identify objects and events as belonging to a

category. Iconic representations, consisting of particular features of objects and events, are used to


9

discriminate between objects and events belonging to the same category. The symbolic

representations are symbol strings describing category membership relationships. A set of

elementary symbols is arbitrarily associated with the iconic and categorical representations,

grounding them in the agent’s representations. Composition of the set of elementary symbols can

occur, resulting in more complex meaning structures.

Another solution to the symbol grounding problem is the use of situated and embodied agents,

in which symbols are grounded in the sub-symbolic activities and the interaction between an agent

and the world (Sun, 2000). Concepts are formed in relation to agents’ experiences, linked to goals

and actions. When cognition and intelligent behaviour is situated and embodied, the behaviour is

based in the interaction of the agent and the environment rather than being symbol manipulation

(Ziemke, 1999).

Steels (2007) claims that the symbol grounding problem has been solved, as experiments have

been performed where agent populations coordinate a symbolic system that is grounded in

interactions with each other and the world. According to Steels (2007), the necessary features to

solve the symbol grounding problem are:

• working with embodied autonomous agents,

• a mechanism for the agent to generate meanings,

• a way for agents to internally represent and ground meanings,

• the ability to establish and negotiate symbols that refer to the meanings, and

• coordination between members of the population to allow the semiotic networks to

become sufficiently similar.

More can still be done to understand meaning, conceptualisation, symbolisation, neural

correlates for semiotic networks, and representation making and group dynamics in people. Steels

focuses on ‘groundable symbols’ that can be directly grounded through a perceptual process in

which sensori-motor data can be analysed to determine whether an object fits a concept. Generative

grounding, which may include concepts that cannot be directly grounded in sensori-motor

experience, is still an open question for symbol grounding.

2.2 Embodiment for Language Models

In embodied cognition, the physical and experiential structure of the human body is central to

human cognition (Varela, Thompson, & Rosch, 1991). Embodiment is also part of the solution to

the symbol grounding problem, and has been addressed when implementing language models in

robots. Robots and tasks have been used in robot language research to investigate different concepts

and tasks, perceptual abilities, and natural or artificial language learning or evolution.


10

Embodied robots have been used to form concepts of colours and shapes (Roy, 2001; Steels,

1999), simple objects (Roy, Hsiao, & Mavridis, 2003; Steels & Kaplan, 2001; Vogt, 2000a), spatial

commands and descriptions (Skubic et al., 2004; Vogt, 2000b), and food or poison (Floreano, Mitri,

Magnenat, & Keller, 2007). The majority of robot tasks have been to form concepts and agree on

labels for those concepts. Some of the robots also had a survival task; for example, to find food and

avoid poison (Floreano et al., 2007).

The robots’ physical implementation has been heads with cameras (Steels, 1999), the SONY

AIBO (Steels & Kaplan, 2001), LEGO vehicles (Vogt, 2000a, 2000b), and custom made robots

such as Toco (Roy, 2001). The perceptual abilities given to the robots have included vision

(Floreano et al., 2007; Steels, 1999), hearing (Roy, 2001), and touch (Roy et al., 2003).

Embodied robots have been taught human terms (Roy, 2001; Roy et al., 2003; Skubic et al.,

2004; Steels & Kaplan, 2001), or have evolved their own languages (Floreano et al., 2007; Vogt,

2000a, 2000b). The word representations have been based on text (Skubic et al., 2004; Steels,

1999), speech (Roy, 2001; Roy et al., 2003; Steels & Kaplan, 2001), or the robots’ perceptual

abilities, such as light sensing (Floreano et al., 2007).

Robot language research can be extended with the use of mobile robots that build up internal

world maps through interactions with a real world environment. The use of mobile autonomous

agents that move in a real environment enables spatial language formation.

2.3 Learning Language

Simulation can add to the debate on the origins and evolution of language by determining features

that are important for evolving communication systems. Language games are a demonstrated

framework for language models in which agents engage in tasks requiring communication. They

have been used to evolve lexicons, such as labelling phases of the moon (Hutchins & Hazlehurst,

1995), categories, such as poisonous or edible mushrooms (Cangelosi & Harnad, 2001), and

grammars (Batali, 2002) in populations of agents. In language games, the agents exchange words

referring to their world that may be related to what they perceive, what they are doing, or what

another agent is doing. The aim of language games is for a population of agents to reach a

consensus on terms for concepts in the world to successfully communicate about a task.

To create a computational model of language, the connections between input signals and

utterances must be found. The process from inputs to utterances may have several stages. The

semiotic square (Steels, 1999) shows a way to divide the language process from real world to words

(see Figure 2.1).


11

Figure 2.1 Semiotic square

The semiotic square is one way to divide the language process from the real world

to words in a language agent. The agent perceives the real world, forming internal

representations that are grouped into meanings associated with words. (Adapted

from Figure 2.1, p.27 of Steels, 1999)

There is a large variety for the structure of the language agents described in the literature. For

parts of the process of connecting input signals and utterances, computational models of language

have used various methods, including:

• simple neural networks (Cangelosi, 2001; Cangelosi & Parisi, 1998; Kirby & Hurford,

2002; Marocco, Cangelosi, & Nolfi, 2003),

• autoassociator networks, neural networks with outputs trained to be equal to the inputs

and hidden units used for signals (Hutchins & Hazlehurst, 1995),

• recurrent neural networks, neural networks in which recurrent links providing the

networks with context from one time step to the next (Batali, 1998; Elman, 1990;

Tonkes, Blair, & Wiles, 2000),

• lexicon tables, a symbolic method that forms associations between categories and

utterances (Smith, 2001; Steels, 1999),

• definite clause grammars, a heuristic-driven incremental grammar induction method

(Kirby & Hurford, 2002),

• finite state unification transducer, in which a prefix tree transducer shows the language

with the signal transmitted and the meaning associated with transmitting the signal at

that position (Brighton & Kirby, 2001),

• self organising maps, a connectionist method that can be used to form categories from

internal representations (Riga, Cangelosi, & Greco, 2004), and

• discrimination trees, a symbolic method that can be used to group internal

representations into categories (Smith, 2001; Steels, 1999).

Real World

Words

Internal Representation

Meaning


12

The appropriate methods and representations must be chosen to match the concepts and agent

interactions.

2.4 How Could Language Have Evolved?

In a paper reviewing the consensus and controversies in research into language and evolution,

Christiansen and Kirby (2003a, p.300) state that the big questions about language and language

evolution are “Why is language the way it is? How did language come to be this way? And why is

our species alone in having complex language?” Understanding language evolution requires

evidence from different areas, including the structure and use of language, brain structure in modern

humans and our ancestors, brain areas used in language, and the differences between language and

animal communication. These diverse types of evidence have resulted in a multidisciplinary field,

including “anthropology, archaeology, artificial life, biology, cognitive science, computer science,

ethology, genetics, linguistics, neuroscience, palaeontology, primatology, psychology and statistical

physics” (Smith et al., 2008, p.v).

For language to evolve, a variety of abilities are required, including behaviours such as altruism,

larger social group sizes lacking hierarchical structure, the right environmental conditions, and a

sufficient intelligence level (Hurford, 2007). Language use “is inseparable from immersion in a

culture” (Dessalles, 2007, p.50) as languages emerge from interactions between people. The nature

of the environment and culture in which the speakers exist affects the languages that form.

Computational models of language can inform about how communication systems emerge,

investigating ontology, grounding, learnability, and generalisation in languages that evolve in

populations of agents (see Kirby, 2002; Steels, 1997b, 2005; and Wagner et al., 2003 for reviews of

computational models of the evolution of language)

One way that language evolution can be modelled is through iterated learning, by using the

Iterated Learning Model (ILM) in populations of agents (Kirby & Hurford, 2002). The basis for the

ILM is the process of language transmission with two representations of language: I-Language

(internal) and E-Language (external) (see Figure 2.2). I-Language is acquired by an agent

experiencing another agent’s E-language. The I-Language is then used to produce the E-Language.

In iterated learning, agents learn from other agents in the population, adapting their lexicon to

improve the chance of successful games.

The ILM has a meaning space, a signal space, at least one language learning agent, and at least

one language using agent. There may be a turnover in the population with learners becoming users,

new learners entering the population, and old users leaving the population. The ILM is typically

used with one language learner and one language user (Brighton & Kirby, 2001; Kirby, 2001; Kirby

& Hurford, 2002), although larger populations can be used. The negotiation model can be seen as a


13

version of the ILM, with equal probabilities for agents being speakers and hearers and a single

generation of agents (Batali, 1998; Cangelosi, Riga, Giolito, & Marocco, 2004; Hutchins &

Hazlehurst, 1995; Smith, 2001). A variation may be used in which language is evolved in separate

generations of agent populations (Cangelosi, 2001; Cangelosi & Parisi, 1998; Nolfi & Marocco,

2002; Quinn, 2001). The ILM is a way of incorporating the agents’ built-in abilities, learning, and

culture into computational models of language evolution.

Figure 2.2 Language transmission

The language transmission process between the internal language (I-Language) and

the external language (E-Language). Language persists by means of transmission

between I-Language and E-Language through production and acquisition.

(Adapted from Figure 5, p109 of Kirby, 2001)

Language evolution involves both the vertical transmission of language through generations of

agents and the horizontal transmission of language among peers. Varieties of the ILM correspond to

the vertical and horizontal transmission of language. In this thesis, the term “evolution” is used to

describe vertical transmission through generations of agents. Horizontal transmission is referred to

as negotiation, and corresponds to the emergence or formation of language.

2.5 Translating Meanings and Signals

“A language is a system for translating meanings into signals, and vice versa. Thus language is

anchored in non-language at two ends, the end of ‘meanings’ and the end of signals” (Hurford,

2007, p.3). The question is whether language is just a type of animal communication, or whether it

is something different entirely (Dessalles, 2007). The difference appears to be that human language

is learned while animal communication is innate. The alarm calls of vervet monkeys appear to be

fixed. The language of bees indicating distance and direction to food sources is genetically

programmed, with no need for individual bees to go through a learning period to associate the

signals with the meanings. The freedom of human communication comes from the arbitrary

association of signs with meanings, as can be seen by the multitude of languages.

Acquisition Acquisition Acquisition

Production Production

I-Language I-Language

E-Language E-Language


14

The ability to categorise the world and to form concepts is central to human thought: “Without

the ability to categorize, we could not function at all, either in the physical world or in our social

and intellectual lives” (Lakoff, 1987, p.6). One open question is whether concepts form before or

together with words to refer to concepts. The key is to consider what we know about concepts in

animals and in pre-linguistic humans. Concepts or ‘proto-concepts’ can be attributed to at least

some animals; for example vervet monkeys responding to alarm calls, swallows responding to

predators such as cats and hawks, but not dogs or pigeons, birds able to classify paintings as Monet

or Picasso, and Alex the African grey parrot who was able to categorise colour, shape, and matter

(Hurford, 2007). Studies of pre-linguistic knowledge of spatial relations indicate that infants are

able to distinguish between left, right, above, below, and between relations, have knowledge of

object permanence by 2.5-3.5 months, some understanding of causal relationships by 7 months,

some understanding of containment by 2.5 months, and begin to separate function from appearance

towards the end of the first year (Coventry & Garrod, 2004).

There are conflicting views on whether concepts exist prior to words and whether language

constrains thought. These views include the Sapir-Whorf hypothesis (Carroll, 1956), and neo-

Whorfianism (Levinson, 2003a). The two cardinal hypotheses of Whorf are: “that all higher levels

of thinking are dependent on language … (and) that the structure of the language one habitually

uses influences the manner in which one understands his environment” (Carroll, 1956, p.vi).

Different languages categorise the world in different ways. A single category in one language with a

single associated word may be divided into two or more categories and words in another language.

According to the Whorfian view, nature is divided up into concepts based on the language that is

used. Language speakers and listeners therefore only have the same experience of the same physical

evidence if their linguistic backgrounds are the same (Carroll, 1956).

Neo-Whorfianism encompasses a variety of views that allow for ‘Whorfian effects’ where

linguistic patterns have an effect on thinking. The neo-Whorfian view is based on studies that find

that linguistic difference between languages correlates with perceptual and cognitive differences

(Levinson, 2003a). The studies have mainly been undertaken in the domains of spatial language and

colour, and indicate that language facilitates cognitive development.

Language is not simply a one-to-one labelling of symbols to concepts. The concepts that we

form are varied, including things (people, animals, and objects), “events, actions, emotions, spatial

relationships, social relationships, and abstract entities” (Lakoff, 1987, p.6). New concepts can be

formed through a generative process by extending from existing concepts. In addition to concepts in

language formed through a generative process, much of human language and conceptual thinking is

metaphor. Spatial concepts are used metaphorically in many other parts of language, including


15

temporal (Spinney, 2005), interpersonal relationships (Tuan, 1975), emotion (Coventry & Garrod,

2004), kinship, social structure, music, and mathematics (Levinson, 2003a). The spatial concepts

used in the metaphor must be grounded in experience for the metaphor to be useful.

2.6 Summary

The key factors that are not yet understood completely to enable language to be used meaningfully

by autonomous agents include grounding, embodiment, learning, evolution, and the interaction

between concepts and words. Features necessary to solve the symbol grounding problem include the

use of embodied autonomous agents, a way for the agents to generate, internally represent, and

ground their own meanings, a way to negotiate symbols to refer to the meanings, and interaction

between agents in the population to coordinate the symbols and meanings (Steels, 2007). The issues

involved in language embodiment include concepts and tasks, perceptual abilities, and whether

natural languages are learned or artificial languages are evolved.

Existing language models employ a large variety of learning methods. There should be a match

between the learning method, representations, concepts, and interactions. In studying language

evolution, the built-in abilities and learning of the agents should be considered, as well as how the

cultural interactions affect how the language evolves through agent populations. The Iterated

Learning Model (Kirby & Hurford, 2002) is a way of integrating built-in abilities, learning, and

culture. Concept-word interactions and how concepts can be grounded when they are not directly

experienced are open questions.


16

17

Chapter 3 The Ubiquity of Space

Space is abstract. It lacks content; it is broad, open, and empty, inviting the

imagination to fill it with substance and illusion; it is possibility and

beckoning future. Place, by contrast, is the past and the present, stability and

achievement.

(Tuan, 1975, p.164-5)

In addition to understanding the nature of language, the concepts on which language is built must be

understood. Space and time are ubiquitous concepts: “Space and time are among the most

fundamental of notions. They provide a basis for ordering all modes of thought and belief”

(Peuquet, 2002, p.11). Spatial cognition is a general requirement for any mobile species, and the

prevalence of spatial metaphors in human language indicates the centrality of spatial cognition in

human thinking (Levinson & Wilkins, 2006a). There is variability in how different languages

conceptualise space, though there may be universals in the possible categories of spatial terms,

ways of describing locations or routes, and in the types of perspective that are possible.

Open research questions include how space is represented and used in humans, how to design

useful computational models of spatial languages, and how to design better robot navigation and

mapping systems. The project described in this thesis combines robot and language research by

designing methods for robots to be able to talk about space.

This chapter provides a review of spatial cognition, universals of spatial language, spatial

language models, and spatial representations in humans, animals, and robots.

3.1 Talking About Where

Spatial language includes descriptions of scenes, navigation, and descriptions of where something

is. Spatial concepts can provide information about the following: a single location (‘at’ specifies a

point), direction (‘north’ specifies any point on a line pointing in a northerly direction), distance

(‘near’ refers to anywhere that is close to a location), or multiple locations (‘beside’ combines ‘left’

and ‘right’). Some spatial words have multiple meanings. ‘To the right of’ may be considered to be

close on the right hand side or at any distance on the right hand side. Meanings can be altered with

different frames of reference. ‘Behind’ may mean that the object is beyond the reference from

‘here’, or that the object is located ‘in the opposite direction to where I am facing’.


18

The key elements involved in how people describe ‘where’ are points (landmarks), planes

(landmarks), paths (one-dimensional connectors), portions, directions (relative to landmark or

environment), and distances (standard units or approximate units of experience) (Tversky, 2003). In

expressing location, landmarks are preferred, followed by spatial relations (near rather than far),

direction (those with natural asymmetries), and distances (usually defined by landmarks). People

are better at using landmarks and paths than at using directions and distances (Tversky, 2003).

3.2 How Space Becomes Place

Spaces and places are known and constructed in the mind through the experiences of smell, taste,

touch, vision, and movement. Space is the general term for the world around us. Once we have

experiences in particular areas of space, they may become a place, or “a special kind of object”

(Tuan, 1975, p.12).

People describe where something is using landmarks of specific points, one-dimensional paths,

two-dimensional planes, and three-dimensional volumes (Tversky, 2003). Landmarks are examples

of the most basic spatial concepts that correspond to areas in space and are referred to using labels,

such as city or suburb names. Areas within an environment or along a path can often be described

by single words, such as corner, corridor, or intersection, or larger regions such as kitchen, office, or

backyard. As defined in Chapter 1, names for specific places in an environment are called toponyms

(i.e. topographic names), and a set of such terms to comprehensively describe an agent’s

environment is a toponymic language.

There are various ways in which toponyms are formed, including natural features (Dover =

water, Rotarua = two lakes), special sites (Doncaster = camp on the Don), religious significance

(Providence, Gadshill), royalty (Queensland), explorers (America, Cookstown), famous local

people (Baltimore, Washington, London), memorable incidents or famous events (Waterloo), and

other place names from immigrants’ homelands (Paris, Troy, London). Other less common ways

include explorers naming good or bad fortune on travels (Cape Tribulation), animal names (Beaver

City, Buffalo), descriptive names (North Sea), and the ‘new town’ (Newtown, Neuville, Naples,

Villanueva, Novgorad, Neustadt, Carthage) (Crystal, 1997, p114).

Place can be constructed at different scales: “At one extreme a favourite armchair is a place, at

the other extreme the whole earth” (Tuan, 1975, p.12). We experience places through our senses.

Smaller areas may become more personal places, while larger ones are constructed from different

types of experience, such as travelling through a city. The different scales of place include personal,

home, city, neighbourhood and region, and nation-state.

Another way to differentiate between spatial scales is based on the actions that can be performed

which depend on the distances involved: within touch (personal space), within view and able to be


19

viewed from different perspectives (tabletop space), within walking or travelling distance

(geographic space), and beyond personal experience (astronomical space) (Peuquet, 2002)

3.3 Describing Relationships Between Places

In language, spatial information is distributed throughout sentences in a variety of word classes

(Levinson, 2003a). In English, spatial relations are often described using spatial prepositions.

Prepositions are hard to learn in a second language due to the differences in how they map onto

concepts (Coventry & Garrod, 2004). Prepositions can be divided into groups, including vertical

(below / above, down / up, under / over, beneath / on top of), distance (near, far), horizontal

(beyond, behind, beside, by), omni-directional (at, about, around, between among, along, across,

opposite, against, from, to, via, through), and temporal (O'Keefe, 1996). Another way to classify

prepositions is shown in Figure 3.1. The preposition type that directly describe spatial relationships

are simple topological terms (in, on, under), proximity terms (next to, beside), projective /

dimensional (behind, in front of), and directional. The order of acquisition for the preposition types

is fairly consistent across different languages, with simple topological terms acquired first, followed

by proximity terms, then projective terms (Coventry & Garrod, 2004).

Figure 3.1 Preposition classification

A classification of the different preposition types (adapted from Figure 1.1, p7 of

Coventry & Garrod, 2004). The spatial terms are those relevant to this thesis,

including ‘simple’ topological terms, proximity terms, projective / dimensional,

and directional.

Prepositions

Grammatical uses Local uses

Spatial uses Temporal uses

Locative/relational Directional

Topological Projective / dimensional

“Simple” topological terms Proximity terms


20

Directions can be described in precise degrees and absolute cardinal directions, but are usually

described approximately, relative to a landmark or major feature of the environment. Directions

with asymmetries, such as front and back, are preferred over those without asymmetries, such as left

and right (Tversky, 2003). While distance can also be described in standard units, such as

kilometres, approximate units of experience are often used that refer to landmarks or time (Tversky,

2003).

3.4 Choosing Which Perspective To Use

When describing space, a variety of references can be used. Levinson (2003a) describes the three

distinct frames of reference as ‘intrinsic’, ‘relative’, and ‘absolute’. In the intrinsic frame of

reference, the coordinate system used is object-centred, determined by inherent features of the

object, which may be different features in different languages. The direction of the coordinate

system is determined by each language, relating to function in English and shape in Tzeltal

(Landau, 1996). The relative frame of reference has a coordinate system centred at the location of a

viewer, which may be the location of the speaker or an arbitrary location in the scene. In the

absolute frame of reference, the coordinate system may be fixed by gravity and cardinal directions,

varying with different languages and cultures. The absolute frame of reference may be fixed by a

feature of the environment such as a coastline or a mountain. Some languages only use one frame of

reference, while others use a combination. The relative frame of reference requires the intrinsic

frame of reference, but all other combinations are possible.

Two examples of languages that require speakers to have a sense of location and direction are

the Australian language Guugu Yimithirr and the Tenejapan language Tzeltal (Levinson, 2003a). In

Guugu Yimithirr, only the absolute frame of reference is used, with spatial descriptions referring to

something similar to the cardinal directions of North, South, East, and West. In Tzeltal, both

intrinsic and absolute frames of reference can be used. In the Tzeltal absolute frame of reference,

directions are designated uphill, downhill, or across, corresponding to an inclined plane that has

been abstracted from the local environment.

While all languages can describe spatial representations, people speaking different languages

will prefer to use different frames of reference or may even switch between frames of reference

during conversation (Tversky, 1996). Frames of reference can be used to construct or describe

spatial relationships in the world. The use of different frames of reference indicates that language

may restructure the spatial representations of the language speaker, rather than the existence of

innate and universal spatial concepts (Majid, Bowerman, Kita, Haun, & Levinson, 2004).


21

Translation between frames of reference is not easy; for example, information required by the

absolute frame of reference is not provided by the others. Therefore, speakers will tend to remember

spatial experiences in the frame of reference that they habitually use (Levinson, 2003a).

3.5 Universals Across Different Languages

There is variation in spatial language across cultures (Levinson, 1996). Language may affect

conceptual categories, including cultural variation in spatial frames of reference. There are

differences in conceptual distinctions, with the spatial relationship concepts formed in different

languages overlapping or cutting across each other, with no one-to-one mapping cross-linguistically

(Levinson & Wilkins, 2006a). There are differences in the grammatical categories of words used to

describe spatial relations. In English, spatial relations are mainly described using spatial

prepositions. Other languages use verbs, local cases, special spatial nominals, or adverbials

(Levinson & Wilkins, 2006a).

The differences in spatial concepts between languages indicate that universal concepts are not

unanalysable wholes (Levinson & Wilkins, 2006b), but may involve elements that can be combined

in various ways. Children acquiring spatial language are not mapping forms onto innate concepts,

but are building the concepts as they are learning the language, developing spatial language for

about the first 8 years of life (Coventry & Garrod, 2004).

Based on a series of studies made over twelve different languages distributed over five

continents with various cultures, Levinson & Wilkins (2006b) concluded that there are constraints

on the diversity of dimensions for structuring spatial domains and implicational constraints. One of

the constraints on the diversity of dimensions for structuring spatial domains is that there is a finite

set of possible frames of reference for languages to use: intrinsic, relative, and absolute. Also, if a

language has a relative frame of reference, it has an intrinsic one.

The universal elements of spatial concepts may include that all languages express spatial

relations, that universal spatial concepts may be elements that can be combined in different ways

(Levinson & Wilkins, 2006b), and that perspectives are used in the form of frames of reference with

three possible choices (relative, intrinsic, and absolute) (Levinson, 2003a).

3.6 Spatial Language Models

Some research has included a spatial dimension to language models, (Bartlett & Kazakov, 2005;

Bodik & Takac, 2003; Regier, 1996; Steels, 1995; Steels & Loetzsch, 2007). In Bartlett and

Kazakov’s (2005) simulation world, agents must find food and water to survive and reproduce.

Food and water are located at set intervals in a grid world and can be found by self-exploration of

the agents, remembering where food and water have been found in the past, conversation with other


22

agents, and sharing with other agents. Locations and paths are remembered by the landmarks that

can be seen from that location. Agents use ‘songlines’ to store paths, listing the landmarks that can

be seen when following the route from a particular location to the nearest resource. The location

names are known by all agents prior to the study. The importance of language to the survival of the

agents is related to the structure of the environment concerning the distances between resources.

In language games studies incorporating a spatial dimension (Bodik & Takac, 2003; Steels,

1995), the games involve concepts of direction and distance from the location of the agent to an

object in the world. Objects in the world could be ‘pointed to’, allowing agents to build up a lexicon

of terms for objects in the world. In Steels’ study the objects were other agents, while in Bodik and

Takac’s study the objects were static objects in a playground. Naming games were followed by

spatial language games in which the agents describe distances and directions between objects. In

Steels’ (1995) study, the agents always face the same direction, and the concepts used are front,

side, behind, left, straight, and right. In Bodik and Takac’s (2003) study, the agents move around

the world, and a conceptualisation tree was used for dividing angles and distances into spatial

concepts which were then associated with words. In both studies, the agents utilise an absolute

frame of reference, and so have a shared perspective.

Regier’s (1996) constrained connectionism model categorised spatial relations, or prepositions,

from several languages including English, Russian, and Mixtec. The model was presented with

sequences of movie frames that displayed the spatial relation, and learned to categorise the terms

without explicit negative evidence.

Methods for aligning the agents’ perspective in language games have been considered. Situated

agents provided with a ‘language faculty’ and methods for aligning perspective are able to agree on

a lexicon describing spatial categories relevant to their environment which includes another robot

and an orange ball (Steels & Loetzsch, 2007).

3.7 Representing Space

Human spatial competence includes shape recognition, a sense of where body parts are, and

navigation (Levinson, 2003a). In terms of navigation, people’s ability varies greatly, unlike the use

of echolocation in bats, detecting polarisation of light in bees, and using the earth’s magnetic field

in migratory birds. Navigation in humans is a culturally developed system, resulting in a large

variance in navigation ability (Levinson, 2003a).

The hippocampus is indicated in spatial representation in humans. The theory presented by

O’Keefe and Nadel (1978) was the existence of a cognitive map in rats, with the hippocampus

constructing an allocentric map of the world. In the cognitive map theory for humans, the right

hippocampus has the spatial function of constructing an allocentric map, while the left hippocampus


23

is a linguistic or episodic memory system, or a semantic map (O'Keefe, 2003). Studies using

modern imaging techniques have confirmed that the right hippocampus is used in navigation tasks,

such as mental navigation along memorised routes (Berthoz, 1999) and topographic learning and

recall (Maguire, 1999).

Multiple brain areas are likely to be associated with spatial language comprehension and

production, including those associated with geometric and dynamic-kinematic routines, such as the

right hippocampus with non-linguistic spatial representations, the left hippocampus mapping the

spatial language onto a spatial representation, and the left parietal and frontal regions involved in

processing of space and motion (Coventry & Garrod, 2004).

By studying other forms of spatial representations, more may be understood about human spatial

representations. Other forms of representations being studied include cognitive maps in rats and

maps for mobile robots.

3.7.1 Cognitive Maps In Rats

Invasive experimental procedures have resulted in detailed knowledge of rodent spatial

representation due to more. Rodents have place cells that correspond to locations in space and head

direction cells that indicate the rodent’s head orientation (O'Keefe, 1979). Place cell activity is

affected by the movement of the rodent and visual input. Place cells appear to provide an allocentric

map of the world, where individual place cells correspond to particular locations in the world.

Evidence of ‘grid cells’ has been shown in the entorhinal cortex (Hafting, Fyhn, Molden, Moser,

& Moser, 2005). Grid cells are activated when the rodent is at locations in the environment that

coincide with a regular grid of equilateral triangles. Grid cells have multiple discrete firing locations

corresponding to the regular grid.

Place cells, head direction cells, and grid cells are indicated in a neural map of the rodent’s

spatial environment. Grid cells may provide a scale of the environment, with head direction

providing orientation, and place cells responding to specific location cues (Hafting et al., 2005).

3.7.2 Maps for Mobile Robots

For mobile robots to be truly autonomous they must be able to build a map of their environment and

navigate using that map. An overview of research into robotic mapping is given in Thrun (2002). A

summary of the main points of this paper is given here.

Robotic mapping is one of the most important problems in building autonomous mobile robots.

With more than two decades of research in robotic mapping, there are many models able to map

structured, static, small scale environments in real-time. Simultaneous Localisation And Mapping

(SLAM) is the process in which robots build an internal map of the environment and use an


24

estimate of localisation in that map for navigation. Problems in robotic mapping include

measurement error (the sensors used by robots are subject to errors), the high dimensionality of the

environments, the correspondence problem (whether different measurements correspond to the

same physical object), dynamic environments (the real world is not a static environment), and

robotic exploration (how the robots choose their path during mapping). Issues for future research

into robotic mapping include mapping dynamic environments, integrating knowledge about

environments into the mapping problem (e.g. objects typical in indoor vs. outdoor environments),

multi-robot collaboration, and unstructured environments (e.g. outdoor, underwater, and planetary).

SLAM models have been based on grid representations, landmark representations, or

topological representations. An alternative is a biologically inspired approach to SLAM; for

example an approach based on the hippocampal complex in rodents. Computational models of the

rodent hippocampus used in mobile robots allow the robots to localise within an environment

(Arleo & Gerstner, 2000; Burgess, Donnett, Jeffery, & O'Keefe, 1999). RatSLAM, a method of

SLAM that has been developed at The University of Queensland, is based on the rodent

hippocampus (Milford et al., 2004).

3.7.3 RatSLAM

The robots used for the RatSLAM project are Pioneer 3 DX robots (see Figure 3.2), equipped with a

camera to provide the visual input for the perception system, and wireless communication

equipment for real time data collection and analysis. There is a high-fidelity simulator that captures

the sensing and motion capabilities of the robot and is able to generate the robot’s on-camera view

(Moylan, 2003, with additional work by Mark Wakabayashi) (see Figure 3.3). Earlier studies were

performed on Pioneer 2 DXE robots with a forward facing camera.

RatSLAM is a computational model inspired by the rodent hippocampal complex with

continuous attractor networks (Milford et al., 2004). Movement and visual senses modulate the

network dynamics. Elements near each other in the network are likely to be close in space.

RatSLAM keeps the sense of space inherent in grid-based and landmark-based systems, while

adding the robustness and adaptability of topological representations.

Robots using RatSLAM use the appearance of an image to aid localisation by learning to

associate the appearance of a scene and its position estimate (Prasser, Wyeth, & Milford, 2004).

The robot can perform goal-directed navigation to locations that have previously been visited based

on an internal map of the environment.


25

Figure 3.2 Robot used in the RatSLAM and RatChat projects

The Pioneer 3 DX robots have a camera, sonar range finders, laser range finders,

and wireless communication. For RatChat, the robots have a microphone and two

speakers.

a)

b)

Figure 3.3 Map of the real and simulated world

The map of the robot’s world showing a) the halls and open plan offices of the real

world and b) the simulation world, a 3D virtual reality environment constructed

from digital photos of the real world.

Mirror for omni-directional camera

Speaker

Microphone Laser range finder

Sonar range finder

Antenna for wireless communication

Camera


26

Architecture

The inputs to the RatSLAM system include odometry and vision. The odometry inputs are the

velocity and rotation of the robot. The visual representation is a low resolution version of the input

from the camera (see Figure 3.4). The RatSLAM Architecture integrates the inputs by creating local

views from the visual scenes and performing path integration on the odometric inputs to form the

pose cell representation. The pose cells and local view cells are integrated into an experience map.

The local view, pose cells, and experiences are described in the following three sections.

a) b) c)

Figure 3.4 Visual input

A corridor as seen by a) the camera of the robot, b) the robot in the simulation

world, and c) the low resolution vision obtained from the camera image and used

by RatSLAM for vision based SLAM.

Local View Cells

The visual processing method used in local view creation is a sum of absolute differences matcher

(see Figure 3.5). The current local view is compared with stored templates that have been seen

previously. If the current view is sufficiently similar it will be recognised, otherwise a new template

will be created.

Pose Cells

The RatSLAM pose cell representation (see Figure 3.6) provides information about the robot’s

pose, combining place and orientation information. The correspondence with the rodent

hippocampal complex is with place cells that are active when the rodent is in a particular location

and head direction cells that are active when the rodent is oriented in a certain direction. In

RatSLAM, place and head direction cells have been combined into a pose representation in x, y, and

θ to allow the system to concurrently manage multiple pose beliefs. Wrapping occurs in each of the

x, y, and θ dimensions. As the robot moves, information from vision and odometry sensors is

processed using local view and path integration. The visual and odometric information is used to

update the pose cell activity, resulting in the movement of the pose cell activity packet. Multiple

activity packets can exist when the robot recognises scenes that are associated with multiple

locations in the environment.


27

Figure 3.5 Local view cells

The array of Local View (LV) Cells is created as the robot explores the world. The

current local view is compared with existing templates. When a scene is

sufficiently different from all stored scenes, a new template is created. Note that

the images shown are for a forward facing camera.

The total number of cells and the number of active cells is adjustable depending on the

environment size and the desired level of accuracy. An example setting, used in the pilot studies

was 180 x by 68 y by 36 θ pose cells (440,640 pose cells) with between 100 and 200 pose cells

active at any time. In all studies described in this thesis, the pose cell map was large enough that the

x and y dimensions did not wrap.

In a pose cell map, where the most active pose cell at the current time is shown as the robot

moves through the world, there may be discontinuities where areas close in physical space are

represented by pose cells that are further apart, and there may be multiple representations where one

cluster of pose cells represents multiple physical locations. The pose cell representation is

Local View Cells

Visual inputs of robots in the Simulation World Low resolution vision

used by RatSLAM


28

topologically correct, consistent, stable, and able to be used for goal directed navigation in simple

environments (Milford, Wyeth, & Prasser, 2005). In more complex environments, the pose cells are

difficult for a human to interpret as a map of the environment, and difficult to use in goal directed

navigation. The experience mapping algorithm was developed to create a map that was easy for a

human to interpret and to use in goal directed navigation.

Figure 3.6 Pose cells

The three dimensional (x, y, and θ) continuous attractor network of pose cells used

in RatSLAM. The lines around the edge show the limiting area for the pose cells at

which wrapping occurs. The activity packet is the currently active cells, and moves

around the pose cells based on the motion and rotation of the robot, with additional

inputs from the current Local View. Note that the robot pictured here is a Pioneer 2

DXE with a forward facing camera.

Experiences

The experience map consists of a collection of experiences linked together by transitions. Each

experience is a representation of the pose cell and local view cell activity at a point in time (see

Figure 3.7). New experiences are created when the existing experiences do not sufficiently describe

the current pose cell and local view cell activity. Each experience is located within the (x, y, θ)

experience map coordinate space. The first experience is placed at (0, 0, 0), with subsequent

experiences placed based on the previous active experience and the robot’s movement. The

movement between experiences, in terms of spatial, temporal, and behaviour, is stored in

experience transitions. A map correction process takes into account the current location of the

experience in the map and the perceived distance between the experiences based on transitions.

rotation

motion

Robot Movement

x'

y'

θ'

motion

rotation

Activity Packet

Pose Cells


29

Figure 3.7 Experiences

The pose cells represent spatial information where active cells encode the robot’s

pose. The local view cells represent visual information where each cell encodes a

visual scene experienced by the robot. Each node in the experience map encodes a

specific spatial and visual experience of the robot and is located in the experience

map’s co-ordinate space. Transitions between experiences are used to form a map

representative of the environment. (Adapted from Milford & Wyeth, 2007)

Behaviours

The basic behaviour of the RatSLAM robots is to explore the world, forming pose cell, local view

cell, and experience map representations. Exploration involves wall following, where the left or

right wall may be selected. The robots can perform goal directed navigation: an experience may be

selected for the robot to use as a goal, causing the robot to plan a route to the location based on

knowledge of time taken to move between experiences.

Future Work

The RatSLAM system is under ongoing development, with possible future work including the

development of analysis techniques for biological models, formal analysis of the model, long term

experiments, and formalisation of the methods and the dynamic environment problem (Milford,

2008).

Pose Cells

Local View Cells

xy

θ

Experience Map Co-ordinate Space

Experience


30

3.8 RatChat

The RatChat project extends the RatSLAM project, and aims to investigate language formation

using robots. The concrete aims are to provide robots with the ability to talk about their world and

experiences. The project builds on RatSLAM’s robotic platform, with the detailed representations

of space formed by robots exploring their world.

The studies of this thesis have been the core of the RatChat project, driving the robots’ abilities

to talk about locations in the world, and the relationships between locations. Other sections of the

project have considered naming paths, equivalent to verbs in the robots’ world, and ways for

humans to interact with the robots.

3.9 Summary

The different types of spatial concepts include landmarks or specific places, paths between places,

directions, and distances. Combinations are used to describe scenes, navigation, or locations.

Relationships between places are often described in English with prepositions. The relationships

that can be described with prepositions include topological, proximity, projective, and directional.

Space, the general world around us, becomes place through experience. Place names, or

toponyms, are often formed through a collective experience at a location, or are descriptive of

features of that location. There is a divergence of concepts, frames of reference, and grammatical

categories used for spatial language. There may be a set of universals from which specific

languages choose; for example, there are only three frames of reference for languages to choose

from: intrinsic, relative, and absolute. Language models have labelled objects located in the world,

relationships between the objects, and have recently considered methods for aligning perspective.

Human spatial competence includes shape recognition, a sense of where body parts are, and the

culturally developed navigation system. Specific brain areas are associated with spatial language

and spatial representation, including the hippocampus, parietal, and frontal regions. In rats, the

hippocampus and entorhinal cortex are indicated in navigation with place cells, head direction cells,

and grid cells. Navigation in rats has inspired the development of robot models of Simultaneous

Localisation and Mapping, including RatSLAM. The RatSLAM modules are local view cells, pose

cells, and the experience map. The themes covered in this chapter are combined in the RatChat

project, of which the studies described in this thesis are a major part.

31

Chapter 4 A Location Language Game

The original word game is the operation of linguistic reference in first

language learning. … We play this game as long as we continue to extend

our vocabularies and that may be as long as we live.

(R. Brown, 1958, p.194)

To use language meaningfully in practical applications requires appropriate methodology. Many

methods and representations have been used in language models. Method refers to the structure of

the language agents, the strategies used by the agents to form concepts, produce utterances, and

comprehend utterances. Representation refers to how concepts and words are defined, where

concepts are formed from concept representations and utterances are formed from word

representations.

The purpose of Chapter 4 is to describe the methodology and representations for the RatChat

project, in particular for the studies presented in this thesis. The studies are presented in the three

chapters following this chapter, and include Pilot Study: Methods and Representations, Study 1: A

Toponymic Language Game, and Study 2: A Generative Toponymic Language Game.

Language games can be used to inform about language features by investigating embodiment,

learning, culture, evolution, grounding, and vocabulary. They may be played in a static or dynamic

population of two or more agents. The nature of the population dynamics may affect the languages

resulting from interactions. A standard language game consists of interactions between two agents:

a speaker and a hearer. In a simple version of the guessing game, the steps of an interaction are:

• shared attention,

• speaker behaviour,

• hearer behaviour,

• feedback, and

• acquisition of a new conceptualisation (Steels, 2001).

The language games described in this thesis are location language games. The structure of

location language games is described in this section, with a flow chart for an agent playing location

language games shown in Figure 4.1. To play a location language game, the agents require a

representation of the world that they use to form concepts. To gain these concept representations,

the agents must first explore the world. Exploration is carried out independently of other agents in


32

the world and may be performed in advance of or concurrently with language game interactions.

The exact nature of the exploration depends on the agents’ environment.

Shared attention for location language games is obtained by agents being near each other, or

within hearing distance. The agents autonomously explore the world and play a game when they

are close to each other. To determine whether they are close to each other, the agents intermittently

send out a ‘Hello’ signal. If a ‘Hello’ signal is heard, the hearing agent then sends a ‘Hear’ signal

and the agents will play a game. The agent that said ‘Hello’ is the speaker and the agent that said

‘Hear’ is the hearer. In real robots, the signals may be in the form of sounds, while in simulated

agents, signals may be sent to agents within a set distance in the simulation world. The hearing

distance affects the distance at which signals will be received by other agents in the world.

Following the acquisition of shared attention is the speaker behaviour. The speaker chooses a

topic, which in a location language game relates to the current location of the agents or a location at

a distance from the agents, depending on the game being played. After the topic is determined, the

speaker uses their lexicon to determine which word should be used in the current situation. The

actual word representation depends on the abilities given to the agents, such as an ability to parse

speech, tones, text, or integers.

After the speaker produces an utterance, the hearer attempts to comprehend the utterance to

determine the topic. For comprehension, the hearer considers the shared attention of the agents and

their internal representations of the world.

In the feedback step, the success or failure of the game is determined with feedback provided to

the other agents. The feedback step may be skipped, as determining success or failure is difficult to

do without considering the internal states of the agents, or ‘mind-reading’, which is not desirable

(Smith, 2001). However, a variety of performance measures may enable each agent to keep track of

how the interactions are progressing.

Finally is the acquisition of a new conceptualisation, where agents update their representations

to increase the chance of success in future games. Specifically, the agents update their lexicon,

which enables a coherent language to form in the population of agents.

In the studies, three different types of language game are played by the agents: ‘where are we’,

‘go to’, and ‘where is there’. The type of location language game played determines how the agents

choose the topic, determine the word, and comprehend the word. The games are described in detail

in Chapters 6 and 7 where they are first used.


33

Figure 4.1 A location language game

A flow chart for an agent playing location language games, with the speaker

behaviour on the left and the hearer behaviour on the right. Agents explore the

world, checking if a ‘Hello?’ signal has been received, or if it is time to send out a

‘Hello?’ signal. When a ‘Hello?’ signal is received, the agent becomes the hearer.

When an agent that has sent a ‘Hello?’ signal receives a ‘Hear’ signal, they become

the speaker.


34

The parameters for the location language game are:

• game and

• hearing distance.

The features requiring more detailed information are:

• concept representations,

• word representations,

• lexicon,

• population dynamics,

• environment, and

• performance measures.

These features are described in the remainder of this chapter, which concludes with a summary

of the parameters for all features of the location language game.

4.1 Concept Representations

In many language models, the concept representations are arbitrary, and may not actually represent

meaningful concepts. For example, a representation might be a vector of 10 real numbers between

0.0 and 1.0 (Batali, 1998) or objects with a set of abstract features in the range 0.0 to 1.0 (Smith,

2003). In other studies, they are representative of real concepts or tasks. Agents have played

language games to describe visual scenes with concepts such as size and colour (Steels, 1999) and

have communicated to aid cooperation in agent populations (Floreano et al., 2007).

In a location language game, the concept representations build on the robot’s representations

formed while exploring the world. They are used by the speaker to choose the interaction topic, and

together with the word representations and the lexicon to determine the word for the topic. They are

used by the hearer, together with the word representations and the lexicon, to comprehend the word

spoken.

Throughout the studies described in this thesis, three types of concepts are formed by the agents:

locations in the world, distances between locations in the world, and directions in the form of the

angle between two locations at a distance (see Figure 4.2). The concepts are constructed from the

robot representations formed using RatSLAM, described in section 3.7.3, or simplified versions of

these representations. The representation types include vision, pose cells, and experiences.

The parameters for robot representations are:

• concept type (locations, distances, or directions) and

• concept representation (vision, pose cells, or experiences).


35

Figure 4.2 Concepts types

The three concept types used by agents in this thesis: locations, distances, and

directions.

4.2 Word Representations

Word representations are used by the speaker when determining the word for the topic, and when

sending the word to the hearer. They are used by the hearer when comprehending the word.

Depending on the methods used to form concepts, and to associate concepts and words, the words

used for different concepts may be arbitrary, or they may be related according to the relationship of

the concepts that they refer to. For example, labels for nearby locations may have similar elements,

while labels for distant locations do not. Alternatively, distinct names may be used for different

locations, which may be easier for agents to learn (Gasser, 2004). The actual representation of the

words regarding the physical transmission could be an integer, a set of unit activations, text or

sound.

The parameter for word representations is:

• word representation (integer, activations, text, or sound).

4.3 Lexicon

The lexicon is used together with concept and word representations when the speaker determines

the word for the topic and when the hearer comprehends the word. The associations between

concepts and words stored in the lexicon are updated at the end of each interaction. Throughout the

studies, four different techniques were employed to associate concepts and words. Each of the

lexicon techniques must allow the agents to produce words, comprehend concepts, learn

associations between concepts and words, and have a source of variability that allows new words to

be associated with new concepts.

Location Distance Direction


36

The parameter for the lexicon is:

• technique (simple or recurrent neural networks, standard or distributed lexicon tables).

For each of these techniques, the features requiring more detailed information are:

• word production,

• concept comprehension,

• learning associations, and

• variability.

In the next four sections the lexicon techniques used in this thesis are described.

4.3.1 Simple Neural Networks

Simple neural networks provide a way to associate a set of input units with a set of output units (see

Figure 4.3). For neural networks, features that need to be considered are whether they will be fully

connected, the transfer function that will be used in the units, and the structure of the word

representations. The simple neural networks used in this thesis are fully connected. The Log-

Sigmoid transfer function (logsig) is used:

logsig(n))exp(1

1n−+

= Equation 4.1

where n is the input to the unit. The logsig function takes any input and will output a value

between 0 and 1, limiting the network outputs to activations between 0 and 1. The structure of the

word representations is that each output unit is associated with a single word, and the word chosen

corresponds to the most active output unit.

Figure 4.3 Simple neural network

The pilot studies used simple linear networks with bias. In this figure, the

rectangles refer to a set of units, and the arrow indicates that the units are fully

connected.

Word Production

For the word production networks, the input unit activations correspond to the representations

underlying the concepts and the output units correspond to the words. In the studies using simple

neural networks each output unit corresponds to a single word.

Output

Input


37

Concept Comprehension

There are two alternatives for concept comprehension. The first is for the agent to have a separate

comprehension network, with the inputs being the words and the outputs being the concept

representations. Two networks are easy to implement and use, but there is no direct link between

the networks. The agents need to train both networks, one for production and the other for

comprehension.

The second alternative is for the agent to have a single network used both for word production

and concept comprehension (Batali, 1998). The network takes the words as the input and the output

of the network is the concept. When acting as the speaker, the word that has the closest concept to

the topic chosen is the word sent to the hearer. A single network means that word production and

concept comprehension are directly linked, but one of the directions is more difficult to obtain and

more computationally expensive.

The parameter for concept comprehension using simple neural networks is:

• networks (the same network used for production and comprehension or a separate

network for production and comprehension).

Learning Associations

The network weights are updated by training the network on a set of input and output patterns. In

this thesis, the networks are initialised with small random weights and biases (uniformly between –

0.1 and 0.1). The network is trained using gradient descent with momentum and an adaptive

learning rate. The change in weights and biases, dX, is given by:

gmdXmdX k )1(. 1 −+= − η Equation 4.2

where m is the momentum constant, dXk-1 is the previous change in weights and biases, η is the

learning rate, and g is the gradient of the network’s performance with respect to its weight and bias

values. The learning rate may be increased or decreased by the increasing and decreasing ratios

depending on the network’s performance.

Concerning the network task, one option is to present patterns individually, updating the

network’s weights for each pattern. Another option is to present all of the patterns, updating the

network’s weights for the whole language. In this thesis, blocks of patterns were presented to the

networks, with the network’s weights updated for the block of patterns rather than for individual

patterns.

The parameters for learning associations with a simple neural network are:

• momentum constant,

• initial learning rate,


38

• increasing ratio, and

• decreasing ratio.

Variability

The source of variability for simple neural networks, resulting in different words being used for a

set of input patterns, may be evolution or training. In evolution, the weights connecting the input

and output units are mutated and different networks are selected based on a performance measure.

An example performance measure is expressivity, or producing many different words for the set of

input patterns. In training, the training set may be adjusted with a set of input patterns associated

with a set of output patterns. In this case, an external trainer sets the desired output patterns. It is

possible to link these two options, with the training set derived from an evolved network.

The parameter for variability using simple neural networks is:

• source of variability (evolving for expressivity or specifying the training set).

4.3.2 Recurrent Neural Networks

Recurrent neural networks are general purpose learners with the ability to learn temporal sequences,

making them ideal for computational models of language (see Figure 4.4). They have been used in

the language domain (Elman, 1991) and in language evolution (Batali, 1998). In the evolution of

languages, recurrent neural networks have been used to investigate compositionality and how

language can evolve to become more learnable (Tonkes et al., 2000). Given appropriate input and

output representations, compositional and generalisable languages can be formed.

Figure 4.4 Recurrent neural network

In a simple recurrent neural network, the hidden unit activations are copied to the

context units for the next time step (adapted from Figure 2, p184 of Elman, 1990).

In this figure, the rectangles refer to sets of units, the full lines indicate that the

units are fully connected, and the dotted line indicates that the activations are

copied one-for-one from the hidden to the context units.

Input

Output

Hidden

Context


39

Parameters that need to be considered are how many hidden units will be used and the units’

transfer function. For the recurrent neural networks in this thesis the hidden unit activations are

copied to the context units for the next time step. The Log-Sigmoid transfer function is used.

The parameter for recurrent neural networks design is:

• hidden units.

Word Production

As for simple neural networks, the input unit activations for word production networks correspond

to the representations underlying the concepts and the output units correspond to the words. The

word may be the output of the network at the final time step, or the output of the network over a

sequence of time steps. Also, the word may be the raw output of activations, or may be a cleaned

version where one or more of the most activated outputs are set to 1 while the others are set to 0.


As for simple neural networks, there are two alternatives for concept comprehension used in the

studies. The parameter for concept comprehension using recurrent neural networks is:

• networks (the same network used for production and comprehension, or a separate

network for production and comprehension).


The network’s weights are updated by training the network on a set of input and output patterns.

The options for updating the weights of the neural network include various evolutionary strategies

and back propagation.

Evolution strategies are a type of evolutionary computation that were initially used for automatic

design and analysis (see Beyer & Schwefel, 2002 for an overview, including a history, the basic

algorithm, and variations). There are a number of ways to control the mutation rate in the evolution

strategy, including a mutation rate rule called the 1/5 success rule (Beyer & Schwefel, 2002). With

the 1/5 success rule, the mutation is tuned so that the success rate is 1/5, which from

experimentation has been found to be the optimal success probability. The 1/5 success rule

increases the mutation rate when success is lower than 1/5, and decreases the mutation rate when

success is higher than 1/5. A variation of the 1/5 success rule is the reverse 1/5 success rule that

decreases the mutation rate when success is lower than 1/5 and increases the mutation rate when

success is higher than 1/5. Another useful mutation rate rule is called self-adaptation (Beyer &

Schwefel, 2002). For self-adaptation, the mutation strength is mutated, with either a single mutation

operator or a vector of mutation operators referring to each weight of the neural network. For all of


40

the evolution strategies the weights are perturbed using the mutation rate multiplied by a random

amount determined using a normal distribution.

Back propagation is one of the most popular methods used to train neural networks (Rumelhart,

Widrow, & Lehr, 1994). Back propagation through time (BPTT) is a variation of back propagation

which is more suitable for temporal information (Tonkes, 2001). Using back propagation, the

network’s weights are updated corresponding to the contribution of each weight to the final error of

the output. For networks using BPTT, the contribution of each weight is calculated over multiple

time steps.

The parameter for learning associations using recurrent neural networks is:

• weight setting mechanism (a constant mutation rate, the 1/5 success mutation rate rule,

the reverse 1/5 success mutation rate rule, self-adaptation, or back propagation).

Variability

As for simple neural networks, a separate network may evolve the language, or a training set may

be created for the networks. The parameter for variability using recurrent neural networks is:

• source of variability (evolving for expressivity or specifying the training set).

4.3.3 Lexicon Table

Lexicon tables are a symbolic representation that is designed to associate concepts with words. A

lexicon table stores associations between concepts and words.

Word Production

The agents can either choose the word most associated with the category (the normal strategy), or

the word most likely to be understood as the category (the introspective obverter strategy (Smith,

2003)). Choosing the word most likely to be understood is similar to the strategy for simple or

recurrent neural networks where a single network is used for production and comprehension.

The parameter for word production using lexicon tables is:

• strategy for choosing words (normal or introspective obverter).


Words are comprehended as the concept that is most associated with the category.


Different strategies can be used to associate categories with words. Steels (1999) assigns a score to

each concept-word pair that is increased when the pair is used successfully, and decreased when the

concept is used with another word, the word is used with another concept, or the concept-word pair

is used unsuccessfully. Smith (2003) does not differentiate between successful and unsuccessful


41

games. Instead, the usage of a concept-word pair is increased every time that pair is used in a

speaking or hearing game. A confidence probability is given to each concept-word pair which is the

proportion of times that word has been used in which it has been associated with that concept.

In the score model, a game is either successful or unsuccessful. If a game is successful, the

association or score of the word-category pair used is increased by a small value (e.g. 0.1) and the

scores of other words associated with that category, and other categories associated with that word

are decreased by a small value (e.g. 0.1). If a game is unsuccessful, the score of the word-category

pair is decreased. The scores are set between a lower and upper limit (e.g. 0.0 and 1.0).

In the usage and confidence probability model, the usage of a word-category pair is increased

every time the word-category pair is used in a speaking or hearing game. The confidence

probability is worked out by calculating the proportion of times the word has been used for that

category compared to the times it has been used for other categories.

The parameter for learning associations using lexicon tables is:

• strategy for associating (score or usage and confidence probability).

Variability

When lexicon tables are used, words are usually invented when they are needed, with no connection

to other words that are already associated with concepts. In an interaction, the agent must decide

whether it already has an appropriate word, or whether a new word should be invented. One option

is to use a word invention rate with a threshold for the association between the concept and word,

under which a word may be invented. When the threshold is 0, a word may only be invented if there

are no associations between words and the chosen concept. A word absorption rate may also be

used by the hearer to determine whether to add words that they hear to their lexicon.

A second option is to invent words probabilistically, for example with probability, p, as follows:

TSep )1(1

−−

= Equation 4.3

where S is the association between the concept and word, and T is the temperature, which sets

the association value accepted by an agent. Varying the temperature alters the rate of word

invention, where a higher temperature increases the probability of inventing a new word.

The parameters for variability using lexicon tables are:

• strategy for word invention (threshold or temperature),

• word absorption rate,

• word invention rate,

• threshold, and

• temperature.


42

4.3.4 Distributed Lexicon Table

In Study 1 and Study 2, the associations between concept elements and words for locations are

stored in distributed lexicon tables, a method designed for these studies and inspired by the

distributed nature of inputs to neural networks combined with the lexicon table structure (see Figure

4.5). Forming concepts with a distributed lexicon table is quite different from most other

conceptualisation methods in that it is directly linked to the language formation, allowing concepts

and words to have boundaries that are never explicitly defined. In many language game studies,

concepts are formed using discrimination trees (Bodik & Takac, 2003; Smith, 2001; Steels, 1997a),

which allows the agents to form concepts with well defined boundaries. The discrete concepts,

formed through a discrimination tree or similar categorisation method, may then be associated with

words through a lexicon table, as described in the previous section.

Figure 4.5 Distributed lexicon table

A distributed lexicon table is shown, which stores the associations between concept

elements and words. Associations are stored for each concept element – word pair.

A distributed lexicon table differs from a standard lexicon table in that concept

elements are associated with words rather than discrete concepts. The resulting

concept is distributed across the concept elements, which have defined

relationships with each other.

Con

cept

Ele

men

ts

Words

Associations


43

With a distributed lexicon table, concept formation and association with words occurs

concurrently by increasing associations between concept elements and words. The concepts can be

made more or less specific with more or less elements used to cover the space of the underlying

representation for the concepts (for example locations in the world, a set or distances, or a set of

directions). An association value is stored for each concept element – word pair, which is a value of

0.0 or greater. Concept elements are related to each other; for example if they are locations, they are

related by how far apart they are in the world.

Word Production

Different strategies may be used for choosing which word should be produced for the current

concept element. With the most associated strategy for choosing words, the word chosen is the

word that has most often been associated with the current concept element. Another strategy, the

most informative strategy, was developed in which the most information is transferred about the

current concept element, based on mutual information (MacKay, 2003). Agents choose the word

that will transfer the maximum amount of information (see Figure 4.6) about the current concept

element. With the most informative strategy, a word that has not been used often, but has only been

used for a particular concept element may be chosen over a word that has been used often for many

concept elements.

Figure 4.6 Information value

The information value of word, w, at location, p, is Awp (the association between

the word, w, and the location, p), compared to the total usage of the word.

World containing locations

p

wDistributed Lexicon Table

Awp ∑ =

M

m wmA1

1

M

2

p

1

2M


44

The most informative strategy is described in this section for concepts of locations in the world.

The implementation of the most informative strategy is to calculate the information value, Iwp, for

the word, w, in location, p, as follows:

∑ =

=M

m wm

wpwp

A

AI

1

Equation 4.4

where Awp is the association between the word, w, and the location, p, and M is the total number

of locations. For each location the word with the highest information value is chosen.

In addition to considering the current location to determine which word should be used, the

agent can consider the neighbourhood of locations. When the agents consider the neighbourhood,

the association for the word is summed over the neighbourhood of locations rather than over a

single location (see Figure 4.7).

Figure 4.7 Neighbourhood information value

The neighbourhood information value of a word, w, at location, p, is Awp(neighbourhood)

(the association between the word, w, and all locations in the neighbourhood of

location, p), compared to the total usage of the word.

The neighbourhood association, Awp(neighbourhood) for the word, w, in location, p, is calculated as

follows:

∑ ==

N

n wnoodneighbourhwp AA1)( Equation 4.5

where Awn, the association between the word, w, and the location, n, is summed over all N

locations in the neighbourhood of location, p. The neighbourhood most informative strategy

calculates the neighbourhood information value, Iwp, for the word, w, in location, p, as follows:

p


Neighbourhood of p

p


∑ =

M

m wmA1

Neighbour of p

Neighbour of p

Awp(neighbourhood)


45

∑ =

= M

m wm

oodneighbourhwpoodneighbourhwp

A

AI

1

)()( Equation 4.6

where Awp(neighbourhood) is the neighbourhood association calculated in Equation 4.5, Awm is the

association between the word, w, and the location, m, and M is the total number of locations. When

the neighbourhood is used, a stable language is reached more quickly than when only the current

concept is used.

The relative neighbourhood information value normalises the association strength of a location

in the neighbourhood by the distance from the location of interest (see Figure 4.8).

Figure 4.8 Relative neighbourhood information value

The relative neighbourhood information value of a word, w, at location, p, is

Awp(relativeNeighbourhood) (the relative association of the word within a neighbourhood,

D) compared to the total usage of the word.

The relative neighbourhood association, Awp(relativeNeighbourhood), for the word, w, in location, p, is

calculated as follows:

∑ =

−=

N

n

npwndighbourhoorelativeNewp D

dDAA

1)(

)(Equation 4.7

where Awn, the association between the word, w, and the location, n, is normalised by the

distance dnp from location, p, and summed over all N locations within a neighbourhood, D, of

location, p. The relative neighbourhood information value, Iwp(relativeNeighbourhood), for the word, w, in

location, p, is the relative association of the word within a neighbourhood, D, compared to the total

usage of the word, calculated as follows:

p


Neighbourhood of p

d2

d1

D p


∑ =

M

m wmA1

Neighbour of p (distance d2)

Neighbour of p (distance d1)

Awp(relativeNeighbourhood)


46

∑ =

=M

m wm

dighbourhoorelativeNewpdighbourhoorelativeNewp

A

AI

1

)()( Equation 4.8

where Awp(relativeNeighbourhood) is the relative neighbourhood association calculated in Equation 4.7,

Awm is the association between the word, w, and an experience, m, and M is the total number of

experiences in the robot’s experience map.

The parameters for word production using distributed lexicon tables are:

• strategy for choosing words (most associated or most informative strategy with a single

concept element, neighbourhood, or relative neighbourhood) and

• neighbourhood size.


For distributed lexicon tables, words are associated with multiple concept elements. A template can

be created based on these associations, resulting in a representation of the concept that is associated

with each word. Alternatively, the single concept element that is most associated with the word can

be chosen as representative of the word.

For concepts of locations in the world, toponyms are associated with multiple locations. A

template can be created based on the associations between a toponym and a set of locations in the

world. In some cases, a toponym should be interpreted as a single location. The location is the one

that is most representative of the toponym, calculated by determining the information value

provided by that toponym at each location in the world.


The features to consider in updating the distributed lexicon include when the lexicon will be

updated and how associations between concepts and words will be strengthened and weakened. The

lexicon may be updated whenever a game is played, whether the agent is the speaker or the hearer.

Another option is for only the hearer to update their lexicon, as the speaker already uses the current

word in the current context.

The association between a concept element and a word is strengthened when they are used

together. If forgetting is used, the associations between the current word and other locations and the

current location and other words are also updated. The current association is strengthened while all

other associations with the current location and word are weakened. Forgetting may increase the

rate at which words are lost in the language, unless countered by another feature.

The parameters for learning associations using distributed lexicon tables include:

• forgetting and

• updating (only hearer or both agents update their lexicon)


47

Variability

As for lexicon tables, words are invented as they are needed.

The parameters for variability using distributed lexicon tables are:

• strategy for word invention (threshold or temperature),

• word absorption rate,

• word invention rate,

• threshold, and

• temperature.

4.4 Population Dynamics

The nature of the population dynamics may affect the languages that result from the interactions. A

basic framework for population dynamics is iterated learning, where agents learn language based on

the utterances of other agents in the population, as described in section 2.4. The Iterated Learning

Model provides a framework for investigating the vertical transmission of language through

generations of agents and the horizontal transmission of language among peers. Both vertical and

horizontal language transmission are important to language evolution with a different emphasis on

the interactions between agents. Standard implementations for population dynamics include the

Iterated Learning Model with two agents per generation (Kirby & Hurford, 2002) and negotiation

between agents in a single generation (Batali, 1998).

For languages to spread throughout the world, many games need to be played. For larger world

sizes and more agents, more games are needed to form a coherent language. The number of

generations may also affect the resulting languages, with the number of games per generation

affecting how well new agents learn the existing language.

At the end of each generation, which may occur after a set number of games, older agents are

removed from the population and new agents enter the population. New agents may have an initial

learning period in which they only take part in interactions as the hearer. The length of the learning

period influences how well the new agent learns the language before using the language.

The parameters for population dynamics are:

• generations,

• agents,

• interactions per generation, and

• initial learning period.


48

4.5 Environment

The environment used for the agents to play location language games affects the representations

underlying the concepts used and the exploration required by the agents to obtain these

representations. Three environments were used. The simplest environment, the grid world, was used

to investigate the design of the language games and agents. The next environment, the simulation

world, was used to investigate the language games and agents in simulated robots. The final

environment, the robots in the real world, was used to test the language games implementation with

real world inaccuracies.

The parameter for environment is:

• world (grid, simulation, or real).

4.5.1 Grid World

The first environment is a grid world (see Figure 4.9). The grid size may be altered and obstacles

may be placed in the world to represent walls and other features of the environment. The grid world

agents may occupy any square in the world that does not have an obstacle in it. The grid world used

is based on the worlds used in Steels’ (1995) and Bodik and Takac’s (2003) studies.

Figure 4.9 Grid world

The grid world is a grid of squares that may have obstacles. The world shown here

is a 10 × 10 grid with no obstacles. Agents may occupy any square in the world

that is not an obstacle, including squares occupied by other agents.

The parameters for the grid world are:

• size and

• obstacles.


49

4.5.2 Simulation World

A simulation world was built to mirror the real world, with images from the real world used in

constructing the views of the robot (Moylan, 2003, with additional work by Mark Wakabayashi).

The simulation world includes a room with several desks (see Figure 4.10).

Figure 4.10 Simulation world with path of robot

The simulation world of the robot, with the black lines indicating walls and the

black octagons desks, showing the path of the robot in a typical simulation run. The

robot’s behaviour is set to wall-following, where the robot chooses to follow either

the left or right wall. The room is the one on the right side of Figure 3.3b.

For the simulated robots to play language games with each other, they must first have

representations of the world with local views, pose cell representations, and an experience map. To

gain the local views, pose cell representations and experiences, the robots must explore the world.

Exploration is currently performed by left or right wall following, and is carried out independently

of other robots in the world. For the studies in the simulation world, the robots used a single

forward facing camera. The simulation world enables simulated robots to pass messages only to

other robots within a set distance of their current locations, allowing the hearing distance to be

explicitly set.

4.5.3 Real World

The robots used in the real world are Pioneer 3 DXs. The real world environment is the fifth floor of

the Axon building at The University of Queensland with halls and open plan offices. Experiments

were confined to the room shown in Figure 4.11. The robots used in the final experiments obtained

visual input from an omni-directional camera. An omni-directional camera means that a location

looks the same regardless of the direction that the robot is facing, while for a forward facing camera

opposite directions look completely different. The result is cleaner maps created using omni-


50

directional cameras, as agents are able to recognise locations more readily, particularly in longer

loops of the environment. As in the simulation world, the robots must first explore the world to

build up their representations.

Figure 4.11 Map of the real world

The robot's world of an open plan office. The real world experiments were

confined to the room to the bottom left of Figure 3.3a. A more detailed layout of

the obstacles in the room and the approximate path of the robots are shown.

The implementation issues involved in implementing the location language games in real robots

include the speakers, whether error detection will be implemented, and the robots’ batteries. A tone

generator has been developed for the robots to use. The ‘Hello’ and ‘Hear’ signals, as well as the

syllables of the words, are converted to DTMF tones produced by the robots. Language games take

place when the robots are literally within hearing distance of each other. The volume of the

speakers influences the actual hearing distance of the robots. Obstacles in the room also have an

effect on the actual hearing distance, with robots more likely to hear each other when they are

within line of sight.

Error detection may be implemented for communication between the robots. The most minimal

form of error detection is to check that the utterances match the expected format given by the

grammar of the language and the structure of the language games (see Figure 4.12). The additional

error detection of a checksum can be included for the produced words. The syllables of the

utterance are added together and form the checksum which can be checked by the hearer. The

words are only accepted if the checksum is correct.

The robots’ batteries last for about two hours before the robots need to recharge. The robots’

state at the end of each two hours can be saved, including the pose cell representation, the

experience map, and the lexicon. The state can then be reloaded for the next two hour session.

Battery life will not place limitations on the experiments in the real world, as the robots’ state at the

end of each two hours of battery life can be saved for later use.


51

Figure 4.12 Language game utterances

A condensed version of the location language game structure (Figure 4.1), with the

sequence of utterances showing the basic grammar of the language game: The

speaker says ‘Hello?’, the hearer responds with ‘Hear’, the speaker sends the word

chosen, and the hearer responds with ‘Ok’. Messages that do not follow this format

are ignored.

The parameter for the real world is:

• error detection (minimal or checksum).

4.6 Performance Measures

Performance measures may be recorded by each agent following interactions. In deciding which

measures are necessary to monitor the agents’ languages, the elements that form a good language

for locations need to be considered. In general, a language should be easy to learn, with the agents

in a population forming consistent conceptualisations and labels. A language should be expressive:

while all agents will agree on every word if only one word is used, a single worded language is not

useful. Quantitative measures are needed to determine how good a language is to aid in the design

of a language game.

The measures used throughout the studies include:

• coherence,

• specificity,

• language size,

• word coverage,

• language layout,

• word locations,

• most information templates, and

• toponym value.

Hello?

Hear

Word

Ok

Speaker

Speaker

Hearer

Hearer


52

4.6.1 Coherence

As defined by de Jong, “the coherence indicates to what extent agents use the same signal for a

certain concept” (de Jong, 1998, p.31). In this thesis, coherence has been altered from de Jong’s

definition so that coherence ranges from 0 to 1 rather than from 1/n to 1 where n is the number of

agents. For each concept, c, the coherence, C, is calculated as follows:

1

1)max(−

−=

nN

C wcc Equation 4.9

where N is the number of times a word, w, has been used for that concept, c, and n is the number

of agents. The population coherence is the average of the concept values. When all agents in the

population agree on the same word for every concept, the coherence is 1. When all agents in the

population disagree on the word for every concept, the coherence is 0.

The coherence may be calculated when the agent’s have identical concept representations, as in

the grid world. In the simulation and real worlds, robots create their own concept representations,

which do not directly match. For small environments, it is possible to translate the robots’ concept

representations to obtain an imperfect match for the concepts, allowing an approximate coherence

value to be calculated for the agents in each world.

Coherence can provide an indication of how well the agents agree with each other on words for

each possible concept.

4.6.2 Specificity

Specificity is a measure of how many concepts each word is used for, which indicates the

descriptiveness of each word. For the case where each concept is distinguished by a unique symbol,

the specificity is 1. For the case when every concept is associated with a single word, the specificity

is 0.

The specificity of a language is calculated from a graph with concepts as nodes and edges

indicating that the same word is used for the linked concepts. Specificity, σ, is calculated from the

proportion of edges that are present in the graph (de Jong, 1998, p32):

nn

v−

−= 2

21σ Equation 4.10

where v is the number of edges and n is the number of nodes in the graph. Alternatively,

specificity can be described as follows (de Jong, 1998, p32):

nn

fn n

k k

−

−=

∑ =2

12

σ Equation 4.11

where f is the frequency of a word for a concept.


53

4.6.3 Language Size

With the language size, two values can be considered. The first value is the average size of the

agent’s lexicon, including all words currently in the agent’s lexicon. The words in the agent’s

lexicon include words that have either been heard or invented by the agent. The second value is the

average number of words used by the agents for the set of concepts. The number of words used is

less than or equal to the total number of words in the lexicon. If there are generations of agents used

for the population, an indication that the language is stable is when the number of words used by the

agents is equal to the number of words in the agents’ lexicon.

4.6.4 Word Coverage

The word coverage of the language considers how words are spread through the underlying concept

representation and how many concepts each word is used for. In an expressive language each word

is used for a similar number of concepts. In an impoverished language a small number of words

may be used for most of the concepts while others are used for very few concepts. In impoverished

languages, words tend to disappear from the lexicon, resulting in languages with very few words. It

is generally desirable for languages to use words for similar numbers of concepts.

4.6.5 Language Layout

The language layout displays how the language covers the space. For a location language, each

toponym is given a different colour, and the areas in the world at which each toponym are used are

shown in the colour of the toponym. A comparison of the language layout between agents can show

if the language is shared between the agents, as an alternate measure of coherence.

4.6.6 Word locations

Word locations are an extension of the language layout in which the best location for producing

each word is shown. The location for each word is the location at which the word is the most

informative, and the difference in information value between the most and the next most

informative word is greatest.

4.6.7 Most Information Templates

Most information templates are constructed similarly to the language layout, with each word shown

individually. They consider not only the locations for which the word is the most informative, but

also those for which it is more informative than other words. Most information templates show

where the word is in the top five most informative words, indicating the general area in which that

word will be understood.


54

4.6.8 Toponym Value

The toponym value is the information value of the word–location combination for the word used at

the interaction location. The value of the toponym at the interaction location is an indication of how

appropriate that toponym is for the current location.

4.7 Summary

This chapter has presented the methodology used in the RatChat project, specifically for the studies

described in the following three chapters. The location language game structure was described, with

the concept representations, word representations, lexicons, population dynamics, environment, and

performance measures. The parameters for each feature of a location language game are presented

in Table 4.1.

Table 4.1 Parameters for a location language game

Feature Parameters Location language game Game Hearing distance Concept representations Concept type Concept representation

Word representations Word representation Lexicon Technique

Simple neural networks Networks

Initial learning rate Decreasing ratio

Momentum constant Increasing ratio

Source of variability

Recurrent neural networks Hidden units Weight setting mechanism

Networks Source of variability

Lexicon tables

Strategy for choosing words Strategy for words invention

Word invention rate Temperature

Strategy for associating Word absorption rate

Threshold

Distributed lexicon tables

Strategy for choosing words Forgetting

Strategy for words invention Word invention rate

Temperature

Neighbourhood size Updating

Word absorption rate Threshold

Population dynamics Generations Interactions per generation

Agents Initial learning period

Environment World Grid world Size Obstacles Real world Error detection

The following three chapters describe the major studies of this thesis: Pilot Study: Methods and

Representations, Study 1: A Toponymic Language game, and Study 2: A Generative Spatial

Language Game.

55

Chapter 5 Experimental Design

Someone had drawn a tree. … It was simple because something complex

had been rolled up small; as if someone had drawn trees, and started with

the normal green cloud on a stick, and refined it, and refined it some more,

and looked for those little twists in a line that said tree and refined those

until there was just one line that said TREE.

(Pratchett, 1998, p.345)

A key question regarding the meaningful usage of spatial languages is whether they can be formed,

what they are like, and how they can be learned. Chapter 5 investigated concept representations and

methods for associating concepts with words.

In the RatChat project, the meaning representations are obtained from mobile robots exploring

their world, with possibilities including what they see, and the equivalent of a cognitive map of the

world, built from exploration. Names for places in the world are the most obvious concepts that

may be obtained from a map of the world. The series of studies described in this chapter

investigated methods for associating concepts and words, and a variety of representations that could

be used to form the spatial concepts used in a location language game.

The overall goal of the pilot studies was to determine the features necessary for agents to form a

spatial language. The specific features included the structure of the language game, the concept

representations, the word representations, the lexicon, the population dynamics, and the

environment (as outlined in the previous chapter). Two pilot studies investigated the features:

• Pilot Study 1: Methods – Recurrent Neural Networks and Lexicon Tables

• Pilot Study 2: Representations – Pose Cells, Vision, and Experiences

Following the description of the pilot studies is a discussion of the implications for this thesis1.

5.1 Pilot Study 1: Methods – Recurrent Neural Networks and Lexicon Tables

In a location language game (described in Chapter 4: A Location Language Game), the lexicon

associates concepts with words and is used by the speaker to produce a word for the chosen topic

1 This chapter describes pilot studies that set up the later work and are included for completeness. The reader

interested in studies with more significant outcomes should see Chapters 6 and 7.


56

and by the hearer to comprehend the word used by the speaker. The first pilot study investigated

two techniques for the lexicon to associate concepts with words:

• recurrent neural networks and

• lexicon tables.

The aim of Pilot Study 1 was to determine whether recurrent neural networks and lexicon tables

were appropriate for use in a location language game and to determine how these techniques could

be used. Following the description of the studies is a general discussion of the techniques for

lexicons.

5.1.1 Pilot Study 1A: Recurrent Neural Networks

For a recurrent neural network lexicon, the implementation choices to be made were the number of

networks, the weight setting mechanisms, and the source of variability (for more detail refer to

section 4.3.2). The other choices relevant to the lexicon were the concept representations and the

word representations. In Pilot Study 1A: Recurrent Neural Networks, the source of variability was

an evolved network. Following the evolution of a language, two learning networks were trained on

the language (one for production, the other for comprehension). The concept representations were

arrays of units. Due to the nature of the study, the actual concept types were not relevant, though

this concept representation could be interpreted as either specific locations or particular directions

or distances. The word representations were sets of activations comprising three syllables with a

number of units active for each syllable. The words were set to be multi-syllabic to allow for the

possibility of compositional languages. This study involved two studies that investigated the

remaining choices, with various weight setting mechanisms, concept representations, and word

representations. The study was divided into:

• weight setting mechanisms and

• concept and word representation.

The aim of Pilot Study 1A: Recurrent Neural Networks was to investigate how an expressive

language can be evolved and learned using recurrent neural networks, and to investigate any

limitations on the languages produced.

Pilot Study 1Ai: Weight Setting Mechanisms

The selection strategies and weight setting mechanisms that work best for evolving expressivity are

very different than those for learning in recurrent neural networks. In this study, the task of the

evolving networks was to find expressive languages, while the task of the learning networks was to

learn those expressive languages. The aim of the Weight Setting Mechanisms study was to

determine the best weight setting mechanism for each network task with respect to the ability for


57

evolved networks to find expressive languages and the ability of learning networks to learn

expressive languages.

The features to be defined for Pilot Study 1Ai included the concept representations, the word

representations, the number of hidden units, the connections to the context units, and the weight

setting mechanisms (see parameters in Table 5.1). The concept representations were an array of 25

units with one-hot encoding (each pattern had a single unit active, representing a single category).

The word representations were three syllables with two out of ten units active for each syllable (see

example word in Figure 5.2). The number of hidden units was set to 15. For the word production

networks, the output units as well as the hidden units were copied to the context layer for the next

time step. The eight conditions investigated were different weight setting mechanisms: Back

Propagation Through Time (BPTT) and the various evolutionary strategy mutation operators of

high (0.1), medium (0.01) and low (0.001) values of mutation, the 1/5 success rule, the reverse 1/5

success rule, mutating a single operator, and mutating a vector of operators (for more detail refer to

section 4.3.2). All conditions except for BPTT were tested for the evolved network providing the

source of variability (see Figure 5.1a). BPTT was not used for the evolved network as this weight

setting mechanism is not suitable for evolution. All conditions were tested for the two learning

networks providing word production and concept comprehension (see Figure 5.1b,c).

To investigate the source of variability for recurrent neural networks, networks were evolved for

5000 generations with 50 runs for the seven evolution weight setting conditions. A simple form of

evolutionary algorithm, the (1+1)-evolution strategy (Beyer & Schwefel, 2002), was used to evolve

the networks which were selected based on the expressivity of the languages produced. A measure

used for the evolved networks was expressivity or the average number of different words in the

language. Expressive languages were defined as those with the maximum possible number of words

(i.e. 25, corresponding to one word for each concept).

Table 5.1 Parameters for Pilot Study 1Ai

Parameters Pilot Study 1Ai Concept representation 25 units, one-hot encoding

Word representation Activation of units: 3 syllables of 10 output units with 2 active for each syllable

Lexicon technique Recurrent neural networks Hidden units 15

Networks Production and comprehension Weight setting

mechanism High, medium, low, 1/5 success, reverse 1/5 success, mutate single, mutate vector, BPTT

Source of variability Evolved network


58

a)

b)

c)

Figure 5.1 Word production and concept comprehension networks (Pilot 1Ai)

There were three types of network used in Pilot Study 1Ai: a) Word Production

(Evolving), b) Word Production (Learning), and c) Concept Comprehension

(Learning). The network structure for Word Production (Evolving) and (Learning)

are identical. The word production networks take the concept representation as

input, presented to the network at the first time step. The word produced was the

output of the network over the following three time steps, with the two most active

units set to 1. The concept comprehension network takes the word representation as

the input, presented over three time steps. The output at the final time step was the

concept comprehended.

Concept representation

Word representation

Hidden units

Context units

Word Production (Evolving)


Word representation

Hidden units

Context units

Word Production (Learning)


Word representation

Hidden units

Context units

Concept Comprehension (Learning)


59

Figure 5.2 Word representation (Pilot 1Ai)

Words consisted of a sequence of three syllables. Each syllable was represented by

a ten unit binary vector in which the two most active units were set to one.

Expressive languages resulted from all runs with the high consistent value of mutation and the

reverse 1/5 success rule (see Table 5.2). The time to high expressivity was much less for the high

mutation rate with the minimum generations of 31 compared to 437 for the reverse 1/5 success rule.

A moderate proportion of runs resulted in expressive languages for the medium consistent value

(54%), the 1/5 success rule (84%), and mutating the operator (48% and 66%), while none of the

runs with low consistent value resulted in expressive languages.

Table 5.2 Source of variability (Pilot 1Ai)

Weight Setting Mechanism Consistent Mutation Rate Variable Mutation Rate

Measure High Medium Low 1/5 success

Reverse 1/5 success

Mutate single operator

Mutate vector of operators

% Runs with Expressive Languages

100% 54% 0% 84% 100% 48% 66%

Minimum Generations 31 233 N/A 398 437 348 265

Average Words 25.0 24.1 15.6 24.6 25.0 23.6 24.1

Following the evolution of expressive languages, word production networks were trained to

produce the language and concept comprehension networks were trained to comprehend the

language. In each run, networks were trained for 5000 generations or epochs. One expressive

language was randomly chosen for each of the six conditions that produced expressive languages.

There were ten runs for each of these six languages. A measure used for word production was how

different the language produced was from the target language, or how many syllables were left for

we

time = 1

time = 2

time = 3

si li


60

the network to learn. A measure used for concept comprehension was the number of words out of

25 that were understood correctly.

For word production and concept comprehension the 1/5 success rule, a low mutation rate, and

BPTT performed well (see Table 5.3 and Table 5.4). For word production, the 1/5 success rule

performed well (28.1 syllables left to learn), as did the low mutation rate (36.0 syllables left to

learn). All of the other evolutionary strategies had more than 70 syllables left to learn. The networks

using BPTT had only 15.1 syllables left to learn at the end of the training. For concept

comprehension, the low mutation rate and the 1/5 success rule performed well (17.8 and 15.7 words

understood). All concept comprehension networks using BPTT learnt to comprehend the language

correctly.

Table 5.3 Word production (Pilot 1Ai)

Weight Setting Mechanism Consistent Mutation Rate Variable Mutation Rate Learning


Reverse 1/5

success

Mutate single

operator


BPTT

% Runs with Correct Language

Production 0% 0% 0% 0% 0% 0% 0% 6.6%

Min. Generations or Epochs N/A N/A N/A N/A N/A N/A N/A 2707

Average Syllables Left to Learn 108.2 71.9 36.0 28.1 111.8 72.9 89.6 15.1

Table 5.4 Concept comprehension (Pilot 1Ai)

Weight Setting Mechanism Consistent Mutation Rate Variable Mutation Rate Learning


Reverse 1/5

success

Mutate single

operator


BPTT

% Runs with Correct Language

Comprehension 0% 0% 0% 0% 0% 0% 0% 100%

Min. Generations or Epochs N/A N/A N/A N/A N/A N/A N/A 357

Average Words Correct 4.7 10.2 17.8 15.7 5.5 10.1 7.6 25

For evolving expressive languages, the most effective weight changing mechanisms were a high

mutation rate and the reverse 1/5 success rule. The only useful weight setting mechanism for word

production and concept comprehension was BPTT.


61

Pilot Study 1Aii: Concept and Word Representation

The form of the concept and word representations for the recurrent neural networks can affect the

agents’ ability to evolve and learn expressive languages. The aim of Pilot Study 1Aii was to

investigate possible representations of the concept and word representation and to determine their

effect on the agent’s abilities to evolve and learn expressive languages.

Three different concept representations were investigated: a one-hot representation where a

single unit was active and two non-orthogonal representations which included a spread of activation

around the most active unit (see Figure 5.3). These representations were chosen to explore the

differences between orthogonal and non-orthogonal patterns, and to determine whether the size of

spread affected the evolved languages. Three word representations were compared in the

simulations. Each consisted of three syllables implemented as binary vectors of ten units with one,

two or five units active (see Figure 5.4). The network structure for each type of network (Word

Production (Evolving), Word Production (Learning), and Concept Comprehension (Learning)) were

the same as for Pilot Study 1Ai (see Figure 5.1).The other parameters were as for Pilot Study 1Ai

(see Table 5.5).

Figure 5.3 Concept representations (Pilot 1Aii)

Three different concept representations: The top line shows a one-hot encoding

with one out of 25 input units active. The second and third lines show non-

orthogonal representations with five and nine active units.

Figure 5.4 Word representation (Pilot 1Aii)

Words consist of a sequence of three syllables. Each syllable is represented by a

ten unit binary vector in which one, two or five most active units are set to one.

The top line shows the raw activations of the units. The second, third, and fourth

lines show syllables with the one, two, or five most active units set to 1.0.


62

Table 5.5 Parameters for Pilot Study 1Aii

Parameters Pilot Study 1Aiii

Concept representation 25 units with one-hot encoding, 5-spread activation, 9-spread activation

Word representation Activation of units: 3 syllables of 10 output units with 1, 2, or 5 active for each syllable

Lexicon technique Recurrent neural networks Hidden units 15

Networks Production and comprehension Weight setting mechanism High for evolving, BPTT for learning

Source of variability Evolved network

To investigate the different concept and word representations with the source of variability for

recurrent neural networks, networks were evolved until expressive languages were formed (one

word for each of the 25 concept patterns), or for 10,000 generations. Ten languages were evolved

for each concept and word representation combination. Languages with a single unit active for each

syllable of the word representation took a long time to evolve with only five expressive languages

evolved after 10,000 generations. The single unit output representation was excluded from further

analysis.

The networks with their activation spread across the word representation evolved expressive

languages in less than 10,000 generations (see Table 5.6). For each concept representation

condition, the word representation with five units active took fewer generations to find an

expressive language than the word representation with two units active.

Table 5.6 Generations to expressive languages (Pilot 1Aii)

Word Representation Concept

Representation 10 units with

2 active 10 units with

5 active 9-spread activation 5536.0 870.6 5-spread activation 6991.4 1702.0 one hot encoding 1460.6 716.5

To investigate the different concept and word representations with word production and concept

comprehension, networks were trained for 1000 epochs using the back propagation through time

algorithm (Rumelhart et al., 1994). There were five runs for each of the ten languages evolved for

each condition. Languages using the concept representation with a spread of activation were easier

for the concept comprehension networks to learn than a representation where each concept is

orthogonal (see Figure 5.5). The structure of most of the languages produced was such that most

words were very similar with many elements of the words in common across the words in the

language.


63

a)

b)

Figure 5.5 Training networks on evolved languages (Pilot 1Aii)

Training networks on the languages evolved for a word representation of ten units

with a) two active and b) five active. The word production networks learned the

languages similarly, while the concept comprehension networks with a spread of

activation for the concept representation learned the languages more quickly than

with one hot encoding.

The language features affected by the concept and word representations were the speed of

evolution of expressive languages and how learnable the languages were for concept


64

comprehension. A spread of activation in the concept representation resulted in faster

comprehension learning. A spread of activation in the word representation enabled expressive

languages to be found more easily. For agents using a recurrent neural network lexicon, the

structure of the concept and word representations affected how quickly languages were evolved and

learned.

5.1.2 Pilot Study 1B: Lexicon Tables

For a lexicon table, parameters need to be set for the strategies for choosing words, associating

concepts, and inventing words (for more detail refer to section 4.3.3). Values needed to be set for

the relevant parameters of word absorption, word invention, threshold, and temperature. The other

choices relevant to the lexicon were the concept and word representations.

In Pilot Study 1B, the threshold strategy was used for word invention with a threshold of 0.0

(agents invented a word when a concept had no associated word). The concept representation was a

set of categories. The word representation was text, with an arbitrary string of syllables invented for

each new word. The agents using lexicon tables formed the language through negotiation games.

The task was a guessing game in which a category was chosen by the speaker, a word was found for

the category, and the hearer used the word to find the category. Measures included average success

and lexical coherence. In a successful game, the concept chosen by the speaker was understood by

the hearer. For average success, the success was averaged over every 25 games. Lexical coherence

represents the proportion of agents that use the same words for the same categories. The study was

divided into:

• strategies and

• word creation and absorption.

The aim of Pilot Study 1B: Lexicon Tables was to investigate how a language could be formed

and learned using lexicon tables, and to investigate any limitations of the languages produced, such

as how long agents take to form successful languages, and the coherence of the resulting languages.

Specifically, the strategies, word creation rates, and word absorption rates were compared.

Pilot Study 1Bi: Strategies

Pilot Study 1Bi tested the different types of strategies that can be used to construct the lexicon table.

Strategies were compared with how many interactions agents needed to reach high levels of

success, and the lexical coherence of agents at high levels of success. The study aimed to find the

most appropriate strategies for a language agent using a lexicon table to associate concepts and

words and to produce words. The speed to success and desired features of the language, such as

whether synonyms were included in the language, were considered.


65

Four conditions were tested with two strategies for the associations between the concepts and

the words and two strategies for word production. The strategies for associations were the score

model and the usage and confidence probability model. The strategies for word production were the

normal strategy and the introspective obverter strategy. A summary of the parameters used is given

in Table 5.7.

Table 5.7 Parameters for Pilot Study 1Bi

Parameters Pilot Study 1Bi Concept representation 5 categories

Word representation Text (syllables) Lexicon technique Lexicon table

Strategy for choosing words Normal or introspective obverter Strategy for associating Score or usage and confidence probability

Strategy for word invention Threshold Word absorption rate 1.0Word invention rate 1.0

Threshold 0.0Generations 1

Agents 5Games 1000

Five agents and five concepts were used. The word invention and absorption rates were both set

to 1. If the speaker did not have a word for the chosen concept, a new word was always invented. If

the hearer had not heard a word before then that word was always added to their lexicon. The

population continued playing language games until 90% average success was obtained. Fifty

populations for each condition evolved languages to 90% average success.

For the agents using scores, the lexical coherence was much higher (both 0.92) than for those

using usage and confidence probability (0.79 and 0.78) (see Table 5.8). For the agents using scores,

the strategy used made a difference in the number of games to an average success of greater than

90%. The obverter strategy took longer than the normal strategy (183.5 games compared to 154

games). See Figure 5.6 for typical runs of each of the strategies.

Table 5.8 Results for different strategies (Pilot 1Bi)

Strategy Usage and Confidence Probability Score for each Word-Category Pair

Measure Normal Introspective Obverter Normal Introspective Obverter Games to 90%

Average Success 139.5 145.0 154.0 183.5

Lexical Coherence at 90% Average Success 0.79 0.78 0.92 0.92


66

a) b)

c) d)

Figure 5.6 Typical runs (Pilot 1Bi)

The average success and lexical coherence of typical runs with the different

strategies for five agents and five categories: a) usage and confidence probability

with normal, b) usage and confidence probability with introspective obverter, c)

score for concept-word pairs with normal, and d) score for concept-word pairs with

introspective obverter. The graphs show that the lexical coherence for the runs

using the usage and confidence probability was lower than those using a score for

each word-category pair. The introspective obverter performed similarly to the

normal strategy.

With the different strategies, the lexical coherence was higher for those using scores than for

those using usage and confidence probability. There were more synonyms in the languages created

using usage and confidence probability. Populations using the obverter strategy took more games to

reach consensus than populations using the normal strategy, although no other difference was

noticed between the strategies.


67

Pilot Study 1Bii: Word Creation and Absorption

During language game interactions, the speaker agent may not have a word for a particular concept

and the hearer agent may not have heard the word used by the speaker. In these situations the agents

may probabilistically create and absorb words. Changing the rates of creation and absorption will

alter the rate at which populations reach a consensus and may affect features of the language, such

as the lexical coherence. This study investigated how changing the rates alters the rate at which

populations reach high levels of success and causes different levels of lexical coherence. The study

aimed to find the most appropriate word creation and absorption rates for language agents using

lexicon tables to associate concepts and words.

Both small and large populations were investigated with varying word creation and absorption

rates. Twenty conditions were tested: for a small population (five agents and five categories) and a

large population (twenty agents and twenty categories), one of the rates was set to 1.0 while the

other varied between 0.2 and 1.0 at increments of 0.2. The score method was used to reduce the

existence of synonyms, and the introspective obverter strategy was used as there was little

difference observed between the normal and obverter strategies.

The population continued playing language games until 90% average success was reached.

Twenty populations for each condition evolved languages to 90% average success. The number of

games to 90% average success and the lexical coherence at 90% average success were recorded for

each run. A summary of the parameters used is given in Table 5.9.

Table 5.9 Parameters for Pilot Study 1Bii

Parameters Pilot Study 1Bii Concept representation 5 categories, 20 categories

Word representation Text (syllables) Lexicon technique Lexicon table

Strategy for choosing words Introspective obverter Strategy for associating Score

Strategy for word invention Threshold Word absorption rate 0.2, 0.4, 0.6, 0.8, 1.0 Word invention rate 0.2, 0.4, 0.6, 0.8, 1.0

Threshold 0.0Generations 1

Agents 5, 20Games To 90% average success

For five agents and five categories, the lexical coherence was between 0.91 and 0.95 when the

word absorption rate was 1.0, and dropped to 0.83 when the word absorption rate was 0.2 (see

Figure 5.7). A similar result was obtained for twenty agents and twenty categories, with the lexical


68

coherence between 0.82 and 0.94 when the word absorption rate was 1.0 and 0.72 when the word

absorption rate was 0.2.

a) b)

c) d)

Figure 5.7 Word creation and absorption results (Pilot 1Bii)

a) Games to 90% average success for five agents and categories, b) Lexical

Coherence at 90% average success for five agents and categories, c) Games to 90%

average success for twenty agents and categories, d) Lexical Coherence at 90%

average success for twenty agents and categories. For the smaller population, the

number of games to 90% average success was lowest with word creation and

absorption rates set to 1.0. For the larger population, the number of games to 90%

average success was lowest with word creation set to 0.2 and word absorption set

to 1.0.

A greater difference between conditions can be seen for the number of games to 90% average

success. For five agents and five categories, when the word absorption rate was 1.0, the average

games to 90% average success ranged between 167 and 204. When the word absorption rate was

0.2, the average games to success increased to 564. For twenty agents and twenty categories, the

average games to success were higher when the word absorption was 0.2 at 22,135 games than

when the word absorption rate was 1.0 at 7445 games. However, there was also a difference in the


69

average number of games to success as the word creation rate changes with 5280 games at 0.2

compared to 7685 at 1.0.

Smaller populations reached high levels of success faster with high word creation and

absorption rates, while larger populations reached high levels of success faster with high word

absorption rates and low word creation rates. With lower absorption rates populations took longer to

reach a consensus of word-category pairings, as new words were not absorbed every time they were

encountered.

5.1.3 Discussion for Pilot Study 1

Pilot Study 1 investigated two methods for language agents to evolve and learn languages. The

simulations in Pilot Study 1A investigated features of the recurrent neural network language agent,

including the weight setting mechanisms, the concept representations, and the word representations.

The lessons learned from Pilot Study 1A were:

• a weight setting mechanism that allowed networks to evolve to find an expressive

language quickly was a high mutation rate,

• a weight setting mechanism that allowed networks to learn a target quickly was back

propagation through time,

• the word representations should provide at least as many words as concepts to enable

networks to associate unique words with each concept,

• a spread of activation in the word representation enabled expressive languages to be

found sooner, and

• a spread of activation in the concept representation resulted in faster comprehension.

The simulations in Pilot Study 1B investigated features of the lexicon table including strategies

to produce words, comprehend concepts, and update associations. The lessons learned from Pilot

Study 1B were:

• synonyms were common in languages formed with the usage and confidence strategy,

• lexical coherence was higher in languages formed with the score strategy,

• populations using the introspective obverter strategy took longer to reach high levels of

success than populations using a ‘normal’ strategy for word production,

• for small populations, a high word creation and word absorption rate resulted in a short

time to reach high levels of success and high lexical coherence, and

• for large populations, a high level of word absorption and a medium level of word

creation resulted in a short time to high levels of success and high lexical coherence.


70

Recurrent neural networks and lexicon tables were able to associate simple concepts with words.

Pilot Study 1 resulted in a clearer understanding of the features of the strategies used by the agents

and the concept and word representations.

5.2 Pilot Study 2: Representations – Pose Cells, Vision, and Experiences

Pilot Study 1 investigated two methods for the lexicon to be used in a location language game.

Another feature to investigate was the concept representations. The concept representations are used

by the speaker to choose the topic, together with the word representations and lexicon to produce

the word, and by the hearer for comprehension. The studies in Pilot Study 2 investigated one or

more of word production, concept comprehension, and the source of variability for words. Three

different concept representations available to robots using RatSLAM were investigated:

• pose cells,

• vision, and

• experiences (for more detail about the representations refer to section 3.7.3).

The aim of Pilot Study 2 was to determine whether pose cells vision and experiences are

appropriate concept representations for use in a location language game. Following the description

of the studies is a general discussion of the concept representations available.

5.2.1 Pilot Study 2A – Pose Cells and Vision

Pilot Study 2A2 compared two of the robot representations: pose cells and vision. A series of studies

investigated how pose cells and vision can be used to learn, categorise, and generalise where words

refer to locations.

Pilot Study 2Ai: Learning Symbols for Locations

The aim of this study was to investigate word production and to determine whether agents could

learn labels for locations where the concept representations were pose cells and vision. The features

to be defined for Pilot Study 2Ai included the concept representations, the word representations, the

2 This section is based in part on work published in Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006).

Towards a spatial language for mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of

Language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298). Singapore: World Scientific

Press; and in Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Generalization in languages evolved for

mobile robots. In L. M. Rocha, L. S. Yaeger, M. A. Bedau, D. Floreano, R. L. Goldstone & A. Vespignani (Eds.),

ALIFE X: Proceedings of the Tenth International Conference on the Simulation and Synthesis of Living Systems (pp.

486-492): MIT Press. Both of these papers were based on work done under the supervision of Janet Wiles, with design

discussions and writing assistance from Paul Stockwell and Mark Wakabayashi.


71

lexicon technique with associated features, the source of variability, and the environment (see Table

5.10).

Table 5.10 Parameters for Pilot Study 2Ai

Parameters Pilot Study 2Ai Concept Type Location

Concept representation Pose cells, vision Word representation 18 units, one-hot encoding

Lexicon technique Simple neural network Networks Production only

Momentum constant 0.9Initial learning rate 0.01

Increasing ratio 1.05Decreasing ratio 0.7

Source of variability Pre-set target concepts Environment Simulation world

Simulated robots initially explored the simulation world until a stable pose cell representation

was obtained. The pose cell and vision representations used in the study were obtained as the robot

continued to explore the world, following the left wall (see Figure 5.8a).

The concept representations investigated were the raw pose cell representation, a reduced pose

cell representation, and vision. The visual representation was every 100th scene in a series of 10,000

visual scenes of 12 × 8 grey scale arrays obtained from a run of the robot wandering in the

simulated world. The pose cell representation was the corresponding 100th representation in a series

of 10,000 pose cell representations from the same run., In the reduced pose cell representation each

group of 4 × 4 × 4 pose cells was averaged. The word representations were a one hot encoding

with each pattern in training set associated with one output unit. To obtain the target outputs, the

world was divided into 4m2 squares (see Figure 5.8b), resulting in 18 output units with 18 of the

squares visited by the robot. Each of the 18 output units correspond to a word describing a location.

Networks were trained to associate the concept representations with the word representations.

The networks were simple neural networks (see Figure 5.9, for more detail refer to section 4.3.1).

The network was trained on the targets for 1000 epochs using gradient descent with momentum and

adaptive learning rate. The training stopped early if the goal was reached or if the gradient had

reached the minimum gradient. The networks were tested on the full set of patterns.

The networks for all of the concept representations were able to learn the training set with 4m2

squares. When tested on the larger set the networks extrapolated between the learned patterns in

different ways (see Figure 5.10). The raw and reduced pose cell representations allowed the

network to generalise to patterns that shared the activations of the training set. However, many

patterns in the test set contained pose cells that were not active in any of the patterns in the training


72

set. In the pose cell runs, the network could not generalise effectively from the training to the test

set. For the vision representation there was no clear correlation between locations and words.

a)

b)

Figure 5.8 Robot route and target concepts (Pilot 2Ai)

a) Data was obtained from the robot completing a circuit of the world after a stable

pose cell representation was achieved. The robot started at the square moving

towards the right and following the left wall. The robot moved along the two

corridors, around the room to the right, back through one of the corridors, around

the room in the middle, then completed the loop with the corridor to the left and

finished at the circle. b) The targets outputs shown in different colours along the

route of the robot, with the 4m2 squares show how the world was divided. Each

target output corresponds to a word for a location.

Figure 5.9 Production network (Pilot 2Ai)

The networks used in Pilot Study 2Ai were simple neural networks with concept

representations as inputs (in the form of raw pose cells, reduced pose cells or

vision), and the word representation as outputs (18 output units).

Concept representation (Raw Pose Cell / Reduced Pose Cell / Vision)

Word representation (18 Output Units)


73

Raw and reduced pose cell representations allowed clusters of patterns of about 1m2 and 2m2.

Vision did not directly indicate position. Pilot Study 2Ai showed that differences in the languages

can result from different concept representations. For a language about location, vision was not an

ideal representation, and the different pose cell representations were able to cluster different area

sizes.

a)

b)

c)

Figure 5.10 Language layout (Pilot 2Ai)

For each of the concept representations: a) raw pose cell, b) reduced pose cell, and

c) vision, the language layout shows the word used for the pattern at each location

in the route of the robot with a different colour for each of the 18 words. Note that

for the pose cell inputs, there were clusters of locations where each word was used.

The clusters for the robots with vision for input were not correlated with location.


74

Pilot Study 2Aii: Categorisation

A further study was undertaken to investigate the ability of agents to invent and comprehend words

using vision and pose cell representations. Pilot Study 2Aii aimed to determine if agents could

effectively categorize vision and pose cell representations. The features to be defined for Pilot

Study 2Aii included the concept representations, the word representations, the lexicon technique

with associated features, the source of variability, and the environment (see Table 5.11).

Table 5.11 Parameters for Pilot Study 2Aii

Parameters Pilot Study 2Aii Concept type Location

Concept representation Pose cells, vision Word representation 10 units, 2 active, 3 syllables

Lexicon technique Recurrent neural network Hidden units 50

Networks Production and comprehension Weight setting mechanism BPTT

Source of variability Evolved network Environment Simulation world

The concept representations used were vision, pose cells, and processed pose cells. The visual

representation was every 100th scene in a series of 10,000 visual scenes of 12 × 8 grey scale arrays

obtained from a run of the robot wandering in the simulated world. The pose cell representation was

the corresponding 100th representation in a series of 10,000 pose cell representations from the same

run, with the number of cells reduced from 440,640 to 610 by reducing the resolution of the pose

cells (4 × 4 × 4 pose cells to 1 pose cell), and disregarding cells that were inactive for the entire

run. As an alternate representation, the pose cells were pre-processed using a hybrid system based

on Self Organising Maps (SOMs) (Kohonen, 1995). In the pre-processing system, a SOM was

trained on the input series for 1000 epochs. The output of the SOM was a 12 × 8 set of competitive

units organised in a hexagonal pattern. To construct a distributed activation the actual output values

of the units were converted to values between 0 and 1.

Recurrent neural networks were used as the lexicon to associate the concept representations with

the word representations, with a separate network for production and comprehension (see Figure

5.11, for more detail refer to section 4.3.2). The word representation was a sequence of three

syllables. Each syllable was represented by a ten unit binary vector in which the two most active

units were set to 1, with all other units set to 0.

One way to measure understanding is to test how well an agent has categorised the world. The

representations of the world were presented to the word production network, resulting in words

associated with each of the patterns. Concept comprehension networks produced a prototype for


75

each of the unique utterances. If the original input pattern was closest to the prototype for the

utterance used, the pattern was correctly categorised.

Ten networks were evolved individually for 100 generations to produce languages to categorise

the world based on each set of inputs: vision, pose cell representations and processed pose cell

representations. A simple (1+1)-evolutionary strategy (Beyer & Schwefel, 2002) was used to evolve

the agent’s speaker, introducing variability in the language. In each generation, a comprehension

network was trained on the language for 500 epochs using the back propagation through time

algorithm (Rumelhart et al., 1994). The comprehension networks produced a prototype for each

unique word in the language, which could be compared to the original input pattern. The languages

were evaluated with a fitness function based on how well the world was categorised. If the mutant

languages was better categorised than the current champion language, then the mutant became the

champion. The languages produced for each concept representation were compared for

expressiveness and categorisation.

a)

b)

Figure 5.11 Production and comprehension networks (Pilot 2Aii)

There were two types of network used in Pilot Study 1Ai: a) Word Production

(Evolving) and b) Concept Comprehension (Learning). The network structures are

the same as for Pilot Study 1A (see Figure 5.1).

The vision languages output by the word production networks had an average of 24.2 words (see

Table 5.12). The average number of scenes correctly categorised by the concept comprehension


Word representation

Hidden units (50)

Context units

Word Production (Evolving)


Word representation

Hidden units (50)

Context units

Concept Comprehension (Learning)


76

networks was 53.4 out of 100. One highly expressive language was evolved with 67 unique words

of which 47 were associated with single scenes. Words often appeared to group several different

types of images together, with the resulting prototype visual scene output by the concept

comprehension network being a combination of the scenes. One set of similar scenes was where the

robot faced a white wall with a strip of black next to the floor. All of the languages other than the

most expressive language grouped together some of these scenes (see Figure 5.12).

Table 5.12 Word production and concept comprehension (Pilot 2Aii)

Concept Representation Measure ( x (σ)) Vision Pose Cells Processed Pose Cells

Number of Unique Words 24.2 (17.3) 23.3 (12.4) 10.9 (6.4) Number of Patterns

Correctly Categorised 53.4 (13.5) 22.6 (10.4) 58.7 (10.4)

Figure 5.12 Vision prototype and scenes (Pilot 2Aii)

The prototype output by the concept comprehension network for the word 'kufufu'

(top left) and the five scenes associated with ‘kufufu’ by the word production

network in a language with 27 unique words. Most of the scenes associated with

‘kufufu’ showed a white wall with a black strip, although the bottom middle scene

had different features. (Reproduced from Figure 3 in Schulz, Stockwell,

Wakabayashi, & Wiles, 2006b)

The pose cell languages output by the word production networks had an average of 23.2 words.

The average number of pose cell patterns correctly categorised by the concept comprehension

networks was 22.6 out of 100. The majority of the words were associated with single patterns or a

small number of patterns, scattered across the space. Some words grouped together patterns that

were close together in space, but were also generally associated with a small number of patterns

from other areas. The processed pose cells languages output by the word production networks had

an average of 10.9 words. The average number of processed pose cell patterns correctly categorised

by the concept comprehension networks was 58.7 out of 100. Processed pose cell languages tended


77

to have fewer words associated with single patterns and more words associated with many patterns

spread across the entire space. However, the larger languages had more words associated with

groups of patterns that were close together in space.

The number of unique words in a language indicates the expressivity of that language. The

vision and pose cell representations resulted in languages with an average of over 20 unique words

for the 100 patterns, while the processed pose cell representation resulted in languages with an

average of just over ten unique words. The reduction in expressivity indicated that the unique

information in some of the input patterns was lost during processing.

The agents using languages evolved with the vision and the processed pose cell representations

were able to correctly categorise over half of the patterns, while the pose cell representation

languages were only able to correctly categorise an average of 22.6 of the 100 patterns. The

processed pose cell languages were better at clustering patterns that were close together in space,

with more distinct clusters of patterns associated with single words. Some of the agents using

languages evolved with vision were able to group together similar images, however many of the

words grouped together images that were dissimilar and many of the words were associated with

single images. Pure vision as a concept representation may not extract enough information out of

each scene for a structured language to evolve.

Pilot Study 2Aiii: Generalisation

This study investigated word production, concept comprehension, and the source of variability for

words in agents using the concept representations of pose cells and vision, raw or processed with

self organising maps or principle component analysis. The aim of Pilot Study 2Aiii was to

determine if agents could generalise from the existing lexicon when using pose cells and vision as

concept representations. Specifically, the agents’ ability to generalise from the training set to the

test set was investigated. Generalisation may occur in the use of novel words for novel concepts,

and the ability to use the novel words in a way that allows the world to be categorised effectively.

The features to be defined for Pilot Study 2Aiii included the concept representations, the word

representations, the lexicon technique with associated features, the source of variability, and the

environment (see Table 5.13).

The visual representation was every tenth scene in a series of 10,000 visual scenes of 12 × 8

grey scale viewed by the robot exploring the simulation world. The series of 1000 scenes was

analysed using hierarchical clustering to determine 30 clusters of images. The image closest to the

mean for each of the 30 clusters was chosen for evolving and training the networks. The dissimilar

scenes (see Figure 5.13a) were spread throughout the robot’s world (see Figure 5.13b).


78

Table 5.13 Parameters for Pilot Study 2Aiii

Parameters Pilot Study 2Aiii Concept type Location

Concept representation Pose cells, vision Word representation 10 units, 2 active, 3 syllables

Lexicon technique Recurrent neural network Hidden units 50

Networks Production and comprehension Weight setting mechanism BPTT

Source of variability Evolved network Environment Simulation world

a)

b)

Figure 5.13 Scenes and their location (Pilot 2Aiii)

Visual scenes for Pilot Study 2Aiii showing a) 30 dissimilar scenes as seen by the

simulated robot in 12 × 8 greyscale and b) the location of the robot for each of the

dissimilar scenes. The scenes were evenly spread throughout the world, with higher

concentrations in the corners, where the visual input of the robot changes more

quickly due to the rotation of the robot. (Extended from Figure 3 in Schulz,

Stockwell, Wakabayashi, & Wiles, 2006a)


79

The pose cell input was the corresponding tenth pattern in a series of 10,000 pose cell

representations, obtained from the same run of the robot (see Figure 5.14a). The number of pose

cells was reduced from 440,640 to 947 by reducing the resolution of the pose cells (180 × 68 × 36

cells to 45 × 17 × 9 cells) and by discarding cells that were inactive for the entire run (6885 cells to

947 cells). The pose cell inputs were analysed using hierarchical clustering to find 30 pose cell

patterns for presenting to the language agents. The position of the robot for each of the 30 pose cell

patterns was spread throughout the world (see Figure 5.14b).

a)

b)

Figure 5.14 Pose cell map and location of pose cell patterns (Pilot 2Aiii)

Pose cell representation for Pilot Study 2Aiii showing a) the projection into the x–y

plane of the pose cell map and b) the locations of 30 pose cell patterns, evenly

spread throughout the world. (Extended from Figure 3 in Schulz, Stockwell et al.,

2006a)

Three techniques were used for processing the visual and pose cell representations. The first

technique was using the raw representation. The second technique involved categorising the

representation with a self organising map (SOM) (Kohonen, 1995). A SOM was trained on the

patterns for 1000 epochs. The output of the SOM was an array of competitive units organised in a

hexagonal pattern. To give a distributed activation pattern for the language agents, the actual values

of the units were scaled to values between 0 and 1. The third technique used Principal Component


80

Analysis (PCA). The 1000 patterns were analysed for their principal components and the

component scores were scaled to values between 0 and 1.

The sizes of the processed inputs were set to the smallest size for which expressive languages

could evolve. For the raw image, a scene of 12 × 8 pixels was used, for the SOM-based

representation, a SOM of size 24 × 16 was used, and for the PCA-based representation, the first 48

components were used. For the pose cells, 947 units were used, for the SOM-based representation, a

SOM of size 12 × 8 was used, and for the PCA-based representation, the first 120 components were

used.

Two types of recurrent neural network were used in Pilot Study 2Aiii, which were the same as

those used in Study 2Aii (see Figure 5.11, for more detail refer to section 4.3.2) with the concept

representations being the processed visual and pose cell representations.

One way of testing whether a language captures the underlying structure of a set of input

patterns is to test how well the concepts are mapped to the language terms. Concept comprehension

networks produced a prototype for each unique word. If the original pattern was closest to the

prototype for the word used, the pattern was correctly categorised. The measure of similarity

between patterns and prototypes was sum squared error.

For each pre-processing technique, ten agents were evolved for 500 generations with a selection

strategy based on how well the agents categorised the world. The winner of the current champion

and mutant language was the one in which the trained networks were able to categorise the highest

number of patterns correctly.

In Pilot Study 2Aiii, the language agents produced novel words for novel scenes, which can be

seen as constructing new words by recombining known morphemes in different ways. Novel words

were produced in each type of processing for both vision and pose cell representations.

The agents produced between 17.2 and 22.8 unique words for the training set of 30 patterns (see

Table 5.14). When the agents were presented with the test set of 1000 patterns, they produced

between 34.4 and 111.2 unique words. The results were remarkable in the large number of new

words for novel patterns.

Table 5.14 Word production (Pilot 2Aiii)

Concept Representation Measure ( x (σ)) Image SOM-based

Image PCA

Image Pose Cells

SOM-based Pose Cells

PCA Pose Cells

30 Patterns Unique Words

22.4 (8.3)

17.2 (5.3)

22.8 (3.3)

18.9 (7.7)

18.5 (6.1)

21.7 (5.3)

1000 Patterns Unique Words

99.9 (65.2)

43.5 (17.5)

111.2 (46.8)

112.6 (86.7)

34.4 (19.2)

93.1 (46.5)


81

The number of patterns close to their prototypes was used to measure the performance of the

agents. The distance between the pattern and their prototype was determined by treating them as

vectors and calculating one minus the cosine of the included angle between them, which was

normalised by the standard deviation of the distances between each of the scenes. The number of

scenes within 0.25, 0.5, and 1.0 standard deviations of the prototype were calculated for each of the

techniques for the test set of 1000 patterns.

When the concept comprehension networks were presented with the test set of 1000 patterns, the

SOM-based pose cell representation had the most patterns within 0.25 standard deviations of their

prototype with 558.4, followed by SOM-based vision with 334.9, raw vision with 26.1, PCA pose

cells with 16.6, PCA vision with 8.3, and raw pose cells with none (see Table 5.15). A similar order

resulted with the number of patterns within 1.0 standard deviations of their prototype with SOM-

based image (920.9), SOM-based pose cells (915.5), raw vision (399.0), PCA pose cells (131.6),

PCA vision (29.2), and raw pose cells (0.0).

Table 5.15 Patterns close to the prototype (Pilot 2Aiii)

Concept Representation Standard

Deviations Image SOM-based Image

PCA Image

Pose Cells

SOM-based Pose Cells

PCA Pose Cells

X = 0.25x (σ)

26.1 (15.8)

334.9 (123.5)

8.3 (8.5)

0.0 (0.0)

558.4 (40.6)

16.6 (38.5)

X = 0.5x (σ)

81.2 (38.8)

689.6 (194.3)

13.2 (11.1)

0.0 (0.0)

767.0 (36.1)

42.8 (59.9)

X = 1.0x (σ)

399.0 (111.0)

920.9 (205.0)

29.2 (19.7)

0.0 (0.0)

915.5 (41.4)

131.6 (74.6)

The vision agents produced between 2.5 (agents with SOM-based inputs) and 4.9 (agents with

PCA inputs) times the number of words for the training set of 30 images when presented with the

test set of 1000 images. The pose cell agents produced between 1.8 (agents with SOM-based inputs)

and 6.0 (agents with raw pose cell inputs) times the number of words for training set of 30 patterns

when presented with the test set of 1000 patterns.

When generalising to 1000 patterns, the SOM-based agents performed well, with almost all

patterns within one standard deviation. The raw pose cell agents had no patterns within one standard

deviation of the prototype for the word associated with the pattern. The lack of similarity for the

pose cells was due to the sparseness of the patterns meaning that the concept comprehension

networks did not learn to associate the words with the pose cell representations.


82

5.2.2 Pilot Study 2B: Pose Cells and Experiences

Pilot Study 2B was an extension of Pilot Study 2A that investigated an additional concept

representation: experiences3. The experience mapping algorithm was developed in the RatSLAM

project in parallel with this thesis, and was not available for the earlier pilot studies. Experiences

provided a representation of space that did not include the discontinuities and multiple activations

for a single location that exist in the pose cell representation. Pilot Study 2B investigated how pose

cells and experiences could be used for word production, when provided with a set of concepts.

Pilot Study 2Bi: Conceptualisation of Locations

This study aimed to test whether agents could use pose cell and experience map representations for

word production, forming concepts for rooms and corridors. The features to be defined for Pilot

Study 2Bi included the concept representations, the word representations, the lexicon technique

with associated features, the source of variability, and the environment. For a summary of the

parameters, see Table 5.16.

Table 5.16 Parameters for Pilot Study 2Bi

Parameters Pilot Study 2Bi Concept type Location

Concept representation Pose cells, experiences Word representation Activation of units, one-hot encoding

Lexicon technique Simple neural network Networks Production only

Weight setting mechanism BPTT Momentum 0.9

Initial learning rate 0.01Increasing ratio 1.05Decreasing ratio 0.7

Source of variability Pre-set target concepts Environment Real world, with offline learning

A teacher-student system was designed and implemented, in which an agent attempted to

associate concepts provided by a human teacher with its internal representations using a single layer 3 This section is based on work published in Schulz, R., Milford, M., Prasser, D., Wyeth, G., & Wiles, J. (2006).

Learning spatial concepts from RatSLAM representations. Paper presented at From sensors to human spatial concepts,

a workshop at the International Conference on Intelligent Robots and Systems, Beijing, China; and Milford, M., Schulz,

R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial concepts from RatSLAM representations. Robotics and

Autonomous Systems - From Sensors to Human Spatial Concepts, 55(5), 403-410. The journal paper was a refinement

of the conference paper. The pose cell and experience map work was done by Michael Milford and David Prasser. The

conceptualisation process was done under the supervision of Janet Wiles and Gordon Wyeth, with design discussions

and writing assistance from Michael Milford and David Prasser.


83

neural network. Teacher-student conceptualisation involved interaction with a teacher, where the

different concepts that the agent was to learn were provided by that teacher. Agents used supervised

learning to associate input patterns with different concepts. In Pilot Study 2Bi, the agents used pose

cells or experiences as the concept representations. The concepts formed were labels for locations in

the world, including specific rooms. The word representation used was a single active unit. The

conceptualisation process for the agents was the association of the concept representations with the

word representations. The agents learned the association using a fully connected single layer neural

network (for more detail refer to section 4.3.1). The network was initialised with small random

weights and biases (uniformly between –0.1 and 0.1), and trained using gradient descent with

momentum and an adaptive learning rate.

During recall, the concept associated with an experience pattern was the concept related to the

most active output unit. The relative activation of the most active unit to the second most active unit

was a confidence value of the label. The agent was considered to be ‘uncertain’ if the activation of

the second most active unit was more than 2/3 the activation of the most active unit. Preliminary

experiments determined that 2/3 provided an appropriate balance between concept uncertainty and

incorrect guessing of concepts at room boundaries.

The experiments used a Pioneer 2 DXE mobile robot with a forward facing camera to explore a

test environment. The environment was level 5 of the Axon Building at The University of

Queensland, consisting mostly of open-plan offices and corridors (shown in Figure 5.16a). The

robot was manually driven along a repeated path through the environment. The robot visited every

place on its path at least twice, providing an opportunity for both learning and recognition. The

resulting dataset was processed by the RatSLAM model and experience mapping algorithm in order

to provide the input for the spatial conceptualisation method.

Experiments used a pose cell structure that was sufficiently large to avoid wrapping of the

activity across the structure boundaries. The pose cell structure measured 200 × 100 × 36 cells

(720,000 cells in total) in (x', y', θ ') . The pose cell representation contained both discontinuities

and multiple representations of the same place, as shown by Figure 5.16b. The discontinuities were

caused by visually driven re-localisation jumps after long periods of exploration where the robot

relied only on wheel odometry to remain localised. Odometric drift and delayed re-localisation

created multiple representations, where more than one group of pose cells represented the same

physical location. The experience mapping algorithm produced a spatially continuous map, with

multiple pose cell representations grouped into overlapping areas of the map (see Figure 5.16c).

During the experiment the robot learned 2384 experiences, which is significantly fewer than the

number of activated pose cells.


84

The spatial conceptualisation process was applied in an offline manner following the

construction of the RatSLAM representations. The environment was manually categorised by a

human teacher into four rooms and two corridors, as shown in Figure 5.16a. The route of the robot

was divided into two sections corresponding to the robot exploring and then revisiting one half of

the building floor. The two sections were further divided into learning and recognition phases. The

learning phase, in which the robot first visited an area, was used for the training set, while the

recognition phase, where the robot revisited an area, was used to test if the concepts had been

learned. The sequence of a recognition phase following a learning phase was equivalent to the areas

being labelled on the first circuit of the environment, and testing whether the robot had learned the

labels on later circuits.

The language agent’s fully connected single layer neural network used pose cells or experiences

as inputs and had six output units corresponding to the concepts of four rooms and two corridors

(see Figure 5.15). The activations of the experience inputs included the current experience and those

experiences within 1m, with activation relative to how close the experience was to the current

experience. Targets were created with a single active output unit corresponding to the current

location of the robot. Transitions between rooms and corridors occurred at doorways and turns. The

first learning phase comprised 403 time steps, with 233 in the second learning phase, 398 in the first

recognition phase, and 187 in the second recognition phase. Agents were initially trained on the first

learning phase and tested on the first recognition phase. Agents were then trained on both the first

and second learning phases and tested on the second recognition phase. For each training segment,

agents were trained for 2000 epochs. The performance of the agents was tested on the first and

second recognition phase by considering the concepts used by the agents for each location.

Figure 5.15 Word production network (Pilot 2Bi)

The networks used in Pilot Study 2Bi were Production Networks, which were

simple neural networks with concept representations as inputs (in the form of pose

cells or experience maps), and the word representation as outputs (six output units

corresponding to four rooms and two corridors).

Concept representation (Pose Cell / Experience Map)

Word representation (6 Output Units)


85

a)

b)

c)

Figure 5.16 Floor plan, pose cells, and experience map (Pilot 2Bi)

a) Floor plan of the area used for the study (approximately 43 by 13 metres) and

the approximate trajectory of the robot. Shaded areas were inaccessible to the

robot. b) Trajectory of the most activated pose cell during the experiment. Thick

dashed lines show re-localisation jumps driven by visual input. Each grid square

contains 4 × 4 pose cells in the (x', y') plane. b) The experience map was

continuous and had a high degree of correspondence to the spatial arrangement of

the environment shown in a. (Reproduced from Figures 5, 6, and 7 in Milford,

Schulz, Prasser, Wyeth, & Wiles, 2007)

In the first learning phase for the pose cell conceptualisation process, 96.77% of the instances

were labelled correctly, with 64.32% labelled correctly in the recognition phase (see Figure 5.17a,b

and Table 5.17). Errors in the training set were generally on the borders of the categories. Errors in

the recognition set were mainly in Room 1, and were due to the different trajectory used in the

learning and recognition phases. The part of the room incorrectly classified was not visited in the

learning phase. Most of the untrained areas were classified as Corridor 1, as the robot spent most of


86

the first learning phase there, and the language network was biased towards categorising patterns as

Corridor 1. In the second learning phase, 98.27% were labelled correctly, with 73.26% labelled

correctly in the recognition phase (Figure 5.17c,d). In the recognition phase, there were many

instances where the robot was uncertain of the label for the current location. While most of the

uncertainties were on the borders between concepts there were other locations of uncertainty,

particularly in Rooms 3 and 4. Different pose cells were active in the uncertain locations during the

recognition and the learning phases.

Table 5.17 Correctly labelled patterns (Pilot 2Bi)

Phase Pose Cells Experiences Learning 1 96.77% 98.26%

Recognition 1 64.32% 90.45% Learning 2 98.27% 98.43%

Recognition 2 73.26% 89.84%

In the first learning phase of the conceptualisation process based on the experience map 98.26%

of the instances were labelled correctly with 90.45% labelled correctly in the recognition phase

(Figure 5.18a,b). The majority of the errors occurred in Room 1 where a different trajectory was

taken during the second pass. For the errors in Room 1 the agent was uncertain about the label,

rather than labelling the instances incorrectly. In the second learning phase, 98.43% were labelled

correctly, with 89.84% labelled correctly in the recognition phase (Figure 5.18c,d). The errors all

occurred at the boundaries between rooms and corridors. The agents were able to cluster the

experiences appropriately, with some uncertain errors on the borders between areas (see Figure

5.18d). At all locations, except for those on borders between areas, and those not visited during the

learning phase, the agents were able to appropriately label their current location.

The conceptualisation experiments tested both the extent to which the RatSLAM system’s maps

could be classified using spatial concepts, and the degree to which different representation types

were suitable. The spatial conceptualisation method was able to learn and then recognise both the

RatSLAM pose cell maps and the experience maps. During the learning phase both representation

types performed well. However, during the recognition phases, higher recognition rates were

achieved when using the experience maps than when using the pose cell maps. The results

demonstrate that phenomena in the pose cells such as multiple representations can impede the

conceptualisation process. The experience mapping algorithm, which was developed to create maps

from the pose cell representations that could be used for goal navigation, also appears to create

maps more suited to spatial conceptualisation.


87

a)

b)

c)

d)

Figure 5.17 Conceptualisation using pose cells (Pilot 2Bi)

a) The learning phase of section 1, b) the recognition phase of section 1, c) the

learning phase of section 2, and d) the recognition phase of section 2. In the

learning phases there were uncertain areas in the Room 1 ↔ Corridor 1, Room 2 ↔

Corridor 1, and Room 3 ↔ Corridor 2 borders. In the recognition phases there were

uncertain areas throughout, including in all of the rooms and borders between

rooms and corridors. In the first recognition phase, part of Room 1 was labelled as

Corridor 1. Each grid square contains 8 × 8 pose cells in the (x', y') plane.

(Reproduced from Figure 6 in Schulz, Milford, Prasser, Wyeth, & Wiles, 2006)


88

a)

b)

c)

d)

Figure 5.18 Conceptualisation using experiences (Pilot 2Bi)

a) The learning phase of section 1, b) the recognition phase of section 1, c) the

learning phase of section 2, and d) the recognition phase of section 2. In the

learning phases there were uncertain areas in the Room 1 ↔ Corridor 1, Room 2 ↔

Corridor 1, and Room 3 ↔ Corridor 2 borders. In the recognition phases there were

uncertain areas in Room 1, and in the Room 1 ↔ Corridor 1, Room 2 ↔ Corridor

1, and Room 3 ↔ Corridor 2 borders. (Reproduced from Figure 8 in Milford et al.,

2007)


89

5.2.3 Discussion for Pilot Study 2

Pilot Study 2 investigated three representations for concepts as well as a variety of processing

techniques. The simulations in Pilot Study 2A compared vision and pose cells, while the

simulations in Pilot Study 2B compared pose cells and experiences. The concept representations

were tested for use in word production, concept comprehension, and together with a source of

variability. The lessons learned from Pilot Study 2 were:

• for a language about locations, vision was not an ideal representation,

• pose cell representations cluster different sizes of areas in the world, depending on the

processing performed,

• there was a trade-off between expressivity and categorisation,

• SOM-based processing of representations provided a natural clustering of patterns, and

• experience maps provided a representation suitable for spatial conceptualisation.

Pose cell and experience representations can be used to form location concepts. Pilot Study 2

resulted in a clearer understanding of the nature of these representations and possible methods for

processing and using the representations for word production and concept comprehension.

5.3 Discussion: Representations Matter

The significance of this chapter was to show that for a location language game, representations

matter. The concept and word representations for need to be considered, otherwise results may be

misinterpreted due to representation artefacts. The method used to associate concepts with words

also affected the nature of the languages that can form.

Pilot Study 1 investigated two methods for language agents to evolve and learn language:

recurrent neural networks and lexicon tables. Two important features of lexicon techniques are

learning rate and generalisation. Lexicon tables (used by Smith, 2001; and Steels, 1999) provide in-

the-moment learning with generalisation usually provided prior to the lexicon table through

similarity to existing exemplars. Simple neural networks (used by Cangelosi, 2001; Cangelosi &

Parisi, 1998; Kirby & Hurford, 2002; and Marocco et al., 2003) and recurrent neural networks (used

by Batali, 1998; Elman, 1990; and Tonkes et al., 2000) learn associations over time, with words

partitioning concept space. For use as a lexicon technique, these methods were found to be not ideal

for forming and learning location concepts. Neural networks take prohibitively long to learn the

associations and lexicon tables do not provide a mechanism for generalisation. Neither technique is

able to appropriately deal with large concept representations such as the pose cells and experiences.

A method developed as a result of these investigations was used in Study 1 and 2: the distributed

lexicon table.


90

Pilot Study 2 investigated three types of robot representations: vision, pose cells, and

experiences. Vision was found to be not appropriate for labelling unique locations in the world as

some distant locations have visually similar scenes. Vision would be more appropriate for location

type concepts such as ‘corner’ or ‘corridor’ where similar visual scenes may occur. Pose cells were

found to be able to cluster locations in the world, but were less reliable at naming locations for

which words were not explicitly learned. Discontinuities and multiple representations within the

pose cell map can impede the conceptualisation process. Of these three types of robot

representations, experiences were found to be ideal for the concept representations underlying

location concepts. In an experience map, distant locations in the world have distant locations in co-

ordinate space, allowing location concepts to be formed within local regions. Unlike concepts that

can be formed from direct perception, location concepts require a representation that is built over

time, such as the cognitive map representation of experiences.

The lessons learned from the studies presented in this chapter were considered when developing

the methods and representations for the major studies of this thesis, including the development of

the ‘where are we’ game, presented in the following chapter.

91

Chapter 6 A Toponymic Language Game

For some strange reason, no matter where I go, the place is always called

“here”.

(Attributed to Ashleigh Brilliant)

As languages are learned through agent interactions, a key question is what impact these

interactions have on the languages. Names for places, or toponyms, are the simplest spatial concepts

and can be formed from a map of the world. Place names in natural languages are formed through a

variety of strategies including natural features, special sites, religious significance, royalty,

explorers, famous local people, memorable incidents or famous events, other place names from

immigrants homelands, explorers naming good or bad fortune on travels, animal names, descriptive

names, and the ‘new town’ (Crystal, 1997, p114). Space becomes place through the experiences of

individuals and populations (Tuan, 1975, p12). Computational models of language have labelled

objects located in the world (Bodik & Takac, 2003; Steels, 1995; Vogt, 2000a), but have only

formed location concepts directly grounded in objects. Location concepts have not been formed

through the collective experience and interactions of an agent population.

Chapter 6 describes studies in which agents played location language games4. The purpose of

the studies was to investigate the features necessary to create shared toponymic languages and to

investigate the affect of agent interactions on the languages. The aim was to determine the methods

and parameters that resulted in the formation of toponymic languages that could be used effectively

by the agents. The agents’ aims were to individually build a world map and to collectively create a

shared lexicon of names for locations. The games played by the agents were ‘where are we’ and ‘go

to’ games.

The premise of a ‘where are we’ game is a location language game where the topic is the

current location of the agents (for more detail about location language games refer to Chapter 4, in

particular Figure 4.1). The speaker produces a word for the current location, and the hearer updates 4 This chapter covers in more detail the work presented in Study 1 of Schulz, R., Prasser, D., Stockwell, P., Wyeth,

G., & Wiles, J. (2008). The formation, generative power, and evolution of toponyms: Grounding a spatial vocabulary in

a cognitive map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language: Proceedings of

the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore: World Scientific Press. The work presented

in the paper was done under the supervision of Janet Wiles and Gordon Wyeth, and with design discussion and writing

assistance from PhD students David Prasser and Paul Stockwell.


92

the lexicon based on the speaker’s utterance. No feedback is given to either agent. Both agents may

update their lexicon based on the interaction.

In the grid world, the concept elements are obtained from the squares of the grid. Each square in

the world is a unique location concept element for the agent, to be used for the toponym lexicon. In

the simulation and real worlds, experiences are the concept elements. The words in the grid world

are integers, in the simulation world, words are strings of syllables, and in the real robots, words are

sequences of tones.

A behaviour available to the robots in the simulation and real worlds is the ability to move to a

specified goal location, if that location is described in their experience map. While the robots play

location language games, they attempt intermittently to play ‘go to’ games. When successful, ‘go

to’ games provide a behaviour in which more location language games are played as the robots

follow similar routes to the goal location. ‘Go to’ games also provide a behavioural method to test

the coherence of the shared languages.

A ‘go to’ game begins when agents are near each other. The speaker decides on a goal and

produces the word for the goal location. The hearer comprehends the goal location and determines

whether the location can be found. If the goal location can be found, the hearer lets the speaker

know that they will try to reach the goal location. Both agents then move to the goal location. Once

at the goal location, the agents produce an ‘at goal’ signal.

The result of the ‘go to’ game is an indication of the coherence of the languages. If the agents

comprehend each word similarly, they will meet each other at the goal location specified in each

game. In ‘go to’ games there are a range of possible results:

• no word found by the speaker OR the hearer did not understand the word produced by

the speaker,

• the goal location was not found,

• the goal location was found, but the other robot was not met at the goal location, or

• the other robot was met at the goal location.

In the simulation world, the exact distance between agents can be measured, enabling a further

breakdown of the result when they meet at the goal location, from within 1m up to within 6m of

each other.

This chapter describes the design and implementation of a location language game in:

• Study 1A: Grid world,

• Study 1B: Simulation world, and

• Study 1C: Real world.

The final section is a general discussion of the location language game.


93

6.1 Study 1A: Grid World

Study 1: A toponymic language addressed the question of how interactions impact on the formation

of location languages. The first step in answering this question was to investigate the

implementation of simple spatial agent interactions in a simple world. The aim of Study 1A: Grid

World was to investigate the effect of interactions on a toponymic language by implementing the

‘where are we’ game in a simple world. In particular, the study considered the effect of a variety of

parameters, including the population dynamics and the methods for producing words.

6.1.1 Experimental Setup

In Study 1A agents played ‘where are we’ games in an empty 10 × 10 grid world (for more detail

about the grid world refer to section 4.5.1). Agents used a distributed lexicon table to associate

concepts and words (for more detail refer to section 4.3.4). The representations used for location

concept elements were the squares of the grid. Words were represented as integers. For each game,

the speaker and hearer were chosen randomly. The speaker was placed in a random square, and the

hearer was placed in a square in the neighbourhood of the speaker. Words were invented when the

maximum toponym value for the interaction location was at the threshold of 0.0. When words and

concepts were used together their association was increased by 1.0. Forgetting was implemented

with 0.2 taken away from unused associations, and a minimum association value of 0.0.

In the grid world, the shape of the hearing area may be a single square, a square shape, or a

diamond shape (see Figure 6.1). The different shapes of hearing areas allow for different languages

regarding how words move through the world. If the hearing area is too large, it may be difficult for

the agents to reach a consensus on words for concepts. The shape of the hearing area affects how

likely it is that one word will take over from another word in any square. Words are more likely to

take over neighbouring squares with the square hearing area than with the diamond hearing area,

and with the larger size than with the smaller size.

Study 1A compared ‘basic’ and ‘best’ solutions. The basic solution implemented the simple

option for each of population dynamics, neighbourhood shape, strategy, hearing distance, and

forgetting. The best solution used the options that gave the desired language features as determined

by preliminary simulations. The parameters for Study 1A: Basic and Best are given in Table 6.1.

The basic solution consisted of 10 runs in which 100,000 games were played, and the best solution

consisted of 10 runs in which 500,000 games were played. There were two agents in each

population. Games were played when the agents were within a square or diamond neighbourhood of

each other. Agents used the most associated or neighbourhood most informative strategy to

determine which word was produced for the chosen topic.


94

a) b) c)

Figure 6.1 Hearing area

The area in which the hearer can hear the speaker for a) single, b) square, and c)

diamond hearing area, where the location of the speaker is shown in black, and the

possible locations of the hearer are shown in grey. The hearer may be located in the

same square as the speaker.

Table 6.1 Parameters for Study 1A

Parameters Study 1A: Basic Study 1A: Best Game ‘where are we’ ‘where are we’

Hearing distance Square (9 squares) Diamond (5 squares) Concept type Location Location

Concept representation Squares of grid Squares of grid Word representation Integer Integer Lexicon technique Distributed lexicon table Distributed lexicon table

Strategy for choosing words Most associated Neighbourhood most informative Neighbourhood size Single square Diamond (5 squares)

Forgetting Yes Yes Updating Hearer only Hearer only

Strategy for word invention Threshold Threshold Word absorption rate 1.0 1.0Word invention rate 1.0 1.0

Threshold 0.0 0.0Generations 1 500

Agents 2 2Interactions per generation 100,000 1000

Initial learning period 0 1000World Grid Grid

Size of grid 10 × 10 10 × 10 Obstacles None None

6.1.2 Results

The resulting languages for each of the 10 runs of the ‘basic’ solution were similar in coherence,

specificity, size and shape. A coherence of greater than 0.8 was obtained for all runs by 10,000

games (see Figure 6.2). At a coherence of 0.8 the two agents use the same word for 80% of the

squares.


95

a)

b)

Figure 6.2 Coherence (1A: Basic)

a) The coherence for each of the 10 populations in Study 1A: Basic over the

100,000 games and b) the coherence for each population at the end of 100,000

games. High coherence (>0.8) was reached by all populations by 10,000 games. At

the end of 100,000 games, eight of the ten populations had reached a coherence of

1.0, with one at 0.995 (one square labelled differently), and another at 0.99 (two

squares labelled differently).

The average number of words used by each agent was above 60 by the time 200 games were

played and decreased fairly rapidly to between 9 and 13 words by about 20,000 games (see Figure

6.3). The languages were mostly stable after 20,000 games, although in some runs, agents continued

to lose words. The specificity of the languages matched the size of the language starting close to 1.0


96

when most squares were associated with a unique word, dropping to between 0.8 and 0.9 by 20,000

games, where there were between 9 and 13 words in the language (see Figure 6.4). If the words

were more evenly distributed between the squares the specificity of the languages would be higher.

The example language in Figure 6.5 showed a wide range in the number of squares associated with

each word. Words used for very few squares disappeared from use as more games were played.

Figure 6.3 Words used (1A: Basic)

The average number of words used by one agent in each population for up to

100,000 games. Most populations had reached a stable language of between 9 and

13 words by 50,000 games.

Figure 6.4 Specificity (1A: Basic)

The specificity of the language for each of the 10 populations up to 100,000 games.

High specificity was maintained throughout the games.


97

Agent 1 Agent 2

a) Language Layout Each toponym was assigned a different shade. Each square in the world was shaded for the toponym used.

b) Word Coverage The x-axis shows the words in order of area covered. The y-axis shows the number of squares covered by each word. For each toponym, the number of squares in which that toponym was used is shown.

Figure 6.5 Shared language (1A: Basic)

The resulting language for the agents of the least coherent run of the Study 1A:

Basic, showing a) the language layout and b) the word coverage. Two of the 100

squares were labelled differently between the two agents, both in the top right

corner of the world. There was a high variance between the number of squares used

for each word, with two words used for 22 squares, and one word used for one

square.

The resulting languages for each of the 10 runs of the best solution were similar in terms of

coherence, specificity, size, and shape. A coherence of greater than 0.8 was obtained for all runs by

4000 games (see Figure 6.6), which was much faster than for the basic simulation, for which 0.8

was obtained for all runs by 10,000 games. None of the best runs reach 1.0, as there were more

words in the languages, making high coherence harder to achieve.

The average number of words used by each agent reached between 30 and 40 by the end of the

first generation (1000 games), and decreased slowly to between 25 and 29 words by about 25

generations (250,000 games) (see Figure 6.7). The languages were mostly stable after 25

generations, although in most runs agents continued to lose words over the successive generations.


98

The specificity of the languages remained high throughout all of the runs, greater than 0.96 at all

times (see Figure 6.8). The specificity of the languages remained high even with the reduction in the

size of the language as the spread of words was fairly even across the squares. The example

language in Figure 6.9 showed between two and four squares associated with each word.

a)

b)

Figure 6.6 Coherence (1A: Best)

a) The coherence for each of the 10 populations in Study 1A: Best over the 500,000

games and b) the coherence for each population at the end of 500,000 games. A

coherence of greater than 0.8 was obtained by all runs by 4000 games. After

500,000 games, all runs reached a coherence of greater than 0.96 (eight squares

different), with the highest at 0.995 (one square different).


99

Figure 6.7 Words used (1A: Best)

The average number of words used by one agent in each population for up to

500,000 games. Most agents had a stable language between 25 and 29 words by

250,000 games.

Figure 6.8 Specificity (1A: Best)

The specificity of the language for each of the 10 populations up to 500,000 games.

A specificity of greater than 0.96 was maintained throughout the games.


100

Agent 1 Agent 2

a) Language Layout

b) Word Coverage

Figure 6.9 Shared language (1A: Best)

The resulting language for both agents for the least coherent run of Study 1A: Best,

showing a) the language layout and b) the word coverage. Each word was used for

between two and five squares. There were minor differences between the

languages, with 8 of the 100 squares associated with different toponyms. In each

case, the border between toponyms had shifted.

6.1.3 Discussion

The best solution agents took longer to reach a stable language, but the languages formed had

higher specificity and more even word coverage. The basic and best solutions showed that very

different languages can result when different features are used with respect to the time taken to form

a stable coherent language, the specificity of the language, and the types of concepts that form. The

features included the nature of the population of agents, how the agents chose words, how the

lexicon was updated, and when games were played.

The shape and size of the concepts were affected by the hearing distance and the strategy and

neighbourhood size used when choosing words. The most informative strategy combined with a


101

diamond hearing distance (as in the best solution) resulted in words that were more even in

coverage than when the most associated strategy was used (as in the basic solution). With new

agents entering the population every 1000 interactions (as in the best solution) the coherence

remained lower as the new agents had to learn the language before high coherence returned. Extra

features to be considered for the implementation in the simulation world include the internal

representations of the robots.

6.2 Study 1B: Simulation World

Study 1B involved the implementation of the ‘where are we’ and the ‘go to’ language games in the

simulation world of the robot (for more detail about the simulation world refer to section 4.5.2). An

investigation was undertaken into how the temperature used for probabilistic word invention and

the neighbourhood size used to produce words affected the resulting language.

The goal of Study 1B was to determine whether the representations of a simulated robot with an

experience map were appropriate for the formation of a toponymic language using the interactions

of the ‘where are we’ language game. The specific aims of Study 1B were to determine how

temperature and neighbourhood size affected the resulting toponymic languages with respect to

lexicon size, word coverage, and successful use of the language in ‘go to’ games. Study 1A showed

that a toponymic language may be formed through ‘where are we’ language game interactions when

the agents have simple and matching concept representations. Study 1B extended Study 1A with

simulated robots that explored the simulation world (see Figure 6.10) and used more complex

representations of space that differed between the robots.

Figure 6.10 Simulation World

The simulation world of the robot, with the black lines indicating walls and the

black octagons desks, showing the path of the robot in a typical simulation run.


102


The concept representations used were the experiences from RatSLAM with a forward facing

camera. Words were represented as text strings. The robots first built representations of the world,

in the form of an experience map by exploring the world. The robots autonomously wandered

through the world and played a game when they were within hearing distance of each other. The

agents played the ‘where are we’ game and the ‘go to’ game. The ‘where are we’ game was played

in order to build up the lexicons of the robots. The ‘go to’ game was played in order to affect the

behaviour of the robots, and to test the coherence of the languages.

A distributed lexicon table was used to store the associations between experiences and words

(for more detail refer to section 4.3.4). The relative neighbourhood most informative strategy was

used to choose words, with probabilistic word invention calculated from the value of the toponym at

the location of the interaction. Both the speaker’s and the hearer’s lexicon was updated every game.

Forgetting was not directly implemented, though as the robots continued to learn new experiences

and remove old experiences, forgetting occurred indirectly through the removal of experiences.

The measures used were the final language layout, the language size, the language coherence,

how the words were spread throughout the world, and the results of the ‘where are we’ and ‘go to’

games (for more detail about the performance measures refer to section 4.6).

The hearing distance for Study 1B was 3m. Within a trial, the temperature for word invention

was set at a fixed temperature, T, and the neighbourhood size was set at a fixed distance, D, and the

two robots negotiated a set of words. Three temperature conditions were tested based on low (T =

0.25), medium (T = 0.5), and high (T = 0.75) temperatures with the neighbourhood size set to

medium (D = 5m). Two other neighbourhood distance conditions were tested based on small (D =

3m) and large (D = 7m) neighbourhood sizes with the temperature set to medium (T = 0.5). Each

condition comprised five runs of 1000 interactions. Following the 1000 interactions, the agents

played 50 ‘go to’ games to test the shared language. For a summary of the parameters, see Table

6.2.

The conditions tested were:

• low temperature (T = 0.25, D = 5m),

• medium temperature and neighbourhood (T = 0.50, D = 5m),

• high temperature (T = 0.75, D = 5m),

• small neighbourhood (T = 0.5, D = 3m), and

• large neighbourhood (T = 0.5, D = 7m).


103

Table 6.2 Parameters for Study 1B

Parameters Study 1B Game ‘where are we’ and ‘go to’

Hearing distance 3mConcept type Location

Concept representation Experiences (simulation world, forward facing camera)

Word representation Text Lexicon technique Distributed lexicon table

Strategy for choosing words Relative neighbourhood most informative

Neighbourhood size 3m, 5m, or 7m Forgetting No Updating Hearer and Speaker

Strategy for word invention Temperature Temperature 0.25, 0.5, or 0.75 Generations 1

Agents 2Interactions per generation 1000

Initial learning period 0World Simulation World

6.2.2 Results

In all five temperature and neighbourhood size conditions the simulated robots developed a shared

set of toponyms, showing that toponyms can be formed at different levels of scale by using different

rates of word invention and word production. A higher temperature and a smaller neighbourhood

size resulted in a more specific toponymic language, with more words covering the world (see

Table 6.3). The toponymic language for all temperatures covered just over 300m2, which was the

size of the world extended by the neighbourhood function of 5m. The toponymic language for

different neighbourhood sizes increased proportionately with the neighbourhood size from 192.8m2

for a small neighbourhood up to 444.1m2 for a large neighbourhood. The increase in coverage was

due to the way words were chosen, with the larger neighbourhood meaning that locations in the

world further away from interactions may still be associated with the words used.

Table 6.3 Results for Study 1B

Temperature Neighbourhood Size Measure ( x (σ)) Low Medium High Small Large

Number of Toponyms 5.7 (2.0) 16.3 (2.7) 23.8 (1.2) 27.6 (3.4) 6.4 (1.4)) Area Covered per Toponym

Used (m2) 53.7 (27.1) 21.5 (19.7) 13.8 (15.4) 7.4 (7.2) 69.4 (35.3)

Area Covered by Language (m2) 306.1 (9.8) 311.3 (4.8) 311.8 (3.5) 192.8 (4.6) 444.1 (14.9) Coherence 0.82 (0.09) 0.73 (0.07) 0.52 (0.12) 0.56 (0.10) 0.81 (0.10)

The coherence decreased with temperature and increased with neighbourhood size. The result of

the ‘go to’ games was similar across the low, medium, and high temperatures (see Figure 6.11a),


104

and across small, medium, and large neighbourhood sizes (see Figure 6.11b). Between 38.2%

(medium temperature) and 50.6% (low temperature) of the games resulted in the robots meeting

each other within 1m. Between 3.6% (small neighbourhood) and 10.6% (large neighbourhood) of

the games resulted in some type of failure, with either a goal word not found, not understood by the

hearer, not found, or found but not met. In the remainder of the games, the robots met each other at

the goal location at a distance between 1m and 6m.

In the low temperature and large neighbourhood populations, most of the words were invented

in the first 100 interactions, while for the medium and high temperature and small neighbourhood

populations, words were invented throughout the whole run (see Figure 6.13a, Figure 6.15a, Figure

6.17a, Figure 6.19a, and Figure 6.21a). For each condition, the value of the toponym at the

interaction location increased through the run, with the value increasing with temperature (see

Figure 6.13b, Figure 6.15b, Figure 6.17b, Figure 6.19b, and Figure 6.21b).

Low Temperature

The average number of words used by the simulated robots after 1000 toponym language games

with a low word invention temperature was 5.7. In all of the runs, most of the words were invented

in the first 100 interactions, with the remainder added throughout the run. The area covered by

toponyms was 53.7m2 on average. The shared language for one of the runs is shown in Figure 6.12

with the interactions in Figure 6.13.

Medium Temperature and Neighbourhood


with a medium word invention temperature was 16.3. In all of the runs, words were invented

throughout the run. The area covered by toponyms was 21.5m2 on average. The shared language for

one of the runs is shown in Figure 6.14 with the interactions in Figure 6.15.

High Temperature


with a high word invention temperature was 23.8. In all of the runs, words were invented

throughout the run. The area covered by a toponym on average was 13.8m2. The shared language

for one of the runs is shown in Figure 6.16 with the interactions in Figure 6.17.

Small Neighbourhood


with a small neighbourhood was 27.6. In all of the runs, words were invented throughout the run.


105

The area covered by toponyms was 7.4m2 on average. The shared language for one of the runs is

shown in Figure 6.18 with the interactions in Figure 6.19.

Large Neighbourhood


with a large neighbourhood was 6.4. In all of the runs, words were invented throughout the run. The

area covered by toponyms was 69.4m2 on average. The shared language for one of the runs is shown

in Figure 6.20 with the interactions in Figure 6.21.

a)

b)

Figure 6.11 Results of ‘go to’ games (1B)

The results of the ‘go to’ games for the runs with different a) temperatures and b)

neighbourhood sizes. For all temperatures, the simulated robots met each other at

the goal location in more than 93% of the games (more than 38% within 1m). For

all neighbourhood sizes, the simulated robots met each other at the goal location in

more than 89% of the games (more than 38% within 1m).


106

Simulated Robot 1 Simulated Robot 2

a) Experience Map The experience map of the agent, formed as the agent explores the simulation world by wall-following. Gaps in the map are either desks or open space (compare to Figure 4.10, the map of the simulation world)

b) Language Layout The square represents the experience map space as shown in a. The language of the agent with each toponym given a colour, and each location in experience map space coloured with the toponym used in that location, to a resolution of 1/16 m2.

c) Word Locations The square represents the experience map space as shown in a. The ‘best’ location for each word is shown.

d) Word Coverage The x-axis shows the toponyms in order of invention. The y-axis shows the area covered by each toponym in m2.The coverage for each word in the language is shown.

Figure 6.12 Shared language (1B: low temperature)

An example language for a low temperature with Simulated Robot 1 on the left and

Simulated Robot 2 on the right with the a) experience map, b) language layout, c)

word locations, and d) word coverage. Note that a-c have been rotated to aid

comparison between the simulated robots.


107

a) Word Usage over the Simulated Robots’

Interactions The x-axis shows the interactions of the robots. The y-axis shows the toponyms in order of invention. The usage of each word throughout the interactions is shown. Note the first usage of each word, and the use of the words throughout the interactions.

b) Toponym Value at the Interaction Location

The x-axis shows the interactions of the robots. The y-axis shows the value of the toponym used at the interaction location. The value of the toponym is the information value of the word-location combination, as defined in Equation 4.8. Note the range of values which result are those considered ‘acceptable’ given the word invention temperature.

Figure 6.13 Interactions (1B: low temperature)

Four of the five words were invented early. The toponym value used in each

interaction generally remained between 0.4 and 0.7. The toponym value is zero

when no toponym has been associated with experiences in the neighbourhood of

the current experience, or when the agent is the hearer and has just created a new

experience that has not been placed in the map.


108


a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 6.14 Shared language (1B: medium temperature)

An example language for a medium temperature and neighbourhood. There was a

range of areas covered by the words, with some covering small areas, and others

covering large areas. Note that some of the words were outside the world of the

simulated robots due to the interactions between words and the neighbourhood size

used. See the Figure 6.12 caption for more detail about the elements of the figure.


109

a)

b)

Figure 6.15 Interactions (1B: medium temperature)

a) Word Usage over the Simulated Robots’ Interactions and b) Toponym Value at

the Interaction Location. Five of the fifteen words were invented early. The value

of the toponym used in each interaction generally remained between 0.5 and 0.8.

See the Figure 6.13 caption for more detail about the elements of the figure.


110


a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 6.16 Shared language (1B: high temperature)

An example language for a high temperature. There was a range of areas covered

by the words: not used, covering very small areas, and covering large areas. Note

that some of the words were outside the world of the simulated robots due to the

interactions between words and the neighbourhood size used. See the Figure 6.12

caption for more detail about the elements of the figure.


111

a)

b)

Figure 6.17 Interactions (1B: high temperature)


the Interaction Location. Seven of the 24 words were invented early, with the

remainder invented consistently through the run. The value of the toponym used in

each interaction generally remained between 0.6 and 0.9. See Figure 6.13 caption

for more detail about the elements of the figure.


112


a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 6.18 Shared language (1B: small neighbourhood)

An example language for a small neighbourhood. There was a range of areas

covered by the words: not used, covering very small areas, and covering large

areas. See the Figure 6.12 caption for more detail about the elements of the figure.


113

a)

b)

Figure 6.19 Interactions (1B: small neighbourhood)

a) Word Usage over the Interactions of the Simulated Robots and b) Toponym

Value at the Interaction Location. Twelve of the 27 words were invented early,

with the remainder invented consistently through the run. The value of the

toponym used in each interaction generally remained between 0.5 and 0.8. See the

Figure 6.13 caption for more detail about the elements of the figure.


114


a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 6.20 Shared language (1B: large neighbourhood)

An example language for a large neighbourhood. The languages were very closely

matched between the two agents, with each word covering a large area in the

world. See the Figure 6.12 caption for more detail about the elements of the figure.


115

a)

b)

Figure 6.21 Interactions (1B: large neighbourhood)


the Interaction Location. Four of the five words were invented early, with the final

word invented halfway through the run. The value of the toponym used in each

interaction generally remained between 0.5 and 0.8. See the Figure 6.13 caption for

more detail about the elements of the figure.

6.2.3 Discussion

Smaller languages resulted from lower temperatures and larger neighbourhood sizes, while larger

languages resulted from higher temperatures and smaller neighbourhood sizes. Higher coherence

resulted from lower temperatures and larger neighbourhood sizes. The results of the ‘go to’ games

were similar across the different temperatures and neighbourhood sizes, with most of the games

resulting in the simulated robots meeting each other at the goal location. Even though the area

covered by toponyms was larger for the lower temperatures and larger neighbourhood sizes, the

location interpreted as best representing each toponym tended to remain similar between the robots,


116

as shown by the large proportion of all games where the robots met each other at the goal location

within 1m.

Study 1B demonstrated how toponyms could be formed for all places in the world visited by

both simulated robots, where the robots built their own personal experience maps of the world and

played toponymic language games when within hearing distance of each other.

6.3 Study 1C: Real World

Study 1C involved the implementation of ‘where are we’ and ‘go to’ games in the real robots. The

challenges of the real world include the limited battery life of the robots, the difficulties of not

hearing or mishearing each others utterances, and changing features of the world. This study

investigated the influence of two error detection strategies, with a comparison made over three

languages for each condition.

The goal of Study 1C was to determine whether useful toponymic languages could be formed

through the interactions of real robots playing ‘where are we’ games. The specific aims of Study 1C

were to determine if the real world issue of noise in the perceptual data obtained by the robots about

odometry, vision, and hearing affected the languages that formed with respect to lexicon size, word

coverage, and the successful use of the language in ‘go to’ games. The hypothesis was that the

languages would be less coherent than those formed in the simulation world, but that the robots

would still be able to play ‘go to’ games successfully. Study 1C extended Study 1B into the real

world (see Figure 6.22, for more detail refer to section 4.5.3).

Figure 6.22 Real world

The robot's world comprises halls and open plan offices. A layout of the obstacles

in the room and the approximate path of the robots are shown.

Note that the room used in the real world is different to the room used in the simulation world

(compare to Figure 6.10). The real world room is smaller, with fewer obstacles, and smaller open

areas. The robots are able to find each other more readily due to the size of the rooms and the

placement of the obstacles, with fewer loops in the exploration of the environment.


117


The concept representations were experiences from RatSLAM formed using an omni-directional

camera. The word representations for the robots in the real world were DTMF tones. The robots

played ‘where are we’ and ‘go to’ games in the real world. In each session, only one type of game

was played, to reduce the potential problems of mishearing. A distributed lexicon table was used to

associate experiences and words (for more detail refer to section 4.3.4), with the relative

neighbourhood information strategy used to choose words. Inventing words was done

probabilistically with a temperature of 0.5. The lexicon was updated when words and concepts were

used together by increasing the association by 1.0. Forgetting was not implemented directly.

Each sequence comprised a series of two hour sessions followed by a final session. In the first

sessions, the robots played ‘where are we’ games to form their lexicons. In the final session, the

lexicons of the robots were tested with the robots playing ‘go to’ games. The final session was run

until the robots had played 25 ‘go to’ games. For a summary of the parameters, see Table 6.4.

Table 6.4 Parameters for Study 1C

Parameters Study 1C: Minimal Study 1C: Checksum Game ‘where are we’ and ‘go to’ ‘where are we’ and ‘go to’

Hearing Distance ~5m ~3m Concept type Location Location

Concept representation Experiences (real world, omni-directional camera)

Experiences (real world, omni-directional camera)

Word representation Tones Tones Lexicon technique Distributed lexicon table Distributed lexicon table


Relative neighbourhood most informative

Neighbourhood size 5m 2mForgetting No No Updating Hearer and Speaker Hearer and Speaker

Strategy for word invention Temperature Temperature Temperature 0.5 0.5Generations 1 1

Agents 2 2Interactions per generation 4 hours 6 hours

Initial learning period 0 0World Real World Real World

Error detection Minimal Checksum

Two conditions were tested comparing different levels of error detection:

• minimal and

• checksum.

In the first condition, the only error detection was whether the correct number of tones had been

received, and whether the structure of the grammar matched what was expected. The


118

neighbourhood size was set to 5m, and the volume of the robots set so that the hearing distance was

approximately 5m when within line of sight. The robots played ‘where are we’ games for two

sessions of two hours. In the second condition, a checksum was included in the transmission of

words. The checksum was a simple additive checksum. For the checksum error detection condition,

the neighbourhood size was set to 2m, and the volume of the robots set so that the hearing distance

was approximately 3m when within line of sight. To allow the robots to play a similar number of

games as in the first condition, the robots played ‘where are we’ games for three sessions of two

hours for each language. Three languages were formed and tested for each condition.

6.3.2 Results

In both error detection conditions the robots developed a shared set of toponyms. When checksum

error detection was implemented, a smaller neighbourhood size was required to form languages of

approximately the same size as those formed with minimal error detection (see Table 6.5). The area

covered by the languages differed between the two conditions due to the neighbourhood sizes used

as a smaller neighbourhood size results in a language that covers a smaller area. The checksum

error detection condition resulted in languages that were more coherent, also shown by the robots’

performance in ‘go to’ games (see Figure 6.23).

Figure 6.23 Results of ‘go to’ games (1C)

For minimal error detection, 58.7% of the games resulted in the robots meeting

each other at the goal location. For checksum error detection 70.0% of the games

resulted in the robots meeting each other at the goal location. Compared to the

simulation world (see Figure 6.11) a greater percentage of the games resulted in

failure or not meeting at the goal location. The reduction in success was due to

difficulty in the robots hearing each other and to reduced language coherence.


119

Table 6.5 Results for Study 1C

Measure ( x (σ)) Minimal Checksum Number of Toponyms 10.7 (1.8) 8.8 (0.8)

Area Covered per Toponym Used (m2) 22.7 (17.2) 9.9 (4.5) Area Covered by Language (m2) 216.1 (22.3) 84.0 (3.0)

Coherence 0.22 (0.10) 0.30 (0.15)

Minimal Error Detection

In all three runs the robots developed a shared set of toponyms, with between eight and fourteen

words in the lexicon (average of 10.7, see Table 6.5). The toponymic language covered an average

of 216.1m2. 58.7% of the ‘go to’ games resulted in the robots meeting each other at the goal

location (see Figure 6.23).

Most of the words were invented in the first half of the interactions with the robots reusing

words already invented when they covered the same locations in the world (see Figure 6.24). The

value of the toponym for the current location increased through the interactions. The language

resulting for each agent in one of the runs is shown in Figure 6.25. In all runs, robot 2 had one or

two more words in their lexicon, due to the microphone picking up more beeps.

On investigation of the word invention behaviours of the robots, it was discovered that in most

cases, new words were learned by the robot as hearers, rather than invented by the robots as

speakers. In many of these cases the hearer robot had misheard the word sent by the speaker, and

added the misheard word to its lexicon.

a) b)

Figure 6.24 Interactions (1C: minimal)

a) Word Usage over the Interactions of the Simulated Robots b) Value of Toponym

at Interaction Location. Most of the words were invented in the first half of the

interactions. The value of the toponym at the interaction location increased to

between 0.6 and 0.9. See the Figure 6.13 caption for more detail about the elements

of the figure.


120

Robot 1 Robot 2

a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 6.25 Shared language (1C: minimal)

An example language for both robots of one run, with ten words used by at least

one robot. Robot 2 has concepts for three more words than Robot 1. Note that some

of the words were outside the world of the simulated robots due to the interactions

between words and the neighbourhood size used. See the Figure 6.12 caption for



121

Checksum Error Detection

In all three runs the robots developed a shared set of toponyms, with between 8 and 9 words in the

robots’ lexicon (average of 8.8, see Table 6.5). The toponymic language covered an average of

84.0m2. 70.0% of the ‘go to’ games resulted in the robots meeting each other at the goal location

(see Figure 6.23).

Most of the words were invented in the first half of the interactions with the robots reusing

words already invented when they covered the same locations in the world (see Figure 6.26). The

value of the toponym for the current location increased through the interactions. The language

resulting for each agent in one of the runs is shown in Figure 6.27. With checksum error detection,

new words acquired by the robots were restricted to those that were invented by the robots.

a) b)

Figure 6.26 Interactions (1C: checksum)

a) Word Usage over the Interactions of the Simulated Robots b) Value of Toponym

at Interaction Location. The words were invented in two stages as the robots

explored the world together. The second stage corresponded to the robots

interacting in different areas in the world. The value of the toponym increased to

between 0.2 and 0.8. The large spread of values was due to the influence of

neighbourhood size on the value of the toponym. See the Figure 6.13 caption for



122

Robot 1 Robot 2

a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 6.27 Shared language (1C: checksum)

An example language for both robots of one run, with nine words used by each

robot. The area for which each word was used was very similar between the robots,

for example, the two words covering triangle shaped areas in the middle of the

world (‘tilo’ and ‘loto’). Note that all of the words were within the world of the

simulated robots. See the Figure 6.12 caption for more detail about the elements of

the figure.


123

6.3.3 Discussion

Study 1C: Real World showed that ‘where are we’ games can be played with robots in the real

world and that the simulation world results were reproducible in the real robots. In both conditions

(minimal and checksum error detection) the robots developed a shared set of toponyms. The

languages formed by the real robots covered smaller areas than those formed by the simulated

robots due to the smaller size of the room in the real world. The coverage of each toponym was

similar between the simulated and real robots when comparing the minimal error detection to the

medium neighbourhood size and comparing the checksum error detection to the small

neighbourhood size.

The toponyms formed by the robots using checksum error detection were all situated in

locations in which the robots interacted, with most of the words situated in the large open area to

the top right of the room, where the robots interacted most often (see Figure 6.27c). In comparison,

human terms used to describe areas in the room are related to interactions between people, actions

that occur at locations, or ownership of the locations. Some examples of human terms for areas in

the world are the ‘entry’ in the top right, the discussion space in the top left, and Ruth’s desk in the

bottom right corner (see Figure 6.28). Open areas such as the one above the fridge do not have their

own name, as interactions between people tend not to occur in this location. As the only actions of

the robots involve building internal maps and interacting with other robots, interaction locations are

currently the only places that can be labelled by the robots.

Figure 6.28 Real world with human labels

Human terms referring to locations in an office environment describe areas in

which interactions take place (the discussion space), to actions that occur (the

entry), or to ownership (Ruth’s desk). Some areas do not have specific names, such

as the open space above the fridge where few interactions occur.

Entry

Ruth’s Desk

Discussion space

Fridge


124

With minimal error detection, the robots misheard each other regularly, resulting in the hearer

adding a new word when the speaker had used an existing word. Despite the difficulties with real

robot implementation, the robots using minimal error detection were able to form useful toponymic

languages, meeting each other at a goal location in 58.7% of the ‘go to’ games played. Adding

checksum error detection resulted in more coherent languages with a higher success rate for the ‘go

to’ games with the robots meeting each other in 70% of the games played.

6.4 Discussion: A Toponymic Language

Study 1 addressed the question of how social interactions impact on toponymic languages, and has

shown how a structured toponymic language can be formed from simple social interactions. The

language game method (Steels, 2001) was extended to a location language game method where

mobile robots share attention by being located near each other. In playing ‘where are we’ games,

the actions of the robots were to interact when they could hear each other and to associate words

used with the current location. A toponymic language that described all the locations visited by the

robots was formed through these actions, resulting in the construction of specific place from general

space (Tuan, 1975), and in toponyms becoming landmarks used to describe ‘where’ (Tversky,

2003).

The toponymic languages allowed goal locations to be set by specifying the label associated

with it. A toponym is simpler to specify than visual information at specific locations, as many

locations in the world may have similar views, or by exact co-ordinates, which requires the same

detailed map to be shared between the robots.

The experience map used in this study for concept representation required the design of a

method for concept formation: The Distributed Lexicon Table. Rather than the formation of

categories prior to language learning (Bodik & Takac, 2003; Smith, 2001; Steels & Loetzsch,

2007), the distributed lexicon table, with methods for updating, producing, and comprehending

words, allowed concept and word formation to interact.

The distributed lexicon table method for concept formation and word usage combines the rapid

learning from exemplars of lexicon tables (Smith, 2001; Steels, 1999) and generalisation of neural

networks (Batali, 1998; Cangelosi, 2001; Kirby & Hurford, 2002; Tonkes et al., 2000). Concepts

result from the associations between concept elements and words as well as the methods for

producing words and comprehending concepts.

In the simulation and real world studies, words were chosen using the relative neighbourhood

most informative strategy, which was developed for use with the distributed lexicon table. This

strategy allowed the words to be more evenly spread across concept space, but new words were

always adopted, becoming the most informative word for the concept they were first used for.


125

While the languages formed throughout Study 1 using the relative neighbourhood most information

strategy were able to be used successfully, it is probable that other methods may provide a better

balance between specificity and stability of the lexicon.

The location concepts were formed while words were associated with the concept elements, and

the areas covered by the concepts were determined by the interactions of the robots in the world.

The studies described in this chapter demonstrated the co-development of concepts and words, and

the way in which words and interactions between agents can affect the concepts that form.

The inclusion of checksum error detection showed that in order to form a coherent language in

the real world, agents need to put additional effort into making sure that they can understand each

other.

In the following chapter, a study including the formation of the spatial concepts of direction and

distance is presented. Distance and direction concepts allow the agents to talk about locations in

space other than their current location.


126

127

Chapter 7 A Generative Spatial Language Game

Homer: What’s an e-mail?

Lenny: It’s a computer thing, like, er, an electric letter.

Carl: Or a quiet phone call.

(Groening, 2000)

A key challenge for embodied language games is for the agents to refer to locations other than those

they have visited. This challenge requires both relational terms and the ability to take into account

the agents’ different perspectives. The ‘where is there’ game, adapted from previous spatial

language games (Bodik & Takac, 2003; Steels, 1995), is based on naming three locations: Both

agents are located within hearing distance at the first (current) location, they are facing the second

(orientation) location, hence aligning their perspectives, and then they talk about a third (target)

location (see Figure 7.1). Given the three locations, agents can describe the target location with

spatial words of distance and direction. The ‘where is there’ game allows agents to talk about places

that they have never visited or can never visit.

Figure 7.1 A generative language game

The agent is at Current facing Orientation and talking about Target: toponyms are

selected for the current, orientation, and target locations, and spatial words are

selected for the direction, θ, and distance, d.

The ‘where is there’ game relies on agents having some toponyms to describe locations. The

minimum number of toponyms required when no spatial language exists is three, with one for each

of the current, orientation, and target locations. When there are spatial words to describe the

d

θ

Orientation

Target Current


128

direction and distance, only two are needed, with one for each of the current and orientation

locations, and a word may be invented for the target location. Direction and distance are calculated

from the current, orientation, and target locations. For directions and distances, each concept

element is a range of directions or distances.

Spatial words can be used to create a template, given the current location and orientation. The

spatial words template describes the locations in the world that are referred to when using the

combination of distance and direction words at the current location and orientation. When a specific

location is required, the ‘best’ location of the combination of spatial words can be determined. A

measure used for the ‘where is there’ games is the match between the toponym and spatial words

template, found by considering how well the toponym template matches the spatial words template

given the current location and orientation (see a two dimensional example in Figure 7.2). The match

between the toponym and spatial templates is an indication of how appropriate the spatial words are

for the current situation, calculated as follows:

∑ ==

L

l sltl ttmatch1

),min( Equation 7.1

where ttl is the value of the toponym template at location, l, tsl is the value of the spatial template

at location, l, and L is the number of locations over which the match is calculated.

Figure 7.2 Match between templates

A good and a bad match between toponym and spatial words templates in two

dimensions.

The purpose of Study 2 was to investigate the formation of spatial terms grounded in the spatial

representations of robots. The aims of Study 2 were to determine what was required for the

formation of spatial terms and to determine the effect of conceptualisation order on the resulting

languages. The robots’ task was to form a spatial language and to label locations. The games played

in Study 2 were the ‘where are we’, ‘go to’, and ‘where is there’ games. The ‘where are we’ games

were used in the initial formation of the toponym lexicon. The ‘go to’ games were used to influence

the behaviour of the robots, so that games were played more frequently, and to test the coherence of

the languages. The ‘where is there’ games allowed the agents to form distance and direction

lexicons and to use these to invent new target toponyms.

Toponym template

Spatial words template

Toponym template

Spatial words template


129

Chapter 7 deals with the design and implementation of a generative spatial language game,

conducted first in simulation and then real robots, investigating the formation of spatial terms

grounded in experience and behaviour5. The studies described are:

• Study 2A: Grid world,

• Study 2B: Simulation world, and

• Study 2C: Real world.

The final section is a general discussion of the generative language game.

7.1 Study 2A: Grid World

The spatial language game was first implemented in the grid world to investigate how spatial

concepts could be formed, given a simple spatial representation of the world (for more detail about

the grid world refer to section 4.5.1). The aims of Study 2A were to determine the effect of the

following features of spatial language games on the resulting languages:

• the size of the grid world,

• obstacles in the world, and

• generations of agents.

In Study 2A agents played ‘where are we’ games in a grid world of varying sizes, with various

obstacles. The representations used for location concept elements were the grid squares. Distance

and direction concepts were calculated from the location of the current, orientation, and target

location squares. The number of direction and distance elements was 50 each. Words were

represented as integers. For each game, the speaker and hearer were chosen randomly. The speaker

was placed in a random square, and the hearer was placed in a square within the neighbourhood of

the speaker. The hearing distance and neighbourhood size were a small diamond of five squares.

Words were invented probabilistically with a temperature of 0.25. The distributed lexicon table was

used to associate words and concepts (for more detail refer to section 4.3.4). Associations between

words and concepts were increased by adding 1.0. Forgetting was implemented with 0.2 taken away

from unused associations. The minimum association value was 0.0. For each experiment the

interactions, language size, and language coherence is presented, together with an example

5 This chapter covers in more detail the work presented in Studies 2 and 3 of Schulz, R., Prasser, D., Stockwell, P.,

Wyeth, G., & Wiles, J. (2008). The formation, generative power, and evolution of toponyms: Grounding a spatial

vocabulary in a cognitive map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language:

Proceedings of the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore: World Scientific Press. The

work presented in the paper was done under the supervision of Janet Wiles and Gordon Wyeth, and with design

discussion and writing assistance from David Prasser and Paul Stockwell.


130

language for each condition (for more detail about the performance measures refer to section 4.6).

For a summary of the parameters for Study 2Ai-iii, see Table 7.1.

Table 7.1 Parameters for Study 2A

Parameters 2Ai: World size 2Aii: Obstacles 2Aiii: Generations

Game ‘where are we’ and ‘where is there’

‘where are we’ and ‘where is there’

‘where are we’ and ‘where is there’

Hearing distance Diamond (5 squares) Diamond (5 squares) Diamond (5 squares)

Concept type Location, distance, direction

Location, distance, direction

Location, distance, direction

Concept representation Squares of grid Squares of grid Squares of grid Word representation Integers Integers Integers

Lexicon technique Distributed lexicon table Distributed lexicon table

Distributed lexicon table

Strategy for choosing words

Neighbourhood most informative



Neighbourhood size Diamond (5 squares) Diamond (5 squares) Diamond (5 squares) Forgetting Yes Yes Yes Updating Hearer only Hearer only Hearer only

Strategy for word invention Temperature Temperature Temperature Temperature 0.25 0.25 0.25Generations 1 1 50, 25, 10

Agents 2 2 2Interactions per generation 10,000 10,000 1000, 2000, 5000

Initial learning period 0 0 500, 1000, 2500World Grid World Grid World Grid World

Size 5 × 5, 10 × 10, 15 × 15, 20 × 20 15 × 15 15 × 15

Obstacles None None, desks, perimeter Desks

7.1.1 Study 2Ai: World Size

Study 2Ai: World Size investigated the influence of the size of the grid world. The aim of the study

was to determine if world size affected the lexicon size, the words coverage, and the coherence of

the resulting languages.

Experimental Setup: In Study 2Ai, 2 agents played location and spatial language games in an

empty grid world of 5 × 5, 10 × 10, 15 × 15, and 20 × 20. There were five runs of 10,000

interactions for each world size. The number of concepts formed, the agents’ coherence, and the

results of the language games were compared. For a summary of parameters, see Table 7.1.

Results: In each world size, a toponymic language formed, with toponyms covering the space

relatively uniformly. The number of toponyms used increased with world size, and the area covered

by each toponym increased with world size, with 3.8 squares per toponym on average for the 5 × 5

world, up to 8.4 squares per toponym for the 20 × 20 world (see Table 7.2). The number of spatial


131

words increased with the number of toponyms. With more toponyms, more distinctions can be

made about the distances and directions between toponyms. The value of the toponym, the match

between templates, and coherence rose quicker for smaller world sizes, with agents in larger worlds

taking longer to reach a successful and coherent language (see Figure 7.3 and Figure 7.4).

Table 7.2 Results for Study 2Ai

World Size Measure ( x (σ)) 5 × 5 10 × 10 15 × 15 20 × 20

Toponyms Invented 7.0 (2.4) 23.2 (2.0) 36.0 (3.3) 51.4 (4.2)Toponyms Used 6.6 (2.1) 21.7 (2.1) 34.1 (3.3) 47.4 (5.1)

Squares per Toponym Used 3.8 (2.0) 4.6 (2.3) 6.6 (2.9) 8.4 (3.9) Distance Words 3.6 (1.7) 6.0 (2.0) 7.2 (2.0) 8.8 (1.5) Direction Words 3.6 (1.7) 6.0 (2.0) 7.2 (2.0) 8.8 (1.5)

a) b)

c) d)

Figure 7.3 Worlds size results (2Ai)

The toponym value at the interaction location for the toponym language games and

the match between the toponym and spatial templates for the generative language

games for a) 5 × 5, b) 10 × 10, c) 15 × 15, and d) 20 × 20 grid worlds for the

hearer agent, averaged every 100 games for the five runs. The larger the world, the

longer taken to reach high levels for the toponym value and match between

templates and the lower the stable value for each.


132

a) b)

c) d)

Figure 7.4 World size coherence (2Ai)

The coherence of the toponyms, direction and distance words for a) 5 × 5, b) 10 ×

10, c) 15 × 15, and d) 20 × 20 grid worlds, averaged over each of the 5 runs,

recorded every 1000 games. The number of games required to reach toponym

coherence increased with world size. The dip in distance and direction coherence in

the 5 × 5 world at 8000 games was due to three of the five runs inventing spatial

words between 7000 and 8000 games. When a word was invented, the coherence

was reduced until the word had propagated through both agents’ lexicons.

In the example languages for each world size (see Figure 7.5, Figure 7.6, Figure 7.7, and Figure

7.8), the associations of each toponym, direction, and distance word is shown. Most toponyms are

distinct and cover single areas in the world. In larger world sizes, some toponyms cover multiple

areas. Most distance and direction words were specific although a few cover larger areas of concept

space, particularly direction words covering the area behind the agent. Direction words covering the

area behind the agent were general due to the structure of the language game. Agents only face

areas within the world which resulted in most of the targets being in front of the agent.


133

a) Toponym lexicon Each square represents the grid world and refers to one toponym. For each toponym, the associations are shown, with the most associated square black.

b) Distance lexicon Each row represents distances, with left being ‘close’ and right being ‘far’. For each distance word the associations are shown.

c) Direction lexicon Each square represents the direction concepts, with straight ahead being at the top, and behind being at the bottom. For each direction word, the associations are shown.

Figure 7.5 Example language (2Ai: 5 × 5)

The language of one of the agents in the 5 × 5 world for a) toponym lexicon, b)

distance lexicon, and c) direction lexicon. Each of the five toponyms covered a

different area in the world. Three of the four distances were similar, being ‘close’,

while one was ‘far’. The four directions can be interpreted as ‘right’, ‘front’, ‘left’,

and ‘general’.

a) Toponym lexicon

b) Distance lexicon

c) Direction lexicon



distance lexicon, and c) direction lexicon. Each of the 20 toponyms covered an

average of five squares. Each of the six distances covered a different spread of

possible distances. The six directions could be termed: ‘right’, ‘front’, ‘front right’,

‘left’, ‘front left’, and ‘behind’. See the Figure 7.5 caption for more detail about the

elements of the figure.


134

a) Toponym lexicon

b) Distance lexicon




distance lexicon, and c) direction lexicon. Each of the toponyms covered

approximately the same area in the world. See the Figure 7.5 caption for more

detail about the elements of the figure.

a) Toponym lexicon

b) Distance lexicon




distance lexicon, and c) direction lexicon. Note that some of the toponyms referred

to multiple locations in the world. See the Figure 7.5 caption for more detail about

the elements of the figure.


135

7.1.2 Study 2Aii: Obstacles

Study 2Aii: Obstacles investigated the influence of obstacles in the world. The aim of the study was

to determine if agents could form toponyms in locations covered by obstacles and to determine the

influence of the obstacles on the resulting lexicons for toponyms, directions, and distances.

Experimental Setup: In Study 2Aii, two agents played location and spatial language games in a

15 × 15 grid which either had a perimeter in which the agents could move or had ‘desks’ through

the world (see Figure 7.9). There were five runs of 10,000 interactions for each world. The number

of concepts formed, the coherence, and the results of the language games were compared. The

languages were also compared to the languages from the empty 15 × 15 grid world used in the

world size study. For a summary of parameters see Table 7.1.

a)

b)

Figure 7.9 Grid world with obstacles

The grid world with obstacles of a) desks and b) a perimeter. The agents may

occupy any square not covered by an obstacle.

Results: In the empty world, the world with desks, and the perimeter world the rate of word

invention was highest for the first 100 interactions and agents continued to invent words throughout

each trial. The toponyms invented and used by the agents in the empty world were all specific,

some of the toponyms used by agents in the world with desks were general, and about half of the

words in the perimeter world were general. The average final lexicon in the empty world had 36.0

toponyms, in the world with desks had 41.0 toponyms, and in the perimeter world, 42.0 toponyms


136

(see Table 7.3). There were more toponyms in the world with desks and in the perimeter world

because they included general toponyms, which cover similar areas.

Table 7.3 Results for Study 2Aii

Measure ( x (σ)) Empty Desks Perimeter Toponyms Invented 36.0 (3.3) 41.0 (3.5) 42.0 (5.5)

Toponyms Used 34.1 (3.3) 24.9 (2.1) 29.4 (3.7) Squares per Toponym Used 6.6 (2.9) 9.0 (5.4) 7.7 (4.8)

Distance Words 7.2 (2.0) 10.6 (2.0) 17.0 (7.3) Direction Words 7.2 (2.0) 10.6 (2.0) 17.0 (7.3)

The toponym value at the interaction location was higher in the worlds with obstacles (see

Figure 7.10), as the interactions occurred at a subset of the possible locations (81 squares for desks

and 56 squares for perimeter compared to 225 for empty) and the words referring to the locations

visited by the agents were more specific. The match between the templates for the ‘where is there’

game increased to about 0.6, compared to about 0.55 for the empty world. There was only a

minimal change in the match between templates, as the size of the spatial templates changed

together with the size of the toponyms templates (see example languages in Figure 7.12 and Figure

7.13).

a) b)

Figure 7.10 Obstacles results (2Aii)

The toponym value at the interaction location for the toponym language games and

the match between the toponym and spatial templates for the generative language

games for a) the world with desks and b) the perimeter world. Compare to the

empty 15 × 15 world in the previous experiment (Figure 7.3c). The value of the

toponym was higher for both, as fewer squares are visited by the agents when there

are obstacles in the world.

Due to the increasing number of words and the general nature of some of the words, the agents

in the world with desks and the perimeter world obtained a lower level of coherence for each of the

types of words compared to the empty world (see Figure 7.11).


137

a) b)

Figure 7.11 Obstacles coherence (2Aii)

The coherence of toponyms, distances, and directions in a) the world with desks

and b) the perimeter world. Compare to the empty 15 × 15 world in the previous

experiment (Figure 7.4c). The coherence was much lower for the perimeter world

as there was much greater uncertainty in this world.

a) Toponym lexicon

b) Distance lexicon


Figure 7.12 Example language (2Aii: Desks)

The language for one of the agents in the world with desks for a) use of toponym

lexicon, b) use of distance lexicon, and c) use of direction lexicon.. Note the

specific and general toponyms in the lexicon, where general toponyms tended to be

used in the area of space covered by one desk. See the Figure 7.5 caption for more



138

a) Toponym lexicon

b) Distance lexicon


Figure 7.13 Example language (2Aii: Perimeter)

The language for one of the agents in the perimeter world for a) toponym lexicon,

b) distance lexicon, and c) direction lexicon. Note the specific and general

toponyms in the lexicon, specific words were confined to the perimeter of the

world, and general words were used in the interior. See the Figure 7.5 caption for


7.1.3 Study 2Aiii: Generations of Agents

Study 2Aiii: Generations of Agents investigated the influence of generations. The aim of the study

was to determine how the lexicons for toponyms, distances, and directions changed through

generations of agents with respect to lexicon size, coherence, and the types of concepts.

Experimental Setup: In Study 2Aiii, agents played location and spatial language games in a 15

× 15 grid with desks. The number of concepts formed, the coherence of the agents, and the results

of the language games across the generations were compared. Generations consisted of a set of

interactions, g. In the initial population two agents played negotiation games. In subsequent

generations, the older agent was replaced by a new agent, initially as a hearer. After g/2

interactions, the agents played negotiation games. Three conditions were tested based on g = 1000,

g = 2000, and g = 5000, each consisting of five trials of 50,000 interactions.

Results: The first generation for each trial formed their language through negotiation, in which

the value of the toponym at the interaction location and the match between toponym and spatial


139

templates increased as the languages were formed. Over generations, specific toponyms tended to

remain stable, as did the concepts for directions and distances while the more general toponyms

shifted to become more specific (see Figure 7.14).

a)

b)

c)

d)

e) Figure 7.14 Toponym change throughout generations (2Aiii)

Each row (a-e) shows the change in meaning of a toponym, as most information

templates, through ten generations of agents with g = 2000. In a row, each square

represents the world of the agents and shows the locations in the world for which

the toponym of the row provides the most information. Each row shows a different

type of toponym: a) a specific toponym that did not alter much throughout the

generations, b) a toponym that initially referred to multiple specific locations, but

only referred to one location after several generations, c) a specific word that

became more general, d) a general word that remained general but shifted in

meaning, and e) a general word that became more specific. The areas in the world

associated with general words were more likely to change throughout generations

as they were reinterpreted by the new agents in the population.

For the ‘where are we’ games, the toponym value for the interaction location was just under 0.8

for g = 1000 and just over 0.8 for g = 5000. For the ‘where is there’ games, the match between

spatial templates was just over 0.5 for g = 1000 and just under 0.6 for g = 5000. As a new agent

entered the population, they began by learning from the older agent, which caused a drop in both

measures that quickly returned to a high level as the new agents learned the language (see Figure

7.15).


140

a)

b)

c)

Figure 7.15 Generations results (2Aiii)

Toponym value at the interaction location for the toponym language game, and

match between the toponym and spatial templates for the generative language

game over generations for a) g = 1000, b) g = 2000, and c) g = 5000. The drop in

toponym value and match between templates occurred at the changeover of

generations as the new agent learned the older agent’s language. Over the first few

generations there was a gradual increase in the steady value of the toponym and

match between templates.

The language coherence was calculated for each of the lexicons (toponyms, distances,

directions) after every 1000 games (see Figure 7.16). The toponym coherence remained fairly stable

for all values of g, between about 0.8 and 0.9. The coherence of the distances and directions


141

decreased for all values of g, most notably for direction words with g = 1000. The coherence of a

language decreased when new words were invented and increased as the meanings for words were

agreed upon between the agents. When g = 1000, new distance and direction words were being

invented in each generation. When g = 2000 and g = 5000, agents had longer to learn the words

used in the previous generation, resulting in higher coherence.

a)

b)

c)

Figure 7.16 Generations coherence (2Aiii)

Coherence of languages over generations for a) g = 1000, b) g = 2000, and c) g =

5000. For each condition, the coherence of the toponyms remained stable between

0.8 and 0.9. The coherence of the distances and directions dropped throughout the

interactions as more words were invented for which the meanings did not have

time to become coherent.


142

There was an increase in the number of toponyms invented and used as the agents had more

interactions per generation (see Table 7.4). The increase in the number of toponyms was correlated

with the number of distance and direction words.

Table 7.4 Results for Study 2Aiii

Interactions per Generation (g) Measure ( x (σ)) 1000 2000 5000

Toponyms 40.0 (2.1) 44.8 (2.0) 47.0 (4.1)Toponyms Used 26.6 (1.9) 28.5 (1.4) 28.7 (1.9)

Squares per Toponym Used 8.5 (5.5) 7.9 (5.3) 7.8 (5.4) Distance Word 26.2 (3.1) 29.7 (4.3) 35.3 (2.2)

Direction Words 32.5 (4.1) 31.4 (4.7) 35.8 (1.9)

7.1.4 Discussion

The purpose of Study 2A was to explain how spatial concepts can be formed through interactions

with other agents, and to investigate the impact of different world sizes, obstacles in the world, and

population dynamics on the languages formed.

The significance of the world size study was that toponyms can be formed in any world size,

and direction and distance words can refer to the spatial relations. The significance of the obstacles

study was that the agents can refer to places in the world that have never been visited. The words

for locations never visited tended to be more general, as they are only referred to indirectly, never

by direct interaction. Agents were able to refer to concepts that had not been directly experienced.

Generations allow the populations to forget words that were used less often, and to allow words that

were used more often to spread through the concept space. Words referring to specific locations

were transferred through the generations, while the meanings of words referring to general locations

shifted, compared to single generation where words referring to both specific and general locations

remained fairly static after they had been formed and used several times.

Study 2Ai – iii showed how a generative toponymic language may form and evolve in a

population of agents. Agents were able to form concepts for locations, directions, and distances as

they interacted with each other and associated words with underlying values. Relationships between

existing concepts were used to expand the concept space to new locations. The following sections

extend the grid world study into the simulation and real world.

7.2 Study 2B: Simulation World

Study 2B involved the implementation of the ‘where are we’, ‘where is there’, and ‘go to’ games in

the simulation world (for more detail refer to section 4.5.2). The additional challenge of the

simulation world was that the representations were formed individually by each simulated robot. An


143

investigation was undertaken into how the conceptualisation order for toponyms, directions, and

distances affected the resulting language.

The aim of Study 2B was to determine whether the representations of a simulated robot with an

experience map were appropriate for the formation of a toponymic and spatial language using the

interactions of the ‘where is there’ language game. Additionally, the study investigated whether

conceptualisation order made a difference to the languages that formed. Study 1 showed that a

toponymic language may be formed through ‘where are we’ interactions and Study 2A showed that

a spatial language may be formed when the agents have simple and matching concept

representations. Study 2B extends Study 2A with simulated robots that formed more complex

representations of space that differed between the robots.


As in Study 1B, the concept representations used were the robots RatSLAM experiences formed

using a forward facing camera, words were represented as strings, the robots autonomously

wandered through the world and played a game when they were close to each other, the distributed

lexicon table was used to associate words and concepts (for more detail refer to section 4.3.4), the

relative neighbourhood most informative strategy was used to choose words, the temperature

strategy used for word invention, both the speaker’s and the hearer’s lexicon were updated in every

game, and forgetting was not directly implemented.

In addition to the use of experiences as concept elements, pseudo-experiences were used. As

experiences do not cover all locations, the best location referred to by a distance and direction

concept may be in a part of the experience map co-ordinate space where no experiences are nearby.

When a location is referred to that does not have a nearby experience, a pseudo-experience is placed

in the map and linked to two experiences that are close to the location that was referred to, and the

pseudo-experience is associated with the target word. When map correction is performed on the

experience map, the pseudo-experiences are moved, based on the new locations of the experiences

they are linked to.

The simulated robots’ languages were monitored with the final language layout, the language

size, how the words spread throughout the world, the toponym value at the interaction location for

the ‘where are we’ games, and the match between the spatial and toponym templates for the ‘where

is there’ games (for more detail about the performance measures refer to section 4.6).

The hearing distance in Study 2B was 3m, the neighbourhood size was 5m, the temperature for

toponyms and spatial words was 0.5, and for target words was 0.4. These temperatures meant that

words were not exponentially invented. Within a trial the conceptualisation order was fixed and the

two simulated robots negotiated a set of words. Two conditions were tested based on:


144

• separate: robots playing ‘where are we’ games to form a toponymic language first

followed by playing ‘where is there’ games and

• together: robots playing both ‘where are we’ and ‘where is there’ games from the start.

Each condition was run for five runs of 2000 interactions. In both cases, ‘go to’ games were

played to change the behaviour of the robots so that the interactions were completed more quickly.

For a summary of the parameters see Table 7.5.

Table 7.5 Parameters for Study 2B

Parameters Study 2B: Separate Study 2B: Together

Game ‘where are we’, ‘where is there’, and ‘go to’

‘where are we’, ‘where is there’, and ‘go to’

Hearing distance 3m 3mConcept type Location, distance, direction Location, distance, direction

Concept representation Experiences (simulation world, forward facing camera)

Experiences (simulation world, forward facing camera)

Word representation Text Text Lexicon technique Distributed lexicon table Distributed lexicon table

Strategy for choosing words



Neighbourhood size 5m 5mForgetting No No Updating Hearer and Speaker Hearer and Speaker

Strategy for word invention Temperature Temperature

Temperature Toponym = 0.5,

Spatial = 0.5, Target = 0.4

Toponym = 0.5, Spatial = 0.5, Target = 0.4

Generations 1 1Agents 2 2

Interactions per generation 1000 ‘where are we’ followed by 1000 ‘where are we’ and

‘where is there’

2000 ‘where are we’ and ‘where is there’

Initial learning period 0 0World Simulation World Simulation World

7.2.2 Results

In both conceptualisation order conditions, the simulated robots developed a shared set of

toponyms, directions, and distances. The distance and direction words covered the space of

directions and distances that could be referred to. The condition where both types of games were

played from the start resulted in larger lexicons for toponyms, directions, and distances (see Table

7.6). In both conditions, words were invented for targets when the value of existing words was low

for the chosen locations. In some cases, the targets were beyond the area able to be explored by the

simulated robots. The average area covered by the languages was 551.2m2 for separate and 418.3m2

for together. The area covered by the languages was much larger than for when the simulated robots


145

only played ‘where are we’ games, where just over 300 m2 was covered (compare to Table 6.3).

With the addition of the ‘where is there’ game, the simulated robots were able to form concepts for

areas beyond the walls of their world, resulting in languages covering larger areas. The ‘go to’

games were successful for both conditions, with the simulated robots meeting each other at the goal

location in 87.2% of the games for separate and 84% of the games for together Figure 7.17.

Table 7.6 Results for Study 2B

Measure ( x (σ)) Separate Together Number of Toponyms 44.9 (7.9) 60.3 (8.4)

Toponyms Invented as Target 25.2 (9.1) 37.0 (8.9) Area Covered per Toponym Used (m2) 16.3 (18.4) 12.3 (14.1)

Area Covered by Language (m2) 551.2 (155.4) 418.3 (114.3) Toponym Coherence 0.16 (0.08) 0.04 (0.03)

Direction Words 13.6 (2.2) 22.2 (2.3) Distance Words 13.5 (2.2) 22.2 (2.3)

Direction Coherence 0.42 (0.15) 0.10 (0.07) Distance Coherence 0.31 (0.21) 0.05 (0.09)

Figure 7.17 Results of ‘go to’ games (2B)

The x-axis shows the possible results of the ‘go to’ games. The y-axis is the

percentage of the total games with that result. For ‘separate’, the simulated robots

met each other at the goal location in 87.2% of the games, with 33.2% within 1m.

For ‘together’, the simulated robots met each other at the goal location in 84% of

the games, with 20.4% within 1m.


146

Separate

For the simulated robots forming toponyms and spatial words separately, the average number of

words used after 2000 interactions was 44.9. Of these words, an average of 25.2 were invented for

the target location. An average of 13.5 distance words and 13.6 direction words were used. The

toponymic language resulting for both agents in one of the runs is shown in Figure 7.18, with the

spatial lexicon in Figure 7.19, and the results of the agents’ interactions in Figure 7.20 and Figure

7.21.

Together

For the simulated robots forming toponyms and spatial words together, the average number of

words used after 2000 language games was higher at 60.3. Of these words, an average of 37.0 were

invented for the target location. An average of 22.2 distance and direction words were used, much

higher than for the separate conceptualisation order. The toponymic language resulting for each

agent for one of the runs is shown in Figure 7.22, with the spatial lexicon in Figure 7.23, and the

results of the agents’ interactions in Figure 7.24 and Figure 7.25.

7.2.3 Discussion

Study 2B investigated the impact of conceptualisation order for toponyms, directions, and distances

on the resulting language. The study demonstrated that directions and distances can be formed

either when there was an existing toponymic language, or when a toponymic language was still

being formed. Words and concepts were formed for places that the simulated robots had not visited.

When a stable toponymic language was formed before the ‘where is there’ games were played, the

distance and direction concepts formed covered the range of distances and direction. When a

toponymic language was formed together with the distance and direction concepts, many of the

concepts were ‘close’ and ‘straight ahead’, although a few concepts formed to cover the range of

distances and directions. When the toponymic language was formed separately, the toponyms

formed were used more precisely, with agents meeting each other within 1m at the goal location a

greater proportion of the time.

With the addition of the ‘where is there’ game to the ‘where are we’ game, the simulated robots

were able to refer to locations beyond the perimeter of their world. The larger coverage of the

language was indicated by the average area covered by all of the toponyms in the agent’s lexicon,

and can be seen in the language layout figures. Words beyond the perimeter of the world were more

general words, as they were only ever updated when they were referred to indirectly, unlike words

that were within the perimeter of the world that were updated through direct interaction.


147


a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 7.18 Shared language (2B: Separate)

The language for the simulated robots for one run, showing a) the experience map,

b) the language layout, c) the word locations, and d) the word coverage. Note that

the language was no longer restricted to within the walls of the world. The words

on the edge of the layout tended to be larger as there were fewer competing words

in their neighbourhood. See the Figure 6.12 caption for more detail about the

elements of the figure.


148


a) Distance lexicon

b) Direction lexicon

Figure 7.19 Spatial lexicon (2B: Separate)

The spatial lexicon showing a) distance and b) direction lexicon. The distance and

direction lexicons had concepts throughout the possible space. See the Figure 7.5


a)

b)

Figure 7.20 Interactions for ‘where are we’ games (2B: Separate)

a) Value of toponym at interaction location and b) word usage over the ‘where are

we’ interactions of the simulated robots. A fairly stable toponymic language was

formed in the first 1000 interactions. With the addition of ‘where is there’

interactions an increase in the word invention rate occurred. See the Figure 6.13



149

a) b)

c) d)

e) f)

Figure 7.21 Interactions for ‘where is there’ games (2B: Separate)

a) Match between templates, b) current word usage, c) orientation word usage, d)

target word usage over the ‘where is there’ interactions, e) distance word usage,

and f) direction word usage. Note that all of the toponyms may be used as

orientations or targets, but not as the current location. The toponyms that were not

ever used for the current location were located beyond the walls in the world of the

simulated robots. The words invented early were mostly within the walls of the

world, while those invented late were mostly beyond the walls of the world, and

occupy larger areas due to fewer competing words.


150


a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 7.22 Shared language (2B: Together)

The language for the simulated robots for one run, showing a) the experience map,

b) the language layout, c) the word locations, and d) the word coverage. There

were a mixture of words that were mostly within the walls of the world and words

that were beyond the walls of the world and occupy larger areas. See the Figure

6.12 caption for more detail about the elements of the figure.


151


a) Distance lexicon


Figure 7.23 Spatial lexicon (2B: Together)

The spatial lexicon showing a) distance and b) direction. Most of the distance

words referred to close locations, and most of the direction words referred to

straight ahead, due to the formation of the toponymic language together with the

distance and direction concepts. See the Figure 7.5 caption for more detail about


a) b)

Figure 7.24 Interactions for ‘where are we’ games (2B: Together)

a) Toponym value at the interaction location and b) word usage over the ‘where are

we’ interactions of the simulated robots. Toponyms continued to be invented

throughout the 2000 interactions. See the Figure 6.13 caption for more detail about



152

a) b)

c) d)

e) f)

Figure 7.25 Interactions for ‘where is there’ games (2B: Together)



and f) direction word usage. There is a gap in the toponyms between 20 and 25 for

current, orientation, and target words – the words in the gap may still be

understood by the agents, but were not used. Again, fewer words were used for the

current toponym than for the orientation and target toponyms. Unlike separate,

there was no correlation between when the words were invented and whether they

were within or beyond the walls of the world.


153

7.3 Study 2C: Real World

Study 2C involved the implementation of the ‘where are we’ and ‘where is there’ games in the real

robots (for more detail about the real world refer to section 4.5.3). The first language formed in

Study 1C: A Toponymic Language in the Real World was used as the base toponymic language.

Study 2C aimed to determine if the real world issues of noise in the perceptual data of odometry,

vision, and hearing affected the languages that formed by comparing the spatial languages formed

in the real world with those formed in the simulation world. The hypothesis was that the distance

and direction words that formed would be more general than in the simulation world, as there was

more uncertainty in the location of toponyms due to the variation in hearing distance.


The concept representations were experiences from RatSLAM with an omni-directional camera.

The word representations for the robots in the real world were DTMF tones. Words and concepts

were associated using the distributed lexicon table (for more detail refer to section 4.3.4). The

robots explored the world, building up representations, and played games when they were within

hearing distance of each other. The robots played ‘where are we’ and ‘where is there’ games in the

real world. Inventing target words was done probabilistically with a temperature of 0.5. Inventing

spatial words was done probabilistically with a temperature of 0.75. The neighbourhood size was

set to 2m. Checksum error detection was implemented as in Study 1C. The lexicon was updated

when words and concepts were used together by increasing the association by 1.0. Forgetting was

not implemented directly. The volume of the speakers was increased due to the difficulty of

transmitting the larger amount of information in the ‘where is there’ game, resulting in games not

being played when restricted to the lower volume.

The first sequence of Study 1C was extended. Twelve hours of sessions were run in which the

robots played ‘where are we’ and ‘where is there’ games to build up their lexicons for distances and

directions, and to extend their lexicons for locations. Following the sessions, the languages were

tested with 25 ‘go to’ games played. For a summary of the parameters see Table 7.7.

7.3.2 Results

The robots developed a shared set of distance and direction concepts to build on the existing

toponymic language. With these concepts, they were able to invent additional toponyms, some of

which were situated beyond the walls of their world.

The number of toponyms increased from 9 to 24, with 10 invented at the target location (see

Table 7.8). The area covered by the language increased from 83.8m2 to 145.7m2. The results of the

‘go to’ games before and after the agents played ‘where is there’ games was similar (see Figure


154

7.26). There was an increase in games in which the agents found the goal but did not meet each

other at the goal location, with a corresponding decrease in games in which the agents met each

other at the goal location. The toponymic language resulting for both agents is shown in Figure

7.28, with the spatial lexicon shown in Figure 7.29, and the results of the interactions of the agents

in Figure 7.27 and Figure 7.30.

Table 7.7 Parameters for Study 2C

Parameters Study 2C

Game ‘where are we’, ‘where is there’, and ‘go to’

Hearing distance ~5m Concept type Location, distance, direction

Concept representation Experiences (real world, omni-directional camera)

Word representation Tones Lexicon technique Distributed lexicon table


Neighbourhood size 2mForgetting No Updating Hearer and Speaker

Strategy for word invention Temperature

Temperature Toponym = 0.5, Spatial = 0.75, Target = 0.5

Generations 1Agents 2

Interactions per generation 12 hours Initial learning period 0

World Real World Error detection Checksum

Table 7.8 Results for Study 2C

Measure ( x (σ)) 1C (Run 1) 2C Number of Toponyms 9 24

Toponyms Invented as Target N/A 10 Area Covered per Toponym Used (m2) 9.3 (5.7) 6.1 (6.0)

Area Covered by Language (m2) 83.8 (3.5) 145.7 (29.6) Toponym Coherence 0.44 0.13

Direction Words N/A 7 Distance Words N/A 7

Direction Coherence N/A 0.43 Distance Coherence N/A 0.35


155

Figure 7.26 Results of the ‘go to’ games (2C)

The results of the ‘go to’ games following the ‘where are we’ games (1C)

compared to the results following the ‘where is there’ games (2C). Following the

‘where is there’ games, there was a reduction in games where the robots met each

other at the goal location (from 78% to 58%), and an increase in games where the

robots did not meet each other at the goal location (from 10% to 28%). A similar

number of games resulted in failure (10% and 12%) and where the goal was not

found (both 2%).

a) b)

Figure 7.27 Interactions for ‘where are we’ games (2C)

a) Toponym value at the interaction location and b) word usage over the ‘where are

we’ interactions of the simulated robots. See the Figure 6.13 caption for more



156

Robot 1 Robot 2

a) Experience Map

b) Language Layout

c) Word Locations

d) Word Coverage

Figure 7.28 Example language (2C)

The language for the robots, showing a) the experience map, b) the language

layout, c) the word locations, and d) the word coverage. Note that the language

layout was not restricted to within the walls of the world. See the Figure 6.12

caption for more detail about the elements of the figure


157

Robot 1 Robot 2

a) Distance lexicon


Figure 7.29 Spatial lexicon (2C)

The spatial lexicon showing a) distance and b) direction. There was a range of

distance and direction concepts, and they matched each other between the robots.

See the Figure 7.5 caption for more detail about the elements of the figure.

a) b)

c) d)

e) f)Figure 7.30 Interactions for ‘where is there’ games (2C)



and f) direction word usage.


158

7.3.3 Discussion

Study 2C: Real World showed that ‘where is there’ games can be played with robots in the real

world and that the simulation world results were reproducible in the real robots. The robots

developed a shared set of distance and direction terms to build on the existing toponymic language.

The main difficulty encountered in the real robot implementation, in addition to those encountered

in Study 1C, was that many more syllables needed to be communicated between the robots. With

the increased volume, games were played and the spatial lexicon was formed. The final toponymic

language of the robots was less coherent that the language of the robots prior to the ‘where is there’

games were played. Updating the lexicon when playing ‘where is there’ games involved more

noise, as the distance and direction concepts were formed, resulting in toponym concepts that were

less coherent. However, the final spatial language was consistent between the agents.

7.4 Discussion: A Generative Spatial Language

Study 2 addressed the challenge of how agents can refer to locations other than those already

visited. This challenge required relational terms and the use of perspectives. The key contribution of

Study 2 was the demonstration of grounding for both experienced and novel concepts using a

generative process, applied to spatial locations.

The method for generative grounding used in Study 2 (the ‘where is there’ game) enabled the

formation of the spatial relation concepts of directions and distances, which were combined to form

concepts equivalent to ‘simple’ topological, proximity, and projective prepositions (Coventry &

Garrod, 2004). The direction and distance concepts together with the ‘where is there’ game allowed

agents to form concepts for locations that neither agent had visited.

The design of the ‘where is there’ game involved a method for generative grounding and for

aligning perspective. In previous language game studies with a spatial dimension, agents utilised an

absolute frame of reference to share perspective (Bodik & Takac, 2003; Steels, 1995). In the ‘where

is there’ game, perspective alignment was gained with respect to known locations and was achieved

by naming the current location (specifying the location component of the perspective) and an

orientation location (specifying the direction component of the perspective).

Study 2A-C showed that generative grounding can be achieved with an appropriate

representation of the concept space (with an approximate x-y representation of the world), a way to

form and label intrinsic concepts (with toponyms), and a generative process that created both the

concepts and the labels.

159

Chapter 8 General Discussion

‘Y’know,’ he said, ‘it’s very hard to talk quantum using a language

originally designed to tell other monkeys where the ripe fruit is...’

(Sweeper / Lu-Tze in Pratchett, 2002, p.100)

To understand the results of this thesis, it is necessary to review the studies and how they fit in the

context of the literature. This chapter presents a summary of the studies and discusses the impact of

the results on the aims. Also discussed are the contributions made, general conclusions, and

possible further work.

8.1 Summary

This thesis described studies in which grounded spatial languages were learned by simulated agents

and mobile robots. The studies showed that mobile robots can form languages describing locations,

directions, and distances. The interactions between agents were based on a location language game,

where agents achieved shared attention by being located near each other. The location language

game framework was made up of the game played, concept representations, word representations,

the lexicon, population dynamics, the environment, and performance measures.

In the location language game framework, agents formed concept representations of their world

through exploration, which was dependent on the agents’ world. Shared attention was determined

by agents being near each other. The word representation depended on the world and the method

used for word production, being either an integer, a set of unit activations, text, or sound. In each

interaction, the speaker used their lexicon to produce the best word for the chosen topic, the hearer

attempted to comprehend the concept intended by the speaker, and the agents updated their lexicons

based on the interaction. A variety of performance measures were used by each agent to keep track

of how the interactions progressed.

The pilot studies presented in Chapter 5 investigated some of the key features of the framework

including representations of concepts and words, and methods for the lexicon including word

production, concept comprehension, and the source of variability for the languages.

Pilot Study 1 investigated two techniques for the lexicon: recurrent neural networks and lexicon

tables. For each technique, a series of studies demonstrated how the techniques could be used to

associate spatial concepts with words. The lessons learned from the studies were details about how

each technique could best be used to associate spatial concepts with words. These included the


160

weight setting mechanisms, concept representations, and word representations for recurrent neural

networks, and the effect of different strategies for associating concepts and words, producing words,

and inventing words for lexicon tables.

Pilot Study 2 investigated the use of three different concept representations for word production,

concept comprehension, and the source of variability for words. The representations investigated

were those available to robots using RatSLAM: pose cells, vision, and experiences. A series of

studies compared how the agents could form concepts, learn concept–word associations, create their

own categories and concept–word associations, and generalise to unseen data with each of the

different representations. The lessons learned from the studies were whether each type of

representation was appropriate for a spatial language, and the ability of the different representations

to form categories that grouped together similar concepts.

The pilot studies showed that representations made a major difference to the ease of learning

and structure of concepts, words, and the associations between them. For concept representations,

experiences were found to be ideal for the representations underlying location concepts. For lexicon

techniques, neural networks and standard lexicon tables were found to not be ideal for forming and

learning location concepts. Neural networks take prohibitively long to learn arbitrary associations

between concepts and words, particularly when the input concept representations are large. For

lexicon tables, generalisation typically occurs with comparison prior to the lexicon, rather than at

the lexicon with word formation. Lexicon tables also do not deal well with large input concept

representations, unless pre-processed into categories. A new technique, the distributed lexicon table,

was designed for the major studies of this thesis that incorporated the useful features of neural

networks and lexicon tables.

A distributed lexicon table allowed rapid learning from exemplars, while allowing

generalisation through the methods used to access the associations stored in the table. Concepts are

not formed explicitly, but result from the associations between concept elements and words, and

methods for producing words and comprehending concepts. Finding the most appropriate word for

each situation requires a word selection strategy. Two strategies were developed for use in the

studies: the most associated strategy, which was used in some of the grid world studies, and the

most informative strategy, which was used in the remainder of the studies. In the most associated

strategy, the word chosen was the one with the highest association with the topic. In the most

informative strategy, the word chosen was the one which provided the most information about the

topic.

Study 1: A Toponymic Language Game, presented in Chapter 6, addressed the question of how

interactions impact on the formation of location languages. Study 1A, the grid world study,


161

investigated the implementation of simple spatial agent interactions in a simple world. The agents

played ‘where are we’ language games. Two solutions were compared: ‘basic’ and ‘best’. The

solutions varied with the features of the population dynamics, word production, updating, and

hearing distance. A comparison of the solutions showed that very different languages resulted when

different parameters were used, with respect to the time taken to form a stable, coherent language,

the specificity of the language, and the types of concepts that form.

In Study 1B, the simulation world study, the ‘where are we’ and ‘go to’ language games were

implemented in simulated robots. The study investigated how word invention rate and

neighbourhood size affected the resulting language. Smaller languages with higher coherence

resulted from lower word invention rates and larger neighbourhood sizes. The results of the ‘go to’

games were similar across the different conditions, with most of the games resulting in the

simulated robots meeting at the goal location. The simulation world study demonstrated how

toponyms could be formed for all locations in the world visited by both simulated robots where the

robots built their own personal maps of the world and interacted through location language games.

Study 1C involved the implementation of the ‘where are we’ and ‘go to’ language games in the

real robots. The goal of the study was to determine whether useful toponymic languages could be

formed through the real robots playing ‘where are we’ games. The study investigated the use of two

error detection strategies: minimal and checksum. In both conditions the robots developed a shared

set of toponyms. The minimal error detection was not enough to stop the robots mishearing each

other regularly: the hearer added a new word when the speaker had used an existing word on

numerous occasions. Additional error detection resulted in more coherent languages with a higher

success rate for the ‘go to’ games (70.0% compared to 58.7%).

The toponym language game study showed how a toponymic language was formed through

simple social interactions. Agents with a shared toponymic language were able to direct each other

to goal locations by specifying the associated toponym. The distributed lexicon table, with methods

for updating the lexicon, producing words, and comprehending concepts, enabled concepts to form

together with words. The study demonstrated the co-development of concepts and words, and

showed how words and interactions influenced concept formation.

Study 2: A Generative Spatial Language Game, presented in Chapter 7, addressed the challenge

for embodied language games of how agents can refer to locations other than those they have

visited. Study 2A, the grid world study, implemented the ‘where are we’ and ‘where is there’

language games in the grid world, investigating the impact of changing the size of the world,

obstacles in the world, and the population dynamics. The studies found that toponyms were formed

in any world size, and direction and distance words were formed for the spatial relations. With


162

obstacles in the world, agents were still able to form a complete toponymic language, including

names for locations in the world that had never been visited. With multiple generations rather than a

single generation of agents, less common words were forgotten, while the meanings of the

remaining words shifted. Study 2A showed how a generative toponymic language formed in a

population of agents.

In Study 2B, the ‘where are we’, ‘where is there’, and ‘go to’ language games were

implemented in simulated robots. The study investigated the impact of conceptualisation order for

toponyms, directions, and distances on the resulting language. In both conditions, the simulated

robots developed a shared set of toponyms, directions, and distances, with words invented for

locations beyond the area able to be explored. The study demonstrated how directions and distances

were formed given an existing toponymic language, or when a toponymic language was still being

formed. However, the direction and distance terms covered the space more effectively when they

were formed following the formation of a toponymic language. With the addition of the ‘where is

there’ to the ‘where are we’ game, the simulated robots were able to refer to locations beyond the

perimeter of their world.

Study 2C involved the implementation of the ‘where are we’, ‘where is there’, and ‘go to’

language games in the real robots. The robots developed a shared set of distance and direction

concepts to build on the existing toponymic language. They invented additional toponyms, some of

which were for locations beyond the walls of their world.

The generative spatial language game study demonstrated the grounding of directly experienced

and novel location concepts using a generative process. The method for generative grounding

presented enabled the formation of spatial relation concepts in the form of directions and distances.

Using this method, agents formed concepts for locations that neither agent had visited.

8.2 Discussion

The overall goal of the thesis was to ground a computational model of spatial language in mobile

robots, to be used meaningfully in practical applications. The mobile robots formed toponymic

languages with enough coherence to specify goal locations for goal directed navigation.

Additionally, the robots formed shared concepts of direction and distance.

The specific aim to run a series of experiments that demonstrated learned and evolved language

in agents and robots was achieved. The pilot studies investigated representations and methods, the

first study investigated the formation of toponymic languages, and the second study investigated

generative grounding.

The key question identified in the introduction was: how can a robot form and label complex

concepts in an embodied spatial environment? The studies of this thesis showed that the important


163

features for answering this question are the interactions between agents, concept representations,

lexicon techniques for associating concepts and words, word production, concept comprehension,

perspective alignment, and a method for generative grounding. The studies in this thesis

demonstrated a grounded spatial language in mobile robots, formed using a cognitive map of

experiences built during exploration. Communicative interactions between mobile robots were

designed that allowed robots to play games when ‘near’ each other and enabled the robots to build a

shared toponymic language. Methods for aligning perspective and generative grounding allowed the

agents to refer to locations other than ‘here’ and to ground new concepts for these locations.

8.3 Contributions

The contributions of this thesis focus on extending ideas for symbol grounding, exploring the

influence that words and concepts have on each other, and exploring the possibilities for models of

spatial cognition. The specific contributions are outlined and discussed in this section.

1. A series of studies to demonstrate that representations and methods matter

The pilot studies presented in Chapter 5 demonstrated that for the spatial languages investigated,

representations and methods influenced the size of the languages, the learning rate of the agents, the

categorisation of concepts, and the generalisations available.

The type of representation used to form categories or concepts not only affected the types of

categories or concepts that can form but also the ease of learning. In terms of language simulation

studies, it was found that it is important to have appropriate representations for the concepts to be

formed and the features of language that are being investigated. To investigate grounding in

embodied agents, arbitrary representations (such as those used in Batali, 1998; and Smith, 2003)

can make concept formation harder than necessary. The pilot studies found that vision, used in

language game studies for concepts of colour and shape (Roy, 2001; Steels, 1999), was not an

appropriate representation for location concepts, as some distant locations have visually similar

scenes. Vision may be more useful for location type concepts, such as ‘corner’ or ‘corridor’, where

the concepts share similar visual scenes. Pose cell representations could be used, but discontinuities

and multiple representations can impede the conceptualisation process. Experiences located in a

map were ideal representations for location concepts, as distant locations in the world have distant

locations in the experience map coordinate space. The location concepts used in this thesis were

useful within a local region, specified by hearing distance, which could be reproduced with a set

distance within an experience map

Using the process described in the semiotic square (Steels, 1999) to structure the language

process, the agents perceived the real world and formed the internal representations of experiences,


164

concepts were formed from the experiences that were associated with words. The links between the

concepts and words were many-to-many, with distinct concepts never implicitly formed. Previous

studies have used a range of representations from those supporting slower learning and

generalisation (such as neural networks) to those supporting rapid learning and minimal

generalisation (such as lexicon tables).

The initial symbolic methodology considered was lexicon tables (used by Smith, 2001; and

Steels, 1999), which provide in-the-moment learning, though generalisation is typically provided by

similarity to existing exemplars and performed prior to word production. The initial connectionist

methodologies considered were simple neural networks (used by Cangelosi, 2001; Cangelosi &

Parisi, 1998; Kirby & Hurford, 2002; and Marocco et al., 2003) and recurrent neural networks (used

by Batali, 1998; Elman, 1990; and Tonkes et al., 2000) in which learning occurs over time, with

words partitioning concept space. Neural networks are more appropriate for forming categories with

boundary conditions in terms of features than they are for forming categories that are arbitrarily

structured. Neural networks could be used effectively for spatial languages that are based on feature

description, for example the location type concepts of ‘corner’ and ‘corridor’.

In this thesis, toponymic languages were based on exemplars, with arbitrary word associations.

For robots learning language, in-the-moment learning was required for the robots to start using

words appropriately after the first instance, and to generalise to similar concepts. The appropriate

features of lexicon tables were in-the-moment learning from exemplars, while the appropriate

features of neural networks were the ability to generalise to similar concepts. The distributed

lexicon table, designed for the studies in this thesis, combined standard lexicon tables and neural

networks to provide the appropriate features for a toponymic language.

2. The development of a method for concept formation with a distributed representation

In the studies presented in this thesis, concept formation for toponymic languages was supported

by a cognitive map representation. For grounding language in a cognitive map, it was necessary to

design a method for concept formation. A typical approach for concept formation using various

representations is to form categories prior to learning the language and grounding terms (Bodik &

Takac, 2003; Smith, 2001; Steels & Loetzsch, 2007). However, there is evidence that language

assists in concept formation, rather than just building on it (Levinson, 2003b).

Following the pilot studies in Chapter 5, a distributed lexicon table was developed for use in the

studies presented in Chapters 6 and 7. The distributed lexicon table with methods for updating,

producing, and comprehending words demonstrated a way in which concept and word formation

may interact.


165

The innate spatial ability of agents may influence the constraints on spatial concepts by

providing elements that can be combined in various ways (Levinson & Wilkins, 2006a). In the

studies, the innate spatial ability of the robots was the construction of an experience map and the

use of a distributed lexicon table. These abilities allowed the agents to form toponyms in different

locations depending on social interactions. The agents formed different distance and direction

words depending on the toponymic language and later interactions. The shared social experiences of

the robots, rather than shared perceptual information, influenced the concepts formed (this point is

addressed further in Section 5 of 8.3, Contributions: Grounding locations).

Concept formation occurred together with word formation, reflecting the neo-Whorfian idea that

language can have an effect on the concepts that are formed (Levinson, 2003a). The use of a

distributed lexicon meant that concepts were not formed explicitly, and many experiences

contributed to each concept. Each experience could contribute to multiple concepts, depending on

the agent interactions. Even without explicitly formed concepts, the usage of the concepts was crisp

and coherent as shown by the ‘go to’ games results. Agents often met each other within 1m in the

simulation world, even when the toponyms covered areas of up to 70m2 for the large neighbourhood

condition. The crispness of usage was not designed, but emerged from the interactions of the agents

and from the methods for creating and using concepts.

The distributed lexicon table method for concept formation and word usage differs from both

standard lexicon tables (Smith, 2001; Steels, 1999) and neural networks (Batali, 1998; Cangelosi,

2001; Kirby & Hurford, 2002; Tonkes et al., 2000), with the rapid learning from exemplars similar

to lexicon tables, and generalisation similar to neural networks. With a distributed lexicon table,

concepts are not formed explicitly, but are formed together with associations between concept

elements and words.

3. The development of a method for producing the word that provides the most

information about the chosen topic

An additional implementation issue with using a distributed lexicon table is a way to find the

most appropriate word in any given situation. This issue does not apply to methods where single

concepts are associated with single words, but does apply when concepts can be associated with

multiple words.

Existing methods for word production with lexicon tables include the score strategy (Steels,

1999) and the confidence and probability strategy (Smith, 2003). Two methods were developed for

the studies in this thesis for use with the distributed lexicon table: The first was the most associated

strategy, based on the score strategy. The word used for the concept was the one with the highest

association value. A neighbourhood could be used, with association values summed over the


166

concepts in the neighbourhood of the topic. In the most associated strategy, the words are easily

found, but one word tended to take over most of the concepts, resulting in lower language

specificity. The second method was the most informative strategy, which was developed to reduce

the chance of a small number of words taking over most of the concepts. In this strategy, the most

informative word for a concept was used, which tended to be specific rather than general. A

neighbourhood could also be used with the most informative strategy. The relative neighbourhood

most informative strategy, described in Chapter 4, was used for the simulated and real robots in

Chapters 6 and 7. Using the most informative strategy, words were more evenly spread across

concept space, but new words were always adopted, becoming the most informative word for the

concept they were first used for. When new words are always adopted, stable languages cannot

form unless the invention of new words is prevented.

While the most informative strategy was used successfully, it is probable that there is more to be

discovered about methods for word production. Word selection is a balance between specificity and

stability of the lexicon. Any word selection strategy designed must address this balance.

4. The formation and grounding of spatial concepts based on a cognitive map

representation

The grounding literature is vast, with different researchers emphasising different aspects of the

grounding problem. Hanard’s (1990) suggested solution to the symbol grounding problem is a

hybrid connectionist and symbolic system, with non-symbolic iconic and categorical

representations formed which are associated with symbols or names that feed back to the formation

of icons and categories. Steels (2007) claims that the symbol grounding problem has already been

solved for concepts that may be formed from directly perceivable inputs, where addressing

grounding requires working with embodied autonomous agents, a mechanism for generating

meanings, internal representations for grounded meanings, the ability to establish and negotiate

symbols, and coordination between members of the population.

A variety of embodied models have grounded concepts through direct perception such as vision

(Floreano et al., 2007; Steels, 1999). The final pilot study and the studies presented in Chapters 6

and 7 involved the formation and grounding of spatial concepts based on a cognitive map

representation. As discussed in earlier sections, concepts do not need to be learned prior to word

association. The cognitive map provided by RatSLAM gave an appropriate base representation for

location concepts, but the concepts were learned through language games. Unlike language games

in which concepts are formed from direct perceptions, location concepts require a representation

built over time from exploration, such as the cognitive map representation of experiences.


167

The studies presented used the experience map together with the distributed lexicon table, which

matched Harnad’s (1990) suggested solution to the symbol grounding problem with the distributed

representation of the experience map associated with symbolic words. The studies also address each

of the features that Steels (2007) claims are required for grounding.

The difference between grounding from direct perception and grounding from a cognitive map

representation is in the way that concept representations are formed, with a cognitive map formed

over time from a combination of direct perceptions. The formation of concept representations and

the grounding of words interact, with the concept representations and types of concepts to be

formed restricting the appropriate methods for grounding.

5. Grounding locations: the design of language game interactions between mobile robots

that enable the formation and grounding of location concepts

Language is not formed by an individual agent, but rather through the interactions of a

population of agents. For a language about locations, the important features of the interactions are

when and where the interactions take place, the content of the interactions, and population

dynamics.

A series of agent interactions were designed in the form of three language games used in the

studies described in Chapters 6 and 7, that were inspired by the guessing game of Steels (2001), and

the spatial language games of Steels (1995) and Bodik and Takac (2003). The language game

method (Steels, 2001) was extended to a location language game method, for use by mobile robots

where shared attention was determined by being located near each other. Obtaining shared attention

through hearing was a simple way to decide if the agents were close to each other, but prone to

noise, particularly in the real world. Combining hearing with another perceptual ability, such as

vision, could result in less noise. Embodiment, seen as one way of solving the symbol grounding

problem (Pfeifer & Scheier, 1999), was addressed partially in the simulation world, and more fully

in the real world. To obtain shared attention based on proximity, embodiment was necessary. The

form of embodiment influenced the shapes of the resulting toponyms. The embodiment of a

language for locations and spatial terms extended previous research by enabling robots to form their

own spatial terms rather than being given terms and ways of forming concepts (Dobnik, 2006;

Skubic et al., 2004), having the perceptual abilities of vision (Floreano et al., 2007; Steels, 1999),

hearing (Roy, 2001), and odometry together with the ability to form a map of the world.

The ‘where are we’ games, played in the toponym language game study, were interactions that

enabled agents to form location concepts. Following each interaction, agents updated their lexicons

by increasing associations between their current experience and the word used, enabling a shared

toponymic language to form. Geographers and architects have long recognised that interactions in


168

locations and experience of a particular area in space are ways to convert space to place (Tuan,

1975). The ‘where are we’ games used shared experience of space to construct labels for places in

the robot world, resulting in the construction of specific place from general space. Toponyms

describing specific places may become landmarks used to describe ‘where’ (Tversky, 2003).

Naming toponyms in the studies described in this thesis was performed by inventing new words that

were unrelated to current words. It would be possible to form toponyms in other ways, including

based on actions that may be performed at locations, similar to naming memorable events (e.g.

Waterloo), or describing features of the environment, similar to descriptive names (e.g. North Sea)

(Crystal, 1997).

Most of the studies described in this thesis involved a single generation of two agents exploring

the world and negotiating their shared language. They used the negotiation model (used by Batali,

1998; Cangelosi et al., 2004; Hutchins & Hazlehurst, 1995; and Smith, 2001), which is a single

generation of iterated learning (Kirby & Hurford, 2002). A single generation of two agents

interacting proved adequate for the formation of spatial languages of toponyms, directions, and

distances. Although additional generations may allow the languages to become more coherent over

time, with the time for new agents to explore the world to build concept representations and the

number of games required to learn a toponymic language, many generations take prohibitively long

to run.

When and where interactions take place influences which experiences will be associated with

the words used in the ‘where are we’ games. Therefore the timing and location of interactions

influences how toponyms form within a population of agents. Multiple generations of agents are not

required for a coherent language to form. The agents’ social interactions build on the cognitive map

representations to form the final toponymic languages.

6. Generative grounding: the design of generative language game interactions that enable

agents to ground concepts that are not directly experienced

The studies presented in Chapters 5 and 6 only used concepts of ‘here’ and ‘now’. A key

challenge for embodied language games is for the agents to refer to locations other than ‘here’,

particularly those they have never visited. This challenge requires both relational terms and the

ability to take into account the agents’ different perspectives.

The ‘where is there’ games, played in the generative language game study, allowed the agents to

construct concepts and terms for describing distances and directions, which were combined,

forming concepts equivalent to ‘simple’ topological, proximity, and projective prepositions as

classified by Coventry and Garrod (2004). The spatial language games extended from the games of

Steels (1995) and Bodik and Takac (2003) by taking away absolute shared knowledge of direction


169

and locations in the world, and adding the ability for agents to coordinate their perspectives and

build representations of the world separately. The generative terms of distances and directions were

used by the agents in grounding new toponymic concepts that were not directly experienced.

The ‘where is there’ games required the development of a method for generative grounding, and

for aligning the perspective of the agents. The perspective alignment described in this thesis was

with respect to locations in the world (instead of giving the agents an absolute sense of direction).

Perspective alignment was achieved by naming three locations: Both agents were located within

hearing distance at the first (current) location, they were facing the second (orientation) location,

hence aligning their perspectives, and they talked about a third (target) location. Given the three

locations, agents described the target location with spatial words of distance and direction. Given

perspective alignment and a spatial lexicon, generative grounding enabled labelling of places that

neither agent had visited.

These studies used a strategy for aligning perspectives with the egocentric view. The three

distinct frames of reference that can be used are ‘intrinsic’, ‘relative’, and ‘absolute’ (Levinson,

2003a). An alternative would be for the agents to have an allocentric view from the raw map with a

compass as used in the spatial language games of Steels (1995), and Bodik and Takac (2003), or a

relative view from translations of the representations, as used by Steels and Loetzsch (2007). To use

an intrinsic frame of reference requires knowledge of the intrinsic frames of reference of objects in

the world. As long as the agents have access to one method for aligning perspective, they can

achieve shared attention at a distance. The robots in this thesis only use one frame of reference, so

do not need to remember a situation in one frame of reference and recall in another frame of

reference (Levinson, 2003a) or cope with the difficulty of switching frames without pointers to the

switch (Tversky, 1996).

Generative grounding provided a way for agents to ground words for concepts that are not

currently being experienced, either through direct perception or the current state of the agent. The

‘where is there’ game provided generative grounding for locations. Generative grounding could be

extended to domains that have concept representations in which concept elements have

relationships comparable to the x-y dimensions of location space.

8.4 Conclusions and Further Work

In summary, a computational model of language for mobile robots was successfully developed,

with the robots able to form grounded spatial concepts associated with words. The grounded spatial

language was used for the practical application of directing other robots to a goal location. With the

addition of generative interactions, the agents extended languages in which known locations were


170

labelled to languages where external locations were also labelled. The result was robots with nouns

(place names) and simple prepositions (direction and distance terms that were used in combination).

The studies presented showed robots that formed and labelled complex concepts in an embodied

spatial environment when they had:

• appropriate representations of spatial experiences,

• concept formation with crisp usage,

• the ability to perform perspective alignment, and

• a method for representing novel concepts by combining simple concepts of locations

with the generative terms of distance and direction.

A cognitive map allowed agents to have a rich representation of their world built from

experiences. Grounding language in a cognitive map meant that the language was grounded in a

rich representation of the world that provided a method for determining relationships between

concepts. The robots learned the map of their environment together with spatial concepts formed

through interactions.

It remains an open question how well these studies will generalise to other aspects of spatial

language. Different spatial scales afford different actions, depending on the distances involved:

within touch (personal space), within view and able to be viewed from different perspectives

(tabletop space), within walking or travelling distance (geographic space), and beyond personal

experience (astronomical space) (Peuquet, 2002). The methods of this thesis apply to geographic

space. With tabletop space, the innate ability required is an intrinsic representation of space, and is

likely to require different methods for the construction of spatial concepts. The experience maps of

RatSLAM (Milford, 2008), based on ideas about a cognitive map in the hippocampus (O'Keefe &

Nadel, 1978), proved to be appropriate for representing geographic space. However, to go beyond

the concepts explored here requires greater knowledge of the world through visual information or

actions through motor control and intent. This work could be extended with robots that have richer

representations of their world and more interesting social interactions.

The major conclusions of this thesis are that generative grounding for spatial concepts is

possible and that representations, methods, and social interactions influence the languages that

form. The meaningful usage of language in practical applications therefore requires appropriate

representations, interactions, and methods for grounding. This thesis has shown that rather than the

directly perceivable world, it is interactions building on innate abilities that influence the final

structure of spatial languages.

171

References Arleo, A., & Gerstner, W. (2000). Modeling rodent head-direction cells and place cells for spatial

learning in bio-mimetic robotics. In J. A. Meyer, A. Berthoz, D. Floreano, H. Roitblat & S. W.

Wilson (Eds.), Proceedings of the Sixth International Conference on the Simulation of Adaptive

Behavior, from Animals to Animats (pp. 236-245). Cambridge, Massachusetts: The MIT Press.

Bartlett, M., & Kazakov, D. (2005). The origins of syntax: from navigation to language. Connection

Science, 17(3-4), 271-288.

Batali, J. (1998). Computational simulations of the emergence of grammar. In J. R. Hurford, M.

Studdert-Kennedy & C. Knight (Eds.), Approaches to the Evolution of Language: Social and

Cognitive Bases (pp. 405-426). Cambridge, UK: Cambridge University Press.

Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition

among exemplars. In E. J. Briscoe (Ed.), Linguistic Evolution Through Language Acquisition:

Formal and Computational Models (pp. 111-172). Cambridge, UK: Cambridge University Press.

Berthoz, A. (1999). Hippocampal and parietal contribution to topokinetic and topographic memory.

In N. Burgess, K. J. Jeffery & J. O'Keefe (Eds.), The hippocampal and parietal foundations of

spatial cognition (pp. 381-403). New York: Oxford University Press Inc.

Beyer, H.-G., & Schwefel, H.-P. (2002). Evolution Strategies: A comprehensive introduction.

Natural Computing, 1, 3-52.

Bickerton, D. (2003). Symbol and Structure: A comprehensive framework for language evolution.

In M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 77-93). New York: Oxford

University Press Inc.

Bodik, P., & Takac, M. (2003). Formation of a common spatial lexicon and its change in a

community of moving agents. In B. Tessem, P. Ala-Siuru, P. Doherty & B. Mayoh (Eds.), Frontiers

in Artificial Intelligence and Applications: Eighth Scandinavian Conference on Artificial

Intelligence SCAI'03 (pp. 37-46). Amsterdam: IOS Press Inc.

Brighton, H., & Kirby, S. (2001). Meaning space structure determines the stability of culturally

evolved compositional language (Technical report). Edinburgh: Language Evolution and

Computation Research Unit, Department of Theoretical and Applied Linguistics, The University of

Edinburgh.

References

172

Brown, P. (2006). A sketch of the grammar of space in Tzeltal. In S. C. Levinson & D. P. Wilkins

(Eds.), Grammars of Space: Explorations in Cognitive Diversity (pp. 230-272). Cambridge, UK:

Cambridge University Press.

Brown, R. (1958). Words and Things. Glencoe, Illinois: The Free Press.

Burgess, N., Donnett, J. G., Jeffery, K. J., & O'Keefe, J. (1999). Robotic and neuronal simulation of

the hippocampus and rat navigation. In N. Burgess, K. J. Jeffery & J. O'Keefe (Eds.), The

hippocampal and parietal foundations of spatial cognition (pp. 149-166). New York: Oxford


Cangelosi, A. (2001). Evolution of communication and language using signals, symbols, and words.

IEEE Transactions on Evolutionary Computation, 5(2), 93-101.

Cangelosi, A., & Harnad, S. (2001). The adaptive advantage of symbolic theft over sensorimotor

toil: Grounding language in perceptual categories. Evolution of Communication, 4(1), 117-142.

Cangelosi, A., & Parisi, D. (1998). The emergence of a 'language' in an evolving population of

neural networks. Connection Science, 10(2), 83-97.

Cangelosi, A., Riga, T., Giolito, B., & Marocco, D. (2004). Language emergence and grounding in

sensorimotor agents and robots. Paper presented at the First International Workshop on Emergence

and Evolution of Linguistic Communication, May 31 - June 1 2004, Kanazawa, Japan.

Cangelosi, A., Smith, A. D. M., & Smith, K. (Eds.). (2006). The Evolution of Language:

Proceedings of the 6th International Conference (EVOLANG6). Singapore: World Scientific

Publishing Co. Pte. Ltd.

Carroll, J. B. (Ed.). (1956). Language, Thought, and Reality: Selected writings of Benjamin Lee

Whorf. Cambridge, Massachusetts: The MIT Press.

Christiansen, M. H., & Kirby, S. (2003a). Language evolution: consensus and controversies. Trends

in Cognitive Science, 7(7), 300-307.

Christiansen, M. H., & Kirby, S. (2003b). Language evolution: The hardest problem in science? In

M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 1-15). Oxford: Oxford University

Press.

Coradeschi, S., & Saffiotti, A. (2000). Anchoring symbols to sensor data: preliminary report. In

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth

Conference on Innovative Applications of Artificial Intelligence (pp. 129-135). Austin, Texas:

AAAI Press / The MIT Press.

References

173

Coventry, K. R., & Garrod, S. C. (2004). Saying, seeing, and acting: The psychological semantics

of spatial prepositions. Hove, East Sussex: Psychology Press.

Crystal, D. (1997). The Cambridge encyclopedia of language (2nd ed.). Cambridge: Cambridge

University Press.

de Jong, E. D. (1998). The development of a lexicon based on behavior. In H. La Poutre & v. d. H.

Jaap (Eds.), Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence

(NAIC'98) (pp. 27-36). Amsterdam, The Netherlands: CWI.

Dessalles, J.-L. (2007). Why we talk: the evolutionary origins of language. Oxford: Oxford

University Press.

Dobnik, S. (2006). Learning spatial referential words with mobile robots. In Proceedings of the 9th

Annual CLUK Research colloquim 8-9 March 2006. The Open University, Milton Keynes, UK.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.

Elman, J. L. (1991). Distributed representations, simple recurrent networks and grammatical

structure. Machine Learning, 7, 195-224.

Floreano, D., Mitri, S., Magnenat, S., & Keller, L. (2007). Evolutionary conditions for the

emergence of communication in robots. Current Biology, 17, 514-519.

Gasser, M. (2004). The origins of arbitrariness in language. In Proceedings of the Cognitive Science

Society Conference (pp. 434-439). Hillsdale, NJ: LEA.

Groening, M. (Writer) (2000). The Computer Wore Menace Shoes [TV], The Simpsons. USA: Fox

Broadcasting Company.

Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). Microstructure of a spatial

map in the entorhinal cortex. Nature, 436, 801-806.

Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42, 335-

346.

Hurford, J. R. (2007). The origins of meaning. New York: Oxford University Press Inc.

Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon: The development of shared

symbols in interaction. In N. Gilbert & R. Conte (Eds.), Artificial Societies: The Computer

Simulation of Social Life. London: UCL Press.

References

174

Kirby, S. (2001). Spontaneous evolution of linguistic structure - an iterated learning model of the

emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2),

102-110.

Kirby, S. (2002). Natural language from artificial life. Artificial Life, 8(2), 185-215.

Kirby, S., & Hurford, J. R. (2002). The emergence of linguistic structure: an overview of the

iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the Evolution of Language

(pp. 121-148). London: Springer Verlag.

Kohonen, T. (1995). Self-organizing maps. Berlin: Springer.

Lakoff, G. (1987). Women, fire, and dangerous things: what categories reveal about the mind.

Chicago: University of Chicago Press.

Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: The University of Chicago

Press.

Landau, B. (1996). Multiple geometric representations of objects in languages and language

learners. In P. Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space (pp.

317-363). Cambridge, Massachusetts: The MIT Press.

Levinson, S. C. (1996). Language and Space. Annual Review of Anthropology, 25, 353-382.

Levinson, S. C. (2001). Space: Linguistic expression. In N. J. Smelser & P. Baltes (Eds.),

International Encyclopedia of Social and Behavioral Sciences (Vol. 22, pp. 14749-14752).

Amsterdam/Oxford: Elsevier Science.

Levinson, S. C. (2003a). Space in language and cognition: Explorations in cognitive diversity.

Cambridge, UK: Cambridge University Press.

Levinson, S. C. (2003b). Spatial language. In L. Nadel (Ed.), Encyclopedia of cognitive science

(Vol. 4, pp. 131-137). London: Nature Publishing Group.

Levinson, S. C., & Wilkins, D. P. (2006a). The background to the study of the language of space. In

S. C. Levinson & D. P. Wilkins (Eds.), Grammars of Space: Explorations in Cognitive Diversity

(pp. 1-23). Cambridge, UK: Cambridge University Press.

Levinson, S. C., & Wilkins, D. P. (2006b). Patterns in the data: towards a semantic typology of

spatial description. In S. C. Levinson & D. P. Wilkins (Eds.), Grammars of Space: Explorations in

Cognitive Diversity (pp. 512-552). Cambridge, UK: Cambridge University Press.

References

175

MacKay, D. J. C. (2003). Information Theory, Inference & Learning Algorithms. Cambridge, UK:

Cambridge University Press.

Maguire, E. A. (1999). Hippocampal and parietal involvement in human topographical memory:

evidence from functional neuroimaging. In N. Burgess, K. J. Jeffery & J. O'Keefe (Eds.), The

hippocampal and parietal foundations of spatial cognition (pp. 404-415). New York: Oxford


Majid, A., Bowerman, M., Kita, S., Haun, D. B. M., & Levinson, S. C. (2004). Can language

restructure cognition? The case for space. Trends in Cognitive Science, 8(3), 108-114.

Marocco, D., Cangelosi, A., & Nolfi, S. (2003). The role of social and cognitive factors in the

emergence of communication: experiments in evolutionary robotics. Philosophical Transactions of

the Royal Society London - A, 361, 2397-2421.

Milford, M. J. (2008). Robot Navigation from Nature: Simultaneous Localisation, Mapping, and

Path Planning Based on Hippocampal Models. Berlin: Springer-Verlag.

Milford, M. J., Schulz, R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial concepts

from RatSLAM representations. Robotics and Autonomous Systems - From Sensors to Human

Spatial Concepts, 55(5), 403-410.

Milford, M. J., & Wyeth, G. (2007). Spatial mapping and map exploitation: A bio-inspired

engineering perspective. In Spatial Information Theory (pp. 203-221). Berlin: Springer.

Milford, M. J., Wyeth, G., & Prasser, D. (2005). Efficient goal directed navigation using RatSLAM.

In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ICRA

2005, April 18-22, 2005 (pp. 1097-1102). Barcelona, Spain: IEEE Press.

Milford, M. J., Wyeth, G. F., & Prasser, D. (2004). RatSLAM: a hippocampal model for

simultaneous localization and mapping. In IEEE International Conference on Robotics and

Automation, ICRA 2004, April 26 - May 1, 2004. New Orleans, LA, USA: IEEE Press.

Moylan, D. (2003). Pioneer Robot Simulation. Unpublished Software Engineering Honours Thesis,

The University of Queensland.

Newmeyer, F. J. (2003). What can the field of linguistics tell us about the origins of language? In

M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 58-76). New York: Oxford


Nolfi, S. (2005). Emergence of communication in embodied agents: co-adapting communicative

and non-communicative behaviours. Connection Science, 17(3-4), 231-248.

References

176

Nolfi, S., & Marocco, D. (2002). Active perception: a sensorimotor account of object

categorization. In B. Hallam, D. Floreano, J. Hallam, G. Hayes & J. A. Meyer (Eds.), From Animals

to Animats 7. Proceedings of the 7th International Conference on Simulation of Adaptive Behavior.

Cambridge, Massachusetts: MIT Press.

O'Keefe, J. (1979). A review of the hippocampal place cells. Progress in Neurobiology, 13, 419-

439.

O'Keefe, J. (1996). The spatial prepositions in English, vector grammar, and the cognitive map

theory. In P. Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space (pp.

277-316). Cambridge, Massachusetts: The MIT Press.

O'Keefe, J. (2003). Vector grammar, places, and the functional role of the spatial prepositions in

English. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space (pp. 69-

85). New York: Oxford University Press Inc.

O'Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. New York: Oxford


Peuquet, D. J. (2002). Representations of Space and Time. New York: The Guilford Press.

Pfeifer, R., & Scheier, C. (1999). Understanding Intelligence. Cambridge, Massachusetts: The MIT

Press.

Prasser, D., Wyeth, G. F., & Milford, M. J. (2004). Biologically inspired visual landmark

processing for simultaneous localization and mapping. In IEEE/RSJ International Conference on

Intelligent Robots and Systems (Vol. 1, pp. 730-735). Sendai, Japan: IEEE Press.

Pratchett, T. (1998). The Last Continent. London: Transworld Publishers Ltd.

Pratchett, T. (2002). Night Watch. London: Transworld Publishers Ltd.

Quinn, M. (2001). Evolving communication without dedicated communication channels. In J.

Kelemen & P. Sosik (Eds.), ECAL01 (pp. 357-366). Prague: Springer.

Regier, T. (1996). The Human Semantic Potential: Spatial Language and Constrained

Connectionism. Cambridge, Massachusetts: The MIT Press.

Riga, T., Cangelosi, A., & Greco, A. (2004). Symbol grounding transfer with hybrid self-

organizing/supervised neural networks. In IJCNN04 International Joint Conference on Neural

Networks, July 25-29 2004 (Vol. 4, pp. 2865-2869). Budapest, Hungary: IEEE Press.

References

177

Roy, D. (2001). Learning visually grounded words and syntax of natural spoken language.

Evolution of Communication, 4(1), 33-56.

Roy, D. (2005). Semiotic Schemas: A framework for grounding language in action and perception.

Artificial Intelligence, 167(1-2), 170-205.

Roy, D., Hsiao, K.-Y., & Mavridis, N. (2003). Conversational robots: building blocks for grounding

word meaning. Proceedings of the HLT-NAACL03 Workshop on Learning Word Meaning from

Non-Linguistic Data.

Rumelhart, D. E., Widrow, B., & Lehr, M. A. (1994). The basic ideas in neural networks.

Communications of the ACM, 37(3), 87-92.

Schulz, R., Milford, M. J., Prasser, D., Wyeth, G., & Wiles, J. (2006). Learning spatial concepts

from RatSLAM representations. Paper presented at the "From Sensors to Human Spatial Concepts"

Workshop at the International Conference on Intelligent Robots and Systems, 10 October 2006,

Beijing, China.

Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006a). Generalization in languages


Goldstone & A. Vespignani (Eds.), ALIFE X: Proceedings of the Tenth International Conference

on the Simulation and Synthesis of Living Systems (pp. 486-492). Cambridge, Massachusetts: The

MIT Press.

Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006b). Towards a spatial language for

mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of Language:

Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298). Singapore: World

Scientific Press.

Searle, J. R. (1980). Mind, brains, and programs. Behavioral and Brain Sciences, 3(3), 417-457.

Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., et al. (2004).

Spatial language for human-robot dialogs. IEEE Transactions on Systems, Man, and Cybernetics

Part C: Applications and Reviews, 34(2), 154-167.

Smith, A. D. M. (2001). Establishing communication systems without explicit meaning

transmission. In J. Kelemen & P. Sosik (Eds.), ECAL01 (pp. 381-390). Prague: Springer.

Smith, A. D. M. (2003). Semantic generalisation and the inference of meaning. In T. Banzhaf, P.

Christaller, J. Dittrich, T. Kim & J. Ziegler (Eds.), Advances in Artificial Life - Proceedings of the

References

178

7th European Conference on Artificial Life (ECAL), Lecture Notes in Artificial Intelligence (Vol.

2801, pp. 499-506). Berlin, Heidelberg: Springer Verlag.

Smith, A. D. M., Smith, K., & Ferrer i Cancho, R. (Eds.). (2008). The Evolution of Language:

Proceedings of the 7th International Conference (EVOLANG7). Singapore: World Scientific

Publishing Co. Pte. Ltd.

Spinney, L. (2005, February 24). How time flies. The Guardian.

Steels, L. (1995). A self-organizing spatial vocabulary. Artificial Life, 2(3), 319-332.

Steels, L. (1997a). The origins of syntax in visually grounded robotic agents. In M. Pollack (Ed.),

Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97)

(Vol. 2, pp. 1632-1641). San Francisco, California: Morgan Kauffman Publishers.

Steels, L. (1997b). The synthetic modeling of language origins. In H. Gouzoules (Ed.), Evolution of

Communication (Vol. 1, pp. 1-34). Amsterdam: John Benjamins Publishing Company.

Steels, L. (1999). The Talking Heads Experiment (Vol. I. Words and Meanings). Brussels: Best of

Publishing.

Steels, L. (2001). Language games for autonomous robots. IEEE Intelligent systems, 16(5), 16-22.

Steels, L. (2005). The emergence and evolution of linguistic structure: from lexical to grammatical

communication systems. Connection Science, 17(3-4), 213-230.

Steels, L. (2007). The symbol grounding problem has been solved. So what's next? In M. De Vega,

A. Glenberg & A. Graesser (Eds.), Symbols, Embodiment and Meaning. New Haven: Academic

Press.

Steels, L., & Kaplan, F. (2001). AIBO's first words. The social learning of language and meaning.

Evolution of Communication, 4(1), 3-32.

Steels, L., & Loetzsch, M. (2007). Perspective alignment in spatial language. In K. R. Coventry, T.

Tenbrink & J. A. Bateman (Eds.), Spatial Language and Dialogue. Oxford, UK: Oxford University

Press.

Strunk, W., & White, E. B. (2000). The Elements of Style (4th ed.). Needham Heights,

Massachusetts: A Pearson Education Company.

Sun, R. (2000). Symbol grounding: a new look at an old idea. Philosophical Psychology, 13(2),

149-172.

References

179

Thrun, S. (2002). Robotic mapping: a survey. In B. Nebel (Ed.), Exploring Artificial Intelligence in

the New Millenium: Morgan Kaufmann.

Tonkes, B. (2001). On the origins of linguistic structure: computational models of the evolution of

language. Unpublished PhD dissertation, School of Information Technology and Electrical

Engineering, The University of Queensland, Brisbane.

Tonkes, B., Blair, A., & Wiles, J. (2000). Evolving learnable languages. In S. A. Solla, T. K. Leen

& K.-R. Muller (Eds.), Advances in Neural Information Processing Systems 12 (pp. 66-72).

Cambridge, Massachusetts: The MIT Press.

Tuan, Y.-F. (1975). Place: An experiential perspective. Geographical Review, 65(2), 151-165.

Tuan, Y.-F. (1977). Space and place: the perspective of experience. Minneapolis, MN: University

of Minnesota Press.

Tversky, B. (1996). Spatial perspective in descriptions. In P. Bloom, M. A. Peterson, L. Nadel & M.

F. Garrett (Eds.), Language and Space (pp. 463-491). Cambridge, Massachusetts: The MIT Press.

Tversky, B. (2003). Places: Points, planes, paths, and portions. In E. van der Zee & J. Slack (Eds.),

Representing direction in language and space (pp. 132-143). New York: Oxford University Press,

Inc.

Varela, F. J., Thompson, E., & Rosch, E. (1991). The Embodied Mind. Cambridge, Massachusetts:

The MIT Press.

Vogt, P. (2000a). Bootstrapping grounded symbols by minimal autonomous robots. Evolution of

Communication, 4(1), 87-116.

Vogt, P. (2000b). Grounding language about actions: Mobile robots playing follow me games. In J.

A. Meyer, A. Berthoz, D. Floreano, H. Roitblat & S. W. Wilson (Eds.), From Animals to Animats

6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior (SAB00).

Cambridge, Massachusetts: The MIT Press.

Vogt, P. (2003). Anchoring of semiotic symbols. Robotics and Autonomous Systems, 43(2), 109-

120.

Vogt, P. (2007). Language evolution and robotics: Issues in symbol grounding and language

acquisition. In A. Loula, R. Gudwin & J. Queiroz (Eds.), Artificial Cognition Systems (pp. 176-

209). Hershey, Pennsylvania: Idea Group Publishing.

Wagner, K., Reggia, J. A., Uriagereka, J., & Wilkinson, G. S. (2003). Progress in the Simulation of

Emergent Communication and Language. Adaptive Behavior, 11(1), 37-69.

References

180

Ziemke, T. (1999). Rethinking Grounding. In A. Riegler, M. Peschl & A. von Stein (Eds.),

Understanding Representation in the Cognitive Sciences - Does Representation Need Reality? (pp.

177-190). New York: Plenum Press.

Spatial Language for Mobile Robots: The Formation and ...staff.itee.uq.edu.au/janetw/papers/PhD 2008...

Documents

Transcript of Spatial Language for Mobile Robots: The Formation and ...staff.itee.uq.edu.au/janetw/papers/PhD 2008...