Spatial Language for Mobile Robots: The Formation and ...staff.itee.uq.edu.au/janetw/papers/PhD 2008...
Transcript of Spatial Language for Mobile Robots: The Formation and ...staff.itee.uq.edu.au/janetw/papers/PhD 2008...
Spatial Language for Mobile Robots:
The Formation and Generative Grounding of Toponyms
Ruth Jennifer Schulz
B.E. (Hons), B.Sc.
A thesis submitted for the degree of Doctor of Philosophy at
The University of Queensland in November 2008
School of Information Technology and Electrical Engineering
i
Declaration This thesis is composed of my original work, and contains no material previously published or
written by another person except where due reference has been made in the text. I have clearly
stated the contribution by others to jointly-authored works that I have included in my thesis.
I have clearly stated the contribution of others to my thesis as a whole, including statistical
assistance, survey design, data analysis, significant technical procedures, professional editorial
advice, and any other original research work used or reported in my thesis. The content of my thesis
is the result of work I have carried out since the commencement of my research higher degree
candidature and does not include a substantial part of work that has been submitted to qualify for
the award of any other degree or diploma in any university or other tertiary institution. I have
clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.
I acknowledge that an electronic copy of my thesis must be lodged with the University Library
and, subject to the General Award Rules of The University of Queensland, immediately made
available for research and study in accordance with the Copyright Act 1968.
I acknowledge that copyright of all material contained in my thesis resides with the copyright
holder(s) of that material.
ii
Statement of Contributions to Jointly Authored Works Contained in the Thesis and Published
Works by the Author Incorporated into the Thesis
1. Schulz, R., Prasser, D., Stockwell, P., Wyeth, G., & Wiles, J. (2008). The formation,
generative power, and evolution of toponyms: Grounding a spatial vocabulary in a cognitive
map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language:
Proceedings of the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore:
World Scientific Press.
RS was responsible for designing and conducting the three studies and for
the majority of the writing; DP and PS contributed to design discussions;
GW and JW lead the RatSLAM and RatChat projects on which this work is
based; all authors contributed to editing.
Incorporated with more detail as Study 1 in Chapter 6 and Study 2 in
Chapter 7
2. Milford, M., Schulz, R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial
concepts from RatSLAM representations. Robotics and Autonomous Systems - From
Sensors to Human Spatial Concepts, 55(5), 403-410.
MM and DP were responsible for the pose cell and experience map work;
RS was responsible for the conceptualisation work; GW and JW lead the
RatSLAM and RatChat projects on which this work is based; MM was
responsible for updating the workshop paper (jointly authored word 3) to
this paper with assistance from all authors.
Incorporated as Pilot Study 2B in Chapter 5
3. Schulz, R., Prasser, D., Wakabayashi, M., & Wiles, J. (2007). Robots and the evolution of
spatial language. Unrefereed poster presentation at the 8th Asia-Pacific Complex Systems
Conference (Complex07).
RS was responsible for designing and conducting the studies and for the
majority of the writing; DP and MW contributed to design discussions; MW
provided software development for one of the studies; JW leads the RatChat
project on which this work is based; all authors contributed to editing.
Incorporated as part of Study 1B in Chapter 6
iii
4. Schulz, R., Milford, M., Prasser, D., Wyeth, G., & Wiles, J. (2006). Learning spatial
concepts from RatSLAM representations. Paper presented at From Sensors to Human
Spatial Concepts, a workshop at the International Conference on Intelligent Robots and
Systems, Beijing, China.
MM and DP were responsible for the pose cell and experience map work;
RS was responsible for the conceptualisation work; GW and JW lead the
RatSLAM and RatChat projects on which this work is based; all authors
contributed to the writing.
Incorporated as Pilot Study 2B in Chapter 5
5. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Generalization in languages
evolved for mobile robots. In L. M. Rocha, L. S. Yaeger, M. A. Bedau, D. Floreano, R. L.
Goldstone & A. Vespignani (Eds.), ALIFE X: Proceedings of the Tenth International
Conference on the Simulation and Synthesis of Living Systems (pp. 486-492). Cambridge,
Massachusetts: MIT Press.
RS was responsible for designing and conducting the studies and for the
majority of the writing; PS and MW contributed to design discussions; JW
leads the RatChat project on which this work is based; all authors
contributed to editing.
Incorporated as part of Pilot Study 2A in Chapter 5
6. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Towards a spatial language
for mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of
Language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298).
Singapore: World Scientific Press.
RS was responsible for designing and conducting the studies and for the
majority of the writing; PS and MW contributed to design discussions; JW
leads the RatChat project on which this work is based; all authors
contributed to editing.
Incorporated as part of Pilot Study 2A in Chapter 5
iv
Statement of Contributions by Others to the Thesis as a Whole
Janet Wiles and Gordon Wyeth put together the initial proposal for funding that resulted in the
conception of the RatChat project as a whole, of which this thesis was a part.
The RatSLAM team provided the robot base, including the Pioneer robots, the pose cell and
experience mapping algorithms, and the simulation world of the robots. The team provided
assistance throughout this project regarding the functions of RatSLAM and keeping the robots
functional. During the course of this thesis the RatSLAM team included: Gordon Wyeth, Michael
Milford, David Prasser, and Shervin Emami.
The RatChat team provided a forum for discussing the progress of the project. During the course
of this thesis the RatChat team included: Janet Wiles, David Prasser, Paul Stockwell, Mark
Wakabayashi, Steven Livingston, Jacinta Fitzgerald, and Andrew Schrauf.
Statement of Parts of the Thesis Submitted to Qualify for the Award of Another Degree
None
Additional Published Works by the Author Relevant to the Thesis but not Forming Part of it
None
Keywords
language, representation, game, robot, agent, spatial, concepts, words, grounding, generative
Australian and New Zealand Standard Research Classifications (ANZSRC)
080101 Adaptive Agents and Intelligent Robotics 100%
v
Abstract For robots to interact with each other and humans in a human environment, it is important for them
to be able to use language meaningfully in practical applications. Grounding connects words and
sentences with their meanings and is a necessary foundation for the meaningful usage of language.
Combining simple concepts provides a way to label other simple concepts. The process of forming
a simple concept from a combination of concepts is termed generative grounding in this thesis.
To understand how language may be used meaningfully in practical applications, the nature of
language and the concepts on which language is built must be understood. Concepts of space and
time are among those that are directly experienced and directly grounded. In particular, space is
used to form other concepts with spatial metaphors used to describe mood, energy, emotion,
personal attributes, and time.
This thesis addresses the question of how robots can form spatial languages. The literature
review in Chapters 2 and 3 covers the diverse fields that this thesis builds on, from linguistics,
computer science, psychology, neurology, and robotics. The literature review explores the language
features that need to be addressed if language evolution and acquisition are to be understood, and
the features of space and spatial language that make the domain appropriate for investigating the
grounding and generative grounding of concepts.
The studies presented investigated grounding spatial language in mobile robots that had
constructed maps to represent their world. The pilot studies investigated the underlying spatial
representations and methods to produce and comprehend language. The experience map, a
representation similar to a cognitive map of the world, provided the most appropriate representation
for forming a spatial language.
A key question is what impact interactions between agents have on the languages. For the two
major studies described in this thesis, agents interacted through language games and the experience
map provided the base representation for concepts formation. A new way to associate concept
elements and words was developed: the distributed lexicon table. Each study had three sections: an
investigation into the experimental design of the language games in a simulation world based on a
grid, the implementation of the language games in simulated robots, and the implementation of
language games in the richer and less predictable real world.
The first study investigated spatial concept formation through collective experience and agent
interactions as they explored their environment and played ‘where are we’ games. Games were
played when the agents were within hearing distance, with the shared attention defined by being
vi
near each other. After playing many language games, the agents successfully formed a toponymic
language that labelled all visited locations.
The second study addressed the question of whether robots can form concepts for spatial
relationships. Agents played ‘where is there’ language games in addition to ‘where are we’
language games. In ‘where is there’ games, they referred to locations other than the current location.
Shared attention was being near each other and the same perspective was achieved when an
orientation location was named. A third target location was specified by name, direction, and
distance. Agents formed a comprehensive spatial language of directions and distances that were
combined to specify other locations in their world.
In summary, a computational language model for mobile robots was successfully developed, in
which the robots formed spatial concepts that were associated with words through interactions with
other agents. The features that made spatial concept formation easy included an appropriate concept
representation, the distributed lexicon table with methods to produce and comprehend words, and
simple interactions from which the language emerged. With the addition of generative interactions,
the agents extended languages in which known locations were labelled to languages in which
external locations were labelled. The result was robots that formed nouns (place names) and simple
prepositions (direction and distance terms).
The major conclusions of this thesis are that generative grounding for spatial concepts is
possible and that representations, methods, and social interactions influence the languages that
form. The meaningful usage of language in practical applications therefore requires appropriate
representations, interactions, and methods for grounding. This thesis has shown that rather than the
directly perceivable world, it is interactions building on innate abilities that influence the final
structure of spatial languages.
vii
List of Publications Publications related to this project:
1. Schulz, R., Prasser, D., Stockwell, P., Wyeth, G., & Wiles, J. (2008). The formation,
generative power, and evolution of toponyms: Grounding a spatial vocabulary in a cognitive
map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language:
Proceedings of the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore:
World Scientific Press.
2. Milford, M., Schulz, R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial
concepts from RatSLAM representations. Robotics and Autonomous Systems - From
Sensors to Human Spatial Concepts, 55(5), 403-410.
3. Schulz, R., Milford, M., Prasser, D., Wyeth, G., & Wiles, J. (2006). Learning spatial
concepts from RatSLAM representations. Paper presented at From sensors to human spatial
concepts, a workshop at the International Conference on Intelligent Robots and Systems,
Beijing, China.
4. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Generalization in languages
evolved for mobile robots. In L. M. Rocha, L. S. Yaeger, M. A. Bedau, D. Floreano, R. L.
Goldstone & A. Vespignani (Eds.), ALIFE X: Proceedings of the Tenth International
Conference on the Simulation and Synthesis of Living Systems (pp. 486-492): MIT Press.
5. Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Towards a spatial language
for mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of
Language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298).
Singapore: World Scientific Press.
Other publications:
6. Wiles, J., Schulz, R., Hallinan, J., Bolland, S., & Tonkes, B. (2001). Probing the persistent
question marks. In L. Spector, E. Goodman, A. Wu, W. B. Langdon, H.-M. Voigt, M. Gen,
S. Sen, M. Dorigo, S. Pezesk, M. Garzon & E. Burke (Eds.), Proceedings of the Genetic and
Evolutionary Computation Conference (GECCO-2001) (pp. 710-717). San Francisco, CA:
Morgan Kaufmann Publishers.
viii
7. Wiles, J., Schulz, R., Bolland, S., Tonkes, B., & Hallinan, J. (2001). Selection procedures
for module discovery: Exploring evolutionary algorithms for cognitive science. In J. D.
Moore & K. Stenning (Eds.), Proceedings of the 23rd Annual Conference of the Cognitive
Science Society (CogSci 2001) (pp. 1124-1129). Mahwah, NJ: Lawrence Erlbaum
Associates.
ix
Acknowledgements Firstly thanks to my supervisor, Janet Wiles, for her encouragement, ideas, knowledge, and support.
We have had numerous meetings over the past three and a half years where I left feeling somewhat
overwhelmed with information and ideas, but always more on track.
Thanks to my associate supervisor, Gordon Wyeth, for many suggestions that have helped to
improve both the studies and the final document, and for helping me to find the story in my thesis.
Thanks to the RatChat team, including Janet Wiles, David Prasser, Paul Stockwell, Mark
Wakabayashi, Steven Livingston, Jacinta Fitzgerald, and Andrew Schrauf, for providing a forum to
discuss the progress of the project; to the RatSLAM team, including Gordon Wyeth, Michael
Milford, David Prasser, and Shervin Emami, for providing the robot base for my experiments and
keeping the robots working; and to the Thinking Systems group for fascinating diversions from my
work and an environment for interesting discussions that were (mostly) relevant to my thesis.
Thanks to the people who were on level 5 of the Axon building, including the RatSLAM and
RatChat teams, Toby Smith, Chris Nolan, Peter Stratton, Daniel Angus, Damien Kee, David Ball,
Daniel Bradley, John Hawkins, James Watson, Nic Geard, Kai Willadsen, Stefan Maetschke, Jon
Witty, Mikael Boden, and Marcus Gallagher, for an enjoyable working environment, entertaining
lunch time discussions, and putting up with the beeping robots for the duration of my experiments.
Thanks to everyone who read drafts of this thesis and earlier papers. Your comments were
valuable, insightful, and have helped to improve this document in many ways.
Thanks to the ITIG for handling all of my technical requests quickly and competently.
Thanks to the School of Information Technology and Electrical Engineering at The University
of Queensland for the provision of computing facilities, office space, and an area in which the
robots could roam; to the Australian Government for support in the form of an Australian
Postgraduate Award; to the Australian Research Council Complex Open Systems Research
Network (COSNet) for funding which allowed me to travel internationally to a conference; and to
the Australian Research Council for the Discovery Grant which funded a top-up in my first year and
allowed me to travel internationally to a conference.
Thanks to everyone in Bai Rui Taekwon-Do, particularly Master Charles Birch, for providing an
environment where I could get away from my thesis for a while and come back feeling positive and
refreshed.
Finally, thanks to my family for always being there for me (even when they were all gallivanting
around overseas). And most importantly, thanks to David Ball for putting up with me, encouraging
me, supporting me, and helping me to maintain a positive outlook on life.
x
xi
Table of Contents Declaration ......................................................................................................................................i
Abstract ..........................................................................................................................................v
List of Publications.......................................................................................................................vii
Acknowledgements .......................................................................................................................ix
Table of Contents ..........................................................................................................................xi
List of Figures .............................................................................................................................xiv
List of Tables..............................................................................................................................xvii
Chapter 1 Grounding Spatial Concepts ....................................................................................1
1.1 Learning and Evolving Language ....................................................................................2
1.2 Space in Language ...........................................................................................................3
1.3 Understanding Space and Language ................................................................................4
1.4 Thesis Overview ..............................................................................................................5
Chapter 2 Understanding Language .........................................................................................7
2.1 The Importance of Grounding..........................................................................................7
2.2 Embodiment for Language Models..................................................................................9
2.3 Learning Language ........................................................................................................10
2.4 How Could Language Have Evolved?...........................................................................12
2.5 Translating Meanings and Signals .................................................................................13
2.6 Summary ........................................................................................................................15
Chapter 3 The Ubiquity of Space ...........................................................................................17
3.1 Talking About Where.....................................................................................................17
3.2 How Space Becomes Place ............................................................................................18
3.3 Describing Relationships Between Places .....................................................................19
3.4 Choosing Which Perspective To Use.............................................................................20
3.5 Universals Across Different Languages ........................................................................21
3.6 Spatial Language Models...............................................................................................21
3.7 Representing Space ........................................................................................................22
3.7.1 Cognitive Maps In Rats..........................................................................................23
3.7.2 Maps for Mobile Robots ........................................................................................23
3.7.3 RatSLAM...............................................................................................................24
3.8 RatChat...........................................................................................................................30
3.9 Summary ........................................................................................................................30
xii
Chapter 4 A Location Language Game ..................................................................................31
4.1 Concept Representations................................................................................................34
4.2 Word Representations ....................................................................................................35
4.3 Lexicon...........................................................................................................................35
4.3.1 Simple Neural Networks........................................................................................36
4.3.2 Recurrent Neural Networks ...................................................................................38
4.3.3 Lexicon Table ........................................................................................................40
4.3.4 Distributed Lexicon Table .....................................................................................42
4.4 Population Dynamics .....................................................................................................47
4.5 Environment...................................................................................................................48
4.5.1 Grid World .............................................................................................................48
4.5.2 Simulation World...................................................................................................49
4.5.3 Real World .............................................................................................................49
4.6 Performance Measures ...................................................................................................51
4.6.1 Coherence...............................................................................................................52
4.6.2 Specificity ..............................................................................................................52
4.6.3 Language Size ........................................................................................................53
4.6.4 Word Coverage ......................................................................................................53
4.6.5 Language Layout....................................................................................................53
4.6.6 Word locations .......................................................................................................53
4.6.7 Most Information Templates..................................................................................53
4.6.8 Toponym Value......................................................................................................54
4.7 Summary ........................................................................................................................54
Chapter 5 Experimental Design..............................................................................................55
5.1 Pilot Study 1: Methods – Recurrent Neural Networks and Lexicon Tables ..................55
5.1.1 Pilot Study 1A: Recurrent Neural Networks..........................................................56
5.1.2 Pilot Study 1B: Lexicon Tables .............................................................................64
5.1.3 Discussion for Pilot Study 1...................................................................................69
5.2 Pilot Study 2: Representations – Pose Cells, Vision, and Experiences .........................70
5.2.1 Pilot Study 2A – Pose Cells and Vision.................................................................70
5.2.2 Pilot Study 2B: Pose Cells and Experiences ..........................................................82
5.2.3 Discussion for Pilot Study 2...................................................................................89
5.3 Discussion: Representations Matter...............................................................................89
Chapter 6 A Toponymic Language Game..............................................................................91
xiii
6.1 Study 1A: Grid World....................................................................................................93
6.1.1 Experimental Setup ................................................................................................93
6.1.2 Results....................................................................................................................94
6.1.3 Discussion ............................................................................................................100
6.2 Study 1B: Simulation World........................................................................................101
6.2.1 Experimental Setup ..............................................................................................102
6.2.2 Results..................................................................................................................103
6.2.3 Discussion ............................................................................................................115
6.3 Study 1C: Real World ..................................................................................................116
6.3.1 Experimental Setup ..............................................................................................117
6.3.2 Results..................................................................................................................118
6.3.3 Discussion ............................................................................................................123
6.4 Discussion: A Toponymic Language...........................................................................124
Chapter 7 A Generative Spatial Language Game.................................................................127
7.1 Study 2A: Grid World..................................................................................................129
7.1.1 Study 2Ai: World Size .........................................................................................130
7.1.2 Study 2Aii: Obstacles...........................................................................................135
7.1.3 Study 2Aiii: Generations of Agents .....................................................................138
7.1.4 Discussion ............................................................................................................142
7.2 Study 2B: Simulation World........................................................................................142
7.2.1 Experimental Setup ..............................................................................................143
7.2.2 Results..................................................................................................................144
7.2.3 Discussion ............................................................................................................146
7.3 Study 2C: Real World ..................................................................................................153
7.3.1 Experimental Setup ..............................................................................................153
7.3.2 Results..................................................................................................................153
7.3.3 Discussion ............................................................................................................158
7.4 Discussion: A Generative Spatial Language................................................................158
Chapter 8 General Discussion ..............................................................................................159
8.1 Summary ......................................................................................................................159
8.2 Discussion ....................................................................................................................162
8.3 Contributions................................................................................................................163
8.4 Conclusions and Further Work ....................................................................................169
References ..................................................................................................................................171
xiv
List of Figures Figure 2.1 Semiotic square...........................................................................................................11
Figure 2.2 Language transmission................................................................................................13
Figure 3.1 Preposition classification ............................................................................................19
Figure 3.2 Robot used in the RatSLAM and RatChat projects ....................................................25
Figure 3.3 Map of the real and simulated world ..........................................................................25
Figure 3.4 Visual input.................................................................................................................26
Figure 3.5 Local view cells ..........................................................................................................27
Figure 3.6 Pose cells.....................................................................................................................28
Figure 3.7 Experiences .................................................................................................................29
Figure 4.1 A location language game...........................................................................................33
Figure 4.2 Concepts types ............................................................................................................35
Figure 4.3 Simple neural network ................................................................................................36
Figure 4.4 Recurrent neural network............................................................................................38
Figure 4.5 Distributed lexicon table .............................................................................................42
Figure 4.6 Information value........................................................................................................43
Figure 4.7 Neighbourhood information value..............................................................................44
Figure 4.8 Relative neighbourhood information value ................................................................45
Figure 4.9 Grid world...................................................................................................................48
Figure 4.10 Simulation world with path of robot.........................................................................49
Figure 4.11 Map of the real world................................................................................................50
Figure 4.12 Language game utterances........................................................................................51
Figure 5.1 Word production and concept comprehension networks (Pilot 1Ai)..........................58
Figure 5.2 Word representation (Pilot 1Ai)..................................................................................59
Figure 5.3 Concept representations (Pilot 1Aii)...........................................................................61
Figure 5.4 Word representation (Pilot 1Aii) ................................................................................61
Figure 5.5 Training networks on evolved languages (Pilot 1Aii) ................................................63
Figure 5.6 Typical runs (Pilot 1Bi) ..............................................................................................66
Figure 5.7 Word creation and absorption results (Pilot 1Bii) ......................................................68
Figure 5.8 Robot route and target concepts (Pilot 2Ai) ...............................................................72
Figure 5.9 Production network (Pilot 2Ai)...................................................................................72
Figure 5.10 Language layout (Pilot 2Ai)......................................................................................73
Figure 5.11 Production and comprehension networks (Pilot 2Aii)..............................................75
xv
Figure 5.12 Vision prototype and scenes (Pilot 2Aii) ..................................................................76
Figure 5.13 Scenes and their location (Pilot 2Aiii) ......................................................................78
Figure 5.14 Pose cell map and location of pose cell patterns (Pilot 2Aiii) ..................................79
Figure 5.15 Word production network (Pilot 2Bi) .......................................................................84
Figure 5.16 Floor plan, pose cells, and experience map (Pilot 2Bi) ............................................85
Figure 5.17 Conceptualisation using pose cells (Pilot 2Bi) .........................................................87
Figure 5.18 Conceptualisation using experiences (Pilot 2Bi) ......................................................88
Figure 6.1 Hearing area ................................................................................................................94
Figure 6.2 Coherence (1A: Basic) ................................................................................................95
Figure 6.3 Words used (1A: Basic) ..............................................................................................96
Figure 6.4 Specificity (1A: Basic)................................................................................................96
Figure 6.5 Shared language (1A: Basic) ......................................................................................97
Figure 6.6 Coherence (1A: Best)..................................................................................................98
Figure 6.7 Words used (1A: Best)................................................................................................99
Figure 6.8 Specificity (1A: Best) .................................................................................................99
Figure 6.9 Shared language (1A: Best) ......................................................................................100
Figure 6.10 Simulation World....................................................................................................101
Figure 6.11 Results of ‘go to’ games (1B) .................................................................................105
Figure 6.12 Shared language (1B: low temperature) .................................................................106
Figure 6.13 Interactions (1B: low temperature) .........................................................................107
Figure 6.14 Shared language (1B: medium temperature) ..........................................................108
Figure 6.15 Interactions (1B: medium temperature) ..................................................................109
Figure 6.16 Shared language (1B: high temperature) ................................................................110
Figure 6.17 Interactions (1B: high temperature) ........................................................................111
Figure 6.18 Shared language (1B: small neighbourhood)..........................................................112
Figure 6.19 Interactions (1B: small neighbourhood) .................................................................113
Figure 6.20 Shared language (1B: large neighbourhood) ..........................................................114
Figure 6.21 Interactions (1B: large neighbourhood) ..................................................................115
Figure 6.22 Real world...............................................................................................................116
Figure 6.23 Results of ‘go to’ games (1C) .................................................................................118
Figure 6.24 Interactions (1C: minimal)......................................................................................119
Figure 6.25 Shared language (1C: minimal) ..............................................................................120
Figure 6.26 Interactions (1C: checksum) ...................................................................................121
Figure 6.27 Shared language (1C: checksum)............................................................................122
xvi
Figure 6.28 Real world with human labels.................................................................................123
Figure 7.1 A generative language game.....................................................................................127
Figure 7.2 Match between templates..........................................................................................128
Figure 7.3 Worlds size results (2Ai) ..........................................................................................131
Figure 7.4 World size coherence (2Ai) ......................................................................................132
Figure 7.5 Example language (2Ai: 5 × 5) ................................................................................133
Figure 7.6 Example language (2Ai: 10 × 10) ............................................................................133
Figure 7.7 Example language (2Ai: 15 × 15) ............................................................................134
Figure 7.8 Example language (2Ai: 20 × 20) ............................................................................134
Figure 7.9 Grid world with obstacles .........................................................................................135
Figure 7.10 Obstacles results (2Aii)...........................................................................................136
Figure 7.11 Obstacles coherence (2Aii) .....................................................................................137
Figure 7.12 Example language (2Aii: Desks) ............................................................................137
Figure 7.13 Example language (2Aii: Perimeter) ......................................................................138
Figure 7.14 Toponym change throughout generations (2Aiii)...................................................139
Figure 7.15 Generations results (2Aiii) ......................................................................................140
Figure 7.16 Generations coherence (2Aiii) ................................................................................141
Figure 7.17 Results of ‘go to’ games (2B) .................................................................................145
Figure 7.18 Shared language (2B: Separate)..............................................................................147
Figure 7.19 Spatial lexicon (2B: Separate) ................................................................................148
Figure 7.20 Interactions for ‘where are we’ games (2B: Separate)............................................148
Figure 7.21 Interactions for ‘where is there’ games (2B: Separate) ..........................................149
Figure 7.22 Shared language (2B: Together) .............................................................................150
Figure 7.23 Spatial lexicon (2B: Together)................................................................................151
Figure 7.24 Interactions for ‘where are we’ games (2B: Together) ...........................................151
Figure 7.25 Interactions for ‘where is there’ games (2B: Together)..........................................152
Figure 7.26 Results of the ‘go to’ games (2C) ...........................................................................155
Figure 7.27 Interactions for ‘where are we’ games (2C) ...........................................................155
Figure 7.28 Example language (2C)...........................................................................................156
Figure 7.29 Spatial lexicon (2C) ................................................................................................157
Figure 7.30 Interactions for ‘where is there’ games (2C) ..........................................................157
xvii
List of Tables Table 4.1 Parameters for a location language game.....................................................................54
Table 5.1 Parameters for Pilot Study 1Ai.....................................................................................57
Table 5.2 Source of variability (Pilot 1Ai)...................................................................................59
Table 5.3 Word production (Pilot 1Ai) ........................................................................................60
Table 5.4 Concept comprehension (Pilot 1Ai).............................................................................60
Table 5.5 Parameters for Pilot Study 1Aii ...................................................................................62
Table 5.6 Generations to expressive languages (Pilot 1Aii) ........................................................62
Table 5.7 Parameters for Pilot Study 1Bi.....................................................................................65
Table 5.8 Results for different strategies (Pilot 1Bi)....................................................................65
Table 5.9 Parameters for Pilot Study 1Bii....................................................................................67
Table 5.10 Parameters for Pilot Study 2Ai...................................................................................71
Table 5.11 Parameters for Pilot Study 2Aii .................................................................................74
Table 5.12 Word production and concept comprehension (Pilot 2Aii) .......................................76
Table 5.13 Parameters for Pilot Study 2Aiii ................................................................................78
Table 5.14 Word production (Pilot 2Aiii) ....................................................................................80
Table 5.15 Patterns close to the prototype (Pilot 2Aiii) ...............................................................81
Table 5.16 Parameters for Pilot Study 2Bi...................................................................................82
Table 5.17 Correctly labelled patterns (Pilot 2Bi) .......................................................................86
Table 6.1 Parameters for Study 1A ..............................................................................................94
Table 6.2 Parameters for Study 1B ............................................................................................103
Table 6.3 Results for Study 1B ..................................................................................................103
Table 6.4 Parameters for Study 1C ............................................................................................117
Table 6.5 Results for Study 1C ..................................................................................................119
Table 7.1 Parameters for Study 2A ............................................................................................130
Table 7.2 Results for Study 2Ai .................................................................................................131
Table 7.3 Results for Study 2Aii ................................................................................................136
Table 7.4 Results for Study 2Aiii ...............................................................................................142
Table 7.5 Parameters for Study 2B ............................................................................................144
Table 7.6 Results for Study 2B ..................................................................................................145
Table 7.7 Parameters for Study 2C ............................................................................................154
Table 7.8 Results for Study 2C ..................................................................................................154
xviii
1
Chapter 1 Grounding Spatial Concepts
Spatial cognition is at the heart of our thinking
(Levinson, 2003a, p.xvii)
For robots to interact with each other and humans in a human environment, it is important for them
to be able to use language meaningfully in practical applications. In human language, many
concepts are grounded directly from experience, particularly from sensory perception. Grounding
connects words and sentences with their meanings and is a necessary foundation for the meaningful
usage of language. Language becomes intrinsically meaningful with grounding.
Combining simple concepts provides a way to label other simple concepts. The process of
forming a simple concept from a combination of concepts is termed generative grounding in this
thesis. Generative grounding allows language to bootstrap from simple words and concepts to a full
vocabulary of complex and abstract concepts.
To understand how language may be used meaningfully in practical applications, the nature of
language and the concepts on which language is built must be understood. Concepts of space and
time are among those that are directly experienced and grounded: “We have a sense of space
because we can move and of time because, as biological beings, we undergo recurrent phases of
tension and ease” (Tuan, 1977, p.118). In particular, space is used to help understand and form
other concepts, with “most of our fundamental concepts … organized in terms of one or more
spatialization metaphors” (Lakoff & Johnson, 1980, p.17). Spatial metaphors are used to describe
many non-spatial qualities including mood, consciousness, health, control, quantity, time, and social
status (Lakoff & Johnson, 1980). Understanding spatial language can improve our understanding of
spatial cognition as there is a link between the spatial concepts in cognition and in language. As
“spatial cognition is at the heart of our thinking” (Levinson, 2003a, p.xvii), understanding spatial
cognition may lead to a greater understanding of human thinking. The formation of simple spatial
concepts in computational models and agents may lead to a greater understanding of human spatial
language.
The symbol grounding problem was first identified by Harnad (1990), when he discussed
limitations in symbolic models of the mind. Unless symbols are grounded in an agent’s
representations, they are meaningless to the agent. To solve the symbol grounding problem the
meanings of symbols must be grounded so that agents can use symbols appropriately.
Chapter 1 Grounding Spatial Concepts
2
Harnad’s (1990) suggested solution to the symbol grounding problem was a hybrid
connectionist and symbolic system, in which sensory representations are used to form iconic and
categorical representations that can be linked to symbols with neural networks. Another solution to
the symbol grounding problem is embodiment. The interaction between the autonomous agent and
the world provides a means to ground concepts and representations and enables the agent to
understand and use grounded concepts effectively (Pfeifer & Scheier, 1999). Steels (2007) claims
that the symbol grounding problem has been solved, as studies now incorporate the features
necessary for symbol grounding to be addressed. The necessary features include using an embodied
autonomous agent that forms its own grounded meaning representations and associates symbols
with meanings through interactions with other agents. The solution to the symbol grounding
problem appears to be appropriate embodiment, representations (sensory, meaning, iconic, and
categorical), associations between representations and symbols, and interactions between agents.
Language simulation draws on and extends knowledge about language, including the origins and
evolution of language, concepts representations, concept formation, and word acquisition. Spatial
language is interesting as it is directly grounded in experience, and influences how more abstract
concepts are formed and understood through metaphor. The next two sections review the current
research in the domains of language and space.
1.1 Learning and Evolving Language
Simulation can add to the debate on the origins and evolution of language by determining features
that are important for evolving communication systems. Language games are a framework for
language models in which agents engage in tasks requiring communication (for more information
about language games see Steels, 2001). Language games have been used to evolve lexicons
(Hutchins & Hazlehurst, 1995), categories (Cangelosi & Harnad, 2001), and grammars (Batali,
2002) in agent populations.
Addressing the challenge of understanding language evolution involves examining how it has
evolved in humans and how it could evolve in simulated agents. Investigations include brain
structure, culture, linguistics, interactions, and reasons for using language (for more information
about investigations into language evolution see Cangelosi, Smith, & Smith, 2006; and Smith,
Smith, & Ferrer i Cancho, 2008). In language models, agent interactions vary, with games played as
negotiators or as teacher and student. The games may involve following commands, interpreting
descriptions, finding resources in the world, or coordinating with other agents (Kirby, 2002; Nolfi,
2005; Steels, 2005; Vogt, 2007; Wagner, Reggia, Uriagereka, & Wilkinson, 2003).
Language ties internal meanings to external signals. One question to be answered is whether
concepts are innate, or whether language allows concepts to form. The Sapir-Whorf hypothesis
Chapter 1 Grounding Spatial Concepts
3
(Carroll, 1956) states that language constrains the way we think with concepts firmly linked to the
words associated with them. The Whorfian viewpoint is that speakers of different languages form
different concepts and ways of thinking constrained by the language they speak. In neo-
Whorfianism, language helps construct complex concepts and aids in cognitive development rather
than reflecting underlying innate concepts (Levinson, 2003a).
Language extends beyond simple concepts to complex concepts and metaphor. Language is
generative as it allows the formation of new concepts from existing concepts. New concepts may be
formed using a generative lexicon and obtaining labels either through invention or reapplication of a
relevant name from another domain.
1.2 Space in Language
Spatial language extends beyond describing relationships between objects and locations with spatial
metaphors prevalent in natural language. Spatial concepts vary across languages (Levinson, 2001),
but the basic properties may be the same. Place names are basic spatial concepts from which other
spatial concepts can be formed. Understanding spatial concepts may inform about concepts in
general, especially through spatial metaphor.
When people describe spatial locations, landmarks are preferred, followed by spatial relations
(Tversky, 2003). In English, spatial relations are generally provided by spatial prepositions, with
directions and distances combined to form spatial terms such as ‘in front of’, ‘near’, and ‘at’. Other
languages and cultures express spatial relations in other ways (Levinson, 1996); for example, the
Mayan language Tzeltal has only one preposition while nouns and verbs provide spatial location
information (P. Brown, 2006).
Several spatial language models have been developed, including studies that involve terms
related to spatial locations (Bodik & Takac, 2003; Steels, 1995). The language games of the studies
involve concepts of direction and distance from the agent to an object in the world, where object
and agent locations are unambiguous and known by all agents.
There is a natural human tendency to assume that a spatial language entails descriptions of
objects at specific locations, or using objects to define landmarks. However, the brain provides a
basic system for representing space that does not require an understanding of objects (O'Keefe &
Nadel, 1978). As demonstrated in Chapter 6, a set of concepts to describe space and a
corresponding toponymic language does not require knowledge of objects or descriptions of visual
scenes.
The most basic spatial concepts correspond to areas in space and are referred to by labels for
places, such as city or suburb names. Areas within an environment or along a path can often be
described by single words, such as corner, corridor, or intersection, or larger regions such as
Chapter 1 Grounding Spatial Concepts
4
kitchen, office, or backyard. In this thesis, names for specific places in an environment are called
toponyms (i.e. topographic names) and a set of such terms to comprehensively describe an agent’s
environment is a toponymic language.
Language models that investigate language emergence are typically based on vision or simple
perception. For an autonomous agent, the representations that language could be based on include
behaviours and spatial representations. To form spatial representations, agents explore the world
and build up a locations representation based on their perceptions and motor actions over time. A
spatial representation provides information than can not be obtained from direct perception.
Mobile robots are currently able to build complex representations of the world, especially in
indoor environments (Milford, Wyeth, & Prasser, 2004; Thrun, 2002). Robot representations of the
world provide an ideal basis for computational models of language, where symbols referring to
location and spatial relations can be grounded in the interactions of the robots with the world.
1.3 Understanding Space and Language
The state of the art relevant to this thesis involves the symbol grounding problem, the ideas about
how concepts and words interact, and spatial cognition models used in robots and agents. To a large
extent, the symbol grounding problem for simple concepts is solved (Steels, 2007). Symbol
grounding requires working with embodied autonomous agents, a mechanism for generating
meanings, internal representations for grounded meanings, the ability to establish and negotiate
symbols, and coordination between members of the population. Regarding spatial concepts in
humans, the current research indicates that words and concepts for space do interact, with the
precise spatial concepts formed relating to the language learned and culture in which the language is
learned (Levinson, 2003a). A variety of spatial cognition models exist, including those that relate to
geographic navigation (Milford, 2008), human language models, where robots are given a method
for forming human spatial concepts (Dobnik, 2006; Skubic et al., 2004), and models with evolved
language, where agents refer to relationships between places in the world (Bodik & Takac, 2003;
Steels, 1995).
Open questions regarding understanding space and language include the grounding of spatial
languages in robots, the formation of a spatial language from a cognitive map representation of the
world, the design of communicative interactions between mobile agents, the design of interactions
to create a language for locations in the world, and determining what is required for the grounding
of concepts that cannot be directly experienced. The key question is: how can a robot form and label
complex concepts in an embodied spatial environment?
Chapter 1 Grounding Spatial Concepts
5
1.4 Thesis Overview
This thesis addresses the formation of spatial language in mobile robots using language games. The
overall goal was to ground a computational model of spatial language in mobile robots that could be
used meaningfully in practical applications. Developing the language model involved determining
the features that made the language easy for the robots to negotiate and learn, particularly agent
features, such as the concept and word representations, and the agent’s society, including how the
agents interact with each other to learn or negotiate a language. The specific aims were to run a
series of experiments that demonstrated language learning and formation in agents and robots. The
series of experiments involved an investigation into representations and methodology, an
investigation into the impact of interactions on spatial languages, and an investigation into going
beyond shared attention for ‘here’ to talking about ‘there’. In this project, robots played language
games with the concepts grounded in spatial representations. The concepts were those obtained
directly from the robot representations: locations and relationships between locations.
The main contributions of this thesis are:
• a series of studies to demonstrate that representations and methods matter,
• the development of a method for concept formation with a distributed representation,
• the development of a method for producing the word that provides the most information
about the chosen topic,
• the formation and grounding of spatial concepts based on a cognitive map representation,
• grounding locations: the design of language game interactions between mobile robots
that enable the formation and grounding of location concepts, and
• generative grounding: the design of generative interactions that enable agents to ground
concepts that are not directly experienced.
Chapter 2 presents a review of the literature for grounding and language. Chapter 3 presents a
review of the literature for space and spatial language. Chapter 4 is a description of the
experimental design for the studies in this thesis. Chapter 5 presents initial studies that investigated
methods for language formation and concept representations. The main studies described in this
thesis involved toponymic and generative language games. The first game involved the formation of
toponyms and is presented in Chapter 6. The second game involved generative grounding with
spatial terms and is presented in Chapter 7. Chapter 8 has a general discussion, conclusions of the
thesis, and possible future work.
Chapter 1 Grounding Spatial Concepts
6
7
Chapter 2 Understanding Language
The language is perpetually in flux: it is a living stream, shifting, changing,
receiving new strength from a thousand tributaries, losing old forms in the
backwaters of time.
(Strunk & White, 2000, p.83)
Understanding the nature of language and how language can be used meaningfully in practical
applications has many different facets. Traditional linguistics deals with finding rules and ways of
describing language exactly. Modern linguistics addresses modern languages, and focuses on
linguistic form with respect to phonetics, phonology, syntax, semantics, and pragmatics (Hurford,
2007). Language evolution has, until recently, been dismissed by linguists as not being part of their
area, or as a task that needs to wait until language has been described in sufficient detail (Bickerton,
2003; Newmeyer, 2003). However, language evolution has now been investigated by researchers
from many fields, including “psycholinguistics, linguistics, psychology, primatology, philosophy,
anthropology, archaeology, biology, neuroscience, neuropsychology, neurophysiology, cognitive
science, and computational linguistics” (Christiansen & Kirby, 2003b, p.2).
Languages continually update with new concepts and words. To completely understand what
language is, ‘messy’ factors need to be considered. Methods such as playing language games are
able to inform about embodiment, learning, culture, evolution, grounding, and vocabulary.
Language models allow various features of agent populations to be investigated. This chapter
provides a review of the research areas relevant for computational models of language using agents.
2.1 The Importance of Grounding
Grounding has been defined as “the processes by which an agent relates beliefs to external physical
objects” (Roy, 2005, p.176), with language grounding referring to “processes specialized for
relating words and speech acts to a language user’s environment via grounded beliefs” (Roy, 2005,
p.176), and the grounding problem being “how to embed an artificial agent into its environment
such that its behaviour, as well as the mechanisms, representations, etc. underlying it, can be
intrinsic and meaningful to the agent itself, rather than dependent on an external designer or
observer” (Ziemke, 1999, p.87).
For humans, grounding involves the connections between meanings as represented in the mind
of the individual and words as agreed on by the population. Each person has a different grounded
Chapter 2 Understanding Language
8
semiotic network that matches enough to enable joint action and communication for speakers of the
same language (Steels, 2007).
For artificial agents, the symbol grounding problem (Harnad, 1990) considers how an agent is
connected with the world so that the representations and mechanisms defining behaviour are both
intrinsic and meaningful (Ziemke, 1999). Traditional AI has not been concerned with how symbols
are related to the world (Pfeifer & Scheier, 1999), using symbols to investigate reasoning, problem
solving, and communication (Vogt, 2003). However, robots need the symbols to be related to the
world in order to discover meaning for themselves, without a human interpreter (Pfeifer & Scheier,
1999).
Two examples of the symbol grounding problem are Searle’s (1980) ‘Chinese Room Argument’
and Harnad’s (1990) extension ‘The Chinese / Chinese Dictionary Go-Round’. In Searle’s example,
a computer attempts to pass the Turing Test in Chinese by performing manipulations on the
symbols received and responding with appropriate Chinese symbols. An external Chinese speaking
observer could interpret the computer’s behaviour as understanding Chinese. However, if a person
unable to speak Chinese replaced the computer and performed the same symbol manipulations, they
would not be considered to have comprehended the symbols.
In Harnad’s extension, a Chinese / Chinese Dictionary is used to learn Chinese as a second
language (hard version) or as a first language (impossible version). As symbols are only referred to
by other symbols in the dictionary, there is no way out of the ‘symbol / symbol merry-go-round’ to
ground the meaning of a symbol in something other than another symbol. These examples show
that symbol manipulation cannot be considered cognition due to the lack of intentionality and
comprehension of the agent performing symbol manipulation.
The grounding problem is not restricted to symbols, and as such has been referred to by different
names, including representation grounding, concept grounding, and the internalist trap (Ziemke,
1999). Another term related to symbol grounding is anchoring, the concrete aspect of the symbol
grounding problem (Coradeschi & Saffiotti, 2000). Anchoring is the creation and maintenance of
relationships between symbols and perceptual data (Coradeschi & Saffiotti, 2000; Vogt, 2003),
while symbol grounding also refers to relationships between symbols and abstractions (Vogt, 2003).
One possible solution to the symbol grounding problem is Harnad’s (1990) hybrid connectionist
and symbolic system for connecting symbols to the world. In Harnad’s proposed system, neural
networks are used to link connectionist representations with symbols. The non-symbolic
representations are categorical and iconic representations. Categorical representations, consisting of
the invariant features of objects and events, are used to identify objects and events as belonging to a
category. Iconic representations, consisting of particular features of objects and events, are used to
Chapter 2 Understanding Language
9
discriminate between objects and events belonging to the same category. The symbolic
representations are symbol strings describing category membership relationships. A set of
elementary symbols is arbitrarily associated with the iconic and categorical representations,
grounding them in the agent’s representations. Composition of the set of elementary symbols can
occur, resulting in more complex meaning structures.
Another solution to the symbol grounding problem is the use of situated and embodied agents,
in which symbols are grounded in the sub-symbolic activities and the interaction between an agent
and the world (Sun, 2000). Concepts are formed in relation to agents’ experiences, linked to goals
and actions. When cognition and intelligent behaviour is situated and embodied, the behaviour is
based in the interaction of the agent and the environment rather than being symbol manipulation
(Ziemke, 1999).
Steels (2007) claims that the symbol grounding problem has been solved, as experiments have
been performed where agent populations coordinate a symbolic system that is grounded in
interactions with each other and the world. According to Steels (2007), the necessary features to
solve the symbol grounding problem are:
• working with embodied autonomous agents,
• a mechanism for the agent to generate meanings,
• a way for agents to internally represent and ground meanings,
• the ability to establish and negotiate symbols that refer to the meanings, and
• coordination between members of the population to allow the semiotic networks to
become sufficiently similar.
More can still be done to understand meaning, conceptualisation, symbolisation, neural
correlates for semiotic networks, and representation making and group dynamics in people. Steels
focuses on ‘groundable symbols’ that can be directly grounded through a perceptual process in
which sensori-motor data can be analysed to determine whether an object fits a concept. Generative
grounding, which may include concepts that cannot be directly grounded in sensori-motor
experience, is still an open question for symbol grounding.
2.2 Embodiment for Language Models
In embodied cognition, the physical and experiential structure of the human body is central to
human cognition (Varela, Thompson, & Rosch, 1991). Embodiment is also part of the solution to
the symbol grounding problem, and has been addressed when implementing language models in
robots. Robots and tasks have been used in robot language research to investigate different concepts
and tasks, perceptual abilities, and natural or artificial language learning or evolution.
Chapter 2 Understanding Language
10
Embodied robots have been used to form concepts of colours and shapes (Roy, 2001; Steels,
1999), simple objects (Roy, Hsiao, & Mavridis, 2003; Steels & Kaplan, 2001; Vogt, 2000a), spatial
commands and descriptions (Skubic et al., 2004; Vogt, 2000b), and food or poison (Floreano, Mitri,
Magnenat, & Keller, 2007). The majority of robot tasks have been to form concepts and agree on
labels for those concepts. Some of the robots also had a survival task; for example, to find food and
avoid poison (Floreano et al., 2007).
The robots’ physical implementation has been heads with cameras (Steels, 1999), the SONY
AIBO (Steels & Kaplan, 2001), LEGO vehicles (Vogt, 2000a, 2000b), and custom made robots
such as Toco (Roy, 2001). The perceptual abilities given to the robots have included vision
(Floreano et al., 2007; Steels, 1999), hearing (Roy, 2001), and touch (Roy et al., 2003).
Embodied robots have been taught human terms (Roy, 2001; Roy et al., 2003; Skubic et al.,
2004; Steels & Kaplan, 2001), or have evolved their own languages (Floreano et al., 2007; Vogt,
2000a, 2000b). The word representations have been based on text (Skubic et al., 2004; Steels,
1999), speech (Roy, 2001; Roy et al., 2003; Steels & Kaplan, 2001), or the robots’ perceptual
abilities, such as light sensing (Floreano et al., 2007).
Robot language research can be extended with the use of mobile robots that build up internal
world maps through interactions with a real world environment. The use of mobile autonomous
agents that move in a real environment enables spatial language formation.
2.3 Learning Language
Simulation can add to the debate on the origins and evolution of language by determining features
that are important for evolving communication systems. Language games are a demonstrated
framework for language models in which agents engage in tasks requiring communication. They
have been used to evolve lexicons, such as labelling phases of the moon (Hutchins & Hazlehurst,
1995), categories, such as poisonous or edible mushrooms (Cangelosi & Harnad, 2001), and
grammars (Batali, 2002) in populations of agents. In language games, the agents exchange words
referring to their world that may be related to what they perceive, what they are doing, or what
another agent is doing. The aim of language games is for a population of agents to reach a
consensus on terms for concepts in the world to successfully communicate about a task.
To create a computational model of language, the connections between input signals and
utterances must be found. The process from inputs to utterances may have several stages. The
semiotic square (Steels, 1999) shows a way to divide the language process from real world to words
(see Figure 2.1).
Chapter 2 Understanding Language
11
Figure 2.1 Semiotic square
The semiotic square is one way to divide the language process from the real world
to words in a language agent. The agent perceives the real world, forming internal
representations that are grouped into meanings associated with words. (Adapted
from Figure 2.1, p.27 of Steels, 1999)
There is a large variety for the structure of the language agents described in the literature. For
parts of the process of connecting input signals and utterances, computational models of language
have used various methods, including:
• simple neural networks (Cangelosi, 2001; Cangelosi & Parisi, 1998; Kirby & Hurford,
2002; Marocco, Cangelosi, & Nolfi, 2003),
• autoassociator networks, neural networks with outputs trained to be equal to the inputs
and hidden units used for signals (Hutchins & Hazlehurst, 1995),
• recurrent neural networks, neural networks in which recurrent links providing the
networks with context from one time step to the next (Batali, 1998; Elman, 1990;
Tonkes, Blair, & Wiles, 2000),
• lexicon tables, a symbolic method that forms associations between categories and
utterances (Smith, 2001; Steels, 1999),
• definite clause grammars, a heuristic-driven incremental grammar induction method
(Kirby & Hurford, 2002),
• finite state unification transducer, in which a prefix tree transducer shows the language
with the signal transmitted and the meaning associated with transmitting the signal at
that position (Brighton & Kirby, 2001),
• self organising maps, a connectionist method that can be used to form categories from
internal representations (Riga, Cangelosi, & Greco, 2004), and
• discrimination trees, a symbolic method that can be used to group internal
representations into categories (Smith, 2001; Steels, 1999).
Real World
Words
Internal Representation
Meaning
Chapter 2 Understanding Language
12
The appropriate methods and representations must be chosen to match the concepts and agent
interactions.
2.4 How Could Language Have Evolved?
In a paper reviewing the consensus and controversies in research into language and evolution,
Christiansen and Kirby (2003a, p.300) state that the big questions about language and language
evolution are “Why is language the way it is? How did language come to be this way? And why is
our species alone in having complex language?” Understanding language evolution requires
evidence from different areas, including the structure and use of language, brain structure in modern
humans and our ancestors, brain areas used in language, and the differences between language and
animal communication. These diverse types of evidence have resulted in a multidisciplinary field,
including “anthropology, archaeology, artificial life, biology, cognitive science, computer science,
ethology, genetics, linguistics, neuroscience, palaeontology, primatology, psychology and statistical
physics” (Smith et al., 2008, p.v).
For language to evolve, a variety of abilities are required, including behaviours such as altruism,
larger social group sizes lacking hierarchical structure, the right environmental conditions, and a
sufficient intelligence level (Hurford, 2007). Language use “is inseparable from immersion in a
culture” (Dessalles, 2007, p.50) as languages emerge from interactions between people. The nature
of the environment and culture in which the speakers exist affects the languages that form.
Computational models of language can inform about how communication systems emerge,
investigating ontology, grounding, learnability, and generalisation in languages that evolve in
populations of agents (see Kirby, 2002; Steels, 1997b, 2005; and Wagner et al., 2003 for reviews of
computational models of the evolution of language)
One way that language evolution can be modelled is through iterated learning, by using the
Iterated Learning Model (ILM) in populations of agents (Kirby & Hurford, 2002). The basis for the
ILM is the process of language transmission with two representations of language: I-Language
(internal) and E-Language (external) (see Figure 2.2). I-Language is acquired by an agent
experiencing another agent’s E-language. The I-Language is then used to produce the E-Language.
In iterated learning, agents learn from other agents in the population, adapting their lexicon to
improve the chance of successful games.
The ILM has a meaning space, a signal space, at least one language learning agent, and at least
one language using agent. There may be a turnover in the population with learners becoming users,
new learners entering the population, and old users leaving the population. The ILM is typically
used with one language learner and one language user (Brighton & Kirby, 2001; Kirby, 2001; Kirby
& Hurford, 2002), although larger populations can be used. The negotiation model can be seen as a
Chapter 2 Understanding Language
13
version of the ILM, with equal probabilities for agents being speakers and hearers and a single
generation of agents (Batali, 1998; Cangelosi, Riga, Giolito, & Marocco, 2004; Hutchins &
Hazlehurst, 1995; Smith, 2001). A variation may be used in which language is evolved in separate
generations of agent populations (Cangelosi, 2001; Cangelosi & Parisi, 1998; Nolfi & Marocco,
2002; Quinn, 2001). The ILM is a way of incorporating the agents’ built-in abilities, learning, and
culture into computational models of language evolution.
Figure 2.2 Language transmission
The language transmission process between the internal language (I-Language) and
the external language (E-Language). Language persists by means of transmission
between I-Language and E-Language through production and acquisition.
(Adapted from Figure 5, p109 of Kirby, 2001)
Language evolution involves both the vertical transmission of language through generations of
agents and the horizontal transmission of language among peers. Varieties of the ILM correspond to
the vertical and horizontal transmission of language. In this thesis, the term “evolution” is used to
describe vertical transmission through generations of agents. Horizontal transmission is referred to
as negotiation, and corresponds to the emergence or formation of language.
2.5 Translating Meanings and Signals
“A language is a system for translating meanings into signals, and vice versa. Thus language is
anchored in non-language at two ends, the end of ‘meanings’ and the end of signals” (Hurford,
2007, p.3). The question is whether language is just a type of animal communication, or whether it
is something different entirely (Dessalles, 2007). The difference appears to be that human language
is learned while animal communication is innate. The alarm calls of vervet monkeys appear to be
fixed. The language of bees indicating distance and direction to food sources is genetically
programmed, with no need for individual bees to go through a learning period to associate the
signals with the meanings. The freedom of human communication comes from the arbitrary
association of signs with meanings, as can be seen by the multitude of languages.
Acquisition Acquisition Acquisition
Production Production
I-Language I-Language
E-Language E-Language
Chapter 2 Understanding Language
14
The ability to categorise the world and to form concepts is central to human thought: “Without
the ability to categorize, we could not function at all, either in the physical world or in our social
and intellectual lives” (Lakoff, 1987, p.6). One open question is whether concepts form before or
together with words to refer to concepts. The key is to consider what we know about concepts in
animals and in pre-linguistic humans. Concepts or ‘proto-concepts’ can be attributed to at least
some animals; for example vervet monkeys responding to alarm calls, swallows responding to
predators such as cats and hawks, but not dogs or pigeons, birds able to classify paintings as Monet
or Picasso, and Alex the African grey parrot who was able to categorise colour, shape, and matter
(Hurford, 2007). Studies of pre-linguistic knowledge of spatial relations indicate that infants are
able to distinguish between left, right, above, below, and between relations, have knowledge of
object permanence by 2.5-3.5 months, some understanding of causal relationships by 7 months,
some understanding of containment by 2.5 months, and begin to separate function from appearance
towards the end of the first year (Coventry & Garrod, 2004).
There are conflicting views on whether concepts exist prior to words and whether language
constrains thought. These views include the Sapir-Whorf hypothesis (Carroll, 1956), and neo-
Whorfianism (Levinson, 2003a). The two cardinal hypotheses of Whorf are: “that all higher levels
of thinking are dependent on language … (and) that the structure of the language one habitually
uses influences the manner in which one understands his environment” (Carroll, 1956, p.vi).
Different languages categorise the world in different ways. A single category in one language with a
single associated word may be divided into two or more categories and words in another language.
According to the Whorfian view, nature is divided up into concepts based on the language that is
used. Language speakers and listeners therefore only have the same experience of the same physical
evidence if their linguistic backgrounds are the same (Carroll, 1956).
Neo-Whorfianism encompasses a variety of views that allow for ‘Whorfian effects’ where
linguistic patterns have an effect on thinking. The neo-Whorfian view is based on studies that find
that linguistic difference between languages correlates with perceptual and cognitive differences
(Levinson, 2003a). The studies have mainly been undertaken in the domains of spatial language and
colour, and indicate that language facilitates cognitive development.
Language is not simply a one-to-one labelling of symbols to concepts. The concepts that we
form are varied, including things (people, animals, and objects), “events, actions, emotions, spatial
relationships, social relationships, and abstract entities” (Lakoff, 1987, p.6). New concepts can be
formed through a generative process by extending from existing concepts. In addition to concepts in
language formed through a generative process, much of human language and conceptual thinking is
metaphor. Spatial concepts are used metaphorically in many other parts of language, including
Chapter 2 Understanding Language
15
temporal (Spinney, 2005), interpersonal relationships (Tuan, 1975), emotion (Coventry & Garrod,
2004), kinship, social structure, music, and mathematics (Levinson, 2003a). The spatial concepts
used in the metaphor must be grounded in experience for the metaphor to be useful.
2.6 Summary
The key factors that are not yet understood completely to enable language to be used meaningfully
by autonomous agents include grounding, embodiment, learning, evolution, and the interaction
between concepts and words. Features necessary to solve the symbol grounding problem include the
use of embodied autonomous agents, a way for the agents to generate, internally represent, and
ground their own meanings, a way to negotiate symbols to refer to the meanings, and interaction
between agents in the population to coordinate the symbols and meanings (Steels, 2007). The issues
involved in language embodiment include concepts and tasks, perceptual abilities, and whether
natural languages are learned or artificial languages are evolved.
Existing language models employ a large variety of learning methods. There should be a match
between the learning method, representations, concepts, and interactions. In studying language
evolution, the built-in abilities and learning of the agents should be considered, as well as how the
cultural interactions affect how the language evolves through agent populations. The Iterated
Learning Model (Kirby & Hurford, 2002) is a way of integrating built-in abilities, learning, and
culture. Concept-word interactions and how concepts can be grounded when they are not directly
experienced are open questions.
Chapter 2 Understanding Language
16
17
Chapter 3 The Ubiquity of Space
Space is abstract. It lacks content; it is broad, open, and empty, inviting the
imagination to fill it with substance and illusion; it is possibility and
beckoning future. Place, by contrast, is the past and the present, stability and
achievement.
(Tuan, 1975, p.164-5)
In addition to understanding the nature of language, the concepts on which language is built must be
understood. Space and time are ubiquitous concepts: “Space and time are among the most
fundamental of notions. They provide a basis for ordering all modes of thought and belief”
(Peuquet, 2002, p.11). Spatial cognition is a general requirement for any mobile species, and the
prevalence of spatial metaphors in human language indicates the centrality of spatial cognition in
human thinking (Levinson & Wilkins, 2006a). There is variability in how different languages
conceptualise space, though there may be universals in the possible categories of spatial terms,
ways of describing locations or routes, and in the types of perspective that are possible.
Open research questions include how space is represented and used in humans, how to design
useful computational models of spatial languages, and how to design better robot navigation and
mapping systems. The project described in this thesis combines robot and language research by
designing methods for robots to be able to talk about space.
This chapter provides a review of spatial cognition, universals of spatial language, spatial
language models, and spatial representations in humans, animals, and robots.
3.1 Talking About Where
Spatial language includes descriptions of scenes, navigation, and descriptions of where something
is. Spatial concepts can provide information about the following: a single location (‘at’ specifies a
point), direction (‘north’ specifies any point on a line pointing in a northerly direction), distance
(‘near’ refers to anywhere that is close to a location), or multiple locations (‘beside’ combines ‘left’
and ‘right’). Some spatial words have multiple meanings. ‘To the right of’ may be considered to be
close on the right hand side or at any distance on the right hand side. Meanings can be altered with
different frames of reference. ‘Behind’ may mean that the object is beyond the reference from
‘here’, or that the object is located ‘in the opposite direction to where I am facing’.
Chapter 3 The Ubiquity of Space
18
The key elements involved in how people describe ‘where’ are points (landmarks), planes
(landmarks), paths (one-dimensional connectors), portions, directions (relative to landmark or
environment), and distances (standard units or approximate units of experience) (Tversky, 2003). In
expressing location, landmarks are preferred, followed by spatial relations (near rather than far),
direction (those with natural asymmetries), and distances (usually defined by landmarks). People
are better at using landmarks and paths than at using directions and distances (Tversky, 2003).
3.2 How Space Becomes Place
Spaces and places are known and constructed in the mind through the experiences of smell, taste,
touch, vision, and movement. Space is the general term for the world around us. Once we have
experiences in particular areas of space, they may become a place, or “a special kind of object”
(Tuan, 1975, p.12).
People describe where something is using landmarks of specific points, one-dimensional paths,
two-dimensional planes, and three-dimensional volumes (Tversky, 2003). Landmarks are examples
of the most basic spatial concepts that correspond to areas in space and are referred to using labels,
such as city or suburb names. Areas within an environment or along a path can often be described
by single words, such as corner, corridor, or intersection, or larger regions such as kitchen, office, or
backyard. As defined in Chapter 1, names for specific places in an environment are called toponyms
(i.e. topographic names), and a set of such terms to comprehensively describe an agent’s
environment is a toponymic language.
There are various ways in which toponyms are formed, including natural features (Dover =
water, Rotarua = two lakes), special sites (Doncaster = camp on the Don), religious significance
(Providence, Gadshill), royalty (Queensland), explorers (America, Cookstown), famous local
people (Baltimore, Washington, London), memorable incidents or famous events (Waterloo), and
other place names from immigrants’ homelands (Paris, Troy, London). Other less common ways
include explorers naming good or bad fortune on travels (Cape Tribulation), animal names (Beaver
City, Buffalo), descriptive names (North Sea), and the ‘new town’ (Newtown, Neuville, Naples,
Villanueva, Novgorad, Neustadt, Carthage) (Crystal, 1997, p114).
Place can be constructed at different scales: “At one extreme a favourite armchair is a place, at
the other extreme the whole earth” (Tuan, 1975, p.12). We experience places through our senses.
Smaller areas may become more personal places, while larger ones are constructed from different
types of experience, such as travelling through a city. The different scales of place include personal,
home, city, neighbourhood and region, and nation-state.
Another way to differentiate between spatial scales is based on the actions that can be performed
which depend on the distances involved: within touch (personal space), within view and able to be
Chapter 3 The Ubiquity of Space
19
viewed from different perspectives (tabletop space), within walking or travelling distance
(geographic space), and beyond personal experience (astronomical space) (Peuquet, 2002)
3.3 Describing Relationships Between Places
In language, spatial information is distributed throughout sentences in a variety of word classes
(Levinson, 2003a). In English, spatial relations are often described using spatial prepositions.
Prepositions are hard to learn in a second language due to the differences in how they map onto
concepts (Coventry & Garrod, 2004). Prepositions can be divided into groups, including vertical
(below / above, down / up, under / over, beneath / on top of), distance (near, far), horizontal
(beyond, behind, beside, by), omni-directional (at, about, around, between among, along, across,
opposite, against, from, to, via, through), and temporal (O'Keefe, 1996). Another way to classify
prepositions is shown in Figure 3.1. The preposition type that directly describe spatial relationships
are simple topological terms (in, on, under), proximity terms (next to, beside), projective /
dimensional (behind, in front of), and directional. The order of acquisition for the preposition types
is fairly consistent across different languages, with simple topological terms acquired first, followed
by proximity terms, then projective terms (Coventry & Garrod, 2004).
Figure 3.1 Preposition classification
A classification of the different preposition types (adapted from Figure 1.1, p7 of
Coventry & Garrod, 2004). The spatial terms are those relevant to this thesis,
including ‘simple’ topological terms, proximity terms, projective / dimensional,
and directional.
Prepositions
Grammatical uses Local uses
Spatial uses Temporal uses
Locative/relational Directional
Topological Projective / dimensional
“Simple” topological terms Proximity terms
Chapter 3 The Ubiquity of Space
20
Directions can be described in precise degrees and absolute cardinal directions, but are usually
described approximately, relative to a landmark or major feature of the environment. Directions
with asymmetries, such as front and back, are preferred over those without asymmetries, such as left
and right (Tversky, 2003). While distance can also be described in standard units, such as
kilometres, approximate units of experience are often used that refer to landmarks or time (Tversky,
2003).
3.4 Choosing Which Perspective To Use
When describing space, a variety of references can be used. Levinson (2003a) describes the three
distinct frames of reference as ‘intrinsic’, ‘relative’, and ‘absolute’. In the intrinsic frame of
reference, the coordinate system used is object-centred, determined by inherent features of the
object, which may be different features in different languages. The direction of the coordinate
system is determined by each language, relating to function in English and shape in Tzeltal
(Landau, 1996). The relative frame of reference has a coordinate system centred at the location of a
viewer, which may be the location of the speaker or an arbitrary location in the scene. In the
absolute frame of reference, the coordinate system may be fixed by gravity and cardinal directions,
varying with different languages and cultures. The absolute frame of reference may be fixed by a
feature of the environment such as a coastline or a mountain. Some languages only use one frame of
reference, while others use a combination. The relative frame of reference requires the intrinsic
frame of reference, but all other combinations are possible.
Two examples of languages that require speakers to have a sense of location and direction are
the Australian language Guugu Yimithirr and the Tenejapan language Tzeltal (Levinson, 2003a). In
Guugu Yimithirr, only the absolute frame of reference is used, with spatial descriptions referring to
something similar to the cardinal directions of North, South, East, and West. In Tzeltal, both
intrinsic and absolute frames of reference can be used. In the Tzeltal absolute frame of reference,
directions are designated uphill, downhill, or across, corresponding to an inclined plane that has
been abstracted from the local environment.
While all languages can describe spatial representations, people speaking different languages
will prefer to use different frames of reference or may even switch between frames of reference
during conversation (Tversky, 1996). Frames of reference can be used to construct or describe
spatial relationships in the world. The use of different frames of reference indicates that language
may restructure the spatial representations of the language speaker, rather than the existence of
innate and universal spatial concepts (Majid, Bowerman, Kita, Haun, & Levinson, 2004).
Chapter 3 The Ubiquity of Space
21
Translation between frames of reference is not easy; for example, information required by the
absolute frame of reference is not provided by the others. Therefore, speakers will tend to remember
spatial experiences in the frame of reference that they habitually use (Levinson, 2003a).
3.5 Universals Across Different Languages
There is variation in spatial language across cultures (Levinson, 1996). Language may affect
conceptual categories, including cultural variation in spatial frames of reference. There are
differences in conceptual distinctions, with the spatial relationship concepts formed in different
languages overlapping or cutting across each other, with no one-to-one mapping cross-linguistically
(Levinson & Wilkins, 2006a). There are differences in the grammatical categories of words used to
describe spatial relations. In English, spatial relations are mainly described using spatial
prepositions. Other languages use verbs, local cases, special spatial nominals, or adverbials
(Levinson & Wilkins, 2006a).
The differences in spatial concepts between languages indicate that universal concepts are not
unanalysable wholes (Levinson & Wilkins, 2006b), but may involve elements that can be combined
in various ways. Children acquiring spatial language are not mapping forms onto innate concepts,
but are building the concepts as they are learning the language, developing spatial language for
about the first 8 years of life (Coventry & Garrod, 2004).
Based on a series of studies made over twelve different languages distributed over five
continents with various cultures, Levinson & Wilkins (2006b) concluded that there are constraints
on the diversity of dimensions for structuring spatial domains and implicational constraints. One of
the constraints on the diversity of dimensions for structuring spatial domains is that there is a finite
set of possible frames of reference for languages to use: intrinsic, relative, and absolute. Also, if a
language has a relative frame of reference, it has an intrinsic one.
The universal elements of spatial concepts may include that all languages express spatial
relations, that universal spatial concepts may be elements that can be combined in different ways
(Levinson & Wilkins, 2006b), and that perspectives are used in the form of frames of reference with
three possible choices (relative, intrinsic, and absolute) (Levinson, 2003a).
3.6 Spatial Language Models
Some research has included a spatial dimension to language models, (Bartlett & Kazakov, 2005;
Bodik & Takac, 2003; Regier, 1996; Steels, 1995; Steels & Loetzsch, 2007). In Bartlett and
Kazakov’s (2005) simulation world, agents must find food and water to survive and reproduce.
Food and water are located at set intervals in a grid world and can be found by self-exploration of
the agents, remembering where food and water have been found in the past, conversation with other
Chapter 3 The Ubiquity of Space
22
agents, and sharing with other agents. Locations and paths are remembered by the landmarks that
can be seen from that location. Agents use ‘songlines’ to store paths, listing the landmarks that can
be seen when following the route from a particular location to the nearest resource. The location
names are known by all agents prior to the study. The importance of language to the survival of the
agents is related to the structure of the environment concerning the distances between resources.
In language games studies incorporating a spatial dimension (Bodik & Takac, 2003; Steels,
1995), the games involve concepts of direction and distance from the location of the agent to an
object in the world. Objects in the world could be ‘pointed to’, allowing agents to build up a lexicon
of terms for objects in the world. In Steels’ study the objects were other agents, while in Bodik and
Takac’s study the objects were static objects in a playground. Naming games were followed by
spatial language games in which the agents describe distances and directions between objects. In
Steels’ (1995) study, the agents always face the same direction, and the concepts used are front,
side, behind, left, straight, and right. In Bodik and Takac’s (2003) study, the agents move around
the world, and a conceptualisation tree was used for dividing angles and distances into spatial
concepts which were then associated with words. In both studies, the agents utilise an absolute
frame of reference, and so have a shared perspective.
Regier’s (1996) constrained connectionism model categorised spatial relations, or prepositions,
from several languages including English, Russian, and Mixtec. The model was presented with
sequences of movie frames that displayed the spatial relation, and learned to categorise the terms
without explicit negative evidence.
Methods for aligning the agents’ perspective in language games have been considered. Situated
agents provided with a ‘language faculty’ and methods for aligning perspective are able to agree on
a lexicon describing spatial categories relevant to their environment which includes another robot
and an orange ball (Steels & Loetzsch, 2007).
3.7 Representing Space
Human spatial competence includes shape recognition, a sense of where body parts are, and
navigation (Levinson, 2003a). In terms of navigation, people’s ability varies greatly, unlike the use
of echolocation in bats, detecting polarisation of light in bees, and using the earth’s magnetic field
in migratory birds. Navigation in humans is a culturally developed system, resulting in a large
variance in navigation ability (Levinson, 2003a).
The hippocampus is indicated in spatial representation in humans. The theory presented by
O’Keefe and Nadel (1978) was the existence of a cognitive map in rats, with the hippocampus
constructing an allocentric map of the world. In the cognitive map theory for humans, the right
hippocampus has the spatial function of constructing an allocentric map, while the left hippocampus
Chapter 3 The Ubiquity of Space
23
is a linguistic or episodic memory system, or a semantic map (O'Keefe, 2003). Studies using
modern imaging techniques have confirmed that the right hippocampus is used in navigation tasks,
such as mental navigation along memorised routes (Berthoz, 1999) and topographic learning and
recall (Maguire, 1999).
Multiple brain areas are likely to be associated with spatial language comprehension and
production, including those associated with geometric and dynamic-kinematic routines, such as the
right hippocampus with non-linguistic spatial representations, the left hippocampus mapping the
spatial language onto a spatial representation, and the left parietal and frontal regions involved in
processing of space and motion (Coventry & Garrod, 2004).
By studying other forms of spatial representations, more may be understood about human spatial
representations. Other forms of representations being studied include cognitive maps in rats and
maps for mobile robots.
3.7.1 Cognitive Maps In Rats
Invasive experimental procedures have resulted in detailed knowledge of rodent spatial
representation due to more. Rodents have place cells that correspond to locations in space and head
direction cells that indicate the rodent’s head orientation (O'Keefe, 1979). Place cell activity is
affected by the movement of the rodent and visual input. Place cells appear to provide an allocentric
map of the world, where individual place cells correspond to particular locations in the world.
Evidence of ‘grid cells’ has been shown in the entorhinal cortex (Hafting, Fyhn, Molden, Moser,
& Moser, 2005). Grid cells are activated when the rodent is at locations in the environment that
coincide with a regular grid of equilateral triangles. Grid cells have multiple discrete firing locations
corresponding to the regular grid.
Place cells, head direction cells, and grid cells are indicated in a neural map of the rodent’s
spatial environment. Grid cells may provide a scale of the environment, with head direction
providing orientation, and place cells responding to specific location cues (Hafting et al., 2005).
3.7.2 Maps for Mobile Robots
For mobile robots to be truly autonomous they must be able to build a map of their environment and
navigate using that map. An overview of research into robotic mapping is given in Thrun (2002). A
summary of the main points of this paper is given here.
Robotic mapping is one of the most important problems in building autonomous mobile robots.
With more than two decades of research in robotic mapping, there are many models able to map
structured, static, small scale environments in real-time. Simultaneous Localisation And Mapping
(SLAM) is the process in which robots build an internal map of the environment and use an
Chapter 3 The Ubiquity of Space
24
estimate of localisation in that map for navigation. Problems in robotic mapping include
measurement error (the sensors used by robots are subject to errors), the high dimensionality of the
environments, the correspondence problem (whether different measurements correspond to the
same physical object), dynamic environments (the real world is not a static environment), and
robotic exploration (how the robots choose their path during mapping). Issues for future research
into robotic mapping include mapping dynamic environments, integrating knowledge about
environments into the mapping problem (e.g. objects typical in indoor vs. outdoor environments),
multi-robot collaboration, and unstructured environments (e.g. outdoor, underwater, and planetary).
SLAM models have been based on grid representations, landmark representations, or
topological representations. An alternative is a biologically inspired approach to SLAM; for
example an approach based on the hippocampal complex in rodents. Computational models of the
rodent hippocampus used in mobile robots allow the robots to localise within an environment
(Arleo & Gerstner, 2000; Burgess, Donnett, Jeffery, & O'Keefe, 1999). RatSLAM, a method of
SLAM that has been developed at The University of Queensland, is based on the rodent
hippocampus (Milford et al., 2004).
3.7.3 RatSLAM
The robots used for the RatSLAM project are Pioneer 3 DX robots (see Figure 3.2), equipped with a
camera to provide the visual input for the perception system, and wireless communication
equipment for real time data collection and analysis. There is a high-fidelity simulator that captures
the sensing and motion capabilities of the robot and is able to generate the robot’s on-camera view
(Moylan, 2003, with additional work by Mark Wakabayashi) (see Figure 3.3). Earlier studies were
performed on Pioneer 2 DXE robots with a forward facing camera.
RatSLAM is a computational model inspired by the rodent hippocampal complex with
continuous attractor networks (Milford et al., 2004). Movement and visual senses modulate the
network dynamics. Elements near each other in the network are likely to be close in space.
RatSLAM keeps the sense of space inherent in grid-based and landmark-based systems, while
adding the robustness and adaptability of topological representations.
Robots using RatSLAM use the appearance of an image to aid localisation by learning to
associate the appearance of a scene and its position estimate (Prasser, Wyeth, & Milford, 2004).
The robot can perform goal-directed navigation to locations that have previously been visited based
on an internal map of the environment.
Chapter 3 The Ubiquity of Space
25
Figure 3.2 Robot used in the RatSLAM and RatChat projects
The Pioneer 3 DX robots have a camera, sonar range finders, laser range finders,
and wireless communication. For RatChat, the robots have a microphone and two
speakers.
a)
b)
Figure 3.3 Map of the real and simulated world
The map of the robot’s world showing a) the halls and open plan offices of the real
world and b) the simulation world, a 3D virtual reality environment constructed
from digital photos of the real world.
Mirror for omni-directional camera
Speaker
Microphone Laser range finder
Sonar range finder
Antenna for wireless communication
Camera
Chapter 3 The Ubiquity of Space
26
Architecture
The inputs to the RatSLAM system include odometry and vision. The odometry inputs are the
velocity and rotation of the robot. The visual representation is a low resolution version of the input
from the camera (see Figure 3.4). The RatSLAM Architecture integrates the inputs by creating local
views from the visual scenes and performing path integration on the odometric inputs to form the
pose cell representation. The pose cells and local view cells are integrated into an experience map.
The local view, pose cells, and experiences are described in the following three sections.
a) b) c)
Figure 3.4 Visual input
A corridor as seen by a) the camera of the robot, b) the robot in the simulation
world, and c) the low resolution vision obtained from the camera image and used
by RatSLAM for vision based SLAM.
Local View Cells
The visual processing method used in local view creation is a sum of absolute differences matcher
(see Figure 3.5). The current local view is compared with stored templates that have been seen
previously. If the current view is sufficiently similar it will be recognised, otherwise a new template
will be created.
Pose Cells
The RatSLAM pose cell representation (see Figure 3.6) provides information about the robot’s
pose, combining place and orientation information. The correspondence with the rodent
hippocampal complex is with place cells that are active when the rodent is in a particular location
and head direction cells that are active when the rodent is oriented in a certain direction. In
RatSLAM, place and head direction cells have been combined into a pose representation in x, y, and
θ to allow the system to concurrently manage multiple pose beliefs. Wrapping occurs in each of the
x, y, and θ dimensions. As the robot moves, information from vision and odometry sensors is
processed using local view and path integration. The visual and odometric information is used to
update the pose cell activity, resulting in the movement of the pose cell activity packet. Multiple
activity packets can exist when the robot recognises scenes that are associated with multiple
locations in the environment.
Chapter 3 The Ubiquity of Space
27
Figure 3.5 Local view cells
The array of Local View (LV) Cells is created as the robot explores the world. The
current local view is compared with existing templates. When a scene is
sufficiently different from all stored scenes, a new template is created. Note that
the images shown are for a forward facing camera.
The total number of cells and the number of active cells is adjustable depending on the
environment size and the desired level of accuracy. An example setting, used in the pilot studies
was 180 x by 68 y by 36 θ pose cells (440,640 pose cells) with between 100 and 200 pose cells
active at any time. In all studies described in this thesis, the pose cell map was large enough that the
x and y dimensions did not wrap.
In a pose cell map, where the most active pose cell at the current time is shown as the robot
moves through the world, there may be discontinuities where areas close in physical space are
represented by pose cells that are further apart, and there may be multiple representations where one
cluster of pose cells represents multiple physical locations. The pose cell representation is
Local View Cells
Visual inputs of robots in the Simulation World Low resolution vision
used by RatSLAM
Chapter 3 The Ubiquity of Space
28
topologically correct, consistent, stable, and able to be used for goal directed navigation in simple
environments (Milford, Wyeth, & Prasser, 2005). In more complex environments, the pose cells are
difficult for a human to interpret as a map of the environment, and difficult to use in goal directed
navigation. The experience mapping algorithm was developed to create a map that was easy for a
human to interpret and to use in goal directed navigation.
Figure 3.6 Pose cells
The three dimensional (x, y, and θ) continuous attractor network of pose cells used
in RatSLAM. The lines around the edge show the limiting area for the pose cells at
which wrapping occurs. The activity packet is the currently active cells, and moves
around the pose cells based on the motion and rotation of the robot, with additional
inputs from the current Local View. Note that the robot pictured here is a Pioneer 2
DXE with a forward facing camera.
Experiences
The experience map consists of a collection of experiences linked together by transitions. Each
experience is a representation of the pose cell and local view cell activity at a point in time (see
Figure 3.7). New experiences are created when the existing experiences do not sufficiently describe
the current pose cell and local view cell activity. Each experience is located within the (x, y, θ)
experience map coordinate space. The first experience is placed at (0, 0, 0), with subsequent
experiences placed based on the previous active experience and the robot’s movement. The
movement between experiences, in terms of spatial, temporal, and behaviour, is stored in
experience transitions. A map correction process takes into account the current location of the
experience in the map and the perceived distance between the experiences based on transitions.
rotation
motion
Robot Movement
x'
y'
θ'
motion
rotation
Activity Packet
Pose Cells
Chapter 3 The Ubiquity of Space
29
Figure 3.7 Experiences
The pose cells represent spatial information where active cells encode the robot’s
pose. The local view cells represent visual information where each cell encodes a
visual scene experienced by the robot. Each node in the experience map encodes a
specific spatial and visual experience of the robot and is located in the experience
map’s co-ordinate space. Transitions between experiences are used to form a map
representative of the environment. (Adapted from Milford & Wyeth, 2007)
Behaviours
The basic behaviour of the RatSLAM robots is to explore the world, forming pose cell, local view
cell, and experience map representations. Exploration involves wall following, where the left or
right wall may be selected. The robots can perform goal directed navigation: an experience may be
selected for the robot to use as a goal, causing the robot to plan a route to the location based on
knowledge of time taken to move between experiences.
Future Work
The RatSLAM system is under ongoing development, with possible future work including the
development of analysis techniques for biological models, formal analysis of the model, long term
experiments, and formalisation of the methods and the dynamic environment problem (Milford,
2008).
Pose Cells
Local View Cells
xy
θ
Experience Map Co-ordinate Space
Experience
Chapter 3 The Ubiquity of Space
30
3.8 RatChat
The RatChat project extends the RatSLAM project, and aims to investigate language formation
using robots. The concrete aims are to provide robots with the ability to talk about their world and
experiences. The project builds on RatSLAM’s robotic platform, with the detailed representations
of space formed by robots exploring their world.
The studies of this thesis have been the core of the RatChat project, driving the robots’ abilities
to talk about locations in the world, and the relationships between locations. Other sections of the
project have considered naming paths, equivalent to verbs in the robots’ world, and ways for
humans to interact with the robots.
3.9 Summary
The different types of spatial concepts include landmarks or specific places, paths between places,
directions, and distances. Combinations are used to describe scenes, navigation, or locations.
Relationships between places are often described in English with prepositions. The relationships
that can be described with prepositions include topological, proximity, projective, and directional.
Space, the general world around us, becomes place through experience. Place names, or
toponyms, are often formed through a collective experience at a location, or are descriptive of
features of that location. There is a divergence of concepts, frames of reference, and grammatical
categories used for spatial language. There may be a set of universals from which specific
languages choose; for example, there are only three frames of reference for languages to choose
from: intrinsic, relative, and absolute. Language models have labelled objects located in the world,
relationships between the objects, and have recently considered methods for aligning perspective.
Human spatial competence includes shape recognition, a sense of where body parts are, and the
culturally developed navigation system. Specific brain areas are associated with spatial language
and spatial representation, including the hippocampus, parietal, and frontal regions. In rats, the
hippocampus and entorhinal cortex are indicated in navigation with place cells, head direction cells,
and grid cells. Navigation in rats has inspired the development of robot models of Simultaneous
Localisation and Mapping, including RatSLAM. The RatSLAM modules are local view cells, pose
cells, and the experience map. The themes covered in this chapter are combined in the RatChat
project, of which the studies described in this thesis are a major part.
31
Chapter 4 A Location Language Game
The original word game is the operation of linguistic reference in first
language learning. … We play this game as long as we continue to extend
our vocabularies and that may be as long as we live.
(R. Brown, 1958, p.194)
To use language meaningfully in practical applications requires appropriate methodology. Many
methods and representations have been used in language models. Method refers to the structure of
the language agents, the strategies used by the agents to form concepts, produce utterances, and
comprehend utterances. Representation refers to how concepts and words are defined, where
concepts are formed from concept representations and utterances are formed from word
representations.
The purpose of Chapter 4 is to describe the methodology and representations for the RatChat
project, in particular for the studies presented in this thesis. The studies are presented in the three
chapters following this chapter, and include Pilot Study: Methods and Representations, Study 1: A
Toponymic Language Game, and Study 2: A Generative Toponymic Language Game.
Language games can be used to inform about language features by investigating embodiment,
learning, culture, evolution, grounding, and vocabulary. They may be played in a static or dynamic
population of two or more agents. The nature of the population dynamics may affect the languages
resulting from interactions. A standard language game consists of interactions between two agents:
a speaker and a hearer. In a simple version of the guessing game, the steps of an interaction are:
• shared attention,
• speaker behaviour,
• hearer behaviour,
• feedback, and
• acquisition of a new conceptualisation (Steels, 2001).
The language games described in this thesis are location language games. The structure of
location language games is described in this section, with a flow chart for an agent playing location
language games shown in Figure 4.1. To play a location language game, the agents require a
representation of the world that they use to form concepts. To gain these concept representations,
the agents must first explore the world. Exploration is carried out independently of other agents in
Chapter 4 A Location Language Game
32
the world and may be performed in advance of or concurrently with language game interactions.
The exact nature of the exploration depends on the agents’ environment.
Shared attention for location language games is obtained by agents being near each other, or
within hearing distance. The agents autonomously explore the world and play a game when they
are close to each other. To determine whether they are close to each other, the agents intermittently
send out a ‘Hello’ signal. If a ‘Hello’ signal is heard, the hearing agent then sends a ‘Hear’ signal
and the agents will play a game. The agent that said ‘Hello’ is the speaker and the agent that said
‘Hear’ is the hearer. In real robots, the signals may be in the form of sounds, while in simulated
agents, signals may be sent to agents within a set distance in the simulation world. The hearing
distance affects the distance at which signals will be received by other agents in the world.
Following the acquisition of shared attention is the speaker behaviour. The speaker chooses a
topic, which in a location language game relates to the current location of the agents or a location at
a distance from the agents, depending on the game being played. After the topic is determined, the
speaker uses their lexicon to determine which word should be used in the current situation. The
actual word representation depends on the abilities given to the agents, such as an ability to parse
speech, tones, text, or integers.
After the speaker produces an utterance, the hearer attempts to comprehend the utterance to
determine the topic. For comprehension, the hearer considers the shared attention of the agents and
their internal representations of the world.
In the feedback step, the success or failure of the game is determined with feedback provided to
the other agents. The feedback step may be skipped, as determining success or failure is difficult to
do without considering the internal states of the agents, or ‘mind-reading’, which is not desirable
(Smith, 2001). However, a variety of performance measures may enable each agent to keep track of
how the interactions are progressing.
Finally is the acquisition of a new conceptualisation, where agents update their representations
to increase the chance of success in future games. Specifically, the agents update their lexicon,
which enables a coherent language to form in the population of agents.
In the studies, three different types of language game are played by the agents: ‘where are we’,
‘go to’, and ‘where is there’. The type of location language game played determines how the agents
choose the topic, determine the word, and comprehend the word. The games are described in detail
in Chapters 6 and 7 where they are first used.
Chapter 4 A Location Language Game
33
Figure 4.1 A location language game
A flow chart for an agent playing location language games, with the speaker
behaviour on the left and the hearer behaviour on the right. Agents explore the
world, checking if a ‘Hello?’ signal has been received, or if it is time to send out a
‘Hello?’ signal. When a ‘Hello?’ signal is received, the agent becomes the hearer.
When an agent that has sent a ‘Hello?’ signal receives a ‘Hear’ signal, they become
the speaker.
Chapter 4 A Location Language Game
34
The parameters for the location language game are:
• game and
• hearing distance.
The features requiring more detailed information are:
• concept representations,
• word representations,
• lexicon,
• population dynamics,
• environment, and
• performance measures.
These features are described in the remainder of this chapter, which concludes with a summary
of the parameters for all features of the location language game.
4.1 Concept Representations
In many language models, the concept representations are arbitrary, and may not actually represent
meaningful concepts. For example, a representation might be a vector of 10 real numbers between
0.0 and 1.0 (Batali, 1998) or objects with a set of abstract features in the range 0.0 to 1.0 (Smith,
2003). In other studies, they are representative of real concepts or tasks. Agents have played
language games to describe visual scenes with concepts such as size and colour (Steels, 1999) and
have communicated to aid cooperation in agent populations (Floreano et al., 2007).
In a location language game, the concept representations build on the robot’s representations
formed while exploring the world. They are used by the speaker to choose the interaction topic, and
together with the word representations and the lexicon to determine the word for the topic. They are
used by the hearer, together with the word representations and the lexicon, to comprehend the word
spoken.
Throughout the studies described in this thesis, three types of concepts are formed by the agents:
locations in the world, distances between locations in the world, and directions in the form of the
angle between two locations at a distance (see Figure 4.2). The concepts are constructed from the
robot representations formed using RatSLAM, described in section 3.7.3, or simplified versions of
these representations. The representation types include vision, pose cells, and experiences.
The parameters for robot representations are:
• concept type (locations, distances, or directions) and
• concept representation (vision, pose cells, or experiences).
Chapter 4 A Location Language Game
35
Figure 4.2 Concepts types
The three concept types used by agents in this thesis: locations, distances, and
directions.
4.2 Word Representations
Word representations are used by the speaker when determining the word for the topic, and when
sending the word to the hearer. They are used by the hearer when comprehending the word.
Depending on the methods used to form concepts, and to associate concepts and words, the words
used for different concepts may be arbitrary, or they may be related according to the relationship of
the concepts that they refer to. For example, labels for nearby locations may have similar elements,
while labels for distant locations do not. Alternatively, distinct names may be used for different
locations, which may be easier for agents to learn (Gasser, 2004). The actual representation of the
words regarding the physical transmission could be an integer, a set of unit activations, text or
sound.
The parameter for word representations is:
• word representation (integer, activations, text, or sound).
4.3 Lexicon
The lexicon is used together with concept and word representations when the speaker determines
the word for the topic and when the hearer comprehends the word. The associations between
concepts and words stored in the lexicon are updated at the end of each interaction. Throughout the
studies, four different techniques were employed to associate concepts and words. Each of the
lexicon techniques must allow the agents to produce words, comprehend concepts, learn
associations between concepts and words, and have a source of variability that allows new words to
be associated with new concepts.
Location Distance Direction
Chapter 4 A Location Language Game
36
The parameter for the lexicon is:
• technique (simple or recurrent neural networks, standard or distributed lexicon tables).
For each of these techniques, the features requiring more detailed information are:
• word production,
• concept comprehension,
• learning associations, and
• variability.
In the next four sections the lexicon techniques used in this thesis are described.
4.3.1 Simple Neural Networks
Simple neural networks provide a way to associate a set of input units with a set of output units (see
Figure 4.3). For neural networks, features that need to be considered are whether they will be fully
connected, the transfer function that will be used in the units, and the structure of the word
representations. The simple neural networks used in this thesis are fully connected. The Log-
Sigmoid transfer function (logsig) is used:
logsig(n))exp(1
1n−+
= Equation 4.1
where n is the input to the unit. The logsig function takes any input and will output a value
between 0 and 1, limiting the network outputs to activations between 0 and 1. The structure of the
word representations is that each output unit is associated with a single word, and the word chosen
corresponds to the most active output unit.
Figure 4.3 Simple neural network
The pilot studies used simple linear networks with bias. In this figure, the
rectangles refer to a set of units, and the arrow indicates that the units are fully
connected.
Word Production
For the word production networks, the input unit activations correspond to the representations
underlying the concepts and the output units correspond to the words. In the studies using simple
neural networks each output unit corresponds to a single word.
Output
Input
Chapter 4 A Location Language Game
37
Concept Comprehension
There are two alternatives for concept comprehension. The first is for the agent to have a separate
comprehension network, with the inputs being the words and the outputs being the concept
representations. Two networks are easy to implement and use, but there is no direct link between
the networks. The agents need to train both networks, one for production and the other for
comprehension.
The second alternative is for the agent to have a single network used both for word production
and concept comprehension (Batali, 1998). The network takes the words as the input and the output
of the network is the concept. When acting as the speaker, the word that has the closest concept to
the topic chosen is the word sent to the hearer. A single network means that word production and
concept comprehension are directly linked, but one of the directions is more difficult to obtain and
more computationally expensive.
The parameter for concept comprehension using simple neural networks is:
• networks (the same network used for production and comprehension or a separate
network for production and comprehension).
Learning Associations
The network weights are updated by training the network on a set of input and output patterns. In
this thesis, the networks are initialised with small random weights and biases (uniformly between –
0.1 and 0.1). The network is trained using gradient descent with momentum and an adaptive
learning rate. The change in weights and biases, dX, is given by:
gmdXmdX k )1(. 1 −+= − η Equation 4.2
where m is the momentum constant, dXk-1 is the previous change in weights and biases, η is the
learning rate, and g is the gradient of the network’s performance with respect to its weight and bias
values. The learning rate may be increased or decreased by the increasing and decreasing ratios
depending on the network’s performance.
Concerning the network task, one option is to present patterns individually, updating the
network’s weights for each pattern. Another option is to present all of the patterns, updating the
network’s weights for the whole language. In this thesis, blocks of patterns were presented to the
networks, with the network’s weights updated for the block of patterns rather than for individual
patterns.
The parameters for learning associations with a simple neural network are:
• momentum constant,
• initial learning rate,
Chapter 4 A Location Language Game
38
• increasing ratio, and
• decreasing ratio.
Variability
The source of variability for simple neural networks, resulting in different words being used for a
set of input patterns, may be evolution or training. In evolution, the weights connecting the input
and output units are mutated and different networks are selected based on a performance measure.
An example performance measure is expressivity, or producing many different words for the set of
input patterns. In training, the training set may be adjusted with a set of input patterns associated
with a set of output patterns. In this case, an external trainer sets the desired output patterns. It is
possible to link these two options, with the training set derived from an evolved network.
The parameter for variability using simple neural networks is:
• source of variability (evolving for expressivity or specifying the training set).
4.3.2 Recurrent Neural Networks
Recurrent neural networks are general purpose learners with the ability to learn temporal sequences,
making them ideal for computational models of language (see Figure 4.4). They have been used in
the language domain (Elman, 1991) and in language evolution (Batali, 1998). In the evolution of
languages, recurrent neural networks have been used to investigate compositionality and how
language can evolve to become more learnable (Tonkes et al., 2000). Given appropriate input and
output representations, compositional and generalisable languages can be formed.
Figure 4.4 Recurrent neural network
In a simple recurrent neural network, the hidden unit activations are copied to the
context units for the next time step (adapted from Figure 2, p184 of Elman, 1990).
In this figure, the rectangles refer to sets of units, the full lines indicate that the
units are fully connected, and the dotted line indicates that the activations are
copied one-for-one from the hidden to the context units.
Input
Output
Hidden
Context
Chapter 4 A Location Language Game
39
Parameters that need to be considered are how many hidden units will be used and the units’
transfer function. For the recurrent neural networks in this thesis the hidden unit activations are
copied to the context units for the next time step. The Log-Sigmoid transfer function is used.
The parameter for recurrent neural networks design is:
• hidden units.
Word Production
As for simple neural networks, the input unit activations for word production networks correspond
to the representations underlying the concepts and the output units correspond to the words. The
word may be the output of the network at the final time step, or the output of the network over a
sequence of time steps. Also, the word may be the raw output of activations, or may be a cleaned
version where one or more of the most activated outputs are set to 1 while the others are set to 0.
Concept Comprehension
As for simple neural networks, there are two alternatives for concept comprehension used in the
studies. The parameter for concept comprehension using recurrent neural networks is:
• networks (the same network used for production and comprehension, or a separate
network for production and comprehension).
Learning Associations
The network’s weights are updated by training the network on a set of input and output patterns.
The options for updating the weights of the neural network include various evolutionary strategies
and back propagation.
Evolution strategies are a type of evolutionary computation that were initially used for automatic
design and analysis (see Beyer & Schwefel, 2002 for an overview, including a history, the basic
algorithm, and variations). There are a number of ways to control the mutation rate in the evolution
strategy, including a mutation rate rule called the 1/5 success rule (Beyer & Schwefel, 2002). With
the 1/5 success rule, the mutation is tuned so that the success rate is 1/5, which from
experimentation has been found to be the optimal success probability. The 1/5 success rule
increases the mutation rate when success is lower than 1/5, and decreases the mutation rate when
success is higher than 1/5. A variation of the 1/5 success rule is the reverse 1/5 success rule that
decreases the mutation rate when success is lower than 1/5 and increases the mutation rate when
success is higher than 1/5. Another useful mutation rate rule is called self-adaptation (Beyer &
Schwefel, 2002). For self-adaptation, the mutation strength is mutated, with either a single mutation
operator or a vector of mutation operators referring to each weight of the neural network. For all of
Chapter 4 A Location Language Game
40
the evolution strategies the weights are perturbed using the mutation rate multiplied by a random
amount determined using a normal distribution.
Back propagation is one of the most popular methods used to train neural networks (Rumelhart,
Widrow, & Lehr, 1994). Back propagation through time (BPTT) is a variation of back propagation
which is more suitable for temporal information (Tonkes, 2001). Using back propagation, the
network’s weights are updated corresponding to the contribution of each weight to the final error of
the output. For networks using BPTT, the contribution of each weight is calculated over multiple
time steps.
The parameter for learning associations using recurrent neural networks is:
• weight setting mechanism (a constant mutation rate, the 1/5 success mutation rate rule,
the reverse 1/5 success mutation rate rule, self-adaptation, or back propagation).
Variability
As for simple neural networks, a separate network may evolve the language, or a training set may
be created for the networks. The parameter for variability using recurrent neural networks is:
• source of variability (evolving for expressivity or specifying the training set).
4.3.3 Lexicon Table
Lexicon tables are a symbolic representation that is designed to associate concepts with words. A
lexicon table stores associations between concepts and words.
Word Production
The agents can either choose the word most associated with the category (the normal strategy), or
the word most likely to be understood as the category (the introspective obverter strategy (Smith,
2003)). Choosing the word most likely to be understood is similar to the strategy for simple or
recurrent neural networks where a single network is used for production and comprehension.
The parameter for word production using lexicon tables is:
• strategy for choosing words (normal or introspective obverter).
Concept Comprehension
Words are comprehended as the concept that is most associated with the category.
Learning Associations
Different strategies can be used to associate categories with words. Steels (1999) assigns a score to
each concept-word pair that is increased when the pair is used successfully, and decreased when the
concept is used with another word, the word is used with another concept, or the concept-word pair
is used unsuccessfully. Smith (2003) does not differentiate between successful and unsuccessful
Chapter 4 A Location Language Game
41
games. Instead, the usage of a concept-word pair is increased every time that pair is used in a
speaking or hearing game. A confidence probability is given to each concept-word pair which is the
proportion of times that word has been used in which it has been associated with that concept.
In the score model, a game is either successful or unsuccessful. If a game is successful, the
association or score of the word-category pair used is increased by a small value (e.g. 0.1) and the
scores of other words associated with that category, and other categories associated with that word
are decreased by a small value (e.g. 0.1). If a game is unsuccessful, the score of the word-category
pair is decreased. The scores are set between a lower and upper limit (e.g. 0.0 and 1.0).
In the usage and confidence probability model, the usage of a word-category pair is increased
every time the word-category pair is used in a speaking or hearing game. The confidence
probability is worked out by calculating the proportion of times the word has been used for that
category compared to the times it has been used for other categories.
The parameter for learning associations using lexicon tables is:
• strategy for associating (score or usage and confidence probability).
Variability
When lexicon tables are used, words are usually invented when they are needed, with no connection
to other words that are already associated with concepts. In an interaction, the agent must decide
whether it already has an appropriate word, or whether a new word should be invented. One option
is to use a word invention rate with a threshold for the association between the concept and word,
under which a word may be invented. When the threshold is 0, a word may only be invented if there
are no associations between words and the chosen concept. A word absorption rate may also be
used by the hearer to determine whether to add words that they hear to their lexicon.
A second option is to invent words probabilistically, for example with probability, p, as follows:
TSep )1(1
−−
= Equation 4.3
where S is the association between the concept and word, and T is the temperature, which sets
the association value accepted by an agent. Varying the temperature alters the rate of word
invention, where a higher temperature increases the probability of inventing a new word.
The parameters for variability using lexicon tables are:
• strategy for word invention (threshold or temperature),
• word absorption rate,
• word invention rate,
• threshold, and
• temperature.
Chapter 4 A Location Language Game
42
4.3.4 Distributed Lexicon Table
In Study 1 and Study 2, the associations between concept elements and words for locations are
stored in distributed lexicon tables, a method designed for these studies and inspired by the
distributed nature of inputs to neural networks combined with the lexicon table structure (see Figure
4.5). Forming concepts with a distributed lexicon table is quite different from most other
conceptualisation methods in that it is directly linked to the language formation, allowing concepts
and words to have boundaries that are never explicitly defined. In many language game studies,
concepts are formed using discrimination trees (Bodik & Takac, 2003; Smith, 2001; Steels, 1997a),
which allows the agents to form concepts with well defined boundaries. The discrete concepts,
formed through a discrimination tree or similar categorisation method, may then be associated with
words through a lexicon table, as described in the previous section.
Figure 4.5 Distributed lexicon table
A distributed lexicon table is shown, which stores the associations between concept
elements and words. Associations are stored for each concept element – word pair.
A distributed lexicon table differs from a standard lexicon table in that concept
elements are associated with words rather than discrete concepts. The resulting
concept is distributed across the concept elements, which have defined
relationships with each other.
Con
cept
Ele
men
ts
Words
Associations
Chapter 4 A Location Language Game
43
With a distributed lexicon table, concept formation and association with words occurs
concurrently by increasing associations between concept elements and words. The concepts can be
made more or less specific with more or less elements used to cover the space of the underlying
representation for the concepts (for example locations in the world, a set or distances, or a set of
directions). An association value is stored for each concept element – word pair, which is a value of
0.0 or greater. Concept elements are related to each other; for example if they are locations, they are
related by how far apart they are in the world.
Word Production
Different strategies may be used for choosing which word should be produced for the current
concept element. With the most associated strategy for choosing words, the word chosen is the
word that has most often been associated with the current concept element. Another strategy, the
most informative strategy, was developed in which the most information is transferred about the
current concept element, based on mutual information (MacKay, 2003). Agents choose the word
that will transfer the maximum amount of information (see Figure 4.6) about the current concept
element. With the most informative strategy, a word that has not been used often, but has only been
used for a particular concept element may be chosen over a word that has been used often for many
concept elements.
Figure 4.6 Information value
The information value of word, w, at location, p, is Awp (the association between
the word, w, and the location, p), compared to the total usage of the word.
World containing locations
p
wDistributed Lexicon Table
Awp ∑ =
M
m wmA1
1
M
2
p
1
2M
Chapter 4 A Location Language Game
44
The most informative strategy is described in this section for concepts of locations in the world.
The implementation of the most informative strategy is to calculate the information value, Iwp, for
the word, w, in location, p, as follows:
∑ =
=M
m wm
wpwp
A
AI
1
Equation 4.4
where Awp is the association between the word, w, and the location, p, and M is the total number
of locations. For each location the word with the highest information value is chosen.
In addition to considering the current location to determine which word should be used, the
agent can consider the neighbourhood of locations. When the agents consider the neighbourhood,
the association for the word is summed over the neighbourhood of locations rather than over a
single location (see Figure 4.7).
Figure 4.7 Neighbourhood information value
The neighbourhood information value of a word, w, at location, p, is Awp(neighbourhood)
(the association between the word, w, and all locations in the neighbourhood of
location, p), compared to the total usage of the word.
The neighbourhood association, Awp(neighbourhood) for the word, w, in location, p, is calculated as
follows:
∑ ==
N
n wnoodneighbourhwp AA1)( Equation 4.5
where Awn, the association between the word, w, and the location, n, is summed over all N
locations in the neighbourhood of location, p. The neighbourhood most informative strategy
calculates the neighbourhood information value, Iwp, for the word, w, in location, p, as follows:
p
World containing locations
Neighbourhood of p
p
wDistributed Lexicon Table
∑ =
M
m wmA1
Neighbour of p
Neighbour of p
Awp(neighbourhood)
Chapter 4 A Location Language Game
45
∑ =
= M
m wm
oodneighbourhwpoodneighbourhwp
A
AI
1
)()( Equation 4.6
where Awp(neighbourhood) is the neighbourhood association calculated in Equation 4.5, Awm is the
association between the word, w, and the location, m, and M is the total number of locations. When
the neighbourhood is used, a stable language is reached more quickly than when only the current
concept is used.
The relative neighbourhood information value normalises the association strength of a location
in the neighbourhood by the distance from the location of interest (see Figure 4.8).
Figure 4.8 Relative neighbourhood information value
The relative neighbourhood information value of a word, w, at location, p, is
Awp(relativeNeighbourhood) (the relative association of the word within a neighbourhood,
D) compared to the total usage of the word.
The relative neighbourhood association, Awp(relativeNeighbourhood), for the word, w, in location, p, is
calculated as follows:
∑ =
−=
N
n
npwndighbourhoorelativeNewp D
dDAA
1)(
)(Equation 4.7
where Awn, the association between the word, w, and the location, n, is normalised by the
distance dnp from location, p, and summed over all N locations within a neighbourhood, D, of
location, p. The relative neighbourhood information value, Iwp(relativeNeighbourhood), for the word, w, in
location, p, is the relative association of the word within a neighbourhood, D, compared to the total
usage of the word, calculated as follows:
p
World containing locations
Neighbourhood of p
d2
d1
D p
wDistributed Lexicon Table
∑ =
M
m wmA1
Neighbour of p (distance d2)
Neighbour of p (distance d1)
Awp(relativeNeighbourhood)
Chapter 4 A Location Language Game
46
∑ =
=M
m wm
dighbourhoorelativeNewpdighbourhoorelativeNewp
A
AI
1
)()( Equation 4.8
where Awp(relativeNeighbourhood) is the relative neighbourhood association calculated in Equation 4.7,
Awm is the association between the word, w, and an experience, m, and M is the total number of
experiences in the robot’s experience map.
The parameters for word production using distributed lexicon tables are:
• strategy for choosing words (most associated or most informative strategy with a single
concept element, neighbourhood, or relative neighbourhood) and
• neighbourhood size.
Concept Comprehension
For distributed lexicon tables, words are associated with multiple concept elements. A template can
be created based on these associations, resulting in a representation of the concept that is associated
with each word. Alternatively, the single concept element that is most associated with the word can
be chosen as representative of the word.
For concepts of locations in the world, toponyms are associated with multiple locations. A
template can be created based on the associations between a toponym and a set of locations in the
world. In some cases, a toponym should be interpreted as a single location. The location is the one
that is most representative of the toponym, calculated by determining the information value
provided by that toponym at each location in the world.
Learning Associations
The features to consider in updating the distributed lexicon include when the lexicon will be
updated and how associations between concepts and words will be strengthened and weakened. The
lexicon may be updated whenever a game is played, whether the agent is the speaker or the hearer.
Another option is for only the hearer to update their lexicon, as the speaker already uses the current
word in the current context.
The association between a concept element and a word is strengthened when they are used
together. If forgetting is used, the associations between the current word and other locations and the
current location and other words are also updated. The current association is strengthened while all
other associations with the current location and word are weakened. Forgetting may increase the
rate at which words are lost in the language, unless countered by another feature.
The parameters for learning associations using distributed lexicon tables include:
• forgetting and
• updating (only hearer or both agents update their lexicon)
Chapter 4 A Location Language Game
47
Variability
As for lexicon tables, words are invented as they are needed.
The parameters for variability using distributed lexicon tables are:
• strategy for word invention (threshold or temperature),
• word absorption rate,
• word invention rate,
• threshold, and
• temperature.
4.4 Population Dynamics
The nature of the population dynamics may affect the languages that result from the interactions. A
basic framework for population dynamics is iterated learning, where agents learn language based on
the utterances of other agents in the population, as described in section 2.4. The Iterated Learning
Model provides a framework for investigating the vertical transmission of language through
generations of agents and the horizontal transmission of language among peers. Both vertical and
horizontal language transmission are important to language evolution with a different emphasis on
the interactions between agents. Standard implementations for population dynamics include the
Iterated Learning Model with two agents per generation (Kirby & Hurford, 2002) and negotiation
between agents in a single generation (Batali, 1998).
For languages to spread throughout the world, many games need to be played. For larger world
sizes and more agents, more games are needed to form a coherent language. The number of
generations may also affect the resulting languages, with the number of games per generation
affecting how well new agents learn the existing language.
At the end of each generation, which may occur after a set number of games, older agents are
removed from the population and new agents enter the population. New agents may have an initial
learning period in which they only take part in interactions as the hearer. The length of the learning
period influences how well the new agent learns the language before using the language.
The parameters for population dynamics are:
• generations,
• agents,
• interactions per generation, and
• initial learning period.
Chapter 4 A Location Language Game
48
4.5 Environment
The environment used for the agents to play location language games affects the representations
underlying the concepts used and the exploration required by the agents to obtain these
representations. Three environments were used. The simplest environment, the grid world, was used
to investigate the design of the language games and agents. The next environment, the simulation
world, was used to investigate the language games and agents in simulated robots. The final
environment, the robots in the real world, was used to test the language games implementation with
real world inaccuracies.
The parameter for environment is:
• world (grid, simulation, or real).
4.5.1 Grid World
The first environment is a grid world (see Figure 4.9). The grid size may be altered and obstacles
may be placed in the world to represent walls and other features of the environment. The grid world
agents may occupy any square in the world that does not have an obstacle in it. The grid world used
is based on the worlds used in Steels’ (1995) and Bodik and Takac’s (2003) studies.
Figure 4.9 Grid world
The grid world is a grid of squares that may have obstacles. The world shown here
is a 10 × 10 grid with no obstacles. Agents may occupy any square in the world
that is not an obstacle, including squares occupied by other agents.
The parameters for the grid world are:
• size and
• obstacles.
Chapter 4 A Location Language Game
49
4.5.2 Simulation World
A simulation world was built to mirror the real world, with images from the real world used in
constructing the views of the robot (Moylan, 2003, with additional work by Mark Wakabayashi).
The simulation world includes a room with several desks (see Figure 4.10).
Figure 4.10 Simulation world with path of robot
The simulation world of the robot, with the black lines indicating walls and the
black octagons desks, showing the path of the robot in a typical simulation run. The
robot’s behaviour is set to wall-following, where the robot chooses to follow either
the left or right wall. The room is the one on the right side of Figure 3.3b.
For the simulated robots to play language games with each other, they must first have
representations of the world with local views, pose cell representations, and an experience map. To
gain the local views, pose cell representations and experiences, the robots must explore the world.
Exploration is currently performed by left or right wall following, and is carried out independently
of other robots in the world. For the studies in the simulation world, the robots used a single
forward facing camera. The simulation world enables simulated robots to pass messages only to
other robots within a set distance of their current locations, allowing the hearing distance to be
explicitly set.
4.5.3 Real World
The robots used in the real world are Pioneer 3 DXs. The real world environment is the fifth floor of
the Axon building at The University of Queensland with halls and open plan offices. Experiments
were confined to the room shown in Figure 4.11. The robots used in the final experiments obtained
visual input from an omni-directional camera. An omni-directional camera means that a location
looks the same regardless of the direction that the robot is facing, while for a forward facing camera
opposite directions look completely different. The result is cleaner maps created using omni-
Chapter 4 A Location Language Game
50
directional cameras, as agents are able to recognise locations more readily, particularly in longer
loops of the environment. As in the simulation world, the robots must first explore the world to
build up their representations.
Figure 4.11 Map of the real world
The robot's world of an open plan office. The real world experiments were
confined to the room to the bottom left of Figure 3.3a. A more detailed layout of
the obstacles in the room and the approximate path of the robots are shown.
The implementation issues involved in implementing the location language games in real robots
include the speakers, whether error detection will be implemented, and the robots’ batteries. A tone
generator has been developed for the robots to use. The ‘Hello’ and ‘Hear’ signals, as well as the
syllables of the words, are converted to DTMF tones produced by the robots. Language games take
place when the robots are literally within hearing distance of each other. The volume of the
speakers influences the actual hearing distance of the robots. Obstacles in the room also have an
effect on the actual hearing distance, with robots more likely to hear each other when they are
within line of sight.
Error detection may be implemented for communication between the robots. The most minimal
form of error detection is to check that the utterances match the expected format given by the
grammar of the language and the structure of the language games (see Figure 4.12). The additional
error detection of a checksum can be included for the produced words. The syllables of the
utterance are added together and form the checksum which can be checked by the hearer. The
words are only accepted if the checksum is correct.
The robots’ batteries last for about two hours before the robots need to recharge. The robots’
state at the end of each two hours can be saved, including the pose cell representation, the
experience map, and the lexicon. The state can then be reloaded for the next two hour session.
Battery life will not place limitations on the experiments in the real world, as the robots’ state at the
end of each two hours of battery life can be saved for later use.
Chapter 4 A Location Language Game
51
Figure 4.12 Language game utterances
A condensed version of the location language game structure (Figure 4.1), with the
sequence of utterances showing the basic grammar of the language game: The
speaker says ‘Hello?’, the hearer responds with ‘Hear’, the speaker sends the word
chosen, and the hearer responds with ‘Ok’. Messages that do not follow this format
are ignored.
The parameter for the real world is:
• error detection (minimal or checksum).
4.6 Performance Measures
Performance measures may be recorded by each agent following interactions. In deciding which
measures are necessary to monitor the agents’ languages, the elements that form a good language
for locations need to be considered. In general, a language should be easy to learn, with the agents
in a population forming consistent conceptualisations and labels. A language should be expressive:
while all agents will agree on every word if only one word is used, a single worded language is not
useful. Quantitative measures are needed to determine how good a language is to aid in the design
of a language game.
The measures used throughout the studies include:
• coherence,
• specificity,
• language size,
• word coverage,
• language layout,
• word locations,
• most information templates, and
• toponym value.
Hello?
Hear
Word
Ok
Speaker
Speaker
Hearer
Hearer
Chapter 4 A Location Language Game
52
4.6.1 Coherence
As defined by de Jong, “the coherence indicates to what extent agents use the same signal for a
certain concept” (de Jong, 1998, p.31). In this thesis, coherence has been altered from de Jong’s
definition so that coherence ranges from 0 to 1 rather than from 1/n to 1 where n is the number of
agents. For each concept, c, the coherence, C, is calculated as follows:
1
1)max(−
−=
nN
C wcc Equation 4.9
where N is the number of times a word, w, has been used for that concept, c, and n is the number
of agents. The population coherence is the average of the concept values. When all agents in the
population agree on the same word for every concept, the coherence is 1. When all agents in the
population disagree on the word for every concept, the coherence is 0.
The coherence may be calculated when the agent’s have identical concept representations, as in
the grid world. In the simulation and real worlds, robots create their own concept representations,
which do not directly match. For small environments, it is possible to translate the robots’ concept
representations to obtain an imperfect match for the concepts, allowing an approximate coherence
value to be calculated for the agents in each world.
Coherence can provide an indication of how well the agents agree with each other on words for
each possible concept.
4.6.2 Specificity
Specificity is a measure of how many concepts each word is used for, which indicates the
descriptiveness of each word. For the case where each concept is distinguished by a unique symbol,
the specificity is 1. For the case when every concept is associated with a single word, the specificity
is 0.
The specificity of a language is calculated from a graph with concepts as nodes and edges
indicating that the same word is used for the linked concepts. Specificity, σ, is calculated from the
proportion of edges that are present in the graph (de Jong, 1998, p32):
nn
v−
−= 2
21σ Equation 4.10
where v is the number of edges and n is the number of nodes in the graph. Alternatively,
specificity can be described as follows (de Jong, 1998, p32):
nn
fn n
k k
−
−=
∑ =2
12
σ Equation 4.11
where f is the frequency of a word for a concept.
Chapter 4 A Location Language Game
53
4.6.3 Language Size
With the language size, two values can be considered. The first value is the average size of the
agent’s lexicon, including all words currently in the agent’s lexicon. The words in the agent’s
lexicon include words that have either been heard or invented by the agent. The second value is the
average number of words used by the agents for the set of concepts. The number of words used is
less than or equal to the total number of words in the lexicon. If there are generations of agents used
for the population, an indication that the language is stable is when the number of words used by the
agents is equal to the number of words in the agents’ lexicon.
4.6.4 Word Coverage
The word coverage of the language considers how words are spread through the underlying concept
representation and how many concepts each word is used for. In an expressive language each word
is used for a similar number of concepts. In an impoverished language a small number of words
may be used for most of the concepts while others are used for very few concepts. In impoverished
languages, words tend to disappear from the lexicon, resulting in languages with very few words. It
is generally desirable for languages to use words for similar numbers of concepts.
4.6.5 Language Layout
The language layout displays how the language covers the space. For a location language, each
toponym is given a different colour, and the areas in the world at which each toponym are used are
shown in the colour of the toponym. A comparison of the language layout between agents can show
if the language is shared between the agents, as an alternate measure of coherence.
4.6.6 Word locations
Word locations are an extension of the language layout in which the best location for producing
each word is shown. The location for each word is the location at which the word is the most
informative, and the difference in information value between the most and the next most
informative word is greatest.
4.6.7 Most Information Templates
Most information templates are constructed similarly to the language layout, with each word shown
individually. They consider not only the locations for which the word is the most informative, but
also those for which it is more informative than other words. Most information templates show
where the word is in the top five most informative words, indicating the general area in which that
word will be understood.
Chapter 4 A Location Language Game
54
4.6.8 Toponym Value
The toponym value is the information value of the word–location combination for the word used at
the interaction location. The value of the toponym at the interaction location is an indication of how
appropriate that toponym is for the current location.
4.7 Summary
This chapter has presented the methodology used in the RatChat project, specifically for the studies
described in the following three chapters. The location language game structure was described, with
the concept representations, word representations, lexicons, population dynamics, environment, and
performance measures. The parameters for each feature of a location language game are presented
in Table 4.1.
Table 4.1 Parameters for a location language game
Feature Parameters Location language game Game Hearing distance Concept representations Concept type Concept representation
Word representations Word representation Lexicon Technique
Simple neural networks Networks
Initial learning rate Decreasing ratio
Momentum constant Increasing ratio
Source of variability
Recurrent neural networks Hidden units Weight setting mechanism
Networks Source of variability
Lexicon tables
Strategy for choosing words Strategy for words invention
Word invention rate Temperature
Strategy for associating Word absorption rate
Threshold
Distributed lexicon tables
Strategy for choosing words Forgetting
Strategy for words invention Word invention rate
Temperature
Neighbourhood size Updating
Word absorption rate Threshold
Population dynamics Generations Interactions per generation
Agents Initial learning period
Environment World Grid world Size Obstacles Real world Error detection
The following three chapters describe the major studies of this thesis: Pilot Study: Methods and
Representations, Study 1: A Toponymic Language game, and Study 2: A Generative Spatial
Language Game.
55
Chapter 5 Experimental Design
Someone had drawn a tree. … It was simple because something complex
had been rolled up small; as if someone had drawn trees, and started with
the normal green cloud on a stick, and refined it, and refined it some more,
and looked for those little twists in a line that said tree and refined those
until there was just one line that said TREE.
(Pratchett, 1998, p.345)
A key question regarding the meaningful usage of spatial languages is whether they can be formed,
what they are like, and how they can be learned. Chapter 5 investigated concept representations and
methods for associating concepts with words.
In the RatChat project, the meaning representations are obtained from mobile robots exploring
their world, with possibilities including what they see, and the equivalent of a cognitive map of the
world, built from exploration. Names for places in the world are the most obvious concepts that
may be obtained from a map of the world. The series of studies described in this chapter
investigated methods for associating concepts and words, and a variety of representations that could
be used to form the spatial concepts used in a location language game.
The overall goal of the pilot studies was to determine the features necessary for agents to form a
spatial language. The specific features included the structure of the language game, the concept
representations, the word representations, the lexicon, the population dynamics, and the
environment (as outlined in the previous chapter). Two pilot studies investigated the features:
• Pilot Study 1: Methods – Recurrent Neural Networks and Lexicon Tables
• Pilot Study 2: Representations – Pose Cells, Vision, and Experiences
Following the description of the pilot studies is a discussion of the implications for this thesis1.
5.1 Pilot Study 1: Methods – Recurrent Neural Networks and Lexicon Tables
In a location language game (described in Chapter 4: A Location Language Game), the lexicon
associates concepts with words and is used by the speaker to produce a word for the chosen topic
1 This chapter describes pilot studies that set up the later work and are included for completeness. The reader
interested in studies with more significant outcomes should see Chapters 6 and 7.
Chapter 5 Experimental Design
56
and by the hearer to comprehend the word used by the speaker. The first pilot study investigated
two techniques for the lexicon to associate concepts with words:
• recurrent neural networks and
• lexicon tables.
The aim of Pilot Study 1 was to determine whether recurrent neural networks and lexicon tables
were appropriate for use in a location language game and to determine how these techniques could
be used. Following the description of the studies is a general discussion of the techniques for
lexicons.
5.1.1 Pilot Study 1A: Recurrent Neural Networks
For a recurrent neural network lexicon, the implementation choices to be made were the number of
networks, the weight setting mechanisms, and the source of variability (for more detail refer to
section 4.3.2). The other choices relevant to the lexicon were the concept representations and the
word representations. In Pilot Study 1A: Recurrent Neural Networks, the source of variability was
an evolved network. Following the evolution of a language, two learning networks were trained on
the language (one for production, the other for comprehension). The concept representations were
arrays of units. Due to the nature of the study, the actual concept types were not relevant, though
this concept representation could be interpreted as either specific locations or particular directions
or distances. The word representations were sets of activations comprising three syllables with a
number of units active for each syllable. The words were set to be multi-syllabic to allow for the
possibility of compositional languages. This study involved two studies that investigated the
remaining choices, with various weight setting mechanisms, concept representations, and word
representations. The study was divided into:
• weight setting mechanisms and
• concept and word representation.
The aim of Pilot Study 1A: Recurrent Neural Networks was to investigate how an expressive
language can be evolved and learned using recurrent neural networks, and to investigate any
limitations on the languages produced.
Pilot Study 1Ai: Weight Setting Mechanisms
The selection strategies and weight setting mechanisms that work best for evolving expressivity are
very different than those for learning in recurrent neural networks. In this study, the task of the
evolving networks was to find expressive languages, while the task of the learning networks was to
learn those expressive languages. The aim of the Weight Setting Mechanisms study was to
determine the best weight setting mechanism for each network task with respect to the ability for
Chapter 5 Experimental Design
57
evolved networks to find expressive languages and the ability of learning networks to learn
expressive languages.
The features to be defined for Pilot Study 1Ai included the concept representations, the word
representations, the number of hidden units, the connections to the context units, and the weight
setting mechanisms (see parameters in Table 5.1). The concept representations were an array of 25
units with one-hot encoding (each pattern had a single unit active, representing a single category).
The word representations were three syllables with two out of ten units active for each syllable (see
example word in Figure 5.2). The number of hidden units was set to 15. For the word production
networks, the output units as well as the hidden units were copied to the context layer for the next
time step. The eight conditions investigated were different weight setting mechanisms: Back
Propagation Through Time (BPTT) and the various evolutionary strategy mutation operators of
high (0.1), medium (0.01) and low (0.001) values of mutation, the 1/5 success rule, the reverse 1/5
success rule, mutating a single operator, and mutating a vector of operators (for more detail refer to
section 4.3.2). All conditions except for BPTT were tested for the evolved network providing the
source of variability (see Figure 5.1a). BPTT was not used for the evolved network as this weight
setting mechanism is not suitable for evolution. All conditions were tested for the two learning
networks providing word production and concept comprehension (see Figure 5.1b,c).
To investigate the source of variability for recurrent neural networks, networks were evolved for
5000 generations with 50 runs for the seven evolution weight setting conditions. A simple form of
evolutionary algorithm, the (1+1)-evolution strategy (Beyer & Schwefel, 2002), was used to evolve
the networks which were selected based on the expressivity of the languages produced. A measure
used for the evolved networks was expressivity or the average number of different words in the
language. Expressive languages were defined as those with the maximum possible number of words
(i.e. 25, corresponding to one word for each concept).
Table 5.1 Parameters for Pilot Study 1Ai
Parameters Pilot Study 1Ai Concept representation 25 units, one-hot encoding
Word representation Activation of units: 3 syllables of 10 output units with 2 active for each syllable
Lexicon technique Recurrent neural networks Hidden units 15
Networks Production and comprehension Weight setting
mechanism High, medium, low, 1/5 success, reverse 1/5 success, mutate single, mutate vector, BPTT
Source of variability Evolved network
Chapter 5 Experimental Design
58
a)
b)
c)
Figure 5.1 Word production and concept comprehension networks (Pilot 1Ai)
There were three types of network used in Pilot Study 1Ai: a) Word Production
(Evolving), b) Word Production (Learning), and c) Concept Comprehension
(Learning). The network structure for Word Production (Evolving) and (Learning)
are identical. The word production networks take the concept representation as
input, presented to the network at the first time step. The word produced was the
output of the network over the following three time steps, with the two most active
units set to 1. The concept comprehension network takes the word representation as
the input, presented over three time steps. The output at the final time step was the
concept comprehended.
Concept representation
Word representation
Hidden units
Context units
Word Production (Evolving)
Concept representation
Word representation
Hidden units
Context units
Word Production (Learning)
Concept representation
Word representation
Hidden units
Context units
Concept Comprehension (Learning)
Chapter 5 Experimental Design
59
Figure 5.2 Word representation (Pilot 1Ai)
Words consisted of a sequence of three syllables. Each syllable was represented by
a ten unit binary vector in which the two most active units were set to one.
Expressive languages resulted from all runs with the high consistent value of mutation and the
reverse 1/5 success rule (see Table 5.2). The time to high expressivity was much less for the high
mutation rate with the minimum generations of 31 compared to 437 for the reverse 1/5 success rule.
A moderate proportion of runs resulted in expressive languages for the medium consistent value
(54%), the 1/5 success rule (84%), and mutating the operator (48% and 66%), while none of the
runs with low consistent value resulted in expressive languages.
Table 5.2 Source of variability (Pilot 1Ai)
Weight Setting Mechanism Consistent Mutation Rate Variable Mutation Rate
Measure High Medium Low 1/5 success
Reverse 1/5 success
Mutate single operator
Mutate vector of operators
% Runs with Expressive Languages
100% 54% 0% 84% 100% 48% 66%
Minimum Generations 31 233 N/A 398 437 348 265
Average Words 25.0 24.1 15.6 24.6 25.0 23.6 24.1
Following the evolution of expressive languages, word production networks were trained to
produce the language and concept comprehension networks were trained to comprehend the
language. In each run, networks were trained for 5000 generations or epochs. One expressive
language was randomly chosen for each of the six conditions that produced expressive languages.
There were ten runs for each of these six languages. A measure used for word production was how
different the language produced was from the target language, or how many syllables were left for
we
time = 1
time = 2
time = 3
si li
Chapter 5 Experimental Design
60
the network to learn. A measure used for concept comprehension was the number of words out of
25 that were understood correctly.
For word production and concept comprehension the 1/5 success rule, a low mutation rate, and
BPTT performed well (see Table 5.3 and Table 5.4). For word production, the 1/5 success rule
performed well (28.1 syllables left to learn), as did the low mutation rate (36.0 syllables left to
learn). All of the other evolutionary strategies had more than 70 syllables left to learn. The networks
using BPTT had only 15.1 syllables left to learn at the end of the training. For concept
comprehension, the low mutation rate and the 1/5 success rule performed well (17.8 and 15.7 words
understood). All concept comprehension networks using BPTT learnt to comprehend the language
correctly.
Table 5.3 Word production (Pilot 1Ai)
Weight Setting Mechanism Consistent Mutation Rate Variable Mutation Rate Learning
Measure High Medium Low 1/5 success
Reverse 1/5
success
Mutate single
operator
Mutate vector of operators
BPTT
% Runs with Correct Language
Production 0% 0% 0% 0% 0% 0% 0% 6.6%
Min. Generations or Epochs N/A N/A N/A N/A N/A N/A N/A 2707
Average Syllables Left to Learn 108.2 71.9 36.0 28.1 111.8 72.9 89.6 15.1
Table 5.4 Concept comprehension (Pilot 1Ai)
Weight Setting Mechanism Consistent Mutation Rate Variable Mutation Rate Learning
Measure High Medium Low 1/5 success
Reverse 1/5
success
Mutate single
operator
Mutate vector of operators
BPTT
% Runs with Correct Language
Comprehension 0% 0% 0% 0% 0% 0% 0% 100%
Min. Generations or Epochs N/A N/A N/A N/A N/A N/A N/A 357
Average Words Correct 4.7 10.2 17.8 15.7 5.5 10.1 7.6 25
For evolving expressive languages, the most effective weight changing mechanisms were a high
mutation rate and the reverse 1/5 success rule. The only useful weight setting mechanism for word
production and concept comprehension was BPTT.
Chapter 5 Experimental Design
61
Pilot Study 1Aii: Concept and Word Representation
The form of the concept and word representations for the recurrent neural networks can affect the
agents’ ability to evolve and learn expressive languages. The aim of Pilot Study 1Aii was to
investigate possible representations of the concept and word representation and to determine their
effect on the agent’s abilities to evolve and learn expressive languages.
Three different concept representations were investigated: a one-hot representation where a
single unit was active and two non-orthogonal representations which included a spread of activation
around the most active unit (see Figure 5.3). These representations were chosen to explore the
differences between orthogonal and non-orthogonal patterns, and to determine whether the size of
spread affected the evolved languages. Three word representations were compared in the
simulations. Each consisted of three syllables implemented as binary vectors of ten units with one,
two or five units active (see Figure 5.4). The network structure for each type of network (Word
Production (Evolving), Word Production (Learning), and Concept Comprehension (Learning)) were
the same as for Pilot Study 1Ai (see Figure 5.1).The other parameters were as for Pilot Study 1Ai
(see Table 5.5).
Figure 5.3 Concept representations (Pilot 1Aii)
Three different concept representations: The top line shows a one-hot encoding
with one out of 25 input units active. The second and third lines show non-
orthogonal representations with five and nine active units.
Figure 5.4 Word representation (Pilot 1Aii)
Words consist of a sequence of three syllables. Each syllable is represented by a
ten unit binary vector in which one, two or five most active units are set to one.
The top line shows the raw activations of the units. The second, third, and fourth
lines show syllables with the one, two, or five most active units set to 1.0.
Chapter 5 Experimental Design
62
Table 5.5 Parameters for Pilot Study 1Aii
Parameters Pilot Study 1Aiii
Concept representation 25 units with one-hot encoding, 5-spread activation, 9-spread activation
Word representation Activation of units: 3 syllables of 10 output units with 1, 2, or 5 active for each syllable
Lexicon technique Recurrent neural networks Hidden units 15
Networks Production and comprehension Weight setting mechanism High for evolving, BPTT for learning
Source of variability Evolved network
To investigate the different concept and word representations with the source of variability for
recurrent neural networks, networks were evolved until expressive languages were formed (one
word for each of the 25 concept patterns), or for 10,000 generations. Ten languages were evolved
for each concept and word representation combination. Languages with a single unit active for each
syllable of the word representation took a long time to evolve with only five expressive languages
evolved after 10,000 generations. The single unit output representation was excluded from further
analysis.
The networks with their activation spread across the word representation evolved expressive
languages in less than 10,000 generations (see Table 5.6). For each concept representation
condition, the word representation with five units active took fewer generations to find an
expressive language than the word representation with two units active.
Table 5.6 Generations to expressive languages (Pilot 1Aii)
Word Representation Concept
Representation 10 units with
2 active 10 units with
5 active 9-spread activation 5536.0 870.6 5-spread activation 6991.4 1702.0 one hot encoding 1460.6 716.5
To investigate the different concept and word representations with word production and concept
comprehension, networks were trained for 1000 epochs using the back propagation through time
algorithm (Rumelhart et al., 1994). There were five runs for each of the ten languages evolved for
each condition. Languages using the concept representation with a spread of activation were easier
for the concept comprehension networks to learn than a representation where each concept is
orthogonal (see Figure 5.5). The structure of most of the languages produced was such that most
words were very similar with many elements of the words in common across the words in the
language.
Chapter 5 Experimental Design
63
a)
b)
Figure 5.5 Training networks on evolved languages (Pilot 1Aii)
Training networks on the languages evolved for a word representation of ten units
with a) two active and b) five active. The word production networks learned the
languages similarly, while the concept comprehension networks with a spread of
activation for the concept representation learned the languages more quickly than
with one hot encoding.
The language features affected by the concept and word representations were the speed of
evolution of expressive languages and how learnable the languages were for concept
Chapter 5 Experimental Design
64
comprehension. A spread of activation in the concept representation resulted in faster
comprehension learning. A spread of activation in the word representation enabled expressive
languages to be found more easily. For agents using a recurrent neural network lexicon, the
structure of the concept and word representations affected how quickly languages were evolved and
learned.
5.1.2 Pilot Study 1B: Lexicon Tables
For a lexicon table, parameters need to be set for the strategies for choosing words, associating
concepts, and inventing words (for more detail refer to section 4.3.3). Values needed to be set for
the relevant parameters of word absorption, word invention, threshold, and temperature. The other
choices relevant to the lexicon were the concept and word representations.
In Pilot Study 1B, the threshold strategy was used for word invention with a threshold of 0.0
(agents invented a word when a concept had no associated word). The concept representation was a
set of categories. The word representation was text, with an arbitrary string of syllables invented for
each new word. The agents using lexicon tables formed the language through negotiation games.
The task was a guessing game in which a category was chosen by the speaker, a word was found for
the category, and the hearer used the word to find the category. Measures included average success
and lexical coherence. In a successful game, the concept chosen by the speaker was understood by
the hearer. For average success, the success was averaged over every 25 games. Lexical coherence
represents the proportion of agents that use the same words for the same categories. The study was
divided into:
• strategies and
• word creation and absorption.
The aim of Pilot Study 1B: Lexicon Tables was to investigate how a language could be formed
and learned using lexicon tables, and to investigate any limitations of the languages produced, such
as how long agents take to form successful languages, and the coherence of the resulting languages.
Specifically, the strategies, word creation rates, and word absorption rates were compared.
Pilot Study 1Bi: Strategies
Pilot Study 1Bi tested the different types of strategies that can be used to construct the lexicon table.
Strategies were compared with how many interactions agents needed to reach high levels of
success, and the lexical coherence of agents at high levels of success. The study aimed to find the
most appropriate strategies for a language agent using a lexicon table to associate concepts and
words and to produce words. The speed to success and desired features of the language, such as
whether synonyms were included in the language, were considered.
Chapter 5 Experimental Design
65
Four conditions were tested with two strategies for the associations between the concepts and
the words and two strategies for word production. The strategies for associations were the score
model and the usage and confidence probability model. The strategies for word production were the
normal strategy and the introspective obverter strategy. A summary of the parameters used is given
in Table 5.7.
Table 5.7 Parameters for Pilot Study 1Bi
Parameters Pilot Study 1Bi Concept representation 5 categories
Word representation Text (syllables) Lexicon technique Lexicon table
Strategy for choosing words Normal or introspective obverter Strategy for associating Score or usage and confidence probability
Strategy for word invention Threshold Word absorption rate 1.0Word invention rate 1.0
Threshold 0.0Generations 1
Agents 5Games 1000
Five agents and five concepts were used. The word invention and absorption rates were both set
to 1. If the speaker did not have a word for the chosen concept, a new word was always invented. If
the hearer had not heard a word before then that word was always added to their lexicon. The
population continued playing language games until 90% average success was obtained. Fifty
populations for each condition evolved languages to 90% average success.
For the agents using scores, the lexical coherence was much higher (both 0.92) than for those
using usage and confidence probability (0.79 and 0.78) (see Table 5.8). For the agents using scores,
the strategy used made a difference in the number of games to an average success of greater than
90%. The obverter strategy took longer than the normal strategy (183.5 games compared to 154
games). See Figure 5.6 for typical runs of each of the strategies.
Table 5.8 Results for different strategies (Pilot 1Bi)
Strategy Usage and Confidence Probability Score for each Word-Category Pair
Measure Normal Introspective Obverter Normal Introspective Obverter Games to 90%
Average Success 139.5 145.0 154.0 183.5
Lexical Coherence at 90% Average Success 0.79 0.78 0.92 0.92
Chapter 5 Experimental Design
66
a) b)
c) d)
Figure 5.6 Typical runs (Pilot 1Bi)
The average success and lexical coherence of typical runs with the different
strategies for five agents and five categories: a) usage and confidence probability
with normal, b) usage and confidence probability with introspective obverter, c)
score for concept-word pairs with normal, and d) score for concept-word pairs with
introspective obverter. The graphs show that the lexical coherence for the runs
using the usage and confidence probability was lower than those using a score for
each word-category pair. The introspective obverter performed similarly to the
normal strategy.
With the different strategies, the lexical coherence was higher for those using scores than for
those using usage and confidence probability. There were more synonyms in the languages created
using usage and confidence probability. Populations using the obverter strategy took more games to
reach consensus than populations using the normal strategy, although no other difference was
noticed between the strategies.
Chapter 5 Experimental Design
67
Pilot Study 1Bii: Word Creation and Absorption
During language game interactions, the speaker agent may not have a word for a particular concept
and the hearer agent may not have heard the word used by the speaker. In these situations the agents
may probabilistically create and absorb words. Changing the rates of creation and absorption will
alter the rate at which populations reach a consensus and may affect features of the language, such
as the lexical coherence. This study investigated how changing the rates alters the rate at which
populations reach high levels of success and causes different levels of lexical coherence. The study
aimed to find the most appropriate word creation and absorption rates for language agents using
lexicon tables to associate concepts and words.
Both small and large populations were investigated with varying word creation and absorption
rates. Twenty conditions were tested: for a small population (five agents and five categories) and a
large population (twenty agents and twenty categories), one of the rates was set to 1.0 while the
other varied between 0.2 and 1.0 at increments of 0.2. The score method was used to reduce the
existence of synonyms, and the introspective obverter strategy was used as there was little
difference observed between the normal and obverter strategies.
The population continued playing language games until 90% average success was reached.
Twenty populations for each condition evolved languages to 90% average success. The number of
games to 90% average success and the lexical coherence at 90% average success were recorded for
each run. A summary of the parameters used is given in Table 5.9.
Table 5.9 Parameters for Pilot Study 1Bii
Parameters Pilot Study 1Bii Concept representation 5 categories, 20 categories
Word representation Text (syllables) Lexicon technique Lexicon table
Strategy for choosing words Introspective obverter Strategy for associating Score
Strategy for word invention Threshold Word absorption rate 0.2, 0.4, 0.6, 0.8, 1.0 Word invention rate 0.2, 0.4, 0.6, 0.8, 1.0
Threshold 0.0Generations 1
Agents 5, 20Games To 90% average success
For five agents and five categories, the lexical coherence was between 0.91 and 0.95 when the
word absorption rate was 1.0, and dropped to 0.83 when the word absorption rate was 0.2 (see
Figure 5.7). A similar result was obtained for twenty agents and twenty categories, with the lexical
Chapter 5 Experimental Design
68
coherence between 0.82 and 0.94 when the word absorption rate was 1.0 and 0.72 when the word
absorption rate was 0.2.
a) b)
c) d)
Figure 5.7 Word creation and absorption results (Pilot 1Bii)
a) Games to 90% average success for five agents and categories, b) Lexical
Coherence at 90% average success for five agents and categories, c) Games to 90%
average success for twenty agents and categories, d) Lexical Coherence at 90%
average success for twenty agents and categories. For the smaller population, the
number of games to 90% average success was lowest with word creation and
absorption rates set to 1.0. For the larger population, the number of games to 90%
average success was lowest with word creation set to 0.2 and word absorption set
to 1.0.
A greater difference between conditions can be seen for the number of games to 90% average
success. For five agents and five categories, when the word absorption rate was 1.0, the average
games to 90% average success ranged between 167 and 204. When the word absorption rate was
0.2, the average games to success increased to 564. For twenty agents and twenty categories, the
average games to success were higher when the word absorption was 0.2 at 22,135 games than
when the word absorption rate was 1.0 at 7445 games. However, there was also a difference in the
Chapter 5 Experimental Design
69
average number of games to success as the word creation rate changes with 5280 games at 0.2
compared to 7685 at 1.0.
Smaller populations reached high levels of success faster with high word creation and
absorption rates, while larger populations reached high levels of success faster with high word
absorption rates and low word creation rates. With lower absorption rates populations took longer to
reach a consensus of word-category pairings, as new words were not absorbed every time they were
encountered.
5.1.3 Discussion for Pilot Study 1
Pilot Study 1 investigated two methods for language agents to evolve and learn languages. The
simulations in Pilot Study 1A investigated features of the recurrent neural network language agent,
including the weight setting mechanisms, the concept representations, and the word representations.
The lessons learned from Pilot Study 1A were:
• a weight setting mechanism that allowed networks to evolve to find an expressive
language quickly was a high mutation rate,
• a weight setting mechanism that allowed networks to learn a target quickly was back
propagation through time,
• the word representations should provide at least as many words as concepts to enable
networks to associate unique words with each concept,
• a spread of activation in the word representation enabled expressive languages to be
found sooner, and
• a spread of activation in the concept representation resulted in faster comprehension.
The simulations in Pilot Study 1B investigated features of the lexicon table including strategies
to produce words, comprehend concepts, and update associations. The lessons learned from Pilot
Study 1B were:
• synonyms were common in languages formed with the usage and confidence strategy,
• lexical coherence was higher in languages formed with the score strategy,
• populations using the introspective obverter strategy took longer to reach high levels of
success than populations using a ‘normal’ strategy for word production,
• for small populations, a high word creation and word absorption rate resulted in a short
time to reach high levels of success and high lexical coherence, and
• for large populations, a high level of word absorption and a medium level of word
creation resulted in a short time to high levels of success and high lexical coherence.
Chapter 5 Experimental Design
70
Recurrent neural networks and lexicon tables were able to associate simple concepts with words.
Pilot Study 1 resulted in a clearer understanding of the features of the strategies used by the agents
and the concept and word representations.
5.2 Pilot Study 2: Representations – Pose Cells, Vision, and Experiences
Pilot Study 1 investigated two methods for the lexicon to be used in a location language game.
Another feature to investigate was the concept representations. The concept representations are used
by the speaker to choose the topic, together with the word representations and lexicon to produce
the word, and by the hearer for comprehension. The studies in Pilot Study 2 investigated one or
more of word production, concept comprehension, and the source of variability for words. Three
different concept representations available to robots using RatSLAM were investigated:
• pose cells,
• vision, and
• experiences (for more detail about the representations refer to section 3.7.3).
The aim of Pilot Study 2 was to determine whether pose cells vision and experiences are
appropriate concept representations for use in a location language game. Following the description
of the studies is a general discussion of the concept representations available.
5.2.1 Pilot Study 2A – Pose Cells and Vision
Pilot Study 2A2 compared two of the robot representations: pose cells and vision. A series of studies
investigated how pose cells and vision can be used to learn, categorise, and generalise where words
refer to locations.
Pilot Study 2Ai: Learning Symbols for Locations
The aim of this study was to investigate word production and to determine whether agents could
learn labels for locations where the concept representations were pose cells and vision. The features
to be defined for Pilot Study 2Ai included the concept representations, the word representations, the
2 This section is based in part on work published in Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006).
Towards a spatial language for mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of
Language: Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298). Singapore: World Scientific
Press; and in Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006). Generalization in languages evolved for
mobile robots. In L. M. Rocha, L. S. Yaeger, M. A. Bedau, D. Floreano, R. L. Goldstone & A. Vespignani (Eds.),
ALIFE X: Proceedings of the Tenth International Conference on the Simulation and Synthesis of Living Systems (pp.
486-492): MIT Press. Both of these papers were based on work done under the supervision of Janet Wiles, with design
discussions and writing assistance from Paul Stockwell and Mark Wakabayashi.
Chapter 5 Experimental Design
71
lexicon technique with associated features, the source of variability, and the environment (see Table
5.10).
Table 5.10 Parameters for Pilot Study 2Ai
Parameters Pilot Study 2Ai Concept Type Location
Concept representation Pose cells, vision Word representation 18 units, one-hot encoding
Lexicon technique Simple neural network Networks Production only
Momentum constant 0.9Initial learning rate 0.01
Increasing ratio 1.05Decreasing ratio 0.7
Source of variability Pre-set target concepts Environment Simulation world
Simulated robots initially explored the simulation world until a stable pose cell representation
was obtained. The pose cell and vision representations used in the study were obtained as the robot
continued to explore the world, following the left wall (see Figure 5.8a).
The concept representations investigated were the raw pose cell representation, a reduced pose
cell representation, and vision. The visual representation was every 100th scene in a series of 10,000
visual scenes of 12 × 8 grey scale arrays obtained from a run of the robot wandering in the
simulated world. The pose cell representation was the corresponding 100th representation in a series
of 10,000 pose cell representations from the same run., In the reduced pose cell representation each
group of 4 × 4 × 4 pose cells was averaged. The word representations were a one hot encoding
with each pattern in training set associated with one output unit. To obtain the target outputs, the
world was divided into 4m2 squares (see Figure 5.8b), resulting in 18 output units with 18 of the
squares visited by the robot. Each of the 18 output units correspond to a word describing a location.
Networks were trained to associate the concept representations with the word representations.
The networks were simple neural networks (see Figure 5.9, for more detail refer to section 4.3.1).
The network was trained on the targets for 1000 epochs using gradient descent with momentum and
adaptive learning rate. The training stopped early if the goal was reached or if the gradient had
reached the minimum gradient. The networks were tested on the full set of patterns.
The networks for all of the concept representations were able to learn the training set with 4m2
squares. When tested on the larger set the networks extrapolated between the learned patterns in
different ways (see Figure 5.10). The raw and reduced pose cell representations allowed the
network to generalise to patterns that shared the activations of the training set. However, many
patterns in the test set contained pose cells that were not active in any of the patterns in the training
Chapter 5 Experimental Design
72
set. In the pose cell runs, the network could not generalise effectively from the training to the test
set. For the vision representation there was no clear correlation between locations and words.
a)
b)
Figure 5.8 Robot route and target concepts (Pilot 2Ai)
a) Data was obtained from the robot completing a circuit of the world after a stable
pose cell representation was achieved. The robot started at the square moving
towards the right and following the left wall. The robot moved along the two
corridors, around the room to the right, back through one of the corridors, around
the room in the middle, then completed the loop with the corridor to the left and
finished at the circle. b) The targets outputs shown in different colours along the
route of the robot, with the 4m2 squares show how the world was divided. Each
target output corresponds to a word for a location.
Figure 5.9 Production network (Pilot 2Ai)
The networks used in Pilot Study 2Ai were simple neural networks with concept
representations as inputs (in the form of raw pose cells, reduced pose cells or
vision), and the word representation as outputs (18 output units).
Concept representation (Raw Pose Cell / Reduced Pose Cell / Vision)
Word representation (18 Output Units)
Chapter 5 Experimental Design
73
Raw and reduced pose cell representations allowed clusters of patterns of about 1m2 and 2m2.
Vision did not directly indicate position. Pilot Study 2Ai showed that differences in the languages
can result from different concept representations. For a language about location, vision was not an
ideal representation, and the different pose cell representations were able to cluster different area
sizes.
a)
b)
c)
Figure 5.10 Language layout (Pilot 2Ai)
For each of the concept representations: a) raw pose cell, b) reduced pose cell, and
c) vision, the language layout shows the word used for the pattern at each location
in the route of the robot with a different colour for each of the 18 words. Note that
for the pose cell inputs, there were clusters of locations where each word was used.
The clusters for the robots with vision for input were not correlated with location.
Chapter 5 Experimental Design
74
Pilot Study 2Aii: Categorisation
A further study was undertaken to investigate the ability of agents to invent and comprehend words
using vision and pose cell representations. Pilot Study 2Aii aimed to determine if agents could
effectively categorize vision and pose cell representations. The features to be defined for Pilot
Study 2Aii included the concept representations, the word representations, the lexicon technique
with associated features, the source of variability, and the environment (see Table 5.11).
Table 5.11 Parameters for Pilot Study 2Aii
Parameters Pilot Study 2Aii Concept type Location
Concept representation Pose cells, vision Word representation 10 units, 2 active, 3 syllables
Lexicon technique Recurrent neural network Hidden units 50
Networks Production and comprehension Weight setting mechanism BPTT
Source of variability Evolved network Environment Simulation world
The concept representations used were vision, pose cells, and processed pose cells. The visual
representation was every 100th scene in a series of 10,000 visual scenes of 12 × 8 grey scale arrays
obtained from a run of the robot wandering in the simulated world. The pose cell representation was
the corresponding 100th representation in a series of 10,000 pose cell representations from the same
run, with the number of cells reduced from 440,640 to 610 by reducing the resolution of the pose
cells (4 × 4 × 4 pose cells to 1 pose cell), and disregarding cells that were inactive for the entire
run. As an alternate representation, the pose cells were pre-processed using a hybrid system based
on Self Organising Maps (SOMs) (Kohonen, 1995). In the pre-processing system, a SOM was
trained on the input series for 1000 epochs. The output of the SOM was a 12 × 8 set of competitive
units organised in a hexagonal pattern. To construct a distributed activation the actual output values
of the units were converted to values between 0 and 1.
Recurrent neural networks were used as the lexicon to associate the concept representations with
the word representations, with a separate network for production and comprehension (see Figure
5.11, for more detail refer to section 4.3.2). The word representation was a sequence of three
syllables. Each syllable was represented by a ten unit binary vector in which the two most active
units were set to 1, with all other units set to 0.
One way to measure understanding is to test how well an agent has categorised the world. The
representations of the world were presented to the word production network, resulting in words
associated with each of the patterns. Concept comprehension networks produced a prototype for
Chapter 5 Experimental Design
75
each of the unique utterances. If the original input pattern was closest to the prototype for the
utterance used, the pattern was correctly categorised.
Ten networks were evolved individually for 100 generations to produce languages to categorise
the world based on each set of inputs: vision, pose cell representations and processed pose cell
representations. A simple (1+1)-evolutionary strategy (Beyer & Schwefel, 2002) was used to evolve
the agent’s speaker, introducing variability in the language. In each generation, a comprehension
network was trained on the language for 500 epochs using the back propagation through time
algorithm (Rumelhart et al., 1994). The comprehension networks produced a prototype for each
unique word in the language, which could be compared to the original input pattern. The languages
were evaluated with a fitness function based on how well the world was categorised. If the mutant
languages was better categorised than the current champion language, then the mutant became the
champion. The languages produced for each concept representation were compared for
expressiveness and categorisation.
a)
b)
Figure 5.11 Production and comprehension networks (Pilot 2Aii)
There were two types of network used in Pilot Study 1Ai: a) Word Production
(Evolving) and b) Concept Comprehension (Learning). The network structures are
the same as for Pilot Study 1A (see Figure 5.1).
The vision languages output by the word production networks had an average of 24.2 words (see
Table 5.12). The average number of scenes correctly categorised by the concept comprehension
Concept representation
Word representation
Hidden units (50)
Context units
Word Production (Evolving)
Concept representation
Word representation
Hidden units (50)
Context units
Concept Comprehension (Learning)
Chapter 5 Experimental Design
76
networks was 53.4 out of 100. One highly expressive language was evolved with 67 unique words
of which 47 were associated with single scenes. Words often appeared to group several different
types of images together, with the resulting prototype visual scene output by the concept
comprehension network being a combination of the scenes. One set of similar scenes was where the
robot faced a white wall with a strip of black next to the floor. All of the languages other than the
most expressive language grouped together some of these scenes (see Figure 5.12).
Table 5.12 Word production and concept comprehension (Pilot 2Aii)
Concept Representation Measure ( x (σ)) Vision Pose Cells Processed Pose Cells
Number of Unique Words 24.2 (17.3) 23.3 (12.4) 10.9 (6.4) Number of Patterns
Correctly Categorised 53.4 (13.5) 22.6 (10.4) 58.7 (10.4)
Figure 5.12 Vision prototype and scenes (Pilot 2Aii)
The prototype output by the concept comprehension network for the word 'kufufu'
(top left) and the five scenes associated with ‘kufufu’ by the word production
network in a language with 27 unique words. Most of the scenes associated with
‘kufufu’ showed a white wall with a black strip, although the bottom middle scene
had different features. (Reproduced from Figure 3 in Schulz, Stockwell,
Wakabayashi, & Wiles, 2006b)
The pose cell languages output by the word production networks had an average of 23.2 words.
The average number of pose cell patterns correctly categorised by the concept comprehension
networks was 22.6 out of 100. The majority of the words were associated with single patterns or a
small number of patterns, scattered across the space. Some words grouped together patterns that
were close together in space, but were also generally associated with a small number of patterns
from other areas. The processed pose cells languages output by the word production networks had
an average of 10.9 words. The average number of processed pose cell patterns correctly categorised
by the concept comprehension networks was 58.7 out of 100. Processed pose cell languages tended
Chapter 5 Experimental Design
77
to have fewer words associated with single patterns and more words associated with many patterns
spread across the entire space. However, the larger languages had more words associated with
groups of patterns that were close together in space.
The number of unique words in a language indicates the expressivity of that language. The
vision and pose cell representations resulted in languages with an average of over 20 unique words
for the 100 patterns, while the processed pose cell representation resulted in languages with an
average of just over ten unique words. The reduction in expressivity indicated that the unique
information in some of the input patterns was lost during processing.
The agents using languages evolved with the vision and the processed pose cell representations
were able to correctly categorise over half of the patterns, while the pose cell representation
languages were only able to correctly categorise an average of 22.6 of the 100 patterns. The
processed pose cell languages were better at clustering patterns that were close together in space,
with more distinct clusters of patterns associated with single words. Some of the agents using
languages evolved with vision were able to group together similar images, however many of the
words grouped together images that were dissimilar and many of the words were associated with
single images. Pure vision as a concept representation may not extract enough information out of
each scene for a structured language to evolve.
Pilot Study 2Aiii: Generalisation
This study investigated word production, concept comprehension, and the source of variability for
words in agents using the concept representations of pose cells and vision, raw or processed with
self organising maps or principle component analysis. The aim of Pilot Study 2Aiii was to
determine if agents could generalise from the existing lexicon when using pose cells and vision as
concept representations. Specifically, the agents’ ability to generalise from the training set to the
test set was investigated. Generalisation may occur in the use of novel words for novel concepts,
and the ability to use the novel words in a way that allows the world to be categorised effectively.
The features to be defined for Pilot Study 2Aiii included the concept representations, the word
representations, the lexicon technique with associated features, the source of variability, and the
environment (see Table 5.13).
The visual representation was every tenth scene in a series of 10,000 visual scenes of 12 × 8
grey scale viewed by the robot exploring the simulation world. The series of 1000 scenes was
analysed using hierarchical clustering to determine 30 clusters of images. The image closest to the
mean for each of the 30 clusters was chosen for evolving and training the networks. The dissimilar
scenes (see Figure 5.13a) were spread throughout the robot’s world (see Figure 5.13b).
Chapter 5 Experimental Design
78
Table 5.13 Parameters for Pilot Study 2Aiii
Parameters Pilot Study 2Aiii Concept type Location
Concept representation Pose cells, vision Word representation 10 units, 2 active, 3 syllables
Lexicon technique Recurrent neural network Hidden units 50
Networks Production and comprehension Weight setting mechanism BPTT
Source of variability Evolved network Environment Simulation world
a)
b)
Figure 5.13 Scenes and their location (Pilot 2Aiii)
Visual scenes for Pilot Study 2Aiii showing a) 30 dissimilar scenes as seen by the
simulated robot in 12 × 8 greyscale and b) the location of the robot for each of the
dissimilar scenes. The scenes were evenly spread throughout the world, with higher
concentrations in the corners, where the visual input of the robot changes more
quickly due to the rotation of the robot. (Extended from Figure 3 in Schulz,
Stockwell, Wakabayashi, & Wiles, 2006a)
Chapter 5 Experimental Design
79
The pose cell input was the corresponding tenth pattern in a series of 10,000 pose cell
representations, obtained from the same run of the robot (see Figure 5.14a). The number of pose
cells was reduced from 440,640 to 947 by reducing the resolution of the pose cells (180 × 68 × 36
cells to 45 × 17 × 9 cells) and by discarding cells that were inactive for the entire run (6885 cells to
947 cells). The pose cell inputs were analysed using hierarchical clustering to find 30 pose cell
patterns for presenting to the language agents. The position of the robot for each of the 30 pose cell
patterns was spread throughout the world (see Figure 5.14b).
a)
b)
Figure 5.14 Pose cell map and location of pose cell patterns (Pilot 2Aiii)
Pose cell representation for Pilot Study 2Aiii showing a) the projection into the x–y
plane of the pose cell map and b) the locations of 30 pose cell patterns, evenly
spread throughout the world. (Extended from Figure 3 in Schulz, Stockwell et al.,
2006a)
Three techniques were used for processing the visual and pose cell representations. The first
technique was using the raw representation. The second technique involved categorising the
representation with a self organising map (SOM) (Kohonen, 1995). A SOM was trained on the
patterns for 1000 epochs. The output of the SOM was an array of competitive units organised in a
hexagonal pattern. To give a distributed activation pattern for the language agents, the actual values
of the units were scaled to values between 0 and 1. The third technique used Principal Component
Chapter 5 Experimental Design
80
Analysis (PCA). The 1000 patterns were analysed for their principal components and the
component scores were scaled to values between 0 and 1.
The sizes of the processed inputs were set to the smallest size for which expressive languages
could evolve. For the raw image, a scene of 12 × 8 pixels was used, for the SOM-based
representation, a SOM of size 24 × 16 was used, and for the PCA-based representation, the first 48
components were used. For the pose cells, 947 units were used, for the SOM-based representation, a
SOM of size 12 × 8 was used, and for the PCA-based representation, the first 120 components were
used.
Two types of recurrent neural network were used in Pilot Study 2Aiii, which were the same as
those used in Study 2Aii (see Figure 5.11, for more detail refer to section 4.3.2) with the concept
representations being the processed visual and pose cell representations.
One way of testing whether a language captures the underlying structure of a set of input
patterns is to test how well the concepts are mapped to the language terms. Concept comprehension
networks produced a prototype for each unique word. If the original pattern was closest to the
prototype for the word used, the pattern was correctly categorised. The measure of similarity
between patterns and prototypes was sum squared error.
For each pre-processing technique, ten agents were evolved for 500 generations with a selection
strategy based on how well the agents categorised the world. The winner of the current champion
and mutant language was the one in which the trained networks were able to categorise the highest
number of patterns correctly.
In Pilot Study 2Aiii, the language agents produced novel words for novel scenes, which can be
seen as constructing new words by recombining known morphemes in different ways. Novel words
were produced in each type of processing for both vision and pose cell representations.
The agents produced between 17.2 and 22.8 unique words for the training set of 30 patterns (see
Table 5.14). When the agents were presented with the test set of 1000 patterns, they produced
between 34.4 and 111.2 unique words. The results were remarkable in the large number of new
words for novel patterns.
Table 5.14 Word production (Pilot 2Aiii)
Concept Representation Measure ( x (σ)) Image SOM-based
Image PCA
Image Pose Cells
SOM-based Pose Cells
PCA Pose Cells
30 Patterns Unique Words
22.4 (8.3)
17.2 (5.3)
22.8 (3.3)
18.9 (7.7)
18.5 (6.1)
21.7 (5.3)
1000 Patterns Unique Words
99.9 (65.2)
43.5 (17.5)
111.2 (46.8)
112.6 (86.7)
34.4 (19.2)
93.1 (46.5)
Chapter 5 Experimental Design
81
The number of patterns close to their prototypes was used to measure the performance of the
agents. The distance between the pattern and their prototype was determined by treating them as
vectors and calculating one minus the cosine of the included angle between them, which was
normalised by the standard deviation of the distances between each of the scenes. The number of
scenes within 0.25, 0.5, and 1.0 standard deviations of the prototype were calculated for each of the
techniques for the test set of 1000 patterns.
When the concept comprehension networks were presented with the test set of 1000 patterns, the
SOM-based pose cell representation had the most patterns within 0.25 standard deviations of their
prototype with 558.4, followed by SOM-based vision with 334.9, raw vision with 26.1, PCA pose
cells with 16.6, PCA vision with 8.3, and raw pose cells with none (see Table 5.15). A similar order
resulted with the number of patterns within 1.0 standard deviations of their prototype with SOM-
based image (920.9), SOM-based pose cells (915.5), raw vision (399.0), PCA pose cells (131.6),
PCA vision (29.2), and raw pose cells (0.0).
Table 5.15 Patterns close to the prototype (Pilot 2Aiii)
Concept Representation Standard
Deviations Image SOM-based Image
PCA Image
Pose Cells
SOM-based Pose Cells
PCA Pose Cells
X = 0.25x (σ)
26.1 (15.8)
334.9 (123.5)
8.3 (8.5)
0.0 (0.0)
558.4 (40.6)
16.6 (38.5)
X = 0.5x (σ)
81.2 (38.8)
689.6 (194.3)
13.2 (11.1)
0.0 (0.0)
767.0 (36.1)
42.8 (59.9)
X = 1.0x (σ)
399.0 (111.0)
920.9 (205.0)
29.2 (19.7)
0.0 (0.0)
915.5 (41.4)
131.6 (74.6)
The vision agents produced between 2.5 (agents with SOM-based inputs) and 4.9 (agents with
PCA inputs) times the number of words for the training set of 30 images when presented with the
test set of 1000 images. The pose cell agents produced between 1.8 (agents with SOM-based inputs)
and 6.0 (agents with raw pose cell inputs) times the number of words for training set of 30 patterns
when presented with the test set of 1000 patterns.
When generalising to 1000 patterns, the SOM-based agents performed well, with almost all
patterns within one standard deviation. The raw pose cell agents had no patterns within one standard
deviation of the prototype for the word associated with the pattern. The lack of similarity for the
pose cells was due to the sparseness of the patterns meaning that the concept comprehension
networks did not learn to associate the words with the pose cell representations.
Chapter 5 Experimental Design
82
5.2.2 Pilot Study 2B: Pose Cells and Experiences
Pilot Study 2B was an extension of Pilot Study 2A that investigated an additional concept
representation: experiences3. The experience mapping algorithm was developed in the RatSLAM
project in parallel with this thesis, and was not available for the earlier pilot studies. Experiences
provided a representation of space that did not include the discontinuities and multiple activations
for a single location that exist in the pose cell representation. Pilot Study 2B investigated how pose
cells and experiences could be used for word production, when provided with a set of concepts.
Pilot Study 2Bi: Conceptualisation of Locations
This study aimed to test whether agents could use pose cell and experience map representations for
word production, forming concepts for rooms and corridors. The features to be defined for Pilot
Study 2Bi included the concept representations, the word representations, the lexicon technique
with associated features, the source of variability, and the environment. For a summary of the
parameters, see Table 5.16.
Table 5.16 Parameters for Pilot Study 2Bi
Parameters Pilot Study 2Bi Concept type Location
Concept representation Pose cells, experiences Word representation Activation of units, one-hot encoding
Lexicon technique Simple neural network Networks Production only
Weight setting mechanism BPTT Momentum 0.9
Initial learning rate 0.01Increasing ratio 1.05Decreasing ratio 0.7
Source of variability Pre-set target concepts Environment Real world, with offline learning
A teacher-student system was designed and implemented, in which an agent attempted to
associate concepts provided by a human teacher with its internal representations using a single layer 3 This section is based on work published in Schulz, R., Milford, M., Prasser, D., Wyeth, G., & Wiles, J. (2006).
Learning spatial concepts from RatSLAM representations. Paper presented at From sensors to human spatial concepts,
a workshop at the International Conference on Intelligent Robots and Systems, Beijing, China; and Milford, M., Schulz,
R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial concepts from RatSLAM representations. Robotics and
Autonomous Systems - From Sensors to Human Spatial Concepts, 55(5), 403-410. The journal paper was a refinement
of the conference paper. The pose cell and experience map work was done by Michael Milford and David Prasser. The
conceptualisation process was done under the supervision of Janet Wiles and Gordon Wyeth, with design discussions
and writing assistance from Michael Milford and David Prasser.
Chapter 5 Experimental Design
83
neural network. Teacher-student conceptualisation involved interaction with a teacher, where the
different concepts that the agent was to learn were provided by that teacher. Agents used supervised
learning to associate input patterns with different concepts. In Pilot Study 2Bi, the agents used pose
cells or experiences as the concept representations. The concepts formed were labels for locations in
the world, including specific rooms. The word representation used was a single active unit. The
conceptualisation process for the agents was the association of the concept representations with the
word representations. The agents learned the association using a fully connected single layer neural
network (for more detail refer to section 4.3.1). The network was initialised with small random
weights and biases (uniformly between –0.1 and 0.1), and trained using gradient descent with
momentum and an adaptive learning rate.
During recall, the concept associated with an experience pattern was the concept related to the
most active output unit. The relative activation of the most active unit to the second most active unit
was a confidence value of the label. The agent was considered to be ‘uncertain’ if the activation of
the second most active unit was more than 2/3 the activation of the most active unit. Preliminary
experiments determined that 2/3 provided an appropriate balance between concept uncertainty and
incorrect guessing of concepts at room boundaries.
The experiments used a Pioneer 2 DXE mobile robot with a forward facing camera to explore a
test environment. The environment was level 5 of the Axon Building at The University of
Queensland, consisting mostly of open-plan offices and corridors (shown in Figure 5.16a). The
robot was manually driven along a repeated path through the environment. The robot visited every
place on its path at least twice, providing an opportunity for both learning and recognition. The
resulting dataset was processed by the RatSLAM model and experience mapping algorithm in order
to provide the input for the spatial conceptualisation method.
Experiments used a pose cell structure that was sufficiently large to avoid wrapping of the
activity across the structure boundaries. The pose cell structure measured 200 × 100 × 36 cells
(720,000 cells in total) in (x', y', θ ') . The pose cell representation contained both discontinuities
and multiple representations of the same place, as shown by Figure 5.16b. The discontinuities were
caused by visually driven re-localisation jumps after long periods of exploration where the robot
relied only on wheel odometry to remain localised. Odometric drift and delayed re-localisation
created multiple representations, where more than one group of pose cells represented the same
physical location. The experience mapping algorithm produced a spatially continuous map, with
multiple pose cell representations grouped into overlapping areas of the map (see Figure 5.16c).
During the experiment the robot learned 2384 experiences, which is significantly fewer than the
number of activated pose cells.
Chapter 5 Experimental Design
84
The spatial conceptualisation process was applied in an offline manner following the
construction of the RatSLAM representations. The environment was manually categorised by a
human teacher into four rooms and two corridors, as shown in Figure 5.16a. The route of the robot
was divided into two sections corresponding to the robot exploring and then revisiting one half of
the building floor. The two sections were further divided into learning and recognition phases. The
learning phase, in which the robot first visited an area, was used for the training set, while the
recognition phase, where the robot revisited an area, was used to test if the concepts had been
learned. The sequence of a recognition phase following a learning phase was equivalent to the areas
being labelled on the first circuit of the environment, and testing whether the robot had learned the
labels on later circuits.
The language agent’s fully connected single layer neural network used pose cells or experiences
as inputs and had six output units corresponding to the concepts of four rooms and two corridors
(see Figure 5.15). The activations of the experience inputs included the current experience and those
experiences within 1m, with activation relative to how close the experience was to the current
experience. Targets were created with a single active output unit corresponding to the current
location of the robot. Transitions between rooms and corridors occurred at doorways and turns. The
first learning phase comprised 403 time steps, with 233 in the second learning phase, 398 in the first
recognition phase, and 187 in the second recognition phase. Agents were initially trained on the first
learning phase and tested on the first recognition phase. Agents were then trained on both the first
and second learning phases and tested on the second recognition phase. For each training segment,
agents were trained for 2000 epochs. The performance of the agents was tested on the first and
second recognition phase by considering the concepts used by the agents for each location.
Figure 5.15 Word production network (Pilot 2Bi)
The networks used in Pilot Study 2Bi were Production Networks, which were
simple neural networks with concept representations as inputs (in the form of pose
cells or experience maps), and the word representation as outputs (six output units
corresponding to four rooms and two corridors).
Concept representation (Pose Cell / Experience Map)
Word representation (6 Output Units)
Chapter 5 Experimental Design
85
a)
b)
c)
Figure 5.16 Floor plan, pose cells, and experience map (Pilot 2Bi)
a) Floor plan of the area used for the study (approximately 43 by 13 metres) and
the approximate trajectory of the robot. Shaded areas were inaccessible to the
robot. b) Trajectory of the most activated pose cell during the experiment. Thick
dashed lines show re-localisation jumps driven by visual input. Each grid square
contains 4 × 4 pose cells in the (x', y') plane. b) The experience map was
continuous and had a high degree of correspondence to the spatial arrangement of
the environment shown in a. (Reproduced from Figures 5, 6, and 7 in Milford,
Schulz, Prasser, Wyeth, & Wiles, 2007)
In the first learning phase for the pose cell conceptualisation process, 96.77% of the instances
were labelled correctly, with 64.32% labelled correctly in the recognition phase (see Figure 5.17a,b
and Table 5.17). Errors in the training set were generally on the borders of the categories. Errors in
the recognition set were mainly in Room 1, and were due to the different trajectory used in the
learning and recognition phases. The part of the room incorrectly classified was not visited in the
learning phase. Most of the untrained areas were classified as Corridor 1, as the robot spent most of
Chapter 5 Experimental Design
86
the first learning phase there, and the language network was biased towards categorising patterns as
Corridor 1. In the second learning phase, 98.27% were labelled correctly, with 73.26% labelled
correctly in the recognition phase (Figure 5.17c,d). In the recognition phase, there were many
instances where the robot was uncertain of the label for the current location. While most of the
uncertainties were on the borders between concepts there were other locations of uncertainty,
particularly in Rooms 3 and 4. Different pose cells were active in the uncertain locations during the
recognition and the learning phases.
Table 5.17 Correctly labelled patterns (Pilot 2Bi)
Phase Pose Cells Experiences Learning 1 96.77% 98.26%
Recognition 1 64.32% 90.45% Learning 2 98.27% 98.43%
Recognition 2 73.26% 89.84%
In the first learning phase of the conceptualisation process based on the experience map 98.26%
of the instances were labelled correctly with 90.45% labelled correctly in the recognition phase
(Figure 5.18a,b). The majority of the errors occurred in Room 1 where a different trajectory was
taken during the second pass. For the errors in Room 1 the agent was uncertain about the label,
rather than labelling the instances incorrectly. In the second learning phase, 98.43% were labelled
correctly, with 89.84% labelled correctly in the recognition phase (Figure 5.18c,d). The errors all
occurred at the boundaries between rooms and corridors. The agents were able to cluster the
experiences appropriately, with some uncertain errors on the borders between areas (see Figure
5.18d). At all locations, except for those on borders between areas, and those not visited during the
learning phase, the agents were able to appropriately label their current location.
The conceptualisation experiments tested both the extent to which the RatSLAM system’s maps
could be classified using spatial concepts, and the degree to which different representation types
were suitable. The spatial conceptualisation method was able to learn and then recognise both the
RatSLAM pose cell maps and the experience maps. During the learning phase both representation
types performed well. However, during the recognition phases, higher recognition rates were
achieved when using the experience maps than when using the pose cell maps. The results
demonstrate that phenomena in the pose cells such as multiple representations can impede the
conceptualisation process. The experience mapping algorithm, which was developed to create maps
from the pose cell representations that could be used for goal navigation, also appears to create
maps more suited to spatial conceptualisation.
Chapter 5 Experimental Design
87
a)
b)
c)
d)
Figure 5.17 Conceptualisation using pose cells (Pilot 2Bi)
a) The learning phase of section 1, b) the recognition phase of section 1, c) the
learning phase of section 2, and d) the recognition phase of section 2. In the
learning phases there were uncertain areas in the Room 1 ↔ Corridor 1, Room 2 ↔
Corridor 1, and Room 3 ↔ Corridor 2 borders. In the recognition phases there were
uncertain areas throughout, including in all of the rooms and borders between
rooms and corridors. In the first recognition phase, part of Room 1 was labelled as
Corridor 1. Each grid square contains 8 × 8 pose cells in the (x', y') plane.
(Reproduced from Figure 6 in Schulz, Milford, Prasser, Wyeth, & Wiles, 2006)
Chapter 5 Experimental Design
88
a)
b)
c)
d)
Figure 5.18 Conceptualisation using experiences (Pilot 2Bi)
a) The learning phase of section 1, b) the recognition phase of section 1, c) the
learning phase of section 2, and d) the recognition phase of section 2. In the
learning phases there were uncertain areas in the Room 1 ↔ Corridor 1, Room 2 ↔
Corridor 1, and Room 3 ↔ Corridor 2 borders. In the recognition phases there were
uncertain areas in Room 1, and in the Room 1 ↔ Corridor 1, Room 2 ↔ Corridor
1, and Room 3 ↔ Corridor 2 borders. (Reproduced from Figure 8 in Milford et al.,
2007)
Chapter 5 Experimental Design
89
5.2.3 Discussion for Pilot Study 2
Pilot Study 2 investigated three representations for concepts as well as a variety of processing
techniques. The simulations in Pilot Study 2A compared vision and pose cells, while the
simulations in Pilot Study 2B compared pose cells and experiences. The concept representations
were tested for use in word production, concept comprehension, and together with a source of
variability. The lessons learned from Pilot Study 2 were:
• for a language about locations, vision was not an ideal representation,
• pose cell representations cluster different sizes of areas in the world, depending on the
processing performed,
• there was a trade-off between expressivity and categorisation,
• SOM-based processing of representations provided a natural clustering of patterns, and
• experience maps provided a representation suitable for spatial conceptualisation.
Pose cell and experience representations can be used to form location concepts. Pilot Study 2
resulted in a clearer understanding of the nature of these representations and possible methods for
processing and using the representations for word production and concept comprehension.
5.3 Discussion: Representations Matter
The significance of this chapter was to show that for a location language game, representations
matter. The concept and word representations for need to be considered, otherwise results may be
misinterpreted due to representation artefacts. The method used to associate concepts with words
also affected the nature of the languages that can form.
Pilot Study 1 investigated two methods for language agents to evolve and learn language:
recurrent neural networks and lexicon tables. Two important features of lexicon techniques are
learning rate and generalisation. Lexicon tables (used by Smith, 2001; and Steels, 1999) provide in-
the-moment learning with generalisation usually provided prior to the lexicon table through
similarity to existing exemplars. Simple neural networks (used by Cangelosi, 2001; Cangelosi &
Parisi, 1998; Kirby & Hurford, 2002; and Marocco et al., 2003) and recurrent neural networks (used
by Batali, 1998; Elman, 1990; and Tonkes et al., 2000) learn associations over time, with words
partitioning concept space. For use as a lexicon technique, these methods were found to be not ideal
for forming and learning location concepts. Neural networks take prohibitively long to learn the
associations and lexicon tables do not provide a mechanism for generalisation. Neither technique is
able to appropriately deal with large concept representations such as the pose cells and experiences.
A method developed as a result of these investigations was used in Study 1 and 2: the distributed
lexicon table.
Chapter 5 Experimental Design
90
Pilot Study 2 investigated three types of robot representations: vision, pose cells, and
experiences. Vision was found to be not appropriate for labelling unique locations in the world as
some distant locations have visually similar scenes. Vision would be more appropriate for location
type concepts such as ‘corner’ or ‘corridor’ where similar visual scenes may occur. Pose cells were
found to be able to cluster locations in the world, but were less reliable at naming locations for
which words were not explicitly learned. Discontinuities and multiple representations within the
pose cell map can impede the conceptualisation process. Of these three types of robot
representations, experiences were found to be ideal for the concept representations underlying
location concepts. In an experience map, distant locations in the world have distant locations in co-
ordinate space, allowing location concepts to be formed within local regions. Unlike concepts that
can be formed from direct perception, location concepts require a representation that is built over
time, such as the cognitive map representation of experiences.
The lessons learned from the studies presented in this chapter were considered when developing
the methods and representations for the major studies of this thesis, including the development of
the ‘where are we’ game, presented in the following chapter.
91
Chapter 6 A Toponymic Language Game
For some strange reason, no matter where I go, the place is always called
“here”.
(Attributed to Ashleigh Brilliant)
As languages are learned through agent interactions, a key question is what impact these
interactions have on the languages. Names for places, or toponyms, are the simplest spatial concepts
and can be formed from a map of the world. Place names in natural languages are formed through a
variety of strategies including natural features, special sites, religious significance, royalty,
explorers, famous local people, memorable incidents or famous events, other place names from
immigrants homelands, explorers naming good or bad fortune on travels, animal names, descriptive
names, and the ‘new town’ (Crystal, 1997, p114). Space becomes place through the experiences of
individuals and populations (Tuan, 1975, p12). Computational models of language have labelled
objects located in the world (Bodik & Takac, 2003; Steels, 1995; Vogt, 2000a), but have only
formed location concepts directly grounded in objects. Location concepts have not been formed
through the collective experience and interactions of an agent population.
Chapter 6 describes studies in which agents played location language games4. The purpose of
the studies was to investigate the features necessary to create shared toponymic languages and to
investigate the affect of agent interactions on the languages. The aim was to determine the methods
and parameters that resulted in the formation of toponymic languages that could be used effectively
by the agents. The agents’ aims were to individually build a world map and to collectively create a
shared lexicon of names for locations. The games played by the agents were ‘where are we’ and ‘go
to’ games.
The premise of a ‘where are we’ game is a location language game where the topic is the
current location of the agents (for more detail about location language games refer to Chapter 4, in
particular Figure 4.1). The speaker produces a word for the current location, and the hearer updates 4 This chapter covers in more detail the work presented in Study 1 of Schulz, R., Prasser, D., Stockwell, P., Wyeth,
G., & Wiles, J. (2008). The formation, generative power, and evolution of toponyms: Grounding a spatial vocabulary in
a cognitive map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language: Proceedings of
the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore: World Scientific Press. The work presented
in the paper was done under the supervision of Janet Wiles and Gordon Wyeth, and with design discussion and writing
assistance from PhD students David Prasser and Paul Stockwell.
Chapter 6 A Toponymic Language Game
92
the lexicon based on the speaker’s utterance. No feedback is given to either agent. Both agents may
update their lexicon based on the interaction.
In the grid world, the concept elements are obtained from the squares of the grid. Each square in
the world is a unique location concept element for the agent, to be used for the toponym lexicon. In
the simulation and real worlds, experiences are the concept elements. The words in the grid world
are integers, in the simulation world, words are strings of syllables, and in the real robots, words are
sequences of tones.
A behaviour available to the robots in the simulation and real worlds is the ability to move to a
specified goal location, if that location is described in their experience map. While the robots play
location language games, they attempt intermittently to play ‘go to’ games. When successful, ‘go
to’ games provide a behaviour in which more location language games are played as the robots
follow similar routes to the goal location. ‘Go to’ games also provide a behavioural method to test
the coherence of the shared languages.
A ‘go to’ game begins when agents are near each other. The speaker decides on a goal and
produces the word for the goal location. The hearer comprehends the goal location and determines
whether the location can be found. If the goal location can be found, the hearer lets the speaker
know that they will try to reach the goal location. Both agents then move to the goal location. Once
at the goal location, the agents produce an ‘at goal’ signal.
The result of the ‘go to’ game is an indication of the coherence of the languages. If the agents
comprehend each word similarly, they will meet each other at the goal location specified in each
game. In ‘go to’ games there are a range of possible results:
• no word found by the speaker OR the hearer did not understand the word produced by
the speaker,
• the goal location was not found,
• the goal location was found, but the other robot was not met at the goal location, or
• the other robot was met at the goal location.
In the simulation world, the exact distance between agents can be measured, enabling a further
breakdown of the result when they meet at the goal location, from within 1m up to within 6m of
each other.
This chapter describes the design and implementation of a location language game in:
• Study 1A: Grid world,
• Study 1B: Simulation world, and
• Study 1C: Real world.
The final section is a general discussion of the location language game.
Chapter 6 A Toponymic Language Game
93
6.1 Study 1A: Grid World
Study 1: A toponymic language addressed the question of how interactions impact on the formation
of location languages. The first step in answering this question was to investigate the
implementation of simple spatial agent interactions in a simple world. The aim of Study 1A: Grid
World was to investigate the effect of interactions on a toponymic language by implementing the
‘where are we’ game in a simple world. In particular, the study considered the effect of a variety of
parameters, including the population dynamics and the methods for producing words.
6.1.1 Experimental Setup
In Study 1A agents played ‘where are we’ games in an empty 10 × 10 grid world (for more detail
about the grid world refer to section 4.5.1). Agents used a distributed lexicon table to associate
concepts and words (for more detail refer to section 4.3.4). The representations used for location
concept elements were the squares of the grid. Words were represented as integers. For each game,
the speaker and hearer were chosen randomly. The speaker was placed in a random square, and the
hearer was placed in a square in the neighbourhood of the speaker. Words were invented when the
maximum toponym value for the interaction location was at the threshold of 0.0. When words and
concepts were used together their association was increased by 1.0. Forgetting was implemented
with 0.2 taken away from unused associations, and a minimum association value of 0.0.
In the grid world, the shape of the hearing area may be a single square, a square shape, or a
diamond shape (see Figure 6.1). The different shapes of hearing areas allow for different languages
regarding how words move through the world. If the hearing area is too large, it may be difficult for
the agents to reach a consensus on words for concepts. The shape of the hearing area affects how
likely it is that one word will take over from another word in any square. Words are more likely to
take over neighbouring squares with the square hearing area than with the diamond hearing area,
and with the larger size than with the smaller size.
Study 1A compared ‘basic’ and ‘best’ solutions. The basic solution implemented the simple
option for each of population dynamics, neighbourhood shape, strategy, hearing distance, and
forgetting. The best solution used the options that gave the desired language features as determined
by preliminary simulations. The parameters for Study 1A: Basic and Best are given in Table 6.1.
The basic solution consisted of 10 runs in which 100,000 games were played, and the best solution
consisted of 10 runs in which 500,000 games were played. There were two agents in each
population. Games were played when the agents were within a square or diamond neighbourhood of
each other. Agents used the most associated or neighbourhood most informative strategy to
determine which word was produced for the chosen topic.
Chapter 6 A Toponymic Language Game
94
a) b) c)
Figure 6.1 Hearing area
The area in which the hearer can hear the speaker for a) single, b) square, and c)
diamond hearing area, where the location of the speaker is shown in black, and the
possible locations of the hearer are shown in grey. The hearer may be located in the
same square as the speaker.
Table 6.1 Parameters for Study 1A
Parameters Study 1A: Basic Study 1A: Best Game ‘where are we’ ‘where are we’
Hearing distance Square (9 squares) Diamond (5 squares) Concept type Location Location
Concept representation Squares of grid Squares of grid Word representation Integer Integer Lexicon technique Distributed lexicon table Distributed lexicon table
Strategy for choosing words Most associated Neighbourhood most informative Neighbourhood size Single square Diamond (5 squares)
Forgetting Yes Yes Updating Hearer only Hearer only
Strategy for word invention Threshold Threshold Word absorption rate 1.0 1.0Word invention rate 1.0 1.0
Threshold 0.0 0.0Generations 1 500
Agents 2 2Interactions per generation 100,000 1000
Initial learning period 0 1000World Grid Grid
Size of grid 10 × 10 10 × 10 Obstacles None None
6.1.2 Results
The resulting languages for each of the 10 runs of the ‘basic’ solution were similar in coherence,
specificity, size and shape. A coherence of greater than 0.8 was obtained for all runs by 10,000
games (see Figure 6.2). At a coherence of 0.8 the two agents use the same word for 80% of the
squares.
Chapter 6 A Toponymic Language Game
95
a)
b)
Figure 6.2 Coherence (1A: Basic)
a) The coherence for each of the 10 populations in Study 1A: Basic over the
100,000 games and b) the coherence for each population at the end of 100,000
games. High coherence (>0.8) was reached by all populations by 10,000 games. At
the end of 100,000 games, eight of the ten populations had reached a coherence of
1.0, with one at 0.995 (one square labelled differently), and another at 0.99 (two
squares labelled differently).
The average number of words used by each agent was above 60 by the time 200 games were
played and decreased fairly rapidly to between 9 and 13 words by about 20,000 games (see Figure
6.3). The languages were mostly stable after 20,000 games, although in some runs, agents continued
to lose words. The specificity of the languages matched the size of the language starting close to 1.0
Chapter 6 A Toponymic Language Game
96
when most squares were associated with a unique word, dropping to between 0.8 and 0.9 by 20,000
games, where there were between 9 and 13 words in the language (see Figure 6.4). If the words
were more evenly distributed between the squares the specificity of the languages would be higher.
The example language in Figure 6.5 showed a wide range in the number of squares associated with
each word. Words used for very few squares disappeared from use as more games were played.
Figure 6.3 Words used (1A: Basic)
The average number of words used by one agent in each population for up to
100,000 games. Most populations had reached a stable language of between 9 and
13 words by 50,000 games.
Figure 6.4 Specificity (1A: Basic)
The specificity of the language for each of the 10 populations up to 100,000 games.
High specificity was maintained throughout the games.
Chapter 6 A Toponymic Language Game
97
Agent 1 Agent 2
a) Language Layout Each toponym was assigned a different shade. Each square in the world was shaded for the toponym used.
b) Word Coverage The x-axis shows the words in order of area covered. The y-axis shows the number of squares covered by each word. For each toponym, the number of squares in which that toponym was used is shown.
Figure 6.5 Shared language (1A: Basic)
The resulting language for the agents of the least coherent run of the Study 1A:
Basic, showing a) the language layout and b) the word coverage. Two of the 100
squares were labelled differently between the two agents, both in the top right
corner of the world. There was a high variance between the number of squares used
for each word, with two words used for 22 squares, and one word used for one
square.
The resulting languages for each of the 10 runs of the best solution were similar in terms of
coherence, specificity, size, and shape. A coherence of greater than 0.8 was obtained for all runs by
4000 games (see Figure 6.6), which was much faster than for the basic simulation, for which 0.8
was obtained for all runs by 10,000 games. None of the best runs reach 1.0, as there were more
words in the languages, making high coherence harder to achieve.
The average number of words used by each agent reached between 30 and 40 by the end of the
first generation (1000 games), and decreased slowly to between 25 and 29 words by about 25
generations (250,000 games) (see Figure 6.7). The languages were mostly stable after 25
generations, although in most runs agents continued to lose words over the successive generations.
Chapter 6 A Toponymic Language Game
98
The specificity of the languages remained high throughout all of the runs, greater than 0.96 at all
times (see Figure 6.8). The specificity of the languages remained high even with the reduction in the
size of the language as the spread of words was fairly even across the squares. The example
language in Figure 6.9 showed between two and four squares associated with each word.
a)
b)
Figure 6.6 Coherence (1A: Best)
a) The coherence for each of the 10 populations in Study 1A: Best over the 500,000
games and b) the coherence for each population at the end of 500,000 games. A
coherence of greater than 0.8 was obtained by all runs by 4000 games. After
500,000 games, all runs reached a coherence of greater than 0.96 (eight squares
different), with the highest at 0.995 (one square different).
Chapter 6 A Toponymic Language Game
99
Figure 6.7 Words used (1A: Best)
The average number of words used by one agent in each population for up to
500,000 games. Most agents had a stable language between 25 and 29 words by
250,000 games.
Figure 6.8 Specificity (1A: Best)
The specificity of the language for each of the 10 populations up to 500,000 games.
A specificity of greater than 0.96 was maintained throughout the games.
Chapter 6 A Toponymic Language Game
100
Agent 1 Agent 2
a) Language Layout
b) Word Coverage
Figure 6.9 Shared language (1A: Best)
The resulting language for both agents for the least coherent run of Study 1A: Best,
showing a) the language layout and b) the word coverage. Each word was used for
between two and five squares. There were minor differences between the
languages, with 8 of the 100 squares associated with different toponyms. In each
case, the border between toponyms had shifted.
6.1.3 Discussion
The best solution agents took longer to reach a stable language, but the languages formed had
higher specificity and more even word coverage. The basic and best solutions showed that very
different languages can result when different features are used with respect to the time taken to form
a stable coherent language, the specificity of the language, and the types of concepts that form. The
features included the nature of the population of agents, how the agents chose words, how the
lexicon was updated, and when games were played.
The shape and size of the concepts were affected by the hearing distance and the strategy and
neighbourhood size used when choosing words. The most informative strategy combined with a
Chapter 6 A Toponymic Language Game
101
diamond hearing distance (as in the best solution) resulted in words that were more even in
coverage than when the most associated strategy was used (as in the basic solution). With new
agents entering the population every 1000 interactions (as in the best solution) the coherence
remained lower as the new agents had to learn the language before high coherence returned. Extra
features to be considered for the implementation in the simulation world include the internal
representations of the robots.
6.2 Study 1B: Simulation World
Study 1B involved the implementation of the ‘where are we’ and the ‘go to’ language games in the
simulation world of the robot (for more detail about the simulation world refer to section 4.5.2). An
investigation was undertaken into how the temperature used for probabilistic word invention and
the neighbourhood size used to produce words affected the resulting language.
The goal of Study 1B was to determine whether the representations of a simulated robot with an
experience map were appropriate for the formation of a toponymic language using the interactions
of the ‘where are we’ language game. The specific aims of Study 1B were to determine how
temperature and neighbourhood size affected the resulting toponymic languages with respect to
lexicon size, word coverage, and successful use of the language in ‘go to’ games. Study 1A showed
that a toponymic language may be formed through ‘where are we’ language game interactions when
the agents have simple and matching concept representations. Study 1B extended Study 1A with
simulated robots that explored the simulation world (see Figure 6.10) and used more complex
representations of space that differed between the robots.
Figure 6.10 Simulation World
The simulation world of the robot, with the black lines indicating walls and the
black octagons desks, showing the path of the robot in a typical simulation run.
Chapter 6 A Toponymic Language Game
102
6.2.1 Experimental Setup
The concept representations used were the experiences from RatSLAM with a forward facing
camera. Words were represented as text strings. The robots first built representations of the world,
in the form of an experience map by exploring the world. The robots autonomously wandered
through the world and played a game when they were within hearing distance of each other. The
agents played the ‘where are we’ game and the ‘go to’ game. The ‘where are we’ game was played
in order to build up the lexicons of the robots. The ‘go to’ game was played in order to affect the
behaviour of the robots, and to test the coherence of the languages.
A distributed lexicon table was used to store the associations between experiences and words
(for more detail refer to section 4.3.4). The relative neighbourhood most informative strategy was
used to choose words, with probabilistic word invention calculated from the value of the toponym at
the location of the interaction. Both the speaker’s and the hearer’s lexicon was updated every game.
Forgetting was not directly implemented, though as the robots continued to learn new experiences
and remove old experiences, forgetting occurred indirectly through the removal of experiences.
The measures used were the final language layout, the language size, the language coherence,
how the words were spread throughout the world, and the results of the ‘where are we’ and ‘go to’
games (for more detail about the performance measures refer to section 4.6).
The hearing distance for Study 1B was 3m. Within a trial, the temperature for word invention
was set at a fixed temperature, T, and the neighbourhood size was set at a fixed distance, D, and the
two robots negotiated a set of words. Three temperature conditions were tested based on low (T =
0.25), medium (T = 0.5), and high (T = 0.75) temperatures with the neighbourhood size set to
medium (D = 5m). Two other neighbourhood distance conditions were tested based on small (D =
3m) and large (D = 7m) neighbourhood sizes with the temperature set to medium (T = 0.5). Each
condition comprised five runs of 1000 interactions. Following the 1000 interactions, the agents
played 50 ‘go to’ games to test the shared language. For a summary of the parameters, see Table
6.2.
The conditions tested were:
• low temperature (T = 0.25, D = 5m),
• medium temperature and neighbourhood (T = 0.50, D = 5m),
• high temperature (T = 0.75, D = 5m),
• small neighbourhood (T = 0.5, D = 3m), and
• large neighbourhood (T = 0.5, D = 7m).
Chapter 6 A Toponymic Language Game
103
Table 6.2 Parameters for Study 1B
Parameters Study 1B Game ‘where are we’ and ‘go to’
Hearing distance 3mConcept type Location
Concept representation Experiences (simulation world, forward facing camera)
Word representation Text Lexicon technique Distributed lexicon table
Strategy for choosing words Relative neighbourhood most informative
Neighbourhood size 3m, 5m, or 7m Forgetting No Updating Hearer and Speaker
Strategy for word invention Temperature Temperature 0.25, 0.5, or 0.75 Generations 1
Agents 2Interactions per generation 1000
Initial learning period 0World Simulation World
6.2.2 Results
In all five temperature and neighbourhood size conditions the simulated robots developed a shared
set of toponyms, showing that toponyms can be formed at different levels of scale by using different
rates of word invention and word production. A higher temperature and a smaller neighbourhood
size resulted in a more specific toponymic language, with more words covering the world (see
Table 6.3). The toponymic language for all temperatures covered just over 300m2, which was the
size of the world extended by the neighbourhood function of 5m. The toponymic language for
different neighbourhood sizes increased proportionately with the neighbourhood size from 192.8m2
for a small neighbourhood up to 444.1m2 for a large neighbourhood. The increase in coverage was
due to the way words were chosen, with the larger neighbourhood meaning that locations in the
world further away from interactions may still be associated with the words used.
Table 6.3 Results for Study 1B
Temperature Neighbourhood Size Measure ( x (σ)) Low Medium High Small Large
Number of Toponyms 5.7 (2.0) 16.3 (2.7) 23.8 (1.2) 27.6 (3.4) 6.4 (1.4)) Area Covered per Toponym
Used (m2) 53.7 (27.1) 21.5 (19.7) 13.8 (15.4) 7.4 (7.2) 69.4 (35.3)
Area Covered by Language (m2) 306.1 (9.8) 311.3 (4.8) 311.8 (3.5) 192.8 (4.6) 444.1 (14.9) Coherence 0.82 (0.09) 0.73 (0.07) 0.52 (0.12) 0.56 (0.10) 0.81 (0.10)
The coherence decreased with temperature and increased with neighbourhood size. The result of
the ‘go to’ games was similar across the low, medium, and high temperatures (see Figure 6.11a),
Chapter 6 A Toponymic Language Game
104
and across small, medium, and large neighbourhood sizes (see Figure 6.11b). Between 38.2%
(medium temperature) and 50.6% (low temperature) of the games resulted in the robots meeting
each other within 1m. Between 3.6% (small neighbourhood) and 10.6% (large neighbourhood) of
the games resulted in some type of failure, with either a goal word not found, not understood by the
hearer, not found, or found but not met. In the remainder of the games, the robots met each other at
the goal location at a distance between 1m and 6m.
In the low temperature and large neighbourhood populations, most of the words were invented
in the first 100 interactions, while for the medium and high temperature and small neighbourhood
populations, words were invented throughout the whole run (see Figure 6.13a, Figure 6.15a, Figure
6.17a, Figure 6.19a, and Figure 6.21a). For each condition, the value of the toponym at the
interaction location increased through the run, with the value increasing with temperature (see
Figure 6.13b, Figure 6.15b, Figure 6.17b, Figure 6.19b, and Figure 6.21b).
Low Temperature
The average number of words used by the simulated robots after 1000 toponym language games
with a low word invention temperature was 5.7. In all of the runs, most of the words were invented
in the first 100 interactions, with the remainder added throughout the run. The area covered by
toponyms was 53.7m2 on average. The shared language for one of the runs is shown in Figure 6.12
with the interactions in Figure 6.13.
Medium Temperature and Neighbourhood
The average number of words used by the simulated robots after 1000 toponym language games
with a medium word invention temperature was 16.3. In all of the runs, words were invented
throughout the run. The area covered by toponyms was 21.5m2 on average. The shared language for
one of the runs is shown in Figure 6.14 with the interactions in Figure 6.15.
High Temperature
The average number of words used by the simulated robots after 1000 toponym language games
with a high word invention temperature was 23.8. In all of the runs, words were invented
throughout the run. The area covered by a toponym on average was 13.8m2. The shared language
for one of the runs is shown in Figure 6.16 with the interactions in Figure 6.17.
Small Neighbourhood
The average number of words used by the simulated robots after 1000 toponym language games
with a small neighbourhood was 27.6. In all of the runs, words were invented throughout the run.
Chapter 6 A Toponymic Language Game
105
The area covered by toponyms was 7.4m2 on average. The shared language for one of the runs is
shown in Figure 6.18 with the interactions in Figure 6.19.
Large Neighbourhood
The average number of words used by the simulated robots after 1000 toponym language games
with a large neighbourhood was 6.4. In all of the runs, words were invented throughout the run. The
area covered by toponyms was 69.4m2 on average. The shared language for one of the runs is shown
in Figure 6.20 with the interactions in Figure 6.21.
a)
b)
Figure 6.11 Results of ‘go to’ games (1B)
The results of the ‘go to’ games for the runs with different a) temperatures and b)
neighbourhood sizes. For all temperatures, the simulated robots met each other at
the goal location in more than 93% of the games (more than 38% within 1m). For
all neighbourhood sizes, the simulated robots met each other at the goal location in
more than 89% of the games (more than 38% within 1m).
Chapter 6 A Toponymic Language Game
106
Simulated Robot 1 Simulated Robot 2
a) Experience Map The experience map of the agent, formed as the agent explores the simulation world by wall-following. Gaps in the map are either desks or open space (compare to Figure 4.10, the map of the simulation world)
b) Language Layout The square represents the experience map space as shown in a. The language of the agent with each toponym given a colour, and each location in experience map space coloured with the toponym used in that location, to a resolution of 1/16 m2.
c) Word Locations The square represents the experience map space as shown in a. The ‘best’ location for each word is shown.
d) Word Coverage The x-axis shows the toponyms in order of invention. The y-axis shows the area covered by each toponym in m2.The coverage for each word in the language is shown.
Figure 6.12 Shared language (1B: low temperature)
An example language for a low temperature with Simulated Robot 1 on the left and
Simulated Robot 2 on the right with the a) experience map, b) language layout, c)
word locations, and d) word coverage. Note that a-c have been rotated to aid
comparison between the simulated robots.
Chapter 6 A Toponymic Language Game
107
a) Word Usage over the Simulated Robots’
Interactions The x-axis shows the interactions of the robots. The y-axis shows the toponyms in order of invention. The usage of each word throughout the interactions is shown. Note the first usage of each word, and the use of the words throughout the interactions.
b) Toponym Value at the Interaction Location
The x-axis shows the interactions of the robots. The y-axis shows the value of the toponym used at the interaction location. The value of the toponym is the information value of the word-location combination, as defined in Equation 4.8. Note the range of values which result are those considered ‘acceptable’ given the word invention temperature.
Figure 6.13 Interactions (1B: low temperature)
Four of the five words were invented early. The toponym value used in each
interaction generally remained between 0.4 and 0.7. The toponym value is zero
when no toponym has been associated with experiences in the neighbourhood of
the current experience, or when the agent is the hearer and has just created a new
experience that has not been placed in the map.
Chapter 6 A Toponymic Language Game
108
Simulated Robot 1 Simulated Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 6.14 Shared language (1B: medium temperature)
An example language for a medium temperature and neighbourhood. There was a
range of areas covered by the words, with some covering small areas, and others
covering large areas. Note that some of the words were outside the world of the
simulated robots due to the interactions between words and the neighbourhood size
used. See the Figure 6.12 caption for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
109
a)
b)
Figure 6.15 Interactions (1B: medium temperature)
a) Word Usage over the Simulated Robots’ Interactions and b) Toponym Value at
the Interaction Location. Five of the fifteen words were invented early. The value
of the toponym used in each interaction generally remained between 0.5 and 0.8.
See the Figure 6.13 caption for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
110
Simulated Robot 1 Simulated Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 6.16 Shared language (1B: high temperature)
An example language for a high temperature. There was a range of areas covered
by the words: not used, covering very small areas, and covering large areas. Note
that some of the words were outside the world of the simulated robots due to the
interactions between words and the neighbourhood size used. See the Figure 6.12
caption for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
111
a)
b)
Figure 6.17 Interactions (1B: high temperature)
a) Word Usage over the Simulated Robots’ Interactions and b) Toponym Value at
the Interaction Location. Seven of the 24 words were invented early, with the
remainder invented consistently through the run. The value of the toponym used in
each interaction generally remained between 0.6 and 0.9. See Figure 6.13 caption
for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
112
Simulated Robot 1 Simulated Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 6.18 Shared language (1B: small neighbourhood)
An example language for a small neighbourhood. There was a range of areas
covered by the words: not used, covering very small areas, and covering large
areas. See the Figure 6.12 caption for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
113
a)
b)
Figure 6.19 Interactions (1B: small neighbourhood)
a) Word Usage over the Interactions of the Simulated Robots and b) Toponym
Value at the Interaction Location. Twelve of the 27 words were invented early,
with the remainder invented consistently through the run. The value of the
toponym used in each interaction generally remained between 0.5 and 0.8. See the
Figure 6.13 caption for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
114
Simulated Robot 1 Simulated Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 6.20 Shared language (1B: large neighbourhood)
An example language for a large neighbourhood. The languages were very closely
matched between the two agents, with each word covering a large area in the
world. See the Figure 6.12 caption for more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
115
a)
b)
Figure 6.21 Interactions (1B: large neighbourhood)
a) Word Usage over the Simulated Robots’ Interactions and b) Toponym Value at
the Interaction Location. Four of the five words were invented early, with the final
word invented halfway through the run. The value of the toponym used in each
interaction generally remained between 0.5 and 0.8. See the Figure 6.13 caption for
more detail about the elements of the figure.
6.2.3 Discussion
Smaller languages resulted from lower temperatures and larger neighbourhood sizes, while larger
languages resulted from higher temperatures and smaller neighbourhood sizes. Higher coherence
resulted from lower temperatures and larger neighbourhood sizes. The results of the ‘go to’ games
were similar across the different temperatures and neighbourhood sizes, with most of the games
resulting in the simulated robots meeting each other at the goal location. Even though the area
covered by toponyms was larger for the lower temperatures and larger neighbourhood sizes, the
location interpreted as best representing each toponym tended to remain similar between the robots,
Chapter 6 A Toponymic Language Game
116
as shown by the large proportion of all games where the robots met each other at the goal location
within 1m.
Study 1B demonstrated how toponyms could be formed for all places in the world visited by
both simulated robots, where the robots built their own personal experience maps of the world and
played toponymic language games when within hearing distance of each other.
6.3 Study 1C: Real World
Study 1C involved the implementation of ‘where are we’ and ‘go to’ games in the real robots. The
challenges of the real world include the limited battery life of the robots, the difficulties of not
hearing or mishearing each others utterances, and changing features of the world. This study
investigated the influence of two error detection strategies, with a comparison made over three
languages for each condition.
The goal of Study 1C was to determine whether useful toponymic languages could be formed
through the interactions of real robots playing ‘where are we’ games. The specific aims of Study 1C
were to determine if the real world issue of noise in the perceptual data obtained by the robots about
odometry, vision, and hearing affected the languages that formed with respect to lexicon size, word
coverage, and the successful use of the language in ‘go to’ games. The hypothesis was that the
languages would be less coherent than those formed in the simulation world, but that the robots
would still be able to play ‘go to’ games successfully. Study 1C extended Study 1B into the real
world (see Figure 6.22, for more detail refer to section 4.5.3).
Figure 6.22 Real world
The robot's world comprises halls and open plan offices. A layout of the obstacles
in the room and the approximate path of the robots are shown.
Note that the room used in the real world is different to the room used in the simulation world
(compare to Figure 6.10). The real world room is smaller, with fewer obstacles, and smaller open
areas. The robots are able to find each other more readily due to the size of the rooms and the
placement of the obstacles, with fewer loops in the exploration of the environment.
Chapter 6 A Toponymic Language Game
117
6.3.1 Experimental Setup
The concept representations were experiences from RatSLAM formed using an omni-directional
camera. The word representations for the robots in the real world were DTMF tones. The robots
played ‘where are we’ and ‘go to’ games in the real world. In each session, only one type of game
was played, to reduce the potential problems of mishearing. A distributed lexicon table was used to
associate experiences and words (for more detail refer to section 4.3.4), with the relative
neighbourhood information strategy used to choose words. Inventing words was done
probabilistically with a temperature of 0.5. The lexicon was updated when words and concepts were
used together by increasing the association by 1.0. Forgetting was not implemented directly.
Each sequence comprised a series of two hour sessions followed by a final session. In the first
sessions, the robots played ‘where are we’ games to form their lexicons. In the final session, the
lexicons of the robots were tested with the robots playing ‘go to’ games. The final session was run
until the robots had played 25 ‘go to’ games. For a summary of the parameters, see Table 6.4.
Table 6.4 Parameters for Study 1C
Parameters Study 1C: Minimal Study 1C: Checksum Game ‘where are we’ and ‘go to’ ‘where are we’ and ‘go to’
Hearing Distance ~5m ~3m Concept type Location Location
Concept representation Experiences (real world, omni-directional camera)
Experiences (real world, omni-directional camera)
Word representation Tones Tones Lexicon technique Distributed lexicon table Distributed lexicon table
Strategy for choosing words Relative neighbourhood most informative
Relative neighbourhood most informative
Neighbourhood size 5m 2mForgetting No No Updating Hearer and Speaker Hearer and Speaker
Strategy for word invention Temperature Temperature Temperature 0.5 0.5Generations 1 1
Agents 2 2Interactions per generation 4 hours 6 hours
Initial learning period 0 0World Real World Real World
Error detection Minimal Checksum
Two conditions were tested comparing different levels of error detection:
• minimal and
• checksum.
In the first condition, the only error detection was whether the correct number of tones had been
received, and whether the structure of the grammar matched what was expected. The
Chapter 6 A Toponymic Language Game
118
neighbourhood size was set to 5m, and the volume of the robots set so that the hearing distance was
approximately 5m when within line of sight. The robots played ‘where are we’ games for two
sessions of two hours. In the second condition, a checksum was included in the transmission of
words. The checksum was a simple additive checksum. For the checksum error detection condition,
the neighbourhood size was set to 2m, and the volume of the robots set so that the hearing distance
was approximately 3m when within line of sight. To allow the robots to play a similar number of
games as in the first condition, the robots played ‘where are we’ games for three sessions of two
hours for each language. Three languages were formed and tested for each condition.
6.3.2 Results
In both error detection conditions the robots developed a shared set of toponyms. When checksum
error detection was implemented, a smaller neighbourhood size was required to form languages of
approximately the same size as those formed with minimal error detection (see Table 6.5). The area
covered by the languages differed between the two conditions due to the neighbourhood sizes used
as a smaller neighbourhood size results in a language that covers a smaller area. The checksum
error detection condition resulted in languages that were more coherent, also shown by the robots’
performance in ‘go to’ games (see Figure 6.23).
Figure 6.23 Results of ‘go to’ games (1C)
For minimal error detection, 58.7% of the games resulted in the robots meeting
each other at the goal location. For checksum error detection 70.0% of the games
resulted in the robots meeting each other at the goal location. Compared to the
simulation world (see Figure 6.11) a greater percentage of the games resulted in
failure or not meeting at the goal location. The reduction in success was due to
difficulty in the robots hearing each other and to reduced language coherence.
Chapter 6 A Toponymic Language Game
119
Table 6.5 Results for Study 1C
Measure ( x (σ)) Minimal Checksum Number of Toponyms 10.7 (1.8) 8.8 (0.8)
Area Covered per Toponym Used (m2) 22.7 (17.2) 9.9 (4.5) Area Covered by Language (m2) 216.1 (22.3) 84.0 (3.0)
Coherence 0.22 (0.10) 0.30 (0.15)
Minimal Error Detection
In all three runs the robots developed a shared set of toponyms, with between eight and fourteen
words in the lexicon (average of 10.7, see Table 6.5). The toponymic language covered an average
of 216.1m2. 58.7% of the ‘go to’ games resulted in the robots meeting each other at the goal
location (see Figure 6.23).
Most of the words were invented in the first half of the interactions with the robots reusing
words already invented when they covered the same locations in the world (see Figure 6.24). The
value of the toponym for the current location increased through the interactions. The language
resulting for each agent in one of the runs is shown in Figure 6.25. In all runs, robot 2 had one or
two more words in their lexicon, due to the microphone picking up more beeps.
On investigation of the word invention behaviours of the robots, it was discovered that in most
cases, new words were learned by the robot as hearers, rather than invented by the robots as
speakers. In many of these cases the hearer robot had misheard the word sent by the speaker, and
added the misheard word to its lexicon.
a) b)
Figure 6.24 Interactions (1C: minimal)
a) Word Usage over the Interactions of the Simulated Robots b) Value of Toponym
at Interaction Location. Most of the words were invented in the first half of the
interactions. The value of the toponym at the interaction location increased to
between 0.6 and 0.9. See the Figure 6.13 caption for more detail about the elements
of the figure.
Chapter 6 A Toponymic Language Game
120
Robot 1 Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 6.25 Shared language (1C: minimal)
An example language for both robots of one run, with ten words used by at least
one robot. Robot 2 has concepts for three more words than Robot 1. Note that some
of the words were outside the world of the simulated robots due to the interactions
between words and the neighbourhood size used. See the Figure 6.12 caption for
more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
121
Checksum Error Detection
In all three runs the robots developed a shared set of toponyms, with between 8 and 9 words in the
robots’ lexicon (average of 8.8, see Table 6.5). The toponymic language covered an average of
84.0m2. 70.0% of the ‘go to’ games resulted in the robots meeting each other at the goal location
(see Figure 6.23).
Most of the words were invented in the first half of the interactions with the robots reusing
words already invented when they covered the same locations in the world (see Figure 6.26). The
value of the toponym for the current location increased through the interactions. The language
resulting for each agent in one of the runs is shown in Figure 6.27. With checksum error detection,
new words acquired by the robots were restricted to those that were invented by the robots.
a) b)
Figure 6.26 Interactions (1C: checksum)
a) Word Usage over the Interactions of the Simulated Robots b) Value of Toponym
at Interaction Location. The words were invented in two stages as the robots
explored the world together. The second stage corresponded to the robots
interacting in different areas in the world. The value of the toponym increased to
between 0.2 and 0.8. The large spread of values was due to the influence of
neighbourhood size on the value of the toponym. See the Figure 6.13 caption for
more detail about the elements of the figure.
Chapter 6 A Toponymic Language Game
122
Robot 1 Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 6.27 Shared language (1C: checksum)
An example language for both robots of one run, with nine words used by each
robot. The area for which each word was used was very similar between the robots,
for example, the two words covering triangle shaped areas in the middle of the
world (‘tilo’ and ‘loto’). Note that all of the words were within the world of the
simulated robots. See the Figure 6.12 caption for more detail about the elements of
the figure.
Chapter 6 A Toponymic Language Game
123
6.3.3 Discussion
Study 1C: Real World showed that ‘where are we’ games can be played with robots in the real
world and that the simulation world results were reproducible in the real robots. In both conditions
(minimal and checksum error detection) the robots developed a shared set of toponyms. The
languages formed by the real robots covered smaller areas than those formed by the simulated
robots due to the smaller size of the room in the real world. The coverage of each toponym was
similar between the simulated and real robots when comparing the minimal error detection to the
medium neighbourhood size and comparing the checksum error detection to the small
neighbourhood size.
The toponyms formed by the robots using checksum error detection were all situated in
locations in which the robots interacted, with most of the words situated in the large open area to
the top right of the room, where the robots interacted most often (see Figure 6.27c). In comparison,
human terms used to describe areas in the room are related to interactions between people, actions
that occur at locations, or ownership of the locations. Some examples of human terms for areas in
the world are the ‘entry’ in the top right, the discussion space in the top left, and Ruth’s desk in the
bottom right corner (see Figure 6.28). Open areas such as the one above the fridge do not have their
own name, as interactions between people tend not to occur in this location. As the only actions of
the robots involve building internal maps and interacting with other robots, interaction locations are
currently the only places that can be labelled by the robots.
Figure 6.28 Real world with human labels
Human terms referring to locations in an office environment describe areas in
which interactions take place (the discussion space), to actions that occur (the
entry), or to ownership (Ruth’s desk). Some areas do not have specific names, such
as the open space above the fridge where few interactions occur.
Entry
Ruth’s Desk
Discussion space
Fridge
Chapter 6 A Toponymic Language Game
124
With minimal error detection, the robots misheard each other regularly, resulting in the hearer
adding a new word when the speaker had used an existing word. Despite the difficulties with real
robot implementation, the robots using minimal error detection were able to form useful toponymic
languages, meeting each other at a goal location in 58.7% of the ‘go to’ games played. Adding
checksum error detection resulted in more coherent languages with a higher success rate for the ‘go
to’ games with the robots meeting each other in 70% of the games played.
6.4 Discussion: A Toponymic Language
Study 1 addressed the question of how social interactions impact on toponymic languages, and has
shown how a structured toponymic language can be formed from simple social interactions. The
language game method (Steels, 2001) was extended to a location language game method where
mobile robots share attention by being located near each other. In playing ‘where are we’ games,
the actions of the robots were to interact when they could hear each other and to associate words
used with the current location. A toponymic language that described all the locations visited by the
robots was formed through these actions, resulting in the construction of specific place from general
space (Tuan, 1975), and in toponyms becoming landmarks used to describe ‘where’ (Tversky,
2003).
The toponymic languages allowed goal locations to be set by specifying the label associated
with it. A toponym is simpler to specify than visual information at specific locations, as many
locations in the world may have similar views, or by exact co-ordinates, which requires the same
detailed map to be shared between the robots.
The experience map used in this study for concept representation required the design of a
method for concept formation: The Distributed Lexicon Table. Rather than the formation of
categories prior to language learning (Bodik & Takac, 2003; Smith, 2001; Steels & Loetzsch,
2007), the distributed lexicon table, with methods for updating, producing, and comprehending
words, allowed concept and word formation to interact.
The distributed lexicon table method for concept formation and word usage combines the rapid
learning from exemplars of lexicon tables (Smith, 2001; Steels, 1999) and generalisation of neural
networks (Batali, 1998; Cangelosi, 2001; Kirby & Hurford, 2002; Tonkes et al., 2000). Concepts
result from the associations between concept elements and words as well as the methods for
producing words and comprehending concepts.
In the simulation and real world studies, words were chosen using the relative neighbourhood
most informative strategy, which was developed for use with the distributed lexicon table. This
strategy allowed the words to be more evenly spread across concept space, but new words were
always adopted, becoming the most informative word for the concept they were first used for.
Chapter 6 A Toponymic Language Game
125
While the languages formed throughout Study 1 using the relative neighbourhood most information
strategy were able to be used successfully, it is probable that other methods may provide a better
balance between specificity and stability of the lexicon.
The location concepts were formed while words were associated with the concept elements, and
the areas covered by the concepts were determined by the interactions of the robots in the world.
The studies described in this chapter demonstrated the co-development of concepts and words, and
the way in which words and interactions between agents can affect the concepts that form.
The inclusion of checksum error detection showed that in order to form a coherent language in
the real world, agents need to put additional effort into making sure that they can understand each
other.
In the following chapter, a study including the formation of the spatial concepts of direction and
distance is presented. Distance and direction concepts allow the agents to talk about locations in
space other than their current location.
Chapter 6 A Toponymic Language Game
126
127
Chapter 7 A Generative Spatial Language Game
Homer: What’s an e-mail?
Lenny: It’s a computer thing, like, er, an electric letter.
Carl: Or a quiet phone call.
(Groening, 2000)
A key challenge for embodied language games is for the agents to refer to locations other than those
they have visited. This challenge requires both relational terms and the ability to take into account
the agents’ different perspectives. The ‘where is there’ game, adapted from previous spatial
language games (Bodik & Takac, 2003; Steels, 1995), is based on naming three locations: Both
agents are located within hearing distance at the first (current) location, they are facing the second
(orientation) location, hence aligning their perspectives, and then they talk about a third (target)
location (see Figure 7.1). Given the three locations, agents can describe the target location with
spatial words of distance and direction. The ‘where is there’ game allows agents to talk about places
that they have never visited or can never visit.
Figure 7.1 A generative language game
The agent is at Current facing Orientation and talking about Target: toponyms are
selected for the current, orientation, and target locations, and spatial words are
selected for the direction, θ, and distance, d.
The ‘where is there’ game relies on agents having some toponyms to describe locations. The
minimum number of toponyms required when no spatial language exists is three, with one for each
of the current, orientation, and target locations. When there are spatial words to describe the
d
θ
Orientation
Target Current
Chapter 7 A Generative Spatial Language Game
128
direction and distance, only two are needed, with one for each of the current and orientation
locations, and a word may be invented for the target location. Direction and distance are calculated
from the current, orientation, and target locations. For directions and distances, each concept
element is a range of directions or distances.
Spatial words can be used to create a template, given the current location and orientation. The
spatial words template describes the locations in the world that are referred to when using the
combination of distance and direction words at the current location and orientation. When a specific
location is required, the ‘best’ location of the combination of spatial words can be determined. A
measure used for the ‘where is there’ games is the match between the toponym and spatial words
template, found by considering how well the toponym template matches the spatial words template
given the current location and orientation (see a two dimensional example in Figure 7.2). The match
between the toponym and spatial templates is an indication of how appropriate the spatial words are
for the current situation, calculated as follows:
∑ ==
L
l sltl ttmatch1
),min( Equation 7.1
where ttl is the value of the toponym template at location, l, tsl is the value of the spatial template
at location, l, and L is the number of locations over which the match is calculated.
Figure 7.2 Match between templates
A good and a bad match between toponym and spatial words templates in two
dimensions.
The purpose of Study 2 was to investigate the formation of spatial terms grounded in the spatial
representations of robots. The aims of Study 2 were to determine what was required for the
formation of spatial terms and to determine the effect of conceptualisation order on the resulting
languages. The robots’ task was to form a spatial language and to label locations. The games played
in Study 2 were the ‘where are we’, ‘go to’, and ‘where is there’ games. The ‘where are we’ games
were used in the initial formation of the toponym lexicon. The ‘go to’ games were used to influence
the behaviour of the robots, so that games were played more frequently, and to test the coherence of
the languages. The ‘where is there’ games allowed the agents to form distance and direction
lexicons and to use these to invent new target toponyms.
Toponym template
Spatial words template
Toponym template
Spatial words template
Chapter 7 A Generative Spatial Language Game
129
Chapter 7 deals with the design and implementation of a generative spatial language game,
conducted first in simulation and then real robots, investigating the formation of spatial terms
grounded in experience and behaviour5. The studies described are:
• Study 2A: Grid world,
• Study 2B: Simulation world, and
• Study 2C: Real world.
The final section is a general discussion of the generative language game.
7.1 Study 2A: Grid World
The spatial language game was first implemented in the grid world to investigate how spatial
concepts could be formed, given a simple spatial representation of the world (for more detail about
the grid world refer to section 4.5.1). The aims of Study 2A were to determine the effect of the
following features of spatial language games on the resulting languages:
• the size of the grid world,
• obstacles in the world, and
• generations of agents.
In Study 2A agents played ‘where are we’ games in a grid world of varying sizes, with various
obstacles. The representations used for location concept elements were the grid squares. Distance
and direction concepts were calculated from the location of the current, orientation, and target
location squares. The number of direction and distance elements was 50 each. Words were
represented as integers. For each game, the speaker and hearer were chosen randomly. The speaker
was placed in a random square, and the hearer was placed in a square within the neighbourhood of
the speaker. The hearing distance and neighbourhood size were a small diamond of five squares.
Words were invented probabilistically with a temperature of 0.25. The distributed lexicon table was
used to associate words and concepts (for more detail refer to section 4.3.4). Associations between
words and concepts were increased by adding 1.0. Forgetting was implemented with 0.2 taken away
from unused associations. The minimum association value was 0.0. For each experiment the
interactions, language size, and language coherence is presented, together with an example
5 This chapter covers in more detail the work presented in Studies 2 and 3 of Schulz, R., Prasser, D., Stockwell, P.,
Wyeth, G., & Wiles, J. (2008). The formation, generative power, and evolution of toponyms: Grounding a spatial
vocabulary in a cognitive map. In A. D. M. Smith, K. Smith & R. Ferrer i Cancho (Eds.), The Evolution of Language:
Proceedings of the 7th International Conference (EVOLANG7) (pp. 267-274). Singapore: World Scientific Press. The
work presented in the paper was done under the supervision of Janet Wiles and Gordon Wyeth, and with design
discussion and writing assistance from David Prasser and Paul Stockwell.
Chapter 7 A Generative Spatial Language Game
130
language for each condition (for more detail about the performance measures refer to section 4.6).
For a summary of the parameters for Study 2Ai-iii, see Table 7.1.
Table 7.1 Parameters for Study 2A
Parameters 2Ai: World size 2Aii: Obstacles 2Aiii: Generations
Game ‘where are we’ and ‘where is there’
‘where are we’ and ‘where is there’
‘where are we’ and ‘where is there’
Hearing distance Diamond (5 squares) Diamond (5 squares) Diamond (5 squares)
Concept type Location, distance, direction
Location, distance, direction
Location, distance, direction
Concept representation Squares of grid Squares of grid Squares of grid Word representation Integers Integers Integers
Lexicon technique Distributed lexicon table Distributed lexicon table
Distributed lexicon table
Strategy for choosing words
Neighbourhood most informative
Neighbourhood most informative
Neighbourhood most informative
Neighbourhood size Diamond (5 squares) Diamond (5 squares) Diamond (5 squares) Forgetting Yes Yes Yes Updating Hearer only Hearer only Hearer only
Strategy for word invention Temperature Temperature Temperature Temperature 0.25 0.25 0.25Generations 1 1 50, 25, 10
Agents 2 2 2Interactions per generation 10,000 10,000 1000, 2000, 5000
Initial learning period 0 0 500, 1000, 2500World Grid World Grid World Grid World
Size 5 × 5, 10 × 10, 15 × 15, 20 × 20 15 × 15 15 × 15
Obstacles None None, desks, perimeter Desks
7.1.1 Study 2Ai: World Size
Study 2Ai: World Size investigated the influence of the size of the grid world. The aim of the study
was to determine if world size affected the lexicon size, the words coverage, and the coherence of
the resulting languages.
Experimental Setup: In Study 2Ai, 2 agents played location and spatial language games in an
empty grid world of 5 × 5, 10 × 10, 15 × 15, and 20 × 20. There were five runs of 10,000
interactions for each world size. The number of concepts formed, the agents’ coherence, and the
results of the language games were compared. For a summary of parameters, see Table 7.1.
Results: In each world size, a toponymic language formed, with toponyms covering the space
relatively uniformly. The number of toponyms used increased with world size, and the area covered
by each toponym increased with world size, with 3.8 squares per toponym on average for the 5 × 5
world, up to 8.4 squares per toponym for the 20 × 20 world (see Table 7.2). The number of spatial
Chapter 7 A Generative Spatial Language Game
131
words increased with the number of toponyms. With more toponyms, more distinctions can be
made about the distances and directions between toponyms. The value of the toponym, the match
between templates, and coherence rose quicker for smaller world sizes, with agents in larger worlds
taking longer to reach a successful and coherent language (see Figure 7.3 and Figure 7.4).
Table 7.2 Results for Study 2Ai
World Size Measure ( x (σ)) 5 × 5 10 × 10 15 × 15 20 × 20
Toponyms Invented 7.0 (2.4) 23.2 (2.0) 36.0 (3.3) 51.4 (4.2)Toponyms Used 6.6 (2.1) 21.7 (2.1) 34.1 (3.3) 47.4 (5.1)
Squares per Toponym Used 3.8 (2.0) 4.6 (2.3) 6.6 (2.9) 8.4 (3.9) Distance Words 3.6 (1.7) 6.0 (2.0) 7.2 (2.0) 8.8 (1.5) Direction Words 3.6 (1.7) 6.0 (2.0) 7.2 (2.0) 8.8 (1.5)
a) b)
c) d)
Figure 7.3 Worlds size results (2Ai)
The toponym value at the interaction location for the toponym language games and
the match between the toponym and spatial templates for the generative language
games for a) 5 × 5, b) 10 × 10, c) 15 × 15, and d) 20 × 20 grid worlds for the
hearer agent, averaged every 100 games for the five runs. The larger the world, the
longer taken to reach high levels for the toponym value and match between
templates and the lower the stable value for each.
Chapter 7 A Generative Spatial Language Game
132
a) b)
c) d)
Figure 7.4 World size coherence (2Ai)
The coherence of the toponyms, direction and distance words for a) 5 × 5, b) 10 ×
10, c) 15 × 15, and d) 20 × 20 grid worlds, averaged over each of the 5 runs,
recorded every 1000 games. The number of games required to reach toponym
coherence increased with world size. The dip in distance and direction coherence in
the 5 × 5 world at 8000 games was due to three of the five runs inventing spatial
words between 7000 and 8000 games. When a word was invented, the coherence
was reduced until the word had propagated through both agents’ lexicons.
In the example languages for each world size (see Figure 7.5, Figure 7.6, Figure 7.7, and Figure
7.8), the associations of each toponym, direction, and distance word is shown. Most toponyms are
distinct and cover single areas in the world. In larger world sizes, some toponyms cover multiple
areas. Most distance and direction words were specific although a few cover larger areas of concept
space, particularly direction words covering the area behind the agent. Direction words covering the
area behind the agent were general due to the structure of the language game. Agents only face
areas within the world which resulted in most of the targets being in front of the agent.
Chapter 7 A Generative Spatial Language Game
133
a) Toponym lexicon Each square represents the grid world and refers to one toponym. For each toponym, the associations are shown, with the most associated square black.
b) Distance lexicon Each row represents distances, with left being ‘close’ and right being ‘far’. For each distance word the associations are shown.
c) Direction lexicon Each square represents the direction concepts, with straight ahead being at the top, and behind being at the bottom. For each direction word, the associations are shown.
Figure 7.5 Example language (2Ai: 5 × 5)
The language of one of the agents in the 5 × 5 world for a) toponym lexicon, b)
distance lexicon, and c) direction lexicon. Each of the five toponyms covered a
different area in the world. Three of the four distances were similar, being ‘close’,
while one was ‘far’. The four directions can be interpreted as ‘right’, ‘front’, ‘left’,
and ‘general’.
a) Toponym lexicon
b) Distance lexicon
c) Direction lexicon
Figure 7.6 Example language (2Ai: 10 × 10)
The language of one of the agents in the 10 × 10 world for a) toponym lexicon, b)
distance lexicon, and c) direction lexicon. Each of the 20 toponyms covered an
average of five squares. Each of the six distances covered a different spread of
possible distances. The six directions could be termed: ‘right’, ‘front’, ‘front right’,
‘left’, ‘front left’, and ‘behind’. See the Figure 7.5 caption for more detail about the
elements of the figure.
Chapter 7 A Generative Spatial Language Game
134
a) Toponym lexicon
b) Distance lexicon
c) Direction lexicon
Figure 7.7 Example language (2Ai: 15 × 15)
The language of one of the agents in the 15 × 15 world for a) toponym lexicon, b)
distance lexicon, and c) direction lexicon. Each of the toponyms covered
approximately the same area in the world. See the Figure 7.5 caption for more
detail about the elements of the figure.
a) Toponym lexicon
b) Distance lexicon
c) Direction lexicon
Figure 7.8 Example language (2Ai: 20 × 20)
The language of one of the agents in the 20 × 20 world for a) toponym lexicon, b)
distance lexicon, and c) direction lexicon. Note that some of the toponyms referred
to multiple locations in the world. See the Figure 7.5 caption for more detail about
the elements of the figure.
Chapter 7 A Generative Spatial Language Game
135
7.1.2 Study 2Aii: Obstacles
Study 2Aii: Obstacles investigated the influence of obstacles in the world. The aim of the study was
to determine if agents could form toponyms in locations covered by obstacles and to determine the
influence of the obstacles on the resulting lexicons for toponyms, directions, and distances.
Experimental Setup: In Study 2Aii, two agents played location and spatial language games in a
15 × 15 grid which either had a perimeter in which the agents could move or had ‘desks’ through
the world (see Figure 7.9). There were five runs of 10,000 interactions for each world. The number
of concepts formed, the coherence, and the results of the language games were compared. The
languages were also compared to the languages from the empty 15 × 15 grid world used in the
world size study. For a summary of parameters see Table 7.1.
a)
b)
Figure 7.9 Grid world with obstacles
The grid world with obstacles of a) desks and b) a perimeter. The agents may
occupy any square not covered by an obstacle.
Results: In the empty world, the world with desks, and the perimeter world the rate of word
invention was highest for the first 100 interactions and agents continued to invent words throughout
each trial. The toponyms invented and used by the agents in the empty world were all specific,
some of the toponyms used by agents in the world with desks were general, and about half of the
words in the perimeter world were general. The average final lexicon in the empty world had 36.0
toponyms, in the world with desks had 41.0 toponyms, and in the perimeter world, 42.0 toponyms
Chapter 7 A Generative Spatial Language Game
136
(see Table 7.3). There were more toponyms in the world with desks and in the perimeter world
because they included general toponyms, which cover similar areas.
Table 7.3 Results for Study 2Aii
Measure ( x (σ)) Empty Desks Perimeter Toponyms Invented 36.0 (3.3) 41.0 (3.5) 42.0 (5.5)
Toponyms Used 34.1 (3.3) 24.9 (2.1) 29.4 (3.7) Squares per Toponym Used 6.6 (2.9) 9.0 (5.4) 7.7 (4.8)
Distance Words 7.2 (2.0) 10.6 (2.0) 17.0 (7.3) Direction Words 7.2 (2.0) 10.6 (2.0) 17.0 (7.3)
The toponym value at the interaction location was higher in the worlds with obstacles (see
Figure 7.10), as the interactions occurred at a subset of the possible locations (81 squares for desks
and 56 squares for perimeter compared to 225 for empty) and the words referring to the locations
visited by the agents were more specific. The match between the templates for the ‘where is there’
game increased to about 0.6, compared to about 0.55 for the empty world. There was only a
minimal change in the match between templates, as the size of the spatial templates changed
together with the size of the toponyms templates (see example languages in Figure 7.12 and Figure
7.13).
a) b)
Figure 7.10 Obstacles results (2Aii)
The toponym value at the interaction location for the toponym language games and
the match between the toponym and spatial templates for the generative language
games for a) the world with desks and b) the perimeter world. Compare to the
empty 15 × 15 world in the previous experiment (Figure 7.3c). The value of the
toponym was higher for both, as fewer squares are visited by the agents when there
are obstacles in the world.
Due to the increasing number of words and the general nature of some of the words, the agents
in the world with desks and the perimeter world obtained a lower level of coherence for each of the
types of words compared to the empty world (see Figure 7.11).
Chapter 7 A Generative Spatial Language Game
137
a) b)
Figure 7.11 Obstacles coherence (2Aii)
The coherence of toponyms, distances, and directions in a) the world with desks
and b) the perimeter world. Compare to the empty 15 × 15 world in the previous
experiment (Figure 7.4c). The coherence was much lower for the perimeter world
as there was much greater uncertainty in this world.
a) Toponym lexicon
b) Distance lexicon
c) Direction lexicon
Figure 7.12 Example language (2Aii: Desks)
The language for one of the agents in the world with desks for a) use of toponym
lexicon, b) use of distance lexicon, and c) use of direction lexicon.. Note the
specific and general toponyms in the lexicon, where general toponyms tended to be
used in the area of space covered by one desk. See the Figure 7.5 caption for more
detail about the elements of the figure.
Chapter 7 A Generative Spatial Language Game
138
a) Toponym lexicon
b) Distance lexicon
c) Direction lexicon
Figure 7.13 Example language (2Aii: Perimeter)
The language for one of the agents in the perimeter world for a) toponym lexicon,
b) distance lexicon, and c) direction lexicon. Note the specific and general
toponyms in the lexicon, specific words were confined to the perimeter of the
world, and general words were used in the interior. See the Figure 7.5 caption for
more detail about the elements of the figure.
7.1.3 Study 2Aiii: Generations of Agents
Study 2Aiii: Generations of Agents investigated the influence of generations. The aim of the study
was to determine how the lexicons for toponyms, distances, and directions changed through
generations of agents with respect to lexicon size, coherence, and the types of concepts.
Experimental Setup: In Study 2Aiii, agents played location and spatial language games in a 15
× 15 grid with desks. The number of concepts formed, the coherence of the agents, and the results
of the language games across the generations were compared. Generations consisted of a set of
interactions, g. In the initial population two agents played negotiation games. In subsequent
generations, the older agent was replaced by a new agent, initially as a hearer. After g/2
interactions, the agents played negotiation games. Three conditions were tested based on g = 1000,
g = 2000, and g = 5000, each consisting of five trials of 50,000 interactions.
Results: The first generation for each trial formed their language through negotiation, in which
the value of the toponym at the interaction location and the match between toponym and spatial
Chapter 7 A Generative Spatial Language Game
139
templates increased as the languages were formed. Over generations, specific toponyms tended to
remain stable, as did the concepts for directions and distances while the more general toponyms
shifted to become more specific (see Figure 7.14).
a)
b)
c)
d)
e) Figure 7.14 Toponym change throughout generations (2Aiii)
Each row (a-e) shows the change in meaning of a toponym, as most information
templates, through ten generations of agents with g = 2000. In a row, each square
represents the world of the agents and shows the locations in the world for which
the toponym of the row provides the most information. Each row shows a different
type of toponym: a) a specific toponym that did not alter much throughout the
generations, b) a toponym that initially referred to multiple specific locations, but
only referred to one location after several generations, c) a specific word that
became more general, d) a general word that remained general but shifted in
meaning, and e) a general word that became more specific. The areas in the world
associated with general words were more likely to change throughout generations
as they were reinterpreted by the new agents in the population.
For the ‘where are we’ games, the toponym value for the interaction location was just under 0.8
for g = 1000 and just over 0.8 for g = 5000. For the ‘where is there’ games, the match between
spatial templates was just over 0.5 for g = 1000 and just under 0.6 for g = 5000. As a new agent
entered the population, they began by learning from the older agent, which caused a drop in both
measures that quickly returned to a high level as the new agents learned the language (see Figure
7.15).
Chapter 7 A Generative Spatial Language Game
140
a)
b)
c)
Figure 7.15 Generations results (2Aiii)
Toponym value at the interaction location for the toponym language game, and
match between the toponym and spatial templates for the generative language
game over generations for a) g = 1000, b) g = 2000, and c) g = 5000. The drop in
toponym value and match between templates occurred at the changeover of
generations as the new agent learned the older agent’s language. Over the first few
generations there was a gradual increase in the steady value of the toponym and
match between templates.
The language coherence was calculated for each of the lexicons (toponyms, distances,
directions) after every 1000 games (see Figure 7.16). The toponym coherence remained fairly stable
for all values of g, between about 0.8 and 0.9. The coherence of the distances and directions
Chapter 7 A Generative Spatial Language Game
141
decreased for all values of g, most notably for direction words with g = 1000. The coherence of a
language decreased when new words were invented and increased as the meanings for words were
agreed upon between the agents. When g = 1000, new distance and direction words were being
invented in each generation. When g = 2000 and g = 5000, agents had longer to learn the words
used in the previous generation, resulting in higher coherence.
a)
b)
c)
Figure 7.16 Generations coherence (2Aiii)
Coherence of languages over generations for a) g = 1000, b) g = 2000, and c) g =
5000. For each condition, the coherence of the toponyms remained stable between
0.8 and 0.9. The coherence of the distances and directions dropped throughout the
interactions as more words were invented for which the meanings did not have
time to become coherent.
Chapter 7 A Generative Spatial Language Game
142
There was an increase in the number of toponyms invented and used as the agents had more
interactions per generation (see Table 7.4). The increase in the number of toponyms was correlated
with the number of distance and direction words.
Table 7.4 Results for Study 2Aiii
Interactions per Generation (g) Measure ( x (σ)) 1000 2000 5000
Toponyms 40.0 (2.1) 44.8 (2.0) 47.0 (4.1)Toponyms Used 26.6 (1.9) 28.5 (1.4) 28.7 (1.9)
Squares per Toponym Used 8.5 (5.5) 7.9 (5.3) 7.8 (5.4) Distance Word 26.2 (3.1) 29.7 (4.3) 35.3 (2.2)
Direction Words 32.5 (4.1) 31.4 (4.7) 35.8 (1.9)
7.1.4 Discussion
The purpose of Study 2A was to explain how spatial concepts can be formed through interactions
with other agents, and to investigate the impact of different world sizes, obstacles in the world, and
population dynamics on the languages formed.
The significance of the world size study was that toponyms can be formed in any world size,
and direction and distance words can refer to the spatial relations. The significance of the obstacles
study was that the agents can refer to places in the world that have never been visited. The words
for locations never visited tended to be more general, as they are only referred to indirectly, never
by direct interaction. Agents were able to refer to concepts that had not been directly experienced.
Generations allow the populations to forget words that were used less often, and to allow words that
were used more often to spread through the concept space. Words referring to specific locations
were transferred through the generations, while the meanings of words referring to general locations
shifted, compared to single generation where words referring to both specific and general locations
remained fairly static after they had been formed and used several times.
Study 2Ai – iii showed how a generative toponymic language may form and evolve in a
population of agents. Agents were able to form concepts for locations, directions, and distances as
they interacted with each other and associated words with underlying values. Relationships between
existing concepts were used to expand the concept space to new locations. The following sections
extend the grid world study into the simulation and real world.
7.2 Study 2B: Simulation World
Study 2B involved the implementation of the ‘where are we’, ‘where is there’, and ‘go to’ games in
the simulation world (for more detail refer to section 4.5.2). The additional challenge of the
simulation world was that the representations were formed individually by each simulated robot. An
Chapter 7 A Generative Spatial Language Game
143
investigation was undertaken into how the conceptualisation order for toponyms, directions, and
distances affected the resulting language.
The aim of Study 2B was to determine whether the representations of a simulated robot with an
experience map were appropriate for the formation of a toponymic and spatial language using the
interactions of the ‘where is there’ language game. Additionally, the study investigated whether
conceptualisation order made a difference to the languages that formed. Study 1 showed that a
toponymic language may be formed through ‘where are we’ interactions and Study 2A showed that
a spatial language may be formed when the agents have simple and matching concept
representations. Study 2B extends Study 2A with simulated robots that formed more complex
representations of space that differed between the robots.
7.2.1 Experimental Setup
As in Study 1B, the concept representations used were the robots RatSLAM experiences formed
using a forward facing camera, words were represented as strings, the robots autonomously
wandered through the world and played a game when they were close to each other, the distributed
lexicon table was used to associate words and concepts (for more detail refer to section 4.3.4), the
relative neighbourhood most informative strategy was used to choose words, the temperature
strategy used for word invention, both the speaker’s and the hearer’s lexicon were updated in every
game, and forgetting was not directly implemented.
In addition to the use of experiences as concept elements, pseudo-experiences were used. As
experiences do not cover all locations, the best location referred to by a distance and direction
concept may be in a part of the experience map co-ordinate space where no experiences are nearby.
When a location is referred to that does not have a nearby experience, a pseudo-experience is placed
in the map and linked to two experiences that are close to the location that was referred to, and the
pseudo-experience is associated with the target word. When map correction is performed on the
experience map, the pseudo-experiences are moved, based on the new locations of the experiences
they are linked to.
The simulated robots’ languages were monitored with the final language layout, the language
size, how the words spread throughout the world, the toponym value at the interaction location for
the ‘where are we’ games, and the match between the spatial and toponym templates for the ‘where
is there’ games (for more detail about the performance measures refer to section 4.6).
The hearing distance in Study 2B was 3m, the neighbourhood size was 5m, the temperature for
toponyms and spatial words was 0.5, and for target words was 0.4. These temperatures meant that
words were not exponentially invented. Within a trial the conceptualisation order was fixed and the
two simulated robots negotiated a set of words. Two conditions were tested based on:
Chapter 7 A Generative Spatial Language Game
144
• separate: robots playing ‘where are we’ games to form a toponymic language first
followed by playing ‘where is there’ games and
• together: robots playing both ‘where are we’ and ‘where is there’ games from the start.
Each condition was run for five runs of 2000 interactions. In both cases, ‘go to’ games were
played to change the behaviour of the robots so that the interactions were completed more quickly.
For a summary of the parameters see Table 7.5.
Table 7.5 Parameters for Study 2B
Parameters Study 2B: Separate Study 2B: Together
Game ‘where are we’, ‘where is there’, and ‘go to’
‘where are we’, ‘where is there’, and ‘go to’
Hearing distance 3m 3mConcept type Location, distance, direction Location, distance, direction
Concept representation Experiences (simulation world, forward facing camera)
Experiences (simulation world, forward facing camera)
Word representation Text Text Lexicon technique Distributed lexicon table Distributed lexicon table
Strategy for choosing words
Relative neighbourhood most informative
Relative neighbourhood most informative
Neighbourhood size 5m 5mForgetting No No Updating Hearer and Speaker Hearer and Speaker
Strategy for word invention Temperature Temperature
Temperature Toponym = 0.5,
Spatial = 0.5, Target = 0.4
Toponym = 0.5, Spatial = 0.5, Target = 0.4
Generations 1 1Agents 2 2
Interactions per generation 1000 ‘where are we’ followed by 1000 ‘where are we’ and
‘where is there’
2000 ‘where are we’ and ‘where is there’
Initial learning period 0 0World Simulation World Simulation World
7.2.2 Results
In both conceptualisation order conditions, the simulated robots developed a shared set of
toponyms, directions, and distances. The distance and direction words covered the space of
directions and distances that could be referred to. The condition where both types of games were
played from the start resulted in larger lexicons for toponyms, directions, and distances (see Table
7.6). In both conditions, words were invented for targets when the value of existing words was low
for the chosen locations. In some cases, the targets were beyond the area able to be explored by the
simulated robots. The average area covered by the languages was 551.2m2 for separate and 418.3m2
for together. The area covered by the languages was much larger than for when the simulated robots
Chapter 7 A Generative Spatial Language Game
145
only played ‘where are we’ games, where just over 300 m2 was covered (compare to Table 6.3).
With the addition of the ‘where is there’ game, the simulated robots were able to form concepts for
areas beyond the walls of their world, resulting in languages covering larger areas. The ‘go to’
games were successful for both conditions, with the simulated robots meeting each other at the goal
location in 87.2% of the games for separate and 84% of the games for together Figure 7.17.
Table 7.6 Results for Study 2B
Measure ( x (σ)) Separate Together Number of Toponyms 44.9 (7.9) 60.3 (8.4)
Toponyms Invented as Target 25.2 (9.1) 37.0 (8.9) Area Covered per Toponym Used (m2) 16.3 (18.4) 12.3 (14.1)
Area Covered by Language (m2) 551.2 (155.4) 418.3 (114.3) Toponym Coherence 0.16 (0.08) 0.04 (0.03)
Direction Words 13.6 (2.2) 22.2 (2.3) Distance Words 13.5 (2.2) 22.2 (2.3)
Direction Coherence 0.42 (0.15) 0.10 (0.07) Distance Coherence 0.31 (0.21) 0.05 (0.09)
Figure 7.17 Results of ‘go to’ games (2B)
The x-axis shows the possible results of the ‘go to’ games. The y-axis is the
percentage of the total games with that result. For ‘separate’, the simulated robots
met each other at the goal location in 87.2% of the games, with 33.2% within 1m.
For ‘together’, the simulated robots met each other at the goal location in 84% of
the games, with 20.4% within 1m.
Chapter 7 A Generative Spatial Language Game
146
Separate
For the simulated robots forming toponyms and spatial words separately, the average number of
words used after 2000 interactions was 44.9. Of these words, an average of 25.2 were invented for
the target location. An average of 13.5 distance words and 13.6 direction words were used. The
toponymic language resulting for both agents in one of the runs is shown in Figure 7.18, with the
spatial lexicon in Figure 7.19, and the results of the agents’ interactions in Figure 7.20 and Figure
7.21.
Together
For the simulated robots forming toponyms and spatial words together, the average number of
words used after 2000 language games was higher at 60.3. Of these words, an average of 37.0 were
invented for the target location. An average of 22.2 distance and direction words were used, much
higher than for the separate conceptualisation order. The toponymic language resulting for each
agent for one of the runs is shown in Figure 7.22, with the spatial lexicon in Figure 7.23, and the
results of the agents’ interactions in Figure 7.24 and Figure 7.25.
7.2.3 Discussion
Study 2B investigated the impact of conceptualisation order for toponyms, directions, and distances
on the resulting language. The study demonstrated that directions and distances can be formed
either when there was an existing toponymic language, or when a toponymic language was still
being formed. Words and concepts were formed for places that the simulated robots had not visited.
When a stable toponymic language was formed before the ‘where is there’ games were played, the
distance and direction concepts formed covered the range of distances and direction. When a
toponymic language was formed together with the distance and direction concepts, many of the
concepts were ‘close’ and ‘straight ahead’, although a few concepts formed to cover the range of
distances and directions. When the toponymic language was formed separately, the toponyms
formed were used more precisely, with agents meeting each other within 1m at the goal location a
greater proportion of the time.
With the addition of the ‘where is there’ game to the ‘where are we’ game, the simulated robots
were able to refer to locations beyond the perimeter of their world. The larger coverage of the
language was indicated by the average area covered by all of the toponyms in the agent’s lexicon,
and can be seen in the language layout figures. Words beyond the perimeter of the world were more
general words, as they were only ever updated when they were referred to indirectly, unlike words
that were within the perimeter of the world that were updated through direct interaction.
Chapter 7 A Generative Spatial Language Game
147
Simulated Robot 1 Simulated Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 7.18 Shared language (2B: Separate)
The language for the simulated robots for one run, showing a) the experience map,
b) the language layout, c) the word locations, and d) the word coverage. Note that
the language was no longer restricted to within the walls of the world. The words
on the edge of the layout tended to be larger as there were fewer competing words
in their neighbourhood. See the Figure 6.12 caption for more detail about the
elements of the figure.
Chapter 7 A Generative Spatial Language Game
148
Simulated Robot 1 Simulated Robot 2
a) Distance lexicon
b) Direction lexicon
Figure 7.19 Spatial lexicon (2B: Separate)
The spatial lexicon showing a) distance and b) direction lexicon. The distance and
direction lexicons had concepts throughout the possible space. See the Figure 7.5
caption for more detail about the elements of the figure.
a)
b)
Figure 7.20 Interactions for ‘where are we’ games (2B: Separate)
a) Value of toponym at interaction location and b) word usage over the ‘where are
we’ interactions of the simulated robots. A fairly stable toponymic language was
formed in the first 1000 interactions. With the addition of ‘where is there’
interactions an increase in the word invention rate occurred. See the Figure 6.13
caption for more detail about the elements of the figure.
Chapter 7 A Generative Spatial Language Game
149
a) b)
c) d)
e) f)
Figure 7.21 Interactions for ‘where is there’ games (2B: Separate)
a) Match between templates, b) current word usage, c) orientation word usage, d)
target word usage over the ‘where is there’ interactions, e) distance word usage,
and f) direction word usage. Note that all of the toponyms may be used as
orientations or targets, but not as the current location. The toponyms that were not
ever used for the current location were located beyond the walls in the world of the
simulated robots. The words invented early were mostly within the walls of the
world, while those invented late were mostly beyond the walls of the world, and
occupy larger areas due to fewer competing words.
Chapter 7 A Generative Spatial Language Game
150
Simulated Robot 1 Simulated Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 7.22 Shared language (2B: Together)
The language for the simulated robots for one run, showing a) the experience map,
b) the language layout, c) the word locations, and d) the word coverage. There
were a mixture of words that were mostly within the walls of the world and words
that were beyond the walls of the world and occupy larger areas. See the Figure
6.12 caption for more detail about the elements of the figure.
Chapter 7 A Generative Spatial Language Game
151
Simulated Robot 1 Simulated Robot 2
a) Distance lexicon
b) Direction lexicon
Figure 7.23 Spatial lexicon (2B: Together)
The spatial lexicon showing a) distance and b) direction. Most of the distance
words referred to close locations, and most of the direction words referred to
straight ahead, due to the formation of the toponymic language together with the
distance and direction concepts. See the Figure 7.5 caption for more detail about
the elements of the figure.
a) b)
Figure 7.24 Interactions for ‘where are we’ games (2B: Together)
a) Toponym value at the interaction location and b) word usage over the ‘where are
we’ interactions of the simulated robots. Toponyms continued to be invented
throughout the 2000 interactions. See the Figure 6.13 caption for more detail about
the elements of the figure.
Chapter 7 A Generative Spatial Language Game
152
a) b)
c) d)
e) f)
Figure 7.25 Interactions for ‘where is there’ games (2B: Together)
a) Match between templates, b) current word usage, c) orientation word usage, d)
target word usage over the ‘where is there’ interactions, e) distance word usage,
and f) direction word usage. There is a gap in the toponyms between 20 and 25 for
current, orientation, and target words – the words in the gap may still be
understood by the agents, but were not used. Again, fewer words were used for the
current toponym than for the orientation and target toponyms. Unlike separate,
there was no correlation between when the words were invented and whether they
were within or beyond the walls of the world.
Chapter 7 A Generative Spatial Language Game
153
7.3 Study 2C: Real World
Study 2C involved the implementation of the ‘where are we’ and ‘where is there’ games in the real
robots (for more detail about the real world refer to section 4.5.3). The first language formed in
Study 1C: A Toponymic Language in the Real World was used as the base toponymic language.
Study 2C aimed to determine if the real world issues of noise in the perceptual data of odometry,
vision, and hearing affected the languages that formed by comparing the spatial languages formed
in the real world with those formed in the simulation world. The hypothesis was that the distance
and direction words that formed would be more general than in the simulation world, as there was
more uncertainty in the location of toponyms due to the variation in hearing distance.
7.3.1 Experimental Setup
The concept representations were experiences from RatSLAM with an omni-directional camera.
The word representations for the robots in the real world were DTMF tones. Words and concepts
were associated using the distributed lexicon table (for more detail refer to section 4.3.4). The
robots explored the world, building up representations, and played games when they were within
hearing distance of each other. The robots played ‘where are we’ and ‘where is there’ games in the
real world. Inventing target words was done probabilistically with a temperature of 0.5. Inventing
spatial words was done probabilistically with a temperature of 0.75. The neighbourhood size was
set to 2m. Checksum error detection was implemented as in Study 1C. The lexicon was updated
when words and concepts were used together by increasing the association by 1.0. Forgetting was
not implemented directly. The volume of the speakers was increased due to the difficulty of
transmitting the larger amount of information in the ‘where is there’ game, resulting in games not
being played when restricted to the lower volume.
The first sequence of Study 1C was extended. Twelve hours of sessions were run in which the
robots played ‘where are we’ and ‘where is there’ games to build up their lexicons for distances and
directions, and to extend their lexicons for locations. Following the sessions, the languages were
tested with 25 ‘go to’ games played. For a summary of the parameters see Table 7.7.
7.3.2 Results
The robots developed a shared set of distance and direction concepts to build on the existing
toponymic language. With these concepts, they were able to invent additional toponyms, some of
which were situated beyond the walls of their world.
The number of toponyms increased from 9 to 24, with 10 invented at the target location (see
Table 7.8). The area covered by the language increased from 83.8m2 to 145.7m2. The results of the
‘go to’ games before and after the agents played ‘where is there’ games was similar (see Figure
Chapter 7 A Generative Spatial Language Game
154
7.26). There was an increase in games in which the agents found the goal but did not meet each
other at the goal location, with a corresponding decrease in games in which the agents met each
other at the goal location. The toponymic language resulting for both agents is shown in Figure
7.28, with the spatial lexicon shown in Figure 7.29, and the results of the interactions of the agents
in Figure 7.27 and Figure 7.30.
Table 7.7 Parameters for Study 2C
Parameters Study 2C
Game ‘where are we’, ‘where is there’, and ‘go to’
Hearing distance ~5m Concept type Location, distance, direction
Concept representation Experiences (real world, omni-directional camera)
Word representation Tones Lexicon technique Distributed lexicon table
Strategy for choosing words Relative neighbourhood most informative
Neighbourhood size 2mForgetting No Updating Hearer and Speaker
Strategy for word invention Temperature
Temperature Toponym = 0.5, Spatial = 0.75, Target = 0.5
Generations 1Agents 2
Interactions per generation 12 hours Initial learning period 0
World Real World Error detection Checksum
Table 7.8 Results for Study 2C
Measure ( x (σ)) 1C (Run 1) 2C Number of Toponyms 9 24
Toponyms Invented as Target N/A 10 Area Covered per Toponym Used (m2) 9.3 (5.7) 6.1 (6.0)
Area Covered by Language (m2) 83.8 (3.5) 145.7 (29.6) Toponym Coherence 0.44 0.13
Direction Words N/A 7 Distance Words N/A 7
Direction Coherence N/A 0.43 Distance Coherence N/A 0.35
Chapter 7 A Generative Spatial Language Game
155
Figure 7.26 Results of the ‘go to’ games (2C)
The results of the ‘go to’ games following the ‘where are we’ games (1C)
compared to the results following the ‘where is there’ games (2C). Following the
‘where is there’ games, there was a reduction in games where the robots met each
other at the goal location (from 78% to 58%), and an increase in games where the
robots did not meet each other at the goal location (from 10% to 28%). A similar
number of games resulted in failure (10% and 12%) and where the goal was not
found (both 2%).
a) b)
Figure 7.27 Interactions for ‘where are we’ games (2C)
a) Toponym value at the interaction location and b) word usage over the ‘where are
we’ interactions of the simulated robots. See the Figure 6.13 caption for more
detail about the elements of the figure.
Chapter 7 A Generative Spatial Language Game
156
Robot 1 Robot 2
a) Experience Map
b) Language Layout
c) Word Locations
d) Word Coverage
Figure 7.28 Example language (2C)
The language for the robots, showing a) the experience map, b) the language
layout, c) the word locations, and d) the word coverage. Note that the language
layout was not restricted to within the walls of the world. See the Figure 6.12
caption for more detail about the elements of the figure
Chapter 7 A Generative Spatial Language Game
157
Robot 1 Robot 2
a) Distance lexicon
b) Direction lexicon
Figure 7.29 Spatial lexicon (2C)
The spatial lexicon showing a) distance and b) direction. There was a range of
distance and direction concepts, and they matched each other between the robots.
See the Figure 7.5 caption for more detail about the elements of the figure.
a) b)
c) d)
e) f)Figure 7.30 Interactions for ‘where is there’ games (2C)
a) Match between templates, b) current word usage, c) orientation word usage, d)
target word usage over the ‘where is there’ interactions, e) distance word usage,
and f) direction word usage.
Chapter 7 A Generative Spatial Language Game
158
7.3.3 Discussion
Study 2C: Real World showed that ‘where is there’ games can be played with robots in the real
world and that the simulation world results were reproducible in the real robots. The robots
developed a shared set of distance and direction terms to build on the existing toponymic language.
The main difficulty encountered in the real robot implementation, in addition to those encountered
in Study 1C, was that many more syllables needed to be communicated between the robots. With
the increased volume, games were played and the spatial lexicon was formed. The final toponymic
language of the robots was less coherent that the language of the robots prior to the ‘where is there’
games were played. Updating the lexicon when playing ‘where is there’ games involved more
noise, as the distance and direction concepts were formed, resulting in toponym concepts that were
less coherent. However, the final spatial language was consistent between the agents.
7.4 Discussion: A Generative Spatial Language
Study 2 addressed the challenge of how agents can refer to locations other than those already
visited. This challenge required relational terms and the use of perspectives. The key contribution of
Study 2 was the demonstration of grounding for both experienced and novel concepts using a
generative process, applied to spatial locations.
The method for generative grounding used in Study 2 (the ‘where is there’ game) enabled the
formation of the spatial relation concepts of directions and distances, which were combined to form
concepts equivalent to ‘simple’ topological, proximity, and projective prepositions (Coventry &
Garrod, 2004). The direction and distance concepts together with the ‘where is there’ game allowed
agents to form concepts for locations that neither agent had visited.
The design of the ‘where is there’ game involved a method for generative grounding and for
aligning perspective. In previous language game studies with a spatial dimension, agents utilised an
absolute frame of reference to share perspective (Bodik & Takac, 2003; Steels, 1995). In the ‘where
is there’ game, perspective alignment was gained with respect to known locations and was achieved
by naming the current location (specifying the location component of the perspective) and an
orientation location (specifying the direction component of the perspective).
Study 2A-C showed that generative grounding can be achieved with an appropriate
representation of the concept space (with an approximate x-y representation of the world), a way to
form and label intrinsic concepts (with toponyms), and a generative process that created both the
concepts and the labels.
159
Chapter 8 General Discussion
‘Y’know,’ he said, ‘it’s very hard to talk quantum using a language
originally designed to tell other monkeys where the ripe fruit is...’
(Sweeper / Lu-Tze in Pratchett, 2002, p.100)
To understand the results of this thesis, it is necessary to review the studies and how they fit in the
context of the literature. This chapter presents a summary of the studies and discusses the impact of
the results on the aims. Also discussed are the contributions made, general conclusions, and
possible further work.
8.1 Summary
This thesis described studies in which grounded spatial languages were learned by simulated agents
and mobile robots. The studies showed that mobile robots can form languages describing locations,
directions, and distances. The interactions between agents were based on a location language game,
where agents achieved shared attention by being located near each other. The location language
game framework was made up of the game played, concept representations, word representations,
the lexicon, population dynamics, the environment, and performance measures.
In the location language game framework, agents formed concept representations of their world
through exploration, which was dependent on the agents’ world. Shared attention was determined
by agents being near each other. The word representation depended on the world and the method
used for word production, being either an integer, a set of unit activations, text, or sound. In each
interaction, the speaker used their lexicon to produce the best word for the chosen topic, the hearer
attempted to comprehend the concept intended by the speaker, and the agents updated their lexicons
based on the interaction. A variety of performance measures were used by each agent to keep track
of how the interactions progressed.
The pilot studies presented in Chapter 5 investigated some of the key features of the framework
including representations of concepts and words, and methods for the lexicon including word
production, concept comprehension, and the source of variability for the languages.
Pilot Study 1 investigated two techniques for the lexicon: recurrent neural networks and lexicon
tables. For each technique, a series of studies demonstrated how the techniques could be used to
associate spatial concepts with words. The lessons learned from the studies were details about how
each technique could best be used to associate spatial concepts with words. These included the
Chapter 8 General Discussion
160
weight setting mechanisms, concept representations, and word representations for recurrent neural
networks, and the effect of different strategies for associating concepts and words, producing words,
and inventing words for lexicon tables.
Pilot Study 2 investigated the use of three different concept representations for word production,
concept comprehension, and the source of variability for words. The representations investigated
were those available to robots using RatSLAM: pose cells, vision, and experiences. A series of
studies compared how the agents could form concepts, learn concept–word associations, create their
own categories and concept–word associations, and generalise to unseen data with each of the
different representations. The lessons learned from the studies were whether each type of
representation was appropriate for a spatial language, and the ability of the different representations
to form categories that grouped together similar concepts.
The pilot studies showed that representations made a major difference to the ease of learning
and structure of concepts, words, and the associations between them. For concept representations,
experiences were found to be ideal for the representations underlying location concepts. For lexicon
techniques, neural networks and standard lexicon tables were found to not be ideal for forming and
learning location concepts. Neural networks take prohibitively long to learn arbitrary associations
between concepts and words, particularly when the input concept representations are large. For
lexicon tables, generalisation typically occurs with comparison prior to the lexicon, rather than at
the lexicon with word formation. Lexicon tables also do not deal well with large input concept
representations, unless pre-processed into categories. A new technique, the distributed lexicon table,
was designed for the major studies of this thesis that incorporated the useful features of neural
networks and lexicon tables.
A distributed lexicon table allowed rapid learning from exemplars, while allowing
generalisation through the methods used to access the associations stored in the table. Concepts are
not formed explicitly, but result from the associations between concept elements and words, and
methods for producing words and comprehending concepts. Finding the most appropriate word for
each situation requires a word selection strategy. Two strategies were developed for use in the
studies: the most associated strategy, which was used in some of the grid world studies, and the
most informative strategy, which was used in the remainder of the studies. In the most associated
strategy, the word chosen was the one with the highest association with the topic. In the most
informative strategy, the word chosen was the one which provided the most information about the
topic.
Study 1: A Toponymic Language Game, presented in Chapter 6, addressed the question of how
interactions impact on the formation of location languages. Study 1A, the grid world study,
Chapter 8 General Discussion
161
investigated the implementation of simple spatial agent interactions in a simple world. The agents
played ‘where are we’ language games. Two solutions were compared: ‘basic’ and ‘best’. The
solutions varied with the features of the population dynamics, word production, updating, and
hearing distance. A comparison of the solutions showed that very different languages resulted when
different parameters were used, with respect to the time taken to form a stable, coherent language,
the specificity of the language, and the types of concepts that form.
In Study 1B, the simulation world study, the ‘where are we’ and ‘go to’ language games were
implemented in simulated robots. The study investigated how word invention rate and
neighbourhood size affected the resulting language. Smaller languages with higher coherence
resulted from lower word invention rates and larger neighbourhood sizes. The results of the ‘go to’
games were similar across the different conditions, with most of the games resulting in the
simulated robots meeting at the goal location. The simulation world study demonstrated how
toponyms could be formed for all locations in the world visited by both simulated robots where the
robots built their own personal maps of the world and interacted through location language games.
Study 1C involved the implementation of the ‘where are we’ and ‘go to’ language games in the
real robots. The goal of the study was to determine whether useful toponymic languages could be
formed through the real robots playing ‘where are we’ games. The study investigated the use of two
error detection strategies: minimal and checksum. In both conditions the robots developed a shared
set of toponyms. The minimal error detection was not enough to stop the robots mishearing each
other regularly: the hearer added a new word when the speaker had used an existing word on
numerous occasions. Additional error detection resulted in more coherent languages with a higher
success rate for the ‘go to’ games (70.0% compared to 58.7%).
The toponym language game study showed how a toponymic language was formed through
simple social interactions. Agents with a shared toponymic language were able to direct each other
to goal locations by specifying the associated toponym. The distributed lexicon table, with methods
for updating the lexicon, producing words, and comprehending concepts, enabled concepts to form
together with words. The study demonstrated the co-development of concepts and words, and
showed how words and interactions influenced concept formation.
Study 2: A Generative Spatial Language Game, presented in Chapter 7, addressed the challenge
for embodied language games of how agents can refer to locations other than those they have
visited. Study 2A, the grid world study, implemented the ‘where are we’ and ‘where is there’
language games in the grid world, investigating the impact of changing the size of the world,
obstacles in the world, and the population dynamics. The studies found that toponyms were formed
in any world size, and direction and distance words were formed for the spatial relations. With
Chapter 8 General Discussion
162
obstacles in the world, agents were still able to form a complete toponymic language, including
names for locations in the world that had never been visited. With multiple generations rather than a
single generation of agents, less common words were forgotten, while the meanings of the
remaining words shifted. Study 2A showed how a generative toponymic language formed in a
population of agents.
In Study 2B, the ‘where are we’, ‘where is there’, and ‘go to’ language games were
implemented in simulated robots. The study investigated the impact of conceptualisation order for
toponyms, directions, and distances on the resulting language. In both conditions, the simulated
robots developed a shared set of toponyms, directions, and distances, with words invented for
locations beyond the area able to be explored. The study demonstrated how directions and distances
were formed given an existing toponymic language, or when a toponymic language was still being
formed. However, the direction and distance terms covered the space more effectively when they
were formed following the formation of a toponymic language. With the addition of the ‘where is
there’ to the ‘where are we’ game, the simulated robots were able to refer to locations beyond the
perimeter of their world.
Study 2C involved the implementation of the ‘where are we’, ‘where is there’, and ‘go to’
language games in the real robots. The robots developed a shared set of distance and direction
concepts to build on the existing toponymic language. They invented additional toponyms, some of
which were for locations beyond the walls of their world.
The generative spatial language game study demonstrated the grounding of directly experienced
and novel location concepts using a generative process. The method for generative grounding
presented enabled the formation of spatial relation concepts in the form of directions and distances.
Using this method, agents formed concepts for locations that neither agent had visited.
8.2 Discussion
The overall goal of the thesis was to ground a computational model of spatial language in mobile
robots, to be used meaningfully in practical applications. The mobile robots formed toponymic
languages with enough coherence to specify goal locations for goal directed navigation.
Additionally, the robots formed shared concepts of direction and distance.
The specific aim to run a series of experiments that demonstrated learned and evolved language
in agents and robots was achieved. The pilot studies investigated representations and methods, the
first study investigated the formation of toponymic languages, and the second study investigated
generative grounding.
The key question identified in the introduction was: how can a robot form and label complex
concepts in an embodied spatial environment? The studies of this thesis showed that the important
Chapter 8 General Discussion
163
features for answering this question are the interactions between agents, concept representations,
lexicon techniques for associating concepts and words, word production, concept comprehension,
perspective alignment, and a method for generative grounding. The studies in this thesis
demonstrated a grounded spatial language in mobile robots, formed using a cognitive map of
experiences built during exploration. Communicative interactions between mobile robots were
designed that allowed robots to play games when ‘near’ each other and enabled the robots to build a
shared toponymic language. Methods for aligning perspective and generative grounding allowed the
agents to refer to locations other than ‘here’ and to ground new concepts for these locations.
8.3 Contributions
The contributions of this thesis focus on extending ideas for symbol grounding, exploring the
influence that words and concepts have on each other, and exploring the possibilities for models of
spatial cognition. The specific contributions are outlined and discussed in this section.
1. A series of studies to demonstrate that representations and methods matter
The pilot studies presented in Chapter 5 demonstrated that for the spatial languages investigated,
representations and methods influenced the size of the languages, the learning rate of the agents, the
categorisation of concepts, and the generalisations available.
The type of representation used to form categories or concepts not only affected the types of
categories or concepts that can form but also the ease of learning. In terms of language simulation
studies, it was found that it is important to have appropriate representations for the concepts to be
formed and the features of language that are being investigated. To investigate grounding in
embodied agents, arbitrary representations (such as those used in Batali, 1998; and Smith, 2003)
can make concept formation harder than necessary. The pilot studies found that vision, used in
language game studies for concepts of colour and shape (Roy, 2001; Steels, 1999), was not an
appropriate representation for location concepts, as some distant locations have visually similar
scenes. Vision may be more useful for location type concepts, such as ‘corner’ or ‘corridor’, where
the concepts share similar visual scenes. Pose cell representations could be used, but discontinuities
and multiple representations can impede the conceptualisation process. Experiences located in a
map were ideal representations for location concepts, as distant locations in the world have distant
locations in the experience map coordinate space. The location concepts used in this thesis were
useful within a local region, specified by hearing distance, which could be reproduced with a set
distance within an experience map
Using the process described in the semiotic square (Steels, 1999) to structure the language
process, the agents perceived the real world and formed the internal representations of experiences,
Chapter 8 General Discussion
164
concepts were formed from the experiences that were associated with words. The links between the
concepts and words were many-to-many, with distinct concepts never implicitly formed. Previous
studies have used a range of representations from those supporting slower learning and
generalisation (such as neural networks) to those supporting rapid learning and minimal
generalisation (such as lexicon tables).
The initial symbolic methodology considered was lexicon tables (used by Smith, 2001; and
Steels, 1999), which provide in-the-moment learning, though generalisation is typically provided by
similarity to existing exemplars and performed prior to word production. The initial connectionist
methodologies considered were simple neural networks (used by Cangelosi, 2001; Cangelosi &
Parisi, 1998; Kirby & Hurford, 2002; and Marocco et al., 2003) and recurrent neural networks (used
by Batali, 1998; Elman, 1990; and Tonkes et al., 2000) in which learning occurs over time, with
words partitioning concept space. Neural networks are more appropriate for forming categories with
boundary conditions in terms of features than they are for forming categories that are arbitrarily
structured. Neural networks could be used effectively for spatial languages that are based on feature
description, for example the location type concepts of ‘corner’ and ‘corridor’.
In this thesis, toponymic languages were based on exemplars, with arbitrary word associations.
For robots learning language, in-the-moment learning was required for the robots to start using
words appropriately after the first instance, and to generalise to similar concepts. The appropriate
features of lexicon tables were in-the-moment learning from exemplars, while the appropriate
features of neural networks were the ability to generalise to similar concepts. The distributed
lexicon table, designed for the studies in this thesis, combined standard lexicon tables and neural
networks to provide the appropriate features for a toponymic language.
2. The development of a method for concept formation with a distributed representation
In the studies presented in this thesis, concept formation for toponymic languages was supported
by a cognitive map representation. For grounding language in a cognitive map, it was necessary to
design a method for concept formation. A typical approach for concept formation using various
representations is to form categories prior to learning the language and grounding terms (Bodik &
Takac, 2003; Smith, 2001; Steels & Loetzsch, 2007). However, there is evidence that language
assists in concept formation, rather than just building on it (Levinson, 2003b).
Following the pilot studies in Chapter 5, a distributed lexicon table was developed for use in the
studies presented in Chapters 6 and 7. The distributed lexicon table with methods for updating,
producing, and comprehending words demonstrated a way in which concept and word formation
may interact.
Chapter 8 General Discussion
165
The innate spatial ability of agents may influence the constraints on spatial concepts by
providing elements that can be combined in various ways (Levinson & Wilkins, 2006a). In the
studies, the innate spatial ability of the robots was the construction of an experience map and the
use of a distributed lexicon table. These abilities allowed the agents to form toponyms in different
locations depending on social interactions. The agents formed different distance and direction
words depending on the toponymic language and later interactions. The shared social experiences of
the robots, rather than shared perceptual information, influenced the concepts formed (this point is
addressed further in Section 5 of 8.3, Contributions: Grounding locations).
Concept formation occurred together with word formation, reflecting the neo-Whorfian idea that
language can have an effect on the concepts that are formed (Levinson, 2003a). The use of a
distributed lexicon meant that concepts were not formed explicitly, and many experiences
contributed to each concept. Each experience could contribute to multiple concepts, depending on
the agent interactions. Even without explicitly formed concepts, the usage of the concepts was crisp
and coherent as shown by the ‘go to’ games results. Agents often met each other within 1m in the
simulation world, even when the toponyms covered areas of up to 70m2 for the large neighbourhood
condition. The crispness of usage was not designed, but emerged from the interactions of the agents
and from the methods for creating and using concepts.
The distributed lexicon table method for concept formation and word usage differs from both
standard lexicon tables (Smith, 2001; Steels, 1999) and neural networks (Batali, 1998; Cangelosi,
2001; Kirby & Hurford, 2002; Tonkes et al., 2000), with the rapid learning from exemplars similar
to lexicon tables, and generalisation similar to neural networks. With a distributed lexicon table,
concepts are not formed explicitly, but are formed together with associations between concept
elements and words.
3. The development of a method for producing the word that provides the most
information about the chosen topic
An additional implementation issue with using a distributed lexicon table is a way to find the
most appropriate word in any given situation. This issue does not apply to methods where single
concepts are associated with single words, but does apply when concepts can be associated with
multiple words.
Existing methods for word production with lexicon tables include the score strategy (Steels,
1999) and the confidence and probability strategy (Smith, 2003). Two methods were developed for
the studies in this thesis for use with the distributed lexicon table: The first was the most associated
strategy, based on the score strategy. The word used for the concept was the one with the highest
association value. A neighbourhood could be used, with association values summed over the
Chapter 8 General Discussion
166
concepts in the neighbourhood of the topic. In the most associated strategy, the words are easily
found, but one word tended to take over most of the concepts, resulting in lower language
specificity. The second method was the most informative strategy, which was developed to reduce
the chance of a small number of words taking over most of the concepts. In this strategy, the most
informative word for a concept was used, which tended to be specific rather than general. A
neighbourhood could also be used with the most informative strategy. The relative neighbourhood
most informative strategy, described in Chapter 4, was used for the simulated and real robots in
Chapters 6 and 7. Using the most informative strategy, words were more evenly spread across
concept space, but new words were always adopted, becoming the most informative word for the
concept they were first used for. When new words are always adopted, stable languages cannot
form unless the invention of new words is prevented.
While the most informative strategy was used successfully, it is probable that there is more to be
discovered about methods for word production. Word selection is a balance between specificity and
stability of the lexicon. Any word selection strategy designed must address this balance.
4. The formation and grounding of spatial concepts based on a cognitive map
representation
The grounding literature is vast, with different researchers emphasising different aspects of the
grounding problem. Hanard’s (1990) suggested solution to the symbol grounding problem is a
hybrid connectionist and symbolic system, with non-symbolic iconic and categorical
representations formed which are associated with symbols or names that feed back to the formation
of icons and categories. Steels (2007) claims that the symbol grounding problem has already been
solved for concepts that may be formed from directly perceivable inputs, where addressing
grounding requires working with embodied autonomous agents, a mechanism for generating
meanings, internal representations for grounded meanings, the ability to establish and negotiate
symbols, and coordination between members of the population.
A variety of embodied models have grounded concepts through direct perception such as vision
(Floreano et al., 2007; Steels, 1999). The final pilot study and the studies presented in Chapters 6
and 7 involved the formation and grounding of spatial concepts based on a cognitive map
representation. As discussed in earlier sections, concepts do not need to be learned prior to word
association. The cognitive map provided by RatSLAM gave an appropriate base representation for
location concepts, but the concepts were learned through language games. Unlike language games
in which concepts are formed from direct perceptions, location concepts require a representation
built over time from exploration, such as the cognitive map representation of experiences.
Chapter 8 General Discussion
167
The studies presented used the experience map together with the distributed lexicon table, which
matched Harnad’s (1990) suggested solution to the symbol grounding problem with the distributed
representation of the experience map associated with symbolic words. The studies also address each
of the features that Steels (2007) claims are required for grounding.
The difference between grounding from direct perception and grounding from a cognitive map
representation is in the way that concept representations are formed, with a cognitive map formed
over time from a combination of direct perceptions. The formation of concept representations and
the grounding of words interact, with the concept representations and types of concepts to be
formed restricting the appropriate methods for grounding.
5. Grounding locations: the design of language game interactions between mobile robots
that enable the formation and grounding of location concepts
Language is not formed by an individual agent, but rather through the interactions of a
population of agents. For a language about locations, the important features of the interactions are
when and where the interactions take place, the content of the interactions, and population
dynamics.
A series of agent interactions were designed in the form of three language games used in the
studies described in Chapters 6 and 7, that were inspired by the guessing game of Steels (2001), and
the spatial language games of Steels (1995) and Bodik and Takac (2003). The language game
method (Steels, 2001) was extended to a location language game method, for use by mobile robots
where shared attention was determined by being located near each other. Obtaining shared attention
through hearing was a simple way to decide if the agents were close to each other, but prone to
noise, particularly in the real world. Combining hearing with another perceptual ability, such as
vision, could result in less noise. Embodiment, seen as one way of solving the symbol grounding
problem (Pfeifer & Scheier, 1999), was addressed partially in the simulation world, and more fully
in the real world. To obtain shared attention based on proximity, embodiment was necessary. The
form of embodiment influenced the shapes of the resulting toponyms. The embodiment of a
language for locations and spatial terms extended previous research by enabling robots to form their
own spatial terms rather than being given terms and ways of forming concepts (Dobnik, 2006;
Skubic et al., 2004), having the perceptual abilities of vision (Floreano et al., 2007; Steels, 1999),
hearing (Roy, 2001), and odometry together with the ability to form a map of the world.
The ‘where are we’ games, played in the toponym language game study, were interactions that
enabled agents to form location concepts. Following each interaction, agents updated their lexicons
by increasing associations between their current experience and the word used, enabling a shared
toponymic language to form. Geographers and architects have long recognised that interactions in
Chapter 8 General Discussion
168
locations and experience of a particular area in space are ways to convert space to place (Tuan,
1975). The ‘where are we’ games used shared experience of space to construct labels for places in
the robot world, resulting in the construction of specific place from general space. Toponyms
describing specific places may become landmarks used to describe ‘where’ (Tversky, 2003).
Naming toponyms in the studies described in this thesis was performed by inventing new words that
were unrelated to current words. It would be possible to form toponyms in other ways, including
based on actions that may be performed at locations, similar to naming memorable events (e.g.
Waterloo), or describing features of the environment, similar to descriptive names (e.g. North Sea)
(Crystal, 1997).
Most of the studies described in this thesis involved a single generation of two agents exploring
the world and negotiating their shared language. They used the negotiation model (used by Batali,
1998; Cangelosi et al., 2004; Hutchins & Hazlehurst, 1995; and Smith, 2001), which is a single
generation of iterated learning (Kirby & Hurford, 2002). A single generation of two agents
interacting proved adequate for the formation of spatial languages of toponyms, directions, and
distances. Although additional generations may allow the languages to become more coherent over
time, with the time for new agents to explore the world to build concept representations and the
number of games required to learn a toponymic language, many generations take prohibitively long
to run.
When and where interactions take place influences which experiences will be associated with
the words used in the ‘where are we’ games. Therefore the timing and location of interactions
influences how toponyms form within a population of agents. Multiple generations of agents are not
required for a coherent language to form. The agents’ social interactions build on the cognitive map
representations to form the final toponymic languages.
6. Generative grounding: the design of generative language game interactions that enable
agents to ground concepts that are not directly experienced
The studies presented in Chapters 5 and 6 only used concepts of ‘here’ and ‘now’. A key
challenge for embodied language games is for the agents to refer to locations other than ‘here’,
particularly those they have never visited. This challenge requires both relational terms and the
ability to take into account the agents’ different perspectives.
The ‘where is there’ games, played in the generative language game study, allowed the agents to
construct concepts and terms for describing distances and directions, which were combined,
forming concepts equivalent to ‘simple’ topological, proximity, and projective prepositions as
classified by Coventry and Garrod (2004). The spatial language games extended from the games of
Steels (1995) and Bodik and Takac (2003) by taking away absolute shared knowledge of direction
Chapter 8 General Discussion
169
and locations in the world, and adding the ability for agents to coordinate their perspectives and
build representations of the world separately. The generative terms of distances and directions were
used by the agents in grounding new toponymic concepts that were not directly experienced.
The ‘where is there’ games required the development of a method for generative grounding, and
for aligning the perspective of the agents. The perspective alignment described in this thesis was
with respect to locations in the world (instead of giving the agents an absolute sense of direction).
Perspective alignment was achieved by naming three locations: Both agents were located within
hearing distance at the first (current) location, they were facing the second (orientation) location,
hence aligning their perspectives, and they talked about a third (target) location. Given the three
locations, agents described the target location with spatial words of distance and direction. Given
perspective alignment and a spatial lexicon, generative grounding enabled labelling of places that
neither agent had visited.
These studies used a strategy for aligning perspectives with the egocentric view. The three
distinct frames of reference that can be used are ‘intrinsic’, ‘relative’, and ‘absolute’ (Levinson,
2003a). An alternative would be for the agents to have an allocentric view from the raw map with a
compass as used in the spatial language games of Steels (1995), and Bodik and Takac (2003), or a
relative view from translations of the representations, as used by Steels and Loetzsch (2007). To use
an intrinsic frame of reference requires knowledge of the intrinsic frames of reference of objects in
the world. As long as the agents have access to one method for aligning perspective, they can
achieve shared attention at a distance. The robots in this thesis only use one frame of reference, so
do not need to remember a situation in one frame of reference and recall in another frame of
reference (Levinson, 2003a) or cope with the difficulty of switching frames without pointers to the
switch (Tversky, 1996).
Generative grounding provided a way for agents to ground words for concepts that are not
currently being experienced, either through direct perception or the current state of the agent. The
‘where is there’ game provided generative grounding for locations. Generative grounding could be
extended to domains that have concept representations in which concept elements have
relationships comparable to the x-y dimensions of location space.
8.4 Conclusions and Further Work
In summary, a computational model of language for mobile robots was successfully developed,
with the robots able to form grounded spatial concepts associated with words. The grounded spatial
language was used for the practical application of directing other robots to a goal location. With the
addition of generative interactions, the agents extended languages in which known locations were
Chapter 8 General Discussion
170
labelled to languages where external locations were also labelled. The result was robots with nouns
(place names) and simple prepositions (direction and distance terms that were used in combination).
The studies presented showed robots that formed and labelled complex concepts in an embodied
spatial environment when they had:
• appropriate representations of spatial experiences,
• concept formation with crisp usage,
• the ability to perform perspective alignment, and
• a method for representing novel concepts by combining simple concepts of locations
with the generative terms of distance and direction.
A cognitive map allowed agents to have a rich representation of their world built from
experiences. Grounding language in a cognitive map meant that the language was grounded in a
rich representation of the world that provided a method for determining relationships between
concepts. The robots learned the map of their environment together with spatial concepts formed
through interactions.
It remains an open question how well these studies will generalise to other aspects of spatial
language. Different spatial scales afford different actions, depending on the distances involved:
within touch (personal space), within view and able to be viewed from different perspectives
(tabletop space), within walking or travelling distance (geographic space), and beyond personal
experience (astronomical space) (Peuquet, 2002). The methods of this thesis apply to geographic
space. With tabletop space, the innate ability required is an intrinsic representation of space, and is
likely to require different methods for the construction of spatial concepts. The experience maps of
RatSLAM (Milford, 2008), based on ideas about a cognitive map in the hippocampus (O'Keefe &
Nadel, 1978), proved to be appropriate for representing geographic space. However, to go beyond
the concepts explored here requires greater knowledge of the world through visual information or
actions through motor control and intent. This work could be extended with robots that have richer
representations of their world and more interesting social interactions.
The major conclusions of this thesis are that generative grounding for spatial concepts is
possible and that representations, methods, and social interactions influence the languages that
form. The meaningful usage of language in practical applications therefore requires appropriate
representations, interactions, and methods for grounding. This thesis has shown that rather than the
directly perceivable world, it is interactions building on innate abilities that influence the final
structure of spatial languages.
171
References Arleo, A., & Gerstner, W. (2000). Modeling rodent head-direction cells and place cells for spatial
learning in bio-mimetic robotics. In J. A. Meyer, A. Berthoz, D. Floreano, H. Roitblat & S. W.
Wilson (Eds.), Proceedings of the Sixth International Conference on the Simulation of Adaptive
Behavior, from Animals to Animats (pp. 236-245). Cambridge, Massachusetts: The MIT Press.
Bartlett, M., & Kazakov, D. (2005). The origins of syntax: from navigation to language. Connection
Science, 17(3-4), 271-288.
Batali, J. (1998). Computational simulations of the emergence of grammar. In J. R. Hurford, M.
Studdert-Kennedy & C. Knight (Eds.), Approaches to the Evolution of Language: Social and
Cognitive Bases (pp. 405-426). Cambridge, UK: Cambridge University Press.
Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition
among exemplars. In E. J. Briscoe (Ed.), Linguistic Evolution Through Language Acquisition:
Formal and Computational Models (pp. 111-172). Cambridge, UK: Cambridge University Press.
Berthoz, A. (1999). Hippocampal and parietal contribution to topokinetic and topographic memory.
In N. Burgess, K. J. Jeffery & J. O'Keefe (Eds.), The hippocampal and parietal foundations of
spatial cognition (pp. 381-403). New York: Oxford University Press Inc.
Beyer, H.-G., & Schwefel, H.-P. (2002). Evolution Strategies: A comprehensive introduction.
Natural Computing, 1, 3-52.
Bickerton, D. (2003). Symbol and Structure: A comprehensive framework for language evolution.
In M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 77-93). New York: Oxford
University Press Inc.
Bodik, P., & Takac, M. (2003). Formation of a common spatial lexicon and its change in a
community of moving agents. In B. Tessem, P. Ala-Siuru, P. Doherty & B. Mayoh (Eds.), Frontiers
in Artificial Intelligence and Applications: Eighth Scandinavian Conference on Artificial
Intelligence SCAI'03 (pp. 37-46). Amsterdam: IOS Press Inc.
Brighton, H., & Kirby, S. (2001). Meaning space structure determines the stability of culturally
evolved compositional language (Technical report). Edinburgh: Language Evolution and
Computation Research Unit, Department of Theoretical and Applied Linguistics, The University of
Edinburgh.
References
172
Brown, P. (2006). A sketch of the grammar of space in Tzeltal. In S. C. Levinson & D. P. Wilkins
(Eds.), Grammars of Space: Explorations in Cognitive Diversity (pp. 230-272). Cambridge, UK:
Cambridge University Press.
Brown, R. (1958). Words and Things. Glencoe, Illinois: The Free Press.
Burgess, N., Donnett, J. G., Jeffery, K. J., & O'Keefe, J. (1999). Robotic and neuronal simulation of
the hippocampus and rat navigation. In N. Burgess, K. J. Jeffery & J. O'Keefe (Eds.), The
hippocampal and parietal foundations of spatial cognition (pp. 149-166). New York: Oxford
University Press Inc.
Cangelosi, A. (2001). Evolution of communication and language using signals, symbols, and words.
IEEE Transactions on Evolutionary Computation, 5(2), 93-101.
Cangelosi, A., & Harnad, S. (2001). The adaptive advantage of symbolic theft over sensorimotor
toil: Grounding language in perceptual categories. Evolution of Communication, 4(1), 117-142.
Cangelosi, A., & Parisi, D. (1998). The emergence of a 'language' in an evolving population of
neural networks. Connection Science, 10(2), 83-97.
Cangelosi, A., Riga, T., Giolito, B., & Marocco, D. (2004). Language emergence and grounding in
sensorimotor agents and robots. Paper presented at the First International Workshop on Emergence
and Evolution of Linguistic Communication, May 31 - June 1 2004, Kanazawa, Japan.
Cangelosi, A., Smith, A. D. M., & Smith, K. (Eds.). (2006). The Evolution of Language:
Proceedings of the 6th International Conference (EVOLANG6). Singapore: World Scientific
Publishing Co. Pte. Ltd.
Carroll, J. B. (Ed.). (1956). Language, Thought, and Reality: Selected writings of Benjamin Lee
Whorf. Cambridge, Massachusetts: The MIT Press.
Christiansen, M. H., & Kirby, S. (2003a). Language evolution: consensus and controversies. Trends
in Cognitive Science, 7(7), 300-307.
Christiansen, M. H., & Kirby, S. (2003b). Language evolution: The hardest problem in science? In
M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 1-15). Oxford: Oxford University
Press.
Coradeschi, S., & Saffiotti, A. (2000). Anchoring symbols to sensor data: preliminary report. In
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth
Conference on Innovative Applications of Artificial Intelligence (pp. 129-135). Austin, Texas:
AAAI Press / The MIT Press.
References
173
Coventry, K. R., & Garrod, S. C. (2004). Saying, seeing, and acting: The psychological semantics
of spatial prepositions. Hove, East Sussex: Psychology Press.
Crystal, D. (1997). The Cambridge encyclopedia of language (2nd ed.). Cambridge: Cambridge
University Press.
de Jong, E. D. (1998). The development of a lexicon based on behavior. In H. La Poutre & v. d. H.
Jaap (Eds.), Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence
(NAIC'98) (pp. 27-36). Amsterdam, The Netherlands: CWI.
Dessalles, J.-L. (2007). Why we talk: the evolutionary origins of language. Oxford: Oxford
University Press.
Dobnik, S. (2006). Learning spatial referential words with mobile robots. In Proceedings of the 9th
Annual CLUK Research colloquim 8-9 March 2006. The Open University, Milton Keynes, UK.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Elman, J. L. (1991). Distributed representations, simple recurrent networks and grammatical
structure. Machine Learning, 7, 195-224.
Floreano, D., Mitri, S., Magnenat, S., & Keller, L. (2007). Evolutionary conditions for the
emergence of communication in robots. Current Biology, 17, 514-519.
Gasser, M. (2004). The origins of arbitrariness in language. In Proceedings of the Cognitive Science
Society Conference (pp. 434-439). Hillsdale, NJ: LEA.
Groening, M. (Writer) (2000). The Computer Wore Menace Shoes [TV], The Simpsons. USA: Fox
Broadcasting Company.
Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). Microstructure of a spatial
map in the entorhinal cortex. Nature, 436, 801-806.
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42, 335-
346.
Hurford, J. R. (2007). The origins of meaning. New York: Oxford University Press Inc.
Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon: The development of shared
symbols in interaction. In N. Gilbert & R. Conte (Eds.), Artificial Societies: The Computer
Simulation of Social Life. London: UCL Press.
References
174
Kirby, S. (2001). Spontaneous evolution of linguistic structure - an iterated learning model of the
emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2),
102-110.
Kirby, S. (2002). Natural language from artificial life. Artificial Life, 8(2), 185-215.
Kirby, S., & Hurford, J. R. (2002). The emergence of linguistic structure: an overview of the
iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the Evolution of Language
(pp. 121-148). London: Springer Verlag.
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer.
Lakoff, G. (1987). Women, fire, and dangerous things: what categories reveal about the mind.
Chicago: University of Chicago Press.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: The University of Chicago
Press.
Landau, B. (1996). Multiple geometric representations of objects in languages and language
learners. In P. Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space (pp.
317-363). Cambridge, Massachusetts: The MIT Press.
Levinson, S. C. (1996). Language and Space. Annual Review of Anthropology, 25, 353-382.
Levinson, S. C. (2001). Space: Linguistic expression. In N. J. Smelser & P. Baltes (Eds.),
International Encyclopedia of Social and Behavioral Sciences (Vol. 22, pp. 14749-14752).
Amsterdam/Oxford: Elsevier Science.
Levinson, S. C. (2003a). Space in language and cognition: Explorations in cognitive diversity.
Cambridge, UK: Cambridge University Press.
Levinson, S. C. (2003b). Spatial language. In L. Nadel (Ed.), Encyclopedia of cognitive science
(Vol. 4, pp. 131-137). London: Nature Publishing Group.
Levinson, S. C., & Wilkins, D. P. (2006a). The background to the study of the language of space. In
S. C. Levinson & D. P. Wilkins (Eds.), Grammars of Space: Explorations in Cognitive Diversity
(pp. 1-23). Cambridge, UK: Cambridge University Press.
Levinson, S. C., & Wilkins, D. P. (2006b). Patterns in the data: towards a semantic typology of
spatial description. In S. C. Levinson & D. P. Wilkins (Eds.), Grammars of Space: Explorations in
Cognitive Diversity (pp. 512-552). Cambridge, UK: Cambridge University Press.
References
175
MacKay, D. J. C. (2003). Information Theory, Inference & Learning Algorithms. Cambridge, UK:
Cambridge University Press.
Maguire, E. A. (1999). Hippocampal and parietal involvement in human topographical memory:
evidence from functional neuroimaging. In N. Burgess, K. J. Jeffery & J. O'Keefe (Eds.), The
hippocampal and parietal foundations of spatial cognition (pp. 404-415). New York: Oxford
University Press Inc.
Majid, A., Bowerman, M., Kita, S., Haun, D. B. M., & Levinson, S. C. (2004). Can language
restructure cognition? The case for space. Trends in Cognitive Science, 8(3), 108-114.
Marocco, D., Cangelosi, A., & Nolfi, S. (2003). The role of social and cognitive factors in the
emergence of communication: experiments in evolutionary robotics. Philosophical Transactions of
the Royal Society London - A, 361, 2397-2421.
Milford, M. J. (2008). Robot Navigation from Nature: Simultaneous Localisation, Mapping, and
Path Planning Based on Hippocampal Models. Berlin: Springer-Verlag.
Milford, M. J., Schulz, R., Prasser, D., Wyeth, G., & Wiles, J. (2007). Learning spatial concepts
from RatSLAM representations. Robotics and Autonomous Systems - From Sensors to Human
Spatial Concepts, 55(5), 403-410.
Milford, M. J., & Wyeth, G. (2007). Spatial mapping and map exploitation: A bio-inspired
engineering perspective. In Spatial Information Theory (pp. 203-221). Berlin: Springer.
Milford, M. J., Wyeth, G., & Prasser, D. (2005). Efficient goal directed navigation using RatSLAM.
In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ICRA
2005, April 18-22, 2005 (pp. 1097-1102). Barcelona, Spain: IEEE Press.
Milford, M. J., Wyeth, G. F., & Prasser, D. (2004). RatSLAM: a hippocampal model for
simultaneous localization and mapping. In IEEE International Conference on Robotics and
Automation, ICRA 2004, April 26 - May 1, 2004. New Orleans, LA, USA: IEEE Press.
Moylan, D. (2003). Pioneer Robot Simulation. Unpublished Software Engineering Honours Thesis,
The University of Queensland.
Newmeyer, F. J. (2003). What can the field of linguistics tell us about the origins of language? In
M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 58-76). New York: Oxford
University Press Inc.
Nolfi, S. (2005). Emergence of communication in embodied agents: co-adapting communicative
and non-communicative behaviours. Connection Science, 17(3-4), 231-248.
References
176
Nolfi, S., & Marocco, D. (2002). Active perception: a sensorimotor account of object
categorization. In B. Hallam, D. Floreano, J. Hallam, G. Hayes & J. A. Meyer (Eds.), From Animals
to Animats 7. Proceedings of the 7th International Conference on Simulation of Adaptive Behavior.
Cambridge, Massachusetts: MIT Press.
O'Keefe, J. (1979). A review of the hippocampal place cells. Progress in Neurobiology, 13, 419-
439.
O'Keefe, J. (1996). The spatial prepositions in English, vector grammar, and the cognitive map
theory. In P. Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space (pp.
277-316). Cambridge, Massachusetts: The MIT Press.
O'Keefe, J. (2003). Vector grammar, places, and the functional role of the spatial prepositions in
English. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space (pp. 69-
85). New York: Oxford University Press Inc.
O'Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. New York: Oxford
University Press Inc.
Peuquet, D. J. (2002). Representations of Space and Time. New York: The Guilford Press.
Pfeifer, R., & Scheier, C. (1999). Understanding Intelligence. Cambridge, Massachusetts: The MIT
Press.
Prasser, D., Wyeth, G. F., & Milford, M. J. (2004). Biologically inspired visual landmark
processing for simultaneous localization and mapping. In IEEE/RSJ International Conference on
Intelligent Robots and Systems (Vol. 1, pp. 730-735). Sendai, Japan: IEEE Press.
Pratchett, T. (1998). The Last Continent. London: Transworld Publishers Ltd.
Pratchett, T. (2002). Night Watch. London: Transworld Publishers Ltd.
Quinn, M. (2001). Evolving communication without dedicated communication channels. In J.
Kelemen & P. Sosik (Eds.), ECAL01 (pp. 357-366). Prague: Springer.
Regier, T. (1996). The Human Semantic Potential: Spatial Language and Constrained
Connectionism. Cambridge, Massachusetts: The MIT Press.
Riga, T., Cangelosi, A., & Greco, A. (2004). Symbol grounding transfer with hybrid self-
organizing/supervised neural networks. In IJCNN04 International Joint Conference on Neural
Networks, July 25-29 2004 (Vol. 4, pp. 2865-2869). Budapest, Hungary: IEEE Press.
References
177
Roy, D. (2001). Learning visually grounded words and syntax of natural spoken language.
Evolution of Communication, 4(1), 33-56.
Roy, D. (2005). Semiotic Schemas: A framework for grounding language in action and perception.
Artificial Intelligence, 167(1-2), 170-205.
Roy, D., Hsiao, K.-Y., & Mavridis, N. (2003). Conversational robots: building blocks for grounding
word meaning. Proceedings of the HLT-NAACL03 Workshop on Learning Word Meaning from
Non-Linguistic Data.
Rumelhart, D. E., Widrow, B., & Lehr, M. A. (1994). The basic ideas in neural networks.
Communications of the ACM, 37(3), 87-92.
Schulz, R., Milford, M. J., Prasser, D., Wyeth, G., & Wiles, J. (2006). Learning spatial concepts
from RatSLAM representations. Paper presented at the "From Sensors to Human Spatial Concepts"
Workshop at the International Conference on Intelligent Robots and Systems, 10 October 2006,
Beijing, China.
Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006a). Generalization in languages
evolved for mobile robots. In L. M. Rocha, L. S. Yaeger, M. A. Bedau, D. Floreano, R. L.
Goldstone & A. Vespignani (Eds.), ALIFE X: Proceedings of the Tenth International Conference
on the Simulation and Synthesis of Living Systems (pp. 486-492). Cambridge, Massachusetts: The
MIT Press.
Schulz, R., Stockwell, P., Wakabayashi, M., & Wiles, J. (2006b). Towards a spatial language for
mobile robots. In A. Cangelosi, A. D. M. Smith & K. Smith (Eds.), The Evolution of Language:
Proceedings of the 6th International Conference (EVOLANG6) (pp. 291-298). Singapore: World
Scientific Press.
Searle, J. R. (1980). Mind, brains, and programs. Behavioral and Brain Sciences, 3(3), 417-457.
Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., et al. (2004).
Spatial language for human-robot dialogs. IEEE Transactions on Systems, Man, and Cybernetics
Part C: Applications and Reviews, 34(2), 154-167.
Smith, A. D. M. (2001). Establishing communication systems without explicit meaning
transmission. In J. Kelemen & P. Sosik (Eds.), ECAL01 (pp. 381-390). Prague: Springer.
Smith, A. D. M. (2003). Semantic generalisation and the inference of meaning. In T. Banzhaf, P.
Christaller, J. Dittrich, T. Kim & J. Ziegler (Eds.), Advances in Artificial Life - Proceedings of the
References
178
7th European Conference on Artificial Life (ECAL), Lecture Notes in Artificial Intelligence (Vol.
2801, pp. 499-506). Berlin, Heidelberg: Springer Verlag.
Smith, A. D. M., Smith, K., & Ferrer i Cancho, R. (Eds.). (2008). The Evolution of Language:
Proceedings of the 7th International Conference (EVOLANG7). Singapore: World Scientific
Publishing Co. Pte. Ltd.
Spinney, L. (2005, February 24). How time flies. The Guardian.
Steels, L. (1995). A self-organizing spatial vocabulary. Artificial Life, 2(3), 319-332.
Steels, L. (1997a). The origins of syntax in visually grounded robotic agents. In M. Pollack (Ed.),
Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97)
(Vol. 2, pp. 1632-1641). San Francisco, California: Morgan Kauffman Publishers.
Steels, L. (1997b). The synthetic modeling of language origins. In H. Gouzoules (Ed.), Evolution of
Communication (Vol. 1, pp. 1-34). Amsterdam: John Benjamins Publishing Company.
Steels, L. (1999). The Talking Heads Experiment (Vol. I. Words and Meanings). Brussels: Best of
Publishing.
Steels, L. (2001). Language games for autonomous robots. IEEE Intelligent systems, 16(5), 16-22.
Steels, L. (2005). The emergence and evolution of linguistic structure: from lexical to grammatical
communication systems. Connection Science, 17(3-4), 213-230.
Steels, L. (2007). The symbol grounding problem has been solved. So what's next? In M. De Vega,
A. Glenberg & A. Graesser (Eds.), Symbols, Embodiment and Meaning. New Haven: Academic
Press.
Steels, L., & Kaplan, F. (2001). AIBO's first words. The social learning of language and meaning.
Evolution of Communication, 4(1), 3-32.
Steels, L., & Loetzsch, M. (2007). Perspective alignment in spatial language. In K. R. Coventry, T.
Tenbrink & J. A. Bateman (Eds.), Spatial Language and Dialogue. Oxford, UK: Oxford University
Press.
Strunk, W., & White, E. B. (2000). The Elements of Style (4th ed.). Needham Heights,
Massachusetts: A Pearson Education Company.
Sun, R. (2000). Symbol grounding: a new look at an old idea. Philosophical Psychology, 13(2),
149-172.
References
179
Thrun, S. (2002). Robotic mapping: a survey. In B. Nebel (Ed.), Exploring Artificial Intelligence in
the New Millenium: Morgan Kaufmann.
Tonkes, B. (2001). On the origins of linguistic structure: computational models of the evolution of
language. Unpublished PhD dissertation, School of Information Technology and Electrical
Engineering, The University of Queensland, Brisbane.
Tonkes, B., Blair, A., & Wiles, J. (2000). Evolving learnable languages. In S. A. Solla, T. K. Leen
& K.-R. Muller (Eds.), Advances in Neural Information Processing Systems 12 (pp. 66-72).
Cambridge, Massachusetts: The MIT Press.
Tuan, Y.-F. (1975). Place: An experiential perspective. Geographical Review, 65(2), 151-165.
Tuan, Y.-F. (1977). Space and place: the perspective of experience. Minneapolis, MN: University
of Minnesota Press.
Tversky, B. (1996). Spatial perspective in descriptions. In P. Bloom, M. A. Peterson, L. Nadel & M.
F. Garrett (Eds.), Language and Space (pp. 463-491). Cambridge, Massachusetts: The MIT Press.
Tversky, B. (2003). Places: Points, planes, paths, and portions. In E. van der Zee & J. Slack (Eds.),
Representing direction in language and space (pp. 132-143). New York: Oxford University Press,
Inc.
Varela, F. J., Thompson, E., & Rosch, E. (1991). The Embodied Mind. Cambridge, Massachusetts:
The MIT Press.
Vogt, P. (2000a). Bootstrapping grounded symbols by minimal autonomous robots. Evolution of
Communication, 4(1), 87-116.
Vogt, P. (2000b). Grounding language about actions: Mobile robots playing follow me games. In J.
A. Meyer, A. Berthoz, D. Floreano, H. Roitblat & S. W. Wilson (Eds.), From Animals to Animats
6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior (SAB00).
Cambridge, Massachusetts: The MIT Press.
Vogt, P. (2003). Anchoring of semiotic symbols. Robotics and Autonomous Systems, 43(2), 109-
120.
Vogt, P. (2007). Language evolution and robotics: Issues in symbol grounding and language
acquisition. In A. Loula, R. Gudwin & J. Queiroz (Eds.), Artificial Cognition Systems (pp. 176-
209). Hershey, Pennsylvania: Idea Group Publishing.
Wagner, K., Reggia, J. A., Uriagereka, J., & Wilkinson, G. S. (2003). Progress in the Simulation of
Emergent Communication and Language. Adaptive Behavior, 11(1), 37-69.
References
180
Ziemke, T. (1999). Rethinking Grounding. In A. Riegler, M. Peschl & A. von Stein (Eds.),
Understanding Representation in the Cognitive Sciences - Does Representation Need Reality? (pp.
177-190). New York: Plenum Press.