Learning object aﬀordances by imitation

The University of BirminghamSchool of Computer Science

September 12, 2005

Learning object affordances by imitationResearch Progress Report 3

Marek [email protected]

Supervisors:Prof. Aaron SlomanDr Jeremy Wyatt

Other Thesis Group Members:Dr Richard Dearden

Abstract

The report explores the role of imitation in learning object affordances. Since I intend toexperiment with a robotic arm enabling only pushing objects, properties of objects will be themain factor affecting the overall complexity of this “toy world”. Learning in the toy world is aprocess of discovering objects’ affordances, as well as the kinds of actions and goals related to them.

I intend to create an algorithm, where imitation can be thought of as a teacher-guided search,which will help a robot to constrain its own exploration to relevant “behavioural patterns”, ac-centuating either action plans or action goals. In effect, a robot which begins with very littleknowledge about actions and objects, will be able not only to recognise and understand novelobject affordances and goals, but also to learn how to generalise the acquired knowledge. Thus, arobot will demonstrate a high level of manipulation competence playing with entirely new objects- “it will master the toy world”.

Keywords

Affordances, imitation, action understanding, robotic manipulation, motor learning, causality.

1

Contents

Abstract 1

Keywords 1

1 Introduction 31.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 An experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Frameworks for sensorimotor learning 52.1 Motor control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Motor learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Multiple pairs of forward and inverse models . . . . . . . . . . . . . . . . . . . . . 102.4 Dynamical movement primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Affordance learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Affordances and properties of objects 153.1 Structure, primitives, learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Social learning in psychology 184.1 Action understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Social interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Knowledge transfer and the learning problem . . . . . . . . . . . . . . . . . . . . . 204.4 Models and functions of imitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5 Action understanding and imitation in neurobiology . . . . . . . . . . . . . . . . . 22

5 An artificial toy world - scenarios of experiments 235.1 Moving arm (no objects) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Poking objects (first contact) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3 Poking objects (understanding shape and kinematics of objects, understanding con-

sequences of actions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4 Pushing objects (understanding dynamics of objects, action planning) . . . . . . . 31

6 Conclusions 346.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.2 My future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3 Evaluation of my work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.4 My timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Bibliography 39

2

1 Introduction

1.1 Motivation

One of the most intriguing, but still unanswered questions in cognitive science concerns kinds ofmechanisms underlying human development. At birth human infants are completely helpless, butas they become adults, they develop into the most versatile and powerful species on this planet.In contrast, modern robots are competent only in very narrow domains, and they fail if changesof an environment cannot be accommodated by domain-specific and pre-programmed routines.This situation is somewhat analogous to precocial species perfectly adjusted to niches they live in,but in general unable to adapt to a highly variable environment. Unfortunately, the extraordinaryversatility of altricial species (including humans, of course) is paid for by a frequently long and riskylearning process [45]. There have been several hypotheses proposed to explain these mechanisms1,but they are all much too general to use them in robotic applications.

I believe that the imitation phenomenon found in many altricial species2 can cast more light onthe aforementioned mechanisms. In particular, imitation as an element of social learning, requiresthe existence of a common representation layer in both a teacher and a learner, in order to beable to transfer any knowledge between them [47, 20]. Imitation allows a learner to experience theteacher’s actions and goals, even without understanding them yet. After many learning sessions,by trial-and-error practise with various actions and action contexts, there appears a behaviouralcorrespondence between a learner and a teacher (infants are developing a kind of forward modelsof actions and goals). This correspondence ultimately leads to the ability to emulate actions andtheir results, i.e. to generalise across the whole action domain.

Humans learn objects’ affordances during their entire life span, and it seems that only subsetsof them are learnt autonomously. Unfortunately, it is rather difficult to estimate which affordancesare not likely to be learnt in this way, since human development is a long, complex and gradualprocess. However, affordances are an important aspect of the toy-world, where interactions of arobotic arm with the environment will strongly depend on properties of objects. Input providedby a teacher can introduce novel behavioural patterns, which together with self-learned ones cansignificantly facilitate understanding properties of objects, as well as laws of physics (causality). Agood example here could be a child trying to connect together two toy carriages, but without yetunderstanding the function of a stitch and a hook3. A teacher may re-direct attention of a learnerto a particular spatial region, some object features and to a certain sub-class of actions. Thisreduces a learner’s search space, so that she/he may find a solution after a few trials only. One ofthe most interesting phenomena, present in many altricial species, is their remarkable capabilityto generalize from very few examples4. How many examples will a robot need to understand thefunction of a stitch and a hook?

1.2 An experiment

Let us imagine an experiment scenario in which an actor (e.g. a human) is pushing an object as itis shown on the figure 1.

While a human would not have any difficulties to repeat this action, for a robot it can be a veryhard problem. I will focus here only on a few aspects of this problem, but the key ones for thisreport. The scenario displays are the example of a qualitative description of what is happening

1For example, it is suggested [45] that altricial species are capable to acquire knowledge in re-usable discretechunks to further combine them in novel ways.

2Chimpanzees can reproduce human movements, many birds can imitate foraging behaviour [20].3Aaron Sloman’s example.4There are biological reasons for that, and not only because of a limited capacity of a brain, but because of the

advantages due to adaptability and robustness in unfamiliar situations.

3

Figure 1: The above scenario displays illustrate an actor moving a box.

during action execution. However, the descriptions do not involve one important thing - the controlmechanisms which allow an actor to carry out the experiment.

Let us assume that a learner (a robot) shares with a teacher (a human) a similar kind of buildingblocks for a qualitative description of actions. Naturally, they can be different than presented heredisplays. These description patterns in conjunction with corresponding proprioceptive patterns(e.g. tactile sensations) comprise a feeling which an actor experiences when carrying out an action.A level of description is sufficient if an actor understands how to control its arm so that anexperiment (almost) always proceeds according to this description. In particular, the above scenariorequires an actor to know:

1. The goal of an action, i.e. pushing a box to a new location.

2. How to approach to an object so that it will not rebound from the end-effector.

3. How to push an object to a new location so that it will not be deviating from a movementdirection.

In order to achieve manipulation competence, a learner must learn how to perceive, plan andpredict actions on a qualitative level, along with developing manipulation skills (the means) tomaintain an ongoing action within the selected qualitative plan. This requires a kind of a doublelayer architecture5 which, thanks to low-level reactive mechanisms, hides the complexity and sto-chastic nature of the real world, at least on a certain description level. There can be many levels ofdescription of a particular experiment, and in general they depend on the knowledge of a learner(object’s affordances, kinds of actions related to them, etc.), and on the quality of perception (inhidden conditions).

A learner, by observing a teacher, activates its own description patterns on various levels. Whiletrying to repeat an action three things may happen:

1. Success - a learner understands a goal and an action plan

2. Surprise or accident - a learner encounters an unexpected problem (known or unknown)

3. Failure - a learner does not know either a goal or an action plan

The role of a teacher is to present the same scenario in various contexts (by changing spacelocations, approach directions, object types, etc.), what accentuate missing knowledge of a learner.A learner imitates, i.e. consciously repeats observed patterns to the lowest details it can perceive,however without understanding either the link to the reactive layer (lack of skills), or the structureof patterns (missing goal), or both6. After certain number of trials, a learner is able to reliably

5I am not considering here deliberative- and meta- layers.6In this report I am not dealing with a social/communicative aspect of imitation, where the meaning of ac-

tions/gestures lies “beyond” the current scene context.

4

repeat an observed action, as well as it understands the qualitative correspondence between theobserved action and a self-generated copy. This is possible due to the following reasons:

� Perception precedes manipulation skills, even though skills bootstrap perception (particularlythrough generalisation). For instance, to understand that certain objects roll, a learner musthave some kind of indicator of “roundness” in advance.

� Only small subsets of description patterns (including proprioceptive patterns) comprise fea-sible actions performed upon known objects. This allows a learner to reliably predict thecourse of an action, and in consequence, to suitably react to any deviations from the previ-ously selected action plan.

� There are similarities (and differences) between whole patterns and their parts. They enablerobust recognition and generalisation, even with imperfect perception. For example, a learnercan push an object relying only on force sensors, without any visual feedback.

The report begins with an overview of a few frameworks for learning from imitation and forlearning object affordances. The section 2 also includes a short introduction to motor learning. Inthe section 3, I introduce the notion of affordance, and I discuss the mechanisms and requirements,which allow a robot to perceive and act upon object affordances. In the section 4, I analysesocial learning from the psychological viewpoint, concentrating on the role of imitation and actionunderstanding. The section is supported with some interesting findings from neurobiology. Next,the section 5 describes the robotic experiments I am going to carry out. I analyse in details robot’sunderstanding of actions, plans, and goals. Finally, in the section 6, I summarise the problemsraised in the report. I also present an “action understanding” hypothesis and plans concerning myfuture research.

2 Frameworks for sensorimotor learning

In the following chapter I present the most successful frameworks for sensorimotor learning andfor learning object affordances, which are relevant to my work. Motor control is an extensive andcomplex subject, therefore I will focus only on the most important issues related to the presentedframeworks.

2.1 Motor control

We assume that a motor system can be modelled as a system with [24, 25, 26]:

� Inputs u determining a control signal: spatial coordinates, velocities or accelerations of thearm.

� Outputs y determining a system behaviour: joint coordinates, velocities or torques of thearm.

� Internal state x, e.g. joint coordinates or velocities of the arm.

The functional relationship between joint coordinates and spatial coordinates is many-to-one,and is described by a forward model :

y[n + 1] = h(x[n],u[n]) (1)

5

Figure 2: Relationship between a forward model and an inverse model at a time step n [24].

Figure 3: A feedforward controller is simply an inverse model of a system (plant). y∗ denotes adesired output of a system.

A forward model predicts the behaviour of a system y[n + 1] at the next time step t + 1, given thecurrent state x[n] and the motor command u[n]. Conversely, the many-to-one relation betweenspatial coordinates and joint coordinates is described by an inverse model :

u[n] = h−1(x[n],y[n + 1]) (2)

An inverse model estimates the motor command u[n] required to achieve the desired behaviourof a system y[n + 1] at the next time step t + 1, given the current state x[n]. The relationshipbetween a forward model and an inverse model at a time step n is shown on the figure 2.

The problem of controlling of a system is a problem of achieving a desired behaviour at itsoutput. In general, there are two classes of controllers:

1. An open-loop feedforward controller which entirely relies on an inverse model of a system(figure 3)

2. A feedback controller which entirely relies on a feedback from the output of a system (figure4)

System control based on inverse models is referred to as predictive control. Feedforward con-trollers guarantee stability of control regardless of a system noise. However, feedforward controllerswill not keep the desired output if a controlled system diverges from the internal estimate of thestate (e.g. because of any external disturbances), or if the controller itself is an inaccurate inversemodel of the plant.

6

Figure 4: A feedback controller uses feedback from the output of a system to correct control signalu.

Figure 5: A composite control system consists of both a feedforward controller and a feedbackcontroller [25].

The simplest feedback controller uses output of a system to correct control signal. Error-correcting feedback control can be expressed as:

u[n] = K(y∗[n]− y[n]) (3)

where K is referred to as a gain. It can be shown that for high values of K, an error-correctingfeedback controller is an equivalent to an open-loop feedforward controller [24], i.e. feedbackcontroller utilises an implicit inverse of the plant. In contrast to feedforward controllers, feedbackcontrollers guarantee maintaining the desired output of a system, however, for high values of K,due to delays and high nonlinearities in the feedback loop, a system may become unstable.

A composite control system consists of both a feedforward controller and a feedback controller,so that it combines the better of these two worlds (figure 5). The control signal in the compositesystem is a sum of signals from a feedforward controller and a feedback controller:

u[n] = uff [n] + ufb[n] (4)

7

Figure 6: A generic supervised learning system [25].

2.2 Motor learning

There are well known classical methods for computing the inverse transformation, using eitherinverse kinematics (global methods) or inverse manipulator Jacobian (local methods) [31]. Unfortu-nately, the calculations may quickly become very complex for more then a few degrees of freedom,as well as they require very precise parameters of the arm and do not allow any flexibility.

Alternatively, inverse models can be determined through supervised learning. If we define asuitable cost function J , the learning can be reduced to an optimisation problem:

J =12‖y∗ − y‖2 (5)

where y∗ is a desired output. A generic supervised learning system is shown on the figure 6.The simplest method of acquiring an inverse model of a system (the plant) is called direct

inverse modelling. The idea is to present to a learner various test inputs, observe outputs, andprovide the input-output pairs as a training data [25], as it is shown on the figure 7.

The cost function is defined as

J =12‖u[n]− u[n]‖2 (6)

where u[n] denotes the estimated controller output at time n. Unfortunately, the direct inversemodelling does not converge to correct solutions for such nonlinear systems like robotic arms. Theproblem is referred to as a convexity problem, because of non-convexity of areas in the joint space,which correspond to the same values in the coordinate space. The one degree of freedom caseof the convexity problem is called “archery problem” [25] - there can be two angles at which aprojectile reaches a target.

Feedback error learning is a method of learning a composite control system (figure 8). The errorsignal used for learning a feedforward controller is now simply a signal from a feedback controllerufb[n]. But more importantly, so defined learning error allows the controller to avoid the convexityproblem.

In contrast to the direct inverse modelling, the feedback error learning can be used online,since a control signal from a feedback controller will always compensate the difference between theactual output and the desired output. Additionally, the feedback error learning is goal directed.

8

Figure 7: The direct inverse modelling approach [25]. Symbols D refer to a one-time-step delay.x[n] is the estimated state of the plant at time n.

Figure 8: The feedback error learning uses a signal from a feedback controller to learn a feedforwardcontroller [25].

9

Instead of just random sampling the task space, as is the case in the direct inverse modelling, thefeedback error learning can sample the space with the input goal signal y∗[n + 1].

Alternatively to supervised learning, reinforcement learning algorithms can be used as well [25].Reinforcement learning algorithms do not need a performance vector for each point in the taskspace, but only a scalar evaluation. The common way is to define for each point in the goal space,a set of possible responses together with associated probability of selecting that response. If thereinforcement is high, the selection probability is increased, otherwise the probability is decreased.While reinforcement learning algorithms allow delayed evaluation, there are usually slower thansupervised methods.

Learning algorithms

Introduced algorithms are very general, and do not restrict a learner to any particular choice. Themotor learning problem can be defined as a classification problem, which involves labelling inputpatterns, or as a regression problem, which involves finding a functional relationship between inputsand outputs, i.e. in our case a nonlinear function approximation with high dimensional input data.

Although, neural networks are a common choice, much better effects can be obtained by usingdedicated function approximators. In particular, locally weighted projected regression (LWPR)[48, 40] is an incremental function approximation algorithm, which calculate prediction y for apoint x (mapping f : x → y) as a weighted sum of linear models:

y =∑M

m=1 wmym∑Mm=1 wm

(7)

where all linear models ym are centred at points cm ∈ <n:

ym = (x− cm)T bm + b0,m = xT βm (8)

where x = {(x− cm)T , 1}T and βm are the parameters of the locally linear model. The region ofvalidity of each local linear model, called receptive field, is determined by weights wm computedfrom a Gaussian kernel:

wm = exp(−1

2(x− cm)T Dm(x− cm)

)(9)

where Dm is a distance metric that determines size and shape of a region m.In contrast to other methods, LWPR allows for a convenient allocation of resources, while

dealing with the bias-variance dilemma - a tradeoff between overfitting and oversmoothing. Moreimportantly, since each local model is learnt independently, LWPR directly addresses a negativeinference problem - an incremental learning problem of forgetting of useful knowledge while learningfrom new data [40].

2.3 Multiple pairs of forward and inverse models

Forward and inverse models introduced so far, are able to capture only one-to-one relation betweenvisually perceived and actual end-effector locations7 - i.e. a single controller can be trained toachieve all points in space, but only from a single location. This problem can be overcome byintroducing multiple parallel models, and splitting the input space into regions, where particularmodels can specialise (figure 9). After simultaneous training of all models, the mixture of expertsarchitecture can generalise learnt trajectories onto regions not covered by single experts [16].

7I assume that there is no ambiguity in spatial perception.

10

Figure 9: The mixture of experts architecture [25]. Each expert network is specialised in a partic-ular region of the input state x. The overall output µ is computed as a normalised sum of outputsµn, weighted by mixing coefficients gn.

The central idea of the MOSAIC model [52][19] is to split up experience into multiple internalmodels - pairs of forward and inverse models - so that in each pair a forward model is learnt topredict the behaviour of an associated inverse model8 (figure 10).

The prediction of the ith forward model at the next time step t + 1 is given by

xit+1 = φ(wi

t,xt,ut) (10)

where wit are parameters of the model (e.g. weights of neural network). Denoting the system

state by xt, and assuming that the system dynamics is disturbed by Gaussian noise with standarddeviation σ, the likelihood lit of the ith forward model can be written as [19]:

lit = P (xt|wit,ut, i) =

1√2πσ2

exp(−|xt − xi

t|2

2σ2

)(11)

The responsibility predictor activate suitable modules before any movement is generated. Eachpredictor η is parametrised by δi

t, and produces a prior probability πit basing on the contextual

signal yt:

πit = η(δi

t,yt) (12)

Finally, the responsibility estimator activates modules according to the normalised responsibil-ities λi

t:

λit =

πitl

it∑n

j=1 πitl

it

(13)

The MOSAIC can operate in the following modes:8Since the MOSAIC model uses multiple forward models, it does not need a gating network with a single control

error.

11

Figure 10: The MOSAIC model consists of n modules [19], where each inverse model is associatedwith a corresponding forward model. Modules are trained according to their ability to predict thecurrent state of a system (responsibility estimator).

Action production and learning: Given the contextual signal yt (determined by a target lo-cation, an estimate of an object mass, etc.), the responsibility predictor initiates movement,by generating the responsibility predictions πt for t ≥ 0. Later, the responsibility predictorcan be learnt by comparing the responsibility predictions and the responsibility estimatorsπ. At time t > 0, forward models receive an efference copy of motor commands, and producethe predicted state xt. The predicted state is then compared to the desired state xt, yieldingthe prediction error (xt − xt). In the learning mode, the prediction error is used to learnforward models, and together with the feedback error ufb, to learn inverse models (e.g. usingthe feedback error learning 8).

Action observation: During recognition inverse models produce motor commands which corre-spond to the observed action. Although in fact all motor commands are inhibited, theirefference copy is passed to forward models. At time t − 1 forward models produce the pre-dicted state xt−1 which is then compared with the observed state xt at the next time stept, yielding the prediction error (xt − xt). The prediction error pattern defines the level ofcertainty that a particular action is being demonstrated.

Imitation: Imitation combines the action observation followed by the action production.

12

The HMOSAIC consists of several layers of MOSAIC [51]9. Due to bidirectional interactionof the lower- and the higher- modules during learning and control, the HMOSAIC can learn howto chunk actions into elementary movements (low level), their sequences (mid level) and symbolicrepresentation (high level). The inputs of the higher-level modules are the responsibilities ofthe lower-level modules, i.e. the higher-level forward models learn how to predict the posteriorprobabilities of lower-level models. On the other hand, the higher-level control models learn theprior probabilities to the lower-level control models, i.e. the higher-level inverse models generateactions for further elaboration on the lower levels.

The HMOSAIC was shown [51] to be capable to learn progressively both the basic movementsat the lowest level, and their hierarchical temporal order at the higher levels. Because of thetree-like structure the HMOSAIC can also learn multiple ways to achieve a single target goal [51].This is a kind of generalisation which allows to recognise a target goal of an observed action, eventhough the comparison of the responsibilities of the lower-level modules may significantly differ.

2.4 Dynamical movement primitives

While a system of pairs of forward and inverse models can potentially learn an arbitrary movement,restricting possible classes of movements can reduce the search space during learning and actionrecognition. Dynamic movement primitives (DMP)[22, 42] are defined by systems of nonlinear dif-ferential equations, whose time evolution naturally smooths generated trajectories in the kinematicspace.

Within the reinforcement learning paradigm, the motor learning problem can be reformulatedas a problem of finding a task specific policy [39, 41]:

u = π(x, α, t) (14)

where x is a desired system state, u is a motor command, and α is an adjustable parameter specificto the policy π (e.g. weights of neural network). Learning of π quickly becomes intractable foreven small number of dimensions of state-action space. However, introducing prior informationabout the policy can significantly simplify learning, e.g. in terms of a desired trajectory.

To increase flexibility, it seems reasonable to provide a set of trajectory primitives rather thana single desired one. Thus, instead of learning a single policy, a robot should learn a combinationof policy primitives [39, 41]:

u = π(x, α, t) =K∑

k=1

πk(x, αk, t) (15)

On the other hand, if we rewrite the control policy 14 as a differential equation

x = f(x, α, t) (16)

the problem of motor control is then entirely “shifted” to the kinematic space, independently onthe complex dynamical properties of an entire system. Thus, a motor controller (figure 11, see alsofigure 5) translates a desired kinematic state of an arm, i.e. locations, velocities and accelerationsinto desired torques, using e.g. computed torque law [31]. Consequently, complex trajectories inthe kinematic space can be generalised over entire space (invariance), since all the nonlinearitiesdue to dynamics of the system are accommodated by the controller.

Without delving into unnecessary details, dynamical systems can have two types of attractors 10:a fixed point and a limit cycle. The appropriate sets of differential equations [22] generates suitably:

9Similar architecture was proposed in [23].10I am not dealing here with strange attractors [35] (with fractal structure).

13

Figure 11: The motor controller for the dynamic movement primitives [39, 41]. Each DMP cangenerate trajectories in the kinematic space - i.e. the joint space, or alternatively in the task space(higher nonlinearities to accommodate by an inverse model).

the discrete trajectories (discrete DMP) and the rhythmic trajectories (rhythmic DMP). The shapeof trajectories is controlled by a nonlinear function f which can be conveniently approximated bya locally linear parametric model 7 introduced in the section 2.2. In addition, rhythmic DMPs canbe parametrised by an energy level.DMPs can be learnt in two modes [41]:

Imitation learning. Given the spatiotemporal characteristic of the sample trajectory y, locallyweighted learning (section 2.2) finds the weights of an function approximator of the nonlinearfunction f .

Reinforcement learning. The DMP learnt by imitation can be further improved with respect toan optimisation criterion. The Natural-Actor Critic (NAC)[36] is a special stochastic gradientmethod, which injects noise to the control policy in order to avoid suboptimal solutions. Acareful design of control policies can allow a humanoid robot to play drums or even walk [41].

The DMPs invariance property also facilitates their further recognition and classification withusing parameters of a function approximator, and e.g. a nearest neighbour classifier.

2.5 Affordance learning

Affordance learning is a relatively unexplored subject in robotics. I have chosen a work of Fitz-patrick and Metta [13, 12] which was originally one of my inspirations for the experiments describedin this report.

The work was implemented on the humanoid robot Cog, whose arms have six degrees of freedomeach [5] controlled by elastic actuators [50]. The experiments described in [13] were carried out ina way which allows “a very natural developmental progression of visual competence” [13].

Perceiving the body in action. The robot learns motor commands needed to reach a currentfixation point [30]. Using optical flow and proprioceptive feedback, the robot learns anappearance and a space location of its own arm, waving it in a still environment.

Perceiving actions and actors on objects. The robot learns that there are objects and obsta-cles in its environment. It experimentally determines the extent of objects, correlating the

14

motion of an arm and the effects of collisions. All collisions are detected only visually, andobjects are segmented using the maximum-flow algorithm [4]. The robot also learns how toperceive a human arm.

Developing mirror neurons. There were four actions used, labelled as: pull in, side tap, pushaway, and back slap[13]. To simplify recognition, each from all four types of objects has differ-ent colour. The robot interacts with objects and learns their average direction of movementwith respect to their principal axis11. Furthermore, the robot is able to repeat an observedaction, what requires its recognition with respect to its own action repertoire.

Comments

I have identified issues not addressed in the above work, but relevant to this report:

� There is no real development in understanding of actions and object properties. All actionsand object properties and affordances are pre-defined. There are four actions, and only oneaffordance - there is nothing to discover from interactions with objects apart from improvingreaching skills and matching between objects’ colour and movement directions. In effect,there is no real development in perception either12.

� Action understanding is defined in statistical terms, i.e. as an estimated probability ofoccurrence of movement directions (bottom-up), not as a goal directed control of objects(top-down). In effect, there is no any qualitative description of interactions with objects(including goals), which would allow the robot to control its environment.

� The experiments do not go very far, i.e. there is an enormous complexity which can emergeonly by playing with the same objects and using the same actions as in the work [13]. Firstof all, behaviour of objects depends on the strength of poking and the point of contact withan object. It also depends on the type of interaction with an object13. The objects used inexperiments can roll, slide, rotate, etc. Some objects can have more than one affordance. Iwill try to reveal some of this complexity in the section 5.

� Because there is no action understanding, a robot cannot learn from imitation - it willnever understand the goal of an action. Consequently, a robot cannot plan actions either.However, a robot can mimic a human’s action by matching actions, what was shown in oneof the experiments.

3 Affordances and properties of objects

In general, an affordance of an object can be defined as an inherent property of the object whichcan be potentially exploited by an observer or actor in order to do something. Thus, an affordancedepends on who the actor is, or in other words, what sort of actions the actor can perform upon anobject. For example, a human can grasp a cup, while a dog cannot do it. Furthermore, a humancan grasp a cup in many other ways, e.g. using a handle. So that, from the actor point of view,an object can possess many affordances depending on his/her knowledge about the object or justactions associated with it. Furthermore, these associated actions depend on the goal of an actor,so that when pursuing the goal an actor will choose actions relevant to this.

11However, a robot cannot find, e.g. the difference between rolling a ball and sliding a cube (different affordances).12In particular, segmentation of objects achieved by poking does not help at all in further object recognition and

categorisation (e.g. by creating a model of its boundaries, shape, etc.), as one would expect from learning.13Short with no reactive control - poking, long with feedback control - pushing

15

The question is: do we remember all actions which we can perform upon all objects? Of course,we do not. In order to maintain our brains in a reasonable size, as well as to increase our robustness,it seems that nature uses at least two tricks. Simplifying:

Top-down mechanism. All objects are classified according to their properties which are relevantto our repertoire of actions. This is a generalisation which allows us to efficiently plan actions.For instance, to vacuum a carpet one needs a vacuum cleaner, completely ignoring most ofits features.

Bottom-up mechanism. We tend to perceive only these object properties which are relevantto our current goal. The kind of objects and its properties we perceive is mediated by ourattention mechanism, which keeps a balance between our ability to complete the task, andto notice potentially relevant but unexpected events (including object affordances).

Let us now consider a variation of the toy world, where the life of all living species comesdown to acting upon objects’ properties. Knowing all actions and properties of objects, learningcould then be reduced to filling out the matrix representing mapping between object propertiesand actions. In order to build an artificial creature one needs to consider (expected) action results,provide a mechanism for goal generation, etc. Unfortunately, the above simple scenario is not thatoptimistic:

1. Whole vs. part. It is difficult to find all “basic” object properties and “primitive” actions,providing, of course, there exist any.

2. Quality. Even if we knew all object properties and actions, they usually depend on eachother in a very complex way. Their combination can both introduce qualitatively new prop-erties/actions and/or cancel existing properties/actions. For example, a handle of a mug(including a hole, a rim, etc.) is created by adding a “bent joint”, a computer consists ofmillions of semiconductors, wires, etc.

3. Quantity. Object properties and actions14 are not “discrete entities”. They all depend oneach other in a structural or spatial way. For example, a handle of a mug makes sense onlyif it is attached to the side of a mug, a net is a rope properly tied together, Van Gogh’sSunflower paintings were created as (unique) effects of combination of paint.

4. Time. Object properties and actions may change in time. Their mutual dependency mayintroduce (or suppress) novel (inherent) behaviours. For example, thanks to the angularmomentum of the wheels (gyroscopic effect) one can ride a bike, aeroplanes can fly becauseof specific wing profiles.

5. Finally, the points from 2) to 4) can be also applied to different properties in different objects.For instance, a composition of stones in a mountain stream may allow us to cross the streamwithout getting wet.

Naturally, the above “constructive” classification of object properties and actions is only one ofmany other possible abstractions15. There is still (at least) one more complication. I deliberatelyhave not used the word affordance in the place of property, since objects can have propertieswhich are sometimes hardly related to any affordances, though they can be very useful in objectidentification and/or categorisation. An example here can be colour or texture. However, forinstance a number written on a weight can also indicate its “hidden” affordance. Alternatively,

14So in consequence affordances.15It is just a matter of usefulness.

16

one can think of affordances as more abstract entities like these found in mathematics, games (e.g.chess) or even in social life. Regarding the experiments I am going to carry out, I will be identifyingobject affordances with rather shape-like properties.

Affordances of a single object clearly form a hierarchy, usually according to their salience -for example, a newspaper can be also used as a mat, a parasol or to kindle a fire. However,infants during development are discovering only a small subset of them, which corresponds roughlyto their repertoire of actions. Affordances together with actions (and its expectations) form an“affordance space” which constantly grows during learning. Observed objects can be then describedas regions in this space. This is a generalisation procedure which reduces each object to collectionof affordances.

The process of discovering affordances can be quite complex. For example, the child playing withthe toy carriage may not have understood how a hook really works, though she/he knows roughlyhow it looks like and what is an effect of using it. Considering temporal aspects of affordances, e.g.the expected result of exploiting any particular affordance, it is also easier describe its multi-effects.For example, the function of a handle of a cup might be quite difficult to recognise by a small child,until she/he tries to grasp a cup with hot tea inside.

An interesting emerging issue is the identity of objects16. As I mentioned before, many proper-ties of objects do not need to be related to any known affordance, nevertheless this may be changedduring further learning. For instance, a robot may have noticed before that certain objects arerounded, but did not realise an affordance related to it (they roll). Keeping some (salient and simi-lar) object properties in short-term memory may significantly speed up learning process, especiallygeneralisation.

3.1 Structure, primitives, learning

A spatial structure of an object (including other object components) is a major factor determiningits affordances. Because humans (including other animals) can perceive only 2D projections of3D objects, they must have developed a way to reconstruct, at least partially, its 3D structure.There have been many trials in computer vision to understand this process - most notably low levelstructure from motion [18], and various high level geometric models like generalised cylinders orribbons [14]. Without delving into details, neither of them links a spatial model of an object (or itspart) with kinds of actions which are possible to perform upon it and which are not. There are e.g.aspect graphs, which describe qualitative change of an object appearance caused by its rotationor equivalently by viewpoint variation [14]. Unfortunately, even the simplest objects can haveextremely complicated aspect graphs. In the toy world, a cylinder, a cone, a ball can all be rolled,but a pen cannot (because of a clip). These abstraction levels are also necessary to categoriseobjects with respect to actions’ results - for instance, a cup, a jar, a bowl, a bottle can all be usedas water containers, which is not true for a flowerpot. Therefore, especially from developmentallearning viewpoint, it is important to recognise object features and their configurations which makecertain actions possible, while the others do not.

The above considerations lead to the idea of basic object properties comprising object affor-dances, similarly to primitive actions or movement primitives in neuroscience [3]. The supportcomes also from the fact that affordances are usually spatially entangled, so that in general it isimpossible to formulate any useful “language” limited only to affordances as its vocabulary (in-cluding any spatial parameters). While affordances and object properties are learnt and discoveredin a rather top-down manner, there must be carried out a kind of reverse engineering procedureto find these properties. This is an extremely difficult process, since the number of possible shape

16The influence of uniqueness on cultural development is an interesting philosophical problem, naturally beyondthe scope of this report.

17

primitives should be as small and as expressive as possible, while being compatible with the bodyof a robot. It seems rather unclear how to find the right balance here, i.e. how to be neither toounder-expressive nor too over-expressive. For instance, it makes no sense to create a specific setof properties describing screws and nuts if a robot has too little degrees of freedom to apply asuitable torque to rotate a screw.

Spatial structure of objects and their affordances are probably one of the most complex aspectsof our life. It seems that humans possess an extraordinary capability to perceive these structures,but only up to the level which affect their current goals, i.e. on different abstraction levels. Thisperception changes depending on age and culture - i.e. manual skills and knowledge. Moreover,while our civilisation has made an enormous progress for the last few tens of thousand years, ourbrains have not changed much (if at all). One cannot resist an impression that we all must havesome kind of a universal and powerful learning mechanism, perfectly tuneable by social learning.

4 Social learning in psychology

I refer to emulation as to the action performed by a learner, which reproduces the observed (bya learner) object movements, while imitation also involves copying the meaningless (for a learner)body movements. Above definitions are slightly different from those found in literature (e.g.[8, 20, 47]). I explicitly refer to a learner and a teacher viewpoints, as well as I also extend themto more complex movements so that they are still fairly complete descriptions of the body and theobject movements when copying actions. In particular, I consider a complex action as imitation ifany part of the action involves meaningless body movements (for a learner).

Following [47], intention is the action plan which a teacher chooses and commits itself to inpursuit of a goal, so that intention includes both the plan of the action (means) and the goal ofa teacher. The problem is that the goal chosen by a teacher is an abstract entity which does notneed to be understood by a learner. Moreover, there is a discrepancy between the intended and theactual outcome of the action. For example, from a teacher point of view, opening a box can resultin success, failure or accident, and each of them can be understood by a learner as the actual goalof a teacher. Unfortunately, intention of a teacher can be also more abstract - not just opening abox, but for instance turning someone’s attention to it.

To distinguish between the action goal and the whole plan of an action (plus means) psycholo-gists introduced desire [47] - the goal of an action without involvement of a plan. Much of recentresearch in psychology focuses on infants’ ability to differentiate between desires and intentions[47].

4.1 Action understanding

In general, action understanding can be defined as a learner’s ability to correctly recognise anintention of a teacher. So that, for instance, even supposing a teacher fails to open a box, a learnerwill still be trying to open it. There is yet another complication here - why should a learner(e.g. an infant) try to open the box instead of just imitating a teacher’s failure to open it? Oneexplanation could be that a teacher never fails and her/his actions are perfectly transparent, butit does not seem to be sufficient. It is known that infants react from birth to various other meansof communication like gaze direction or emotional reactions. Indeed, it has been shown that 14-18month-old infants are more likely to copy intentional actions rather then accidental ones whenthey were marked vocally as intentional “there!” and accidental “woops!” [6]. Interestingly, infantsnot only can discriminate between success and failure of an action (trying), as it was shown inexperiments with habituation to a “jumping dot” [9], but also between success and accident of an

18

action (accidents). In addition, a 9 month-old infant can discriminate between “unwilling” and“unable” reasons for an action accident [47].

According to Tomasello, there can be distinguished three levels of action understanding inchildren [47]:

Acting animately. A learner perceives the difference between animate self-produced motion of ateacher and inanimate one, non autonomous. There is neither goal nor plan understanding.Nonetheless, infants are able to recognize a human face from birth, and soon after they tendto look in the same direction as other persons.

Pursuing goals. A learner understands the goal of a teacher, as well as the result of an action -success, failure or accident. 10 month-old infants look at adult’s face when she/he teases herwith a toy [47]. Infants recognise that an adult sees things which are related to her/his goal.

Choosing plans. A learner understands not only the goal and the plan of a teacher, but thepossibility that a teacher may have multiple plans. At this stage infants know which objectsan adult attends in order to pursue her/his goal, i.e. they are able to predict what an adultwill do when observing her/his activity.

It also seems that infants are learning causality acquiring first a kind of “spatial understanding”at level 2 (where is the goal), and then later “temporal understanding” at level 3 (what to expectnext - prediction).

4.2 Social interaction

How is then any interaction between two individuals possible? This is not a trivial problem, sinceadults possess already well developed attention mechanism [32] which filters out our perceptionin both a bottom-up (salient features, motion) and a top-down (task relevant features) ways.Furthermore, according to the “world as an outside memory” hypothesis [34] [33], which claimsthat we do not have picture-like memory, attention mechanism might serve then as a kind ofassociative memory controller17. The process of development of this mechanism can be betterunderstood from two person’s perspective, i.e. how the shared intentionality develops due tocollaborative interaction.

Tomasello distinguishes three participation levels [47], which correspond directly to three levelsof action understanding presented in the last section:

Dyadic engagement. Sharing behaviour and emotions. Infants engage in protoconversationssharing with adults not only look and smile (vision) but also touch and voice. It is argued [46]that protoconversation may involve the exchange of emotions between different modalities:e.g. if an adult expresses happiness facially, an infant responses vocally. It visible the shiftof focus of attention of infants from themselves only (at birth) to the outside world later on,which however does not involve objects but other humans (dyadic engagement).

Triadic engagement. Sharing goals and perception. The infants’ focus of attention can beshifted now towards objects in the outside world. Interestingly, humans’ infants demon-strate much greater interest in playing the join game then nonhuman ones, handing to anadult objects or even suggesting new goals [47].

Collaborative engagement. Joint intentions and attention. Infants know what objects adultschoose to attend to (their plans), and as it was shown in [38] from about 14 months, infants

17Stabilising the dynamically changing attractor through eye saccades.

19

understand their own role in the joint social game, prompting reluctant adult to reengageand even taking a turn for her/him. Moreover they understand pointing gestures18 - theyeven try to actively establish joint attention through declarative pointing (see [47]).

4.3 Knowledge transfer and the learning problem

Even though, it is not entirely clear yet what are the critical elements which make humans differentthan chimpanzees (in psychology)19, we are all adapted to receive and transfer knowledge. Pedagogyhypothesis [8] states that ability to teach is a primary one, independent, even earlier then the abilityto attribute mental states. The main evidence comes from experiments which show that certaininnate infants’ behaviours are clearly either suboptimal or unnecessary (but they do exist):

1. Neonates can recognise faces, but only the upright ones - they are looking not just for facesbut for potential teachers (emotional contact).

2. Infants can follow the gaze direction of adults, but they are very inaccurate in the targetobject’s localisation.

3. Infants imitate even though they know simpler means to achieve the same goal. Infants wereimitating the behaviour the adult by touching the box with their heads, although they knewsimpler means to light up the box [29].

This suggests that their function is to “bootstrap” a learning process, i.e. infants are sensitiveto certain stimuli, have tendencies and are able to extract only “compatible” information.

Pedagogy is a general framework for social learning which explicitly assumes manifestation ofknowledge by a teacher, and interpretation of this manifestation in terms of contained knowledgeby a learner. Its minimum requirements are:

1. An ostentation which accentuates not only that a teacher is about to transfer a knowledge,but also the fact of knowledge manifestation - the knowledge transfer is costly.

2. A reference which specifies through assigning the referent, the scope of a knowledge to betransferred - the knowledge can be out of the current context. The simplest form of theassignment of a referent can be a gaze-shift or pointing.

3. A relevance which requires a teacher to address the problem of ambiguity of a transferredknowledge, i.e. a knowledge should be possibly relevant and novel to a learner.

From the system theory point of view, we can describe an agent as a system which has an input(sensors), an output (actuators) and an internal state. The forward problem is the problem ofestimating the output knowing the internal state. Conversely, the inverse problem is the problemof estimating the internal state having the output of a system. Thus, a learner has to solve theinverse problem what can be very difficult - behaviour of a teacher can be explained in a potentiallyinfinite numbers of ways. A teacher, however, has to solve not only the inverse and the forwardproblem, but also has to analyse the current knowledge of a learner in order to prepare a suitablecontent of the transferred knowledge (e.g. maximising the overall learner’s fitness). Therefore, it isclaimed [27] that teaching requires metacognitive access to her/his own knowledge - she/he needsto create metarepresentation of self-knowledge20.

18They can be thought of as ”remnants” of unfinished actions, what requires capacity to predict its imaginaryconsequences.

19It is pointed in [47] that humankind is the only species which developed different roles acting in a group, whene.g. defending from predators or during hunting.

20In our case a robot is only a learner.

20

Figure 12: The imitation mechanism as described in associative sequence learning theory (ASL)consists of a horizontal sequence of sensory-motor associations.

4.4 Models and functions of imitation

While gaze direction or pointing gestures are examples of teacher’s clues, imitation is related inher-ently to a learner. Imitation requires the capability to perform perceptual-motor translation, i.e.a learner must be able match exteroceptive stimuli with her/his corresponding motor activities.However, experiments showed that infants are able to match with an adult facial expressions, e.g.tongue protrusion (opaque movement), which give only proprioceptive feedback. There have beena few theories proposed to explain the problem of transparent and opaque movements. In partic-ular, associative sequence learning theory (AIM) [20] addresses the fact that imitation mechanismis highly experience-dependent, and explains “chameleon effect” - the example of unintentionalimitation21.

Vertical sensory-motor associations can encode innate movements (smiling, yawning etc.), aswell as novel ones arising from learning sensory-motor correlations. According to ASL, sensory-motor associations can be activated by both sensory-dependent and sensory-independent stimuli(e.g. chameleon effect). Furthermore, the activation caused by action observation can be alwaysinhibited by intentional actions. More importantly, ASL is similar to many state-of-the-art roboticimitation frameworks.

Imitation deficit in autism [21] suggests its importance in human development, but there is stillno common agreement so as to its function. For instance, according to [2] during imitating thetransitive actions (supposedly meaningless), the infant can acquire simultaneously the first andthe third person experience, what is necessary to distinguish between its own and the third-personperspectives.

Imitation allows a learner to self-experience the state of a teacher, even though the goal of anaction is not clear at the time of a performance. Moreover, there can be no goal at all as in theexperiment with lighting up a box, or the goal can be clear while the plan (means) is unknown -e.g. a child trying to connect two toy carriages22. In this context, performance of the remembered“imitation patterns” (or their various combinations) can contribute to further understanding of theperformed actions (intentions)23. Furthermore, imitation performed by a learner is an importantfeedback, which enables a teacher to further refine transferred knowledge.

21Chameleon effect is an unconscious imitation of facial expressions, gestures and mannerism, which frequentlyoccurs in human adults.

22See introduction23by slight self-modification of them in order to match with the observed result

21

4.5 Action understanding and imitation in neurobiology

The discovery of mirror neurons in the monkey’s brain renewed discussion about the role of im-itation in action understanding. In neurobiology action understanding is defined as the capacityof a brain to internally describe an action in a way allowing its later correct reproduction [37].Additionally, an action can have a goal only as a physical target - the trajectory endpoint. Twohypotheses were proposed to explain how action understanding occurs [20]. Visual hypothesis saysthat action understanding is achieved by analysing the visual stimuli with no motor involvement,while direct-matching hypothesis claims that there is a direct mapping between action and thecorresponding visual stimuli.

Mirror neurons have been originally found in the ventral premotor cortex of monkeys called areaF524. Mirror neurons code goal-directed motor acts and discharge (activate) both when monkeymakes any particular action and when observes another individual (including humans) making it.The response of neurons varies and can be sometimes very strict, as well as less congruent, whatcan be interpreted as a kind of action generalisation [37]. Mirror neurons do not discharge whenthe experimenter uses pliers to grasp an object [37]. This implies a strict correspondence betweenend-effectors of the monkey and the experimenter. Furthermore, more then half of mirror neuronsare also discharging during experiments where the target of a movement is hidden [37]. This maysuggest that monkey knew the meaning of the action despite hidden conditions.

There is also a group of STSa neurons [37] which responses only to passive action observationbut not during active movement. Similarly to mirror neurons, there seems to be a qualitativedifference depending on whether the action is goal-directed or not. Interestingly, STSa neurons donot respond if the experimenter performs a reaching action while looking away from the intendedtarget [37]. The gaze direction is a strong indicator of the projected target, and plays crucial rolein social interaction (shared attention [20] [47]). This is a strong support for the visual hypothesis.On the contrary, it is absolutely not obvious how the complex response patterns of STSa neuronsmight have emerged, without any “validation” as it is the case with mirror neurons [37]25.

Though the areas F5 and STSa are not directly connected, there are both connected to theinferior parietal lobule - area PF. PF contains neurons which respond selectively to observed actionsincluding reaching, placing, grasping, holding and bimanual interaction [37]. Some of them respondto single actions, while the others to two actions like grasping and releasing. More importantly,about two-third of them are also discharging during action execution, and for this reason they arecalled PF mirror neurons.

Furthermore, STSa is a part of a neuronal circuit that contains amygdala and orbitofrontalcortex. This circuit is probably involved in social interactions - e.g. neurons in amygdala arestrongly responsive to threatening facial expressions [37].

To summarise, if the action requires action understanding, motor neurons coding the actiondischarge (also in PF area), otherwise other regions become active. If the stimuli contains emotionalcontext the amygdala gets activated. It was shown [37] that if the action was known to a monkey,activation in all premotor areas is significantly stronger than in the case the subjects have to merelyimitate it without understanding.

Another interesting finding [37] suggests that goal-directed actions (motor acts) have theirnatural “atomic” representation, whilst their sequences (motor actions) - such as reaching forfood, grasping it, bringing it to the mouth - do not.

24F5 contains also so-called motor neurons and neurons which discharge only in response to visual stimuli likepresentation of 3D structures [37].

25“Motor feedback”.

22

5 An artificial toy world - scenarios of experiments

An artificial toy-world is a closed world, limited only to the certain set of actions, objects andcommunication signals for both a robot and a teacher. Despite these limitations, the toy worldis hopefully capable to capture the most important and basic issues of the complexity of objectmanipulation of the real unconstrained world. Although, there is a continuum of possible time-changing configurations of a robotic arm and objects on the scene, there exists only finite numberof different types of interactions which we perceive as qualitatively different. They can be conve-niently depicted by displays - sequences of action snapshots used to catalogue perceptual causalityphenomena [43]. While certainly displays do not form an exhaustive description of what canactually happen in the real world, they may suggest a few interesting things.

Toy-world conjectures

1. There is only a limited number of qualitatively different26 scenarios in so defined toy-world.

2. They can be sorted according to the level of difficulty or competence a robot should have inorder to understand them.

3. The toy-world is complete in the sense of a limited number of possible types of scenarios.

The type of scenarios of experiments, especially their organization and hierarchy are not onlylinked to cognition problem, but also they are closely related to representations and architecture ofa robot (and vice-versa). Therefore to avoid the chicken and egg problem, I have assumed severalconstraints concerning the algorithm as well.

General experiment prerequisites

1. All movements of objects are practically limited to 2D space, as a consequence of the robot’sability to only push/drag and poke on a flat surface. In the basic version, a robotic arm canmove only in 2D, although I am planning to relax this assumption27.

2. Although, the arm control is limited to kinematics, an elastic finger equipped with force/torquesensors offers sufficient manipulation flexibility. This is an approximation of a manipulationwith light objects, which do not influence arm’s trajectories in a direct way. It seems to bea reasonable compromise, regarding the main goal of this research28.

3. A robot playground is a flat and smooth surface, with constant, uniform and non-zero coeffi-cients of friction (static and kinetic) enabling a suitable, reach and diverse interaction betweenobjects and a robotic arm. I am also considering introducing some colour landmarks (fields).

4. Objects are the main factor regulating the overall complexity of the toy-world. In general, Ido not assume any particular restrictions here, apart from that, they have to be manipulablerigid bodies (not too big, small, heavy, etc.), must have uniform colours as well as the allarising interaction complexity can be uniquely mapped onto their (shape) structure and/ortheir colour29.

26In the sense of possible displays.27This step may require providing stereo vision.28There is an ongoing discussion about the role of dynamical properties of a robot’s body in understanding the

spatial structure and causality [44].29Even this assumption can be slightly relaxed during experiments.

23

General algorithm prerequisites

1. It is a learning algorithm, i.e. it implies that the manipulation competence can be achievedonly by trial-and-error practise. This requires a robot to play with objects in a way dependingon its current manipulation skills and knowledge about objects.

2. It is an incremental algorithm, i.e. it requires a robot to learn in an incremental fashion, sothat each next chunk of knowledge must be “learnable”. As a result, the learning processcan be controlled by the amount of a robot’s innate knowledge, therefore it can begin at adifferent level of competence, i.e. not necessary from the scratch. For instance, the simplestscenario can deal with only simple straight movements, or object poking. This is also a wayfor incremental building and testing the entire system.

3. Exteroceptive input involves vision. I assume idealised conditions - no shadows, reflections,unambiguous environment and objects etc. In the basic version, I am considering onlya single, fixed, wide-angle, colour camera (simulated fovea). Alternatively, I will use twocameras (wide and narrow angle) mounted on an additional robotic arm. This step may turnout to be necessary to learn space invariance.

4. Proprioceptive input involves force/torque sensors.

5. Optionally, for the evaluation feedback and for diagnostic purposes I am considering anadditional communication channel between a robot and a human teacher.

Equipment

1. Katana 6M robotic arm [1] with integrated PID controller. In the basic version the end-effector is equipped with an elastic finger with regulated stiffness (e.g. with elastic tendons)and force/torque sensors, as well as with a LED to disambiguate a location of a finger.

2. Human teacher manually controls an end-effector similar to the one mounted on the arm(LED in different colour). Alternatively, the robotic arm is controlled with a joystick.

3. In the basic version, a firewire camera with frame rate at least 30 fps (allowing the systemto capture some subtleties of the real world interaction) and resolution at least 1024 by 768pixels (to cover the whole scene with reasonable resolution). Alternatively, two cameras (wideand narrow angle) with frame rate 30 fps and resolution 640 by 480 pixels, mounted on anadditional robotic arm (e.g. Lynx).

Although, I am planning to use many various objects with different shapes and colours, inthe basic version there are four objects in the same uniform colour. The objects’ proportions arechosen in such a way that a robot is not able to knock them over, i.e. there will be always possibleto return to the initial configuration of the objects on the playground30.

Types of objects

Regarding the above conjectures and prerequisites, I have selected three general classes of scenarios,sorted according to the level of understanding of the laws of dynamics. The scenario ordering(especially sub-scenarios) and the ways a robot acquires new knowledge is only hypothetical31,and depends on various factors in particular on the learning algorithm itself.

30All actions are reversible.31I do not mean to follow any psychological findings here.

24

Figure 13: Basic set of objects used in experiments: cuboid, cylinder, ball and truncated cone.

5.1 Moving arm (no objects)

Scenario sketch

This scenario involves a robot showing the ability to imitate teacher’s movements. A robot is ableto roughly replicate the trajectory location, shape and orientation but without any understanding(imitation).

Scenario ontology

Actors: a robot and the human teacher.

Trajectories: straight lines and curves with a bell-shaped velocity profile.

Actions: the trajectory in a free space.

Goals: the trajectory itself.

Preferences: an overall activity location (the equilibrium point).

Basic relations: distance between the end-effector and the playground boundary (a robot’s reach)or the equilibrium point (the preferable area on the playground).

Robot learning

� Inverse kinematics (mapping from the workspace to the joint space).

� To improve competence in imitation/self-modification/self-generation of trajectories in theworkspace (through self-exploration).

� A correspondence between a teacher and a robot’s frame of reference (location and orienta-tion).

� To localise the playground boundaries and the equilibrium point.

� Communication signals between a robot and a teacher (ostentation and reference) necessaryto recognise the beginning and the end of a teacher’s demonstration.

25

Teacher responsibilities

� Presenting various types of arm trajectories (modifying their shape, orientation and locationin space) - straight lines and curves.

Scenario displays

The scenario display 1: A robot is required to imitate a movement of a teacher - a straighttrajectory with a bell-shaped velocity profile. A robot guided by curiosity tries to modify learnttrajectories, as well as to create new ones. The interestingness [7] mechanism redirects a robot toexplore the trajectories which may seem to be e.g. very different or difficult to generate becauseof the playground boundaries.

5.2 Poking objects (first contact)

Scenario sketch

This scenario involves a robot showing the ability to poke/prod objects. A robot understands agoal of the reaching action as a target object itself. When copying an observed reaching action, arobot is able to roughly replicate a location, shape and orientation of a trajectory by comparing itsown and a teacher’s frame of reference. A robot understands the fact of poking (and not poking)of a target object, and can successfully replicate it from a demonstration. A robot understandsthat objects can be out of reach, but does not relate it to its own actions.

A robot also knows how to reach an object from any direction, also in proximity of the play-ground boundaries. This requires trajectory planning, however without considering any possiblecollisions.

Scenario ontology


Objects: the default objects’ list (robot cannot recognize particular objects yet).

Locations of objects related to: the arm (relative), the playground boundaries/the equilibriumpoint (absolute).

Trajectories: straight lines and curves with a bell-shaped velocity profile.

Actions: the trajectory in a free space, the reaching action (with a target object).

Goals: the trajectory itself (goal undirected actions), the target object (proximal and target di-rected actions, pointing actions), the playground boundary, the equilibrium point.

26

Preferences: a trajectory type (location, shape, orientation), an overall activity location (theequilibrium point).

Basic relations: distance between the end-effector and an object/playground boundary/equilibriumpoint, object hit (binary).

Robot learning

� To understand that objects have boundaries: persistence of objects in space, tactile sensations

� To improve competence in imitation/self-modification/self-generation of object related tra-jectories especially in the proximity of the playground boundaries.

� To understand that some objects can be out of reach.


� Presenting various types of the reaching action by changing the types of trajectories (shapeand orientation) and type of objects involved.

� Using a suitable feedback to highlight the cases which were not understood or exploredproperly by a robot.

Scenario displays

The scenario display 1: The scenario for the reaching movements is the same for all objecttypes since a robot does not understand yet their properties/affordances. To avoid e.g. prematureconvergence of the learning algorithm, a teacher can choose objects randomly performing differentkinds of reaching actions. In result, a robot perceives objects’ properties as a random pattern,and focuses mostly on trajectory imitation. A robot also learns how to generate alternative goal-directed trajectories in cases where an object is close to the playground boundaries or just preferringtrajectories close to the equilibrium point.

The negative scenario display 2: Certain trajectories which miss “a target object” in con-junction with a suitable feedback signal from a teacher, allows a robot to elaborate on the notion ofthe goal-directed actions. Namely, there exist three classes of actions in the presence of an object:the target directed actions, the target undirected actions (just trajectories in a free workspace),and the “proximal actions” which can be later extended to the pointing actions.

27

5.3 Poking objects (understanding shape and kinematics of objects, un-derstanding consequences of actions)

Scenario sketch

This scenario involves a robot showing the ability to poke/prod objects exploiting their (salient)affordances. A robot understands that each object exhibits different behaviours - it knows that anobject can have multiple affordances. For instance, a cylinder can be rolled, rotated and moved(without rolling) along its long axis. A robot knows that some objects’ affordances are similar butnot identical. For instance, a cylinder and a cuboid can be both moved along their long axis (butrolling is a salient affordance of a cylinder), a cylinder and a truncated cone can be both rolled(but results are different). In order to represent these cases, a robot classifies affordances accordingto their salience and interestingness.

A robot begins to generalise and it knows that some object features (their spatial structure)make an object to behave in a specific way (i.e. rounded-like objects tend to roll or long objectscan rotate or can be poked in certain ways). A robot tries to create the mapping between anobject shape structure and its corresponding behaviours, but only in statistical sense, i.e. a robotknows that by poking an object in some places can achieve statistically reproducible behaviourand knows how to enhance it. The robot’s ability to understand object affordances comes togetherwith understanding consequences of its own actions. Although, the first “symptoms” of actionunderstanding might have appeared in the previous scenario32, at this time a robot knows thatpoked objects tend to move into certain directions, as well as that they can be pushed out of theplayground boundaries. This means that the goal of an action may not be any longer associatedonly with an object or its affordance, but also with a desired location - a “virtual entity” whichlies beyond the object itself. Unfortunately, a robot does not know yet a reliable way to control amovement direction, even though while playing with a ball a robot discovers the rule (of inversion)that to move an object into any desired direction it is enough to poke it into this direction.

A robot understands that contact points may not be accessible in a straight line from thecurrent arm location, so that it knows how to plan a suitable trajectory. However, a robot doesnot understand yet planning on an action level, e.g. how to plan a sequence of actions to move anobject to any location.

Scenario ontology


Objects: the default objects’ list.

Locations of objects: the arm (relative), the playground boundaries/the equilibrium point (ab-solute).

32“Something is happening after my action”.

28

Locations of contact points along with their affordances (relative to the object reference frame).

Trajectories: curves with a bell-shaped velocity profile.

Actions: the trajectory in a free space, the reaching action (with a target object as a whole or aparticular contact point).

Goals: the trajectory itself (goal undirected actions), the target object (proximal actions, pointingactions, target directed actions), the playground boundary, the equilibrium point, exploita-tion of any particular object affordance, obtaining any particular object behaviour (e.g.rotation, translation).

Preferences: the trajectory type (shape, orientation, length), an overall activity location (theequilibrium point), an enhancement level of any exploited affordance (controlled by a hitstrength and by deviation from a contact point).

Basic relations: distance between the end-effector and the object/playground boundary/equilibriumpoint/contact point.

Robot learning

� To understand that objects have boundaries: persistence of objects in space and time (mo-tion), tactile sensations.

� To understand the frame of reference of the object (location and orientation) and the mapof contact points.

� To discover the fact that when trying to reach an object the arm can hit another object onthe way.

� To improve competence in reaching to the object contact points using various self-modified/self-generated trajectories.

� A relation between objects’ shape features and corresponding affordances (generalisation).

� To understand the consequences of its own actions after exploiting any particular objectaffordance, but without capability to project a resulting object pose back into an action (noplanning).


� Presenting various ways of exploiting object affordances by changing the types of trajectories(shape, orientation, object hit strength) and type of objects involved

� Using a suitable feedback to highlight the cases which were not understood or exploredproperly by a robot

29

Scenario displays

The scenario display 1: A cuboid is poked along its long axis what results in statistically largevalue of the ratio of the travelled distance to the rotation. Although, without playing with otherobjects robot cannot generalise yet, it finds that this affordance can be enhanced if an object ishit centrally, along a trajectory parallel to the object’s long axis. On the other hand, if an objectis hit from the side at large angle (to the long axis) the ratio takes the smallest values.

The scenario display 2: A cuboid is poked from the side, perpendicular to its long axis, whatresults in statistically small value of the ratio of the travelled distance to the rotation. Regardingthe previous scenario and taking into account that an object can be fully characterised by its lengthand width, a robot can hypothesize that:

1. To obtain a maximum rotation, an object should be hit from its longer side perpendicular toits long axis.

2. To obtain a maximum translation, an object should be hit centrally either to its shorter sideparallel to its long axis or to its longer side perpendicular to its long axis33.

During the above experiments robots learns the most reliable reference frame of the object(long axis). In addition, it learns that a cuboid consists of more than just two types of sides (longand short) - the edges. Unfortunately, it is more difficult to find any clear pattern of behaviour byhitting edges34.

33However, by playing with various similar objects a robot might soon reject the second case as unreliable.34Edges also yield a different tactile response.

30

The scenario display 3: A cylinder is poked centrally to its longer side (perpendicularly to itslong axis) what results in a surprisingly strong response (rolling). After further self-experimentationa robot discovers that apart from its strong response an object is very similar to the previous one(a cuboid). A robot may want to play again with both objects to compare them again and to refineits hypotheses35. A robot may discover that the strong response - rolling36 - can be characterisedas a parallel movement. A robot confirms the previous hypotheses as for rotation and translation,adding the new one that rolling is connected to a rounded-like shape of cylinder (perspective viewof the camera).

The scenario display 4: A ball is poked centrally what results again in a strong response(rolling). A rounded-like shape of an object confirms the rolling hypothesis. After further self-experimentation a robot discovers that an object always rolls and if it is hit centrally it movesalways in the direction of the movement of the end-effector. In effect, a robot discovers the rulethat to move a ball into any desired direction it is enough to poke it into this direction.

5.4 Pushing objects (understanding dynamics of objects, action plan-ning)

Scenario sketch

This scenario involves a robot showing the ability to push objects, i.e. it is able to stay with a longand stable tactile contact with objects. To push an object instead of just poking it, a robot needsto understand a relation between the response from force/torque sensors, the end-effector velocityand stiffness (while touching an object). In theory, if we knew a Lagrangian/Hamiltonian of thesystem (the robotic arm + an object) it would be enough to solve Lagrange/Hamilton equationsto determine the system behaviour, without using any sensors. For instance, one can calculatea desired torque when moving along a surface of a known permanent object [31]. Unfortunately,this is feasible only in a very few idealised cases. A robot is equipped with force sensors and theelastic finger with regulated stiffness. The force acting on an object (measured by sensors) canbe controlled by the stiffness and velocity of the elastic finger. The elasticity of the finger allowsignoring small fluctuation of the distance and speed between the arm and the object37.

A robot is capable to generate a large variety of trajectories by changing its location, shape,orientation and also velocity profile. While the location-directed bell-shaped trajectories are opti-mal for target-directed actions [17], the interactive and orientation-directed trajectories are moresuitable for pushing actions. They are product of interaction between the elastic finger and objects.

A robot can recognise some object properties not only from vision but purely from proprio-ceptive feedback. For instance, to apply a constant force on a rolling-like object, a robot needs to

35Request to teacher to give a cuboid-like object again.36occasionally occurring throughout whole experimentation37Human’s arm is a nonlinear device, and it is difficult to model it by simple spring-like devices, which has similar

linear characteristics for different stiffness levels.

31

speed up the end-effector with almost constant acceleration38.A robot is able to enhance affordances in a more sophisticated way by staying in a longer tactile

contact with objects. For instance, rolling a cylinder does not require any more a precise estimateof the midpoint of a cylinder longer side.

A robot understands that to be able to push an object, it has to combine two actions:

1. A reaching action (proximal action) directed towards a particular contact point (in order toexploit its affordance).

2. A pushing action itself.

As a result, a robot learns strategies how to poke and push objects from all directions, whatrequires planning on both a trajectory level and an action level. Furthermore, a robot discoversthat there are more efficient ways of pushing objects if only one suitably reacts to deviations fromthe movement direction.

A robot knows how to push an object (not only a ball) in order to move it into a desireddirection, i.e. having the current reference frame of an object, a robot is able to project back adesired move into an action. Since this manoeuvre rarely make use of ”reliable” object affordances,a robot learns that an object can be moved to a new location by applying a sequence of separatepushing or poking actions - for instance by combining only two actions: the first one - to roughlyalign an object towards a target location (what requires choosing an affordance which will beexploited during pushing action), and the second one - to push an object towards a target location.

Scenario ontology


Objects: the default objects’ list.

Locations of objects related to: the arm (relative), the playground boundaries/the equilibriumpoint (absolute).

Locations of contact points along with their affordances (relative to the object reference frame).

Trajectories: curves with a bell-shaped velocity profile, interactive trajectories with a velocityprofile determined by required sensor response.

Actions: the trajectory in a free space, the reaching action (with a target object as a whole or aparticular contact point), pushing actions.

Goals: the trajectory itself (goal undirected actions), the target object (proximal actions, pointingactions, target directed actions), the playground boundary, the equilibrium point, exploita-tion of any particular object affordance, obtaining any particular object behaviour (e.g.rotation, translation).

Preferences: the trajectory type (shape, orientation, length - e.g. object hit strength), the overallactivity location (the equilibrium point), the enhancement level of any exploited affordance(controlled by applied force and by deviation from a contact point).

Basic relations: distance between the end-effector and the object/playground boundary/equilibriumpoint/contact point, applied force/torque.

38Assuming no slippage, it is easy to calculate (Newtonian physics) a mass of an object and its moment of inertia.

32

Robot learning

� To understand a relation between the response from force/torque sensors, the end-effectorvelocity and its stiffness.

� To generalise the above relation with respect to various objects and their affordances, and tocreate corresponding classes of interactive trajectories.

� To refine a recognition of objects’ affordances using vision and proprioceptive feedback.

� To refine the ways of exploiting of object affordances.

� To improve competence in pushing objects using complex self-modified/self-generated tra-jectories.

� To combine two and more trajectories avoiding collisions with goal-unrelated objects.

� To plan how to move an object into a new location using a sequence of poking or pushingactions.


� Presenting a new way of exploiting object affordances by pushing.

� Using a suitable feedback to highlight the cases which were not understood or exploredproperly by a robot.

Scenario displays

The scenario display 1: A cuboid is pushed perpendicularly to its long axis. A robot discovers(perhaps through imitation) that if the initial velocity and stiffness of the end-effector is smallenough, an object does not rebound from the end-effector and starts to move. Furthermore,a robot finds that it is possible to have a longer tactile contact by simply continuing its ownmovement with a constant-like velocity. A force sensed on the end-effector quickly gets smalleronce the object starts rotating just before loosing a tactile contact with it. In order to move anobject on a larger distance, it needs to be pushed close to the midpoint of its side, possibly theshorter one (this is an affordance)39.

The scenario display 2: A robot tries to move a cuboid perpendicularly to its long axis on thelargest possible distance, and learns that instead of trying to find the midpoint of the side of acuboid, it is much better to react on each deviation of an object a suitable response directed alongthe side of an object. Later, a robot finds that by applying this technique a cuboid can be movedin any direction, regardless of the initial orientation of a cuboid.

39as it was the case with object poking.

33

The scenario display 3: A cylinder is pushed perpendicularly to its long axis. A robot findsthat if the initial velocity and stiffness of the end-effector is small enough, a cylinder does notrotate but starts to roll in a direction exactly perpendicular to its long axis while still staying ina permanent contact with the end-effector. A robot also finds that it is possible to have a longertactile contact by simply following an escaping cylinder. This results in high speeds of a rollingcylinder, and a robot may like it!.

6 Conclusions

In this report I have tried to outline the most important issues related to my PhD topic, instead ofrather choosing any promising framework40. Consequently, at this stage of research, I have focusedon psychological findings, general problems of motor control, and experiments.

I have found that the key to understanding of the role of imitation in learning (not only objectaffordances) is the understanding of the process of learning, which ultimately leads to learningby emulation. Imitation learning is an incremental process, which gradually reduces unknown“behavioural patterns”, in particular, these related to understanding of goals of actions. Imitationlearning is still present long after our birth, but is more related to acquisition of skills. Goodexamples are playing tennis, juggling or riding a bike.

I am looking for a new sensorimotor framework, which will allow a robot to achieve a highmanipulation competence by gradual learning from self-experience, as well as from imitation. Thismanipulation competence can be achieved only by understanding interactions with the toy world,i.e. action understanding41.

6.1 Hypothesis

A robot understands certain subset of its own actions (all kinds, including passive observation ofobjects), if it has such qualitative descriptions of actions and objects, that the information theycontain is sufficient for a robot to perform reliably these actions (with the probability of successclose to one). A robot is able to create control policies basing on these descriptions, as well as onsome innate knowledge.

40There are very few, if any.41This includes understanding objects’ properties and scene configuration.

34

A robot can learn entirely new skills (development), therefore these qualitative descriptions hasto be “redundant” to some extent42. This means that there is a trade-off between manipulationreliability (ignoring some unnecessary descriptions), and the ability to perceive novelty (new con-figurations of emerging descriptions). Novelty can be introduced by testing some of descriptionsin novel control policies - for instance from imitation or self-exploration using interestingness. Ineffect, the whole process can be seen as mutual bootstrapping of descriptions and control policies,controlled by complexity of robot’s environment.

Generalisation allows a robot to apply similar actions in similar situations in order to achievethe same goals. This ultimately leads to the ability to recognise whether the current situation isdifferent qualitatively or just quantitatively, and to the ability to respond to this changes. Thus,generalisation requires a notion of similarity among descriptions and policies on various levelsof abstraction. For example, poking actions can be thought of as similar to each other, as longas a target object is touched (action); a cylinder, a truncated cone and a ball can roll, however atruncated cone cannot be rolled straight, a ball can roll in all directions, a cylinder and a truncatedcone can rotate in a different way that a ball, etc. (affordances)43.

6.2 My future work

I have collected a few areas which I am going to explore in my further research. I am interestedin:

� Building blocks for qualitative descriptions of actions and objects in the toy word domain.Building blocks can be based on displays used in modelling of causality [43], trajectories (e.g.similar to DMPs [22]), geometric primitives/object features (e.g. derived from SIFT features[28]), etc.

� Similarities and symmetries among actions and objects’ properties (continuous and discon-tinuous); constraints arising due to law of physics (also causality); descriptions’ invariancedue to orientation and location changes.

� Reinforcement learning in continuous time and space [10], which has recently been usedin HMOSAIC architecture [11]. A problem of constructive building of control policies andreward models [11].

6.3 Evaluation of my work

I consider the following criteria as relevant in evaluation of my research work.

Framework

A framework should allow to:

Perception and action ontology. Establish a link between action understanding (as an equiv-alence of reliability of action execution) and its qualitative description.

Causality. Learn (to some extent) and generalise law of physics.

Learning. Learn from self-exploration and observation novel properties of objects, actions (means),and goals.

42Vide Vygotsky’s Zone of Proximal Development [49].43The idea is closely related to Grdenfors’ idea of conceptual spaces defined as a collection of domains [15]. Each

domain (e.g. representing colour or shape) must have defined a distance measure. An object and a class of objectscan be characterized suitably as a point and as a region of a space.

35

Generalisation. Generalise these properties into models or generic patterns in hierarchical man-ner.

Planning. Plan motor actions, given a goal and preferences (also through qualitative simulation).

Execution. Robustly execute so prepared plans.

Experiments

There are several possible ways of carrying out the experiments I have described in the report44.There is also a trade-off between the abilities to achieve higher stages of development, and tomanipulate with larger number of different objects. In general, a robot should be able to (reliably):

Manipulation competence. Carry out most of the actions from its repertoire with most of thefamiliar objects.

Fallback. Carry out some of the actions from its repertoire with novel objects, as well as to carryout novel actions with some of the familiar objects.

Intentions of a teacher. Recognize from a teacher demonstration most of the actions and goalsfrom its repertoire.

Novel actions and goals. Learn (not only by imitation) novel actions and goals from a teacherdemonstration significantly faster than from self-experimentation.

6.4 My timetable

September 2005 - December 2005. First robotic experiments. Implementations of some basicvision and manipulation algorithms. Testing the ideas introduced in the report.

January 2006 - May 2006. Further literature review. Cutting down the research topic.

June 2006 - December 2006. Designing a robotic framework. Implementation of the frame-work. Experiments.

January 2007 - March 2007. Analysis of the problems which have emerged during implemen-tation and experiments. Further literature review.

April 2007 - August 2007. Final design and implementation of the framework. Experiments.

August 2007 - December 2007. Writing the thesis.

References

[1] Katana User Manual and Technical Description, 2004. http://www.neuronics.ch.

[2] J. Barresi and C. Moore. Intentional relations and social understanding. Behavioral and BrainSciences, 19(1):107–154, 1996.

[3] E. Bizzi, A. Davella, P. Saltiel, and M. Tresch. Modular organization of spinal motor systems.The Neuroscientist, 8(5):437–442, 2002.

44They will also depend on the robot’s architecture.

36

http://www.neuronics.ch

[4] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithmsfor energy minimization in vision. In EMMCVPR ’01: Proceedings of the Third InternationalWorkshop on Energy Minimization Methods in Computer Vision and Pattern Recognition,pages 359–374, London, UK, 2001. Springer-Verlag.

[5] R. Brooks, M. M. C. Breazeal, B. Scassellati, and M. Williamson. The cog project: Building ahumanoid robot. To appear in a Springer-Verlag Lecture Notes in Computer Science Volume,1999.

[6] M. Carpenter, N. Akhtar, and M. Tomasello. Fourteen- through 18-month-old infants differen-tially imitate intentional and accidental actions. Infant behavior and Development, 21(2):315–330, 1998.

[7] S. Colton, A. Bundy, and T. Walsh. On the notion of interestingness in automated mathe-matical discovery. Int. J. Hum.-Comput. Stud., 53(3):351–375, 2000.

[8] G. Csibra and G. Gergely. Social learning and social cognition: The case for pedagogy.Processes of Change in Brain and Cognitive Development. Attention and Performance., XXI,2005.

[9] G. Csibraa, G. Gergelyb, S. Br, O. Koc, and M. Brockbank. Goal attribution without agencycues: the perception of ‘pure reason’ in infancy. Cognition, 72:237–267, 1999.

[10] K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12:219–245, 2000.

[11] K. Doya, N. Sugimoto, D. Wolpert, and M. Kawato. Selecting optimal behaviors based oncontexts. In Symposium on Emergent Mechanisms of Communication., pages 19–23, 2003.

[12] P. Fitzpatrick. From First Contact to Close Encounters: A Developmentally Deep PerceptualSystem for a Humanoid Robot. PhD thesis, MIT, 2003.

[13] P. Fitzpatrick and G. Metta. Grounding vision through experimental manipulation. Philo-sophical Transactions of the Royal Society: Mathematical, Physical, and Engineering Sciences,2003.

[14] D. A. Forsyth and J. Ponce. Computer vision: a modern approach. Prentice Hall, first edition,2003.

[15] P. Gardenfors. Conceptual spaces. The geometry of thought. MIT Press, first edition, 2000.

[16] Z. Ghahramaniyand and D. Wolpert. Modular decomposition in visuomotor learning. Nature,386(27):392–395, 1997.

[17] C. Harris and D. Wolpert. Signal-dependent noise determines motor planning. Nature,394(20):780–784, 1998.

[18] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. CambridgeUniversity Press, second edition, 2003.

[19] M. Haruno, D. Wolpert, and M. Kawato. Mosaic model for sensorimotor learning and control.Neural Computation, 13:2201–2220, 2001.

[20] C. Heyes. Causes and consequences of imitation. Trends in Cognitive Science, 5(6):253–261,2001.

37

[21] R. Hobson and A. Lee. Imitation and identification in autism. Journal of Child PsychologicalPsychiatry, 40:649–659, 1999.

[22] A. Ijspeert, J. Nakanishi, and S. Schaal. Trajectory formation for imitation with nonlineardynamical systems. In IEEE International Conference on Intelligent Robots and Systems(IROS 2001), pages 752–757, 2001.

[23] M. Johnson and Y. Demiris. Hierarchies of coupled inverse and forward models for abstrac-tion in robot action planning, recognition and imitation. In Proceedings of the AISB 2005Symposium on Imitation in Animals and Artifacts, pages 69–76, 2005.

[24] M. Jordan. Handbook of Perception and Action: Motor Skills. New York: Academic Press,1996.

[25] M. Jordan and D. Wolpert. Computational motor control. The Cognitive Neurosciences.Cambridge, MIT Press., 1998.

[26] M. I. Jordan and D. E. Rumelhart. Forward models: Supervised learning with a distal teacher.Cognitive Science, 16:307–354, 1992.

[27] A. Karmiloff-Smith. From meta-processes to conscious access: Evidence from children’s met-alinguinstic and repair data. Cognition, 23:95–147, 1986.

[28] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal ofComputer Vision, 2004.

[29] A. Meltzoff. Infant imitation after a 1-week delay: Long-term memory for novel acts andmultiple stimuli. Developmental Psychology, 24(4):470–476, 1988.

[30] G. Metta, G. Sandini, and J. Konczak. A developmental approach to visually-guided reachingin articial systems. Neural Networks, 1999.

[31] R. Murray, Z. Li, and S. Sastry. A mathematical introduction to robotic manipulation. CRCPress, 1994.

[32] V. Navalpakkam and L. Itti. Goal attribution without agency cues: the perception of ‘purereason’ in infancy. Vision Research, 43:205–231, 2005.

[33] J. O’Regan. Solving the “real“ mysteries of visual perception: The world as an outsidememory. Canadian Journal of Psychology, 46:461–488, 1992.

[34] J. O’Regan and A. Noe. A sensorimotor account of vision and visual consciousness. Behavioraland Brain Sciences, 24(5):939–1011, 2001.

[35] H.-O. Peitgen, H. Jurgens, and D. Saupe. Chaos and fractals: new frontiers of science. NewYork: Springer-Verlag, 1992.

[36] J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. InHumanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, 2003.

[37] G. Rizzolatti, L. Fogassi, and V. Gallese. Neurophysiological mechanisms underlying theunderstanding and imitation of action. Nature Reviews Neuroscience, 2:661–670, 2001.

[38] H. Ross and S. Lollis. Communication within infant social games. Developmental Psychology,23:241–248, 1987.

38

[39] S. Schaal. The handbook of brain theory and neural networks, chapter Learning robot control,pages 983–987. MIT Press, 2002.

[40] S. Schaal and C. Atkeson. Constructive incremental learning from only local information.Neural Computation, 10(8):2047–2084, 1998.

[41] S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert. Control, planning, learning, and imitationwith dynamic movement primitives. In Workshop on Bilateral Paradigms on Humans andHumanoids, IEEE International Conference on Intelligent Robots and Systems (IROS 2003),2003.

[42] S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert. Learning movement primitives. In Inter-national Symposium on Robotics Research (ISRR2003), 2004.

[43] B. Scholl and P. Tremoulet. Perceptual causality and animacy. Trends in Cognitive Sciences,4(8):299–309, 2000.

[44] A. Sloman. The Priority of Dynamical Systems Hypothesis, 2005. http://www.cs.bham.ac.uk/axs/pds.

[45] A. Sloman and J. Chappell. The Altricial-Precocial Spectrum for Robots. In ProceedingsIJCAI’05, pages 1187–1192, Edinburgh, 2005.

[46] D. Stern. The interpersonal world of infant. Basic, 1985.

[47] M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll. Understanding and sharingintentions: The origins of cultural cognition. Behavioral and Brain Sciences, 2004.

[48] S. Vijayakumar and S. Schaal. Locallyweighted projection regression : An o(n) algorithm forincremental real time learning in high dimensional space. In Proc. of Seventeenth InternationalConference on Machine Learning (ICML2000), pages 1079–1086, 2000.

[49] S. Vygotsky. Mind in Society: Development of Higher Psychological Processes. HarvardUniversity Press, 1980.

[50] M. Williamson. Robot arm control exploiting natural dynamics. PhD thesis, MIT, 1999.

[51] D. M. Wolpert, K. Doya, and M. Kawato. A unifying computational framework for motorcontrol and social interaction. Philosophical Transactions of the Royal Society of London BBiological Sciences, 358:593–602, 2003.

[52] D. M. Wolpert and M. Kawato. Multiple paired forward and inverse models for motor control.Neural Networks, 11(7-8):1317–1329, 1998.

39

http://www.cs.bham.ac.uk/~axs/pds

http://www.cs.bham.ac.uk/~axs/pds

Learning object aﬀordances by imitation

Documents

Transcript of Learning object aﬀordances by imitation