Visual Perception and Reproduction for Imitative Learning ... · We developed a human-like partner...

1 IntroductionNatural communication with a human is requiredfor various types of human-friendly robots such ashumanoid robots, personal robots, pet robots, andentertainment robots [1-9]. The main methods fornatural communication are natural languages andgestures. The theory of relevance emphasizes theimportance of mutual cognitive environments forcommunication [10]. This indicates the high levelof perceptual capability is required for naturalcommunication. One of important roles inperception is to specify or extract an object fromits background. This is deeply related with thefigure-ground problem [12]. According to thetheory of relevance, an object can be specified bynatural language and gestures. And also, Zadehdiscusses the relationship between perception andnatural language in his work on computing withwords [11]. The meaning of an object depends onthe environment as a background and the physicalembodiment treating the object, i.e., how to use itconsidered as possible actions. Therefore, we musttake into account all of natural language, gestures,perception, and action to realize natural

communication. In this paper, we focus on thegesture imitation as a basic level of human-friendlycommunication, and propose a total mechanism ofbehavior acquisition and behavior accumulationthrough the interaction with human from theviewpoint of constructivism.

Imitation is a powerful tool for gesturalinteraction between children [6] and for teachingbehaviors to children by parents. Imitation isdefined as the ability to recognize and reproduceothers' action, and imitation has been also discussedin the research of social learning theory. In general,the social learning is classified into two levels:observational learning and imitative learning [12].The concept of imitative learning has been appliedto robotics [1-6]. Basically, in the traditionalresearches of learning by observation, a motiontrajectory of a human arm assembling or handlingobjects is measured, and the obtained data areanalyzed and transformed for the motion control ofa robotic manipulator. Furthermore, variousbiologically-inspired neural systems have beenapplied to imitative learning for robots [1-3].Especially, the discovery of mirror neurons is very

Visual Perception and Reproduction for Imitative Learning ofA Par tner Robot

NAOYUKI KUBOTA

Dept. of System Design

Tokyo Metropolitan University

1-1 Minami-Osawa, Hachioji, Tokyo 192-0397

Japan

SORST, Japan Science and Technology Agency

http://www.eng.metro-u.ac.jp/prec/SEKKEI/eng/index.html

Abstract: - This paper proposes visual perception and model reproduction based on imitation of apartner robot interacting with a human. First of all, we discuss the role of imitation, and propose themethod for imitative behavior generation. After the robot searches for a human by using a CCD camera,human hand positions are extracted from a series of images taken from the CCD camera. Next, theposition sequence of the extracted human hand is used as inputs to a fuzzy spiking neural network torecognize the position sequence as a motion pattern. The trajectory for the robot behavior is generatedand updated by a steady-state genetic algorithm based on the human motions pattern. Furthermore, aself-organizing map is used for clustering human hand motion patterns. Finally, we show experimentalresults of imitative behavior generation through interaction with a human.

Key-words: - Visual Perception, Partner Robots, Spiking Neural Network, Genetic Algorithm

Proceedings of the 5th WSEAS International Conference on Signal Processing, Istanbul, Turkey, May 27-29, 2006 (pp76-81)

mailto:[email protected]

http://www.eng.metro-u.ac.jp/prec/SEKKEI/eng/index.html

important [1]. Each mirror neuron activates notonly in performing a task, but also in observingthat somebody performs the same task. In thisway, imitation has been applied for learningrobotic behaviors. Rao and Meltzoff classifiedimitative abilities into four stage progression: (i)body babbling, (ii) imitation of body movements,(iii) imitation of actions on objects, and (iv)imitation based on inferring intentions of others[6]. The third stage of imitation was realized in theprevious research. If the robot can perform thefourth stage of imitation, the robot might developin the same way as humans. Actually, we shoulddiscuss how to reproduce the behaviors acquired bythe robot, before discussing the fourth stage ofimitation. Therefore, we focus on behaviorcoordination based on imitation. For this, therobot should have three modes of human search,interaction with human, and imitative learningmotions at least. First of all, the robot detects ahuman, and extracts his or her hand motion byimage processing. Next, the hand motion isrecognized as a gesture by using a spiking neuralnetwork [13] and a self-organizing map [14].Furthermore, a steady-state genetic algorithm(SSGA) [15] is used for generating a trajectorysimilar to the motion of the human hand. Finally,the acquired motion patterns are incorporated intothe behavior coordination of the robot accordingto the sensory inputs. We discuss the interactivelearning between a human and a partner robotbased on the proposed method through experimentresults.

This paper is organized as follows. Section2 explains the imitative behavior generation andthe behavior coordination of a partner robot.Section 3 shows several experiment results of thepartner robot based on the imitative learning.

2 Imitation and BehaviorCoordination2.1 A Par tner Robot and Visual PerceptionWe developed a human-like partner robot Hubot[9] in order to realize the natural communicationwith a human (Fig.1). This robot is composed of amobile base, a body, two arms with grippers, and shead with pan-tilt structure. The robot has varioussensors such as two CCD cameras, four line sensors(infrared sensors), microphone, ultrasonic sensors,touch sensors in order to perceive its environmentand internal states. Each CCD camera can capture

an image with the range of -30° and 30° in front of

the robot. Furthermore, many encoders are equippedwith the robot. Two CPUs are used for sensing,motion controlling, and communicating. Therefore,the robot can take various behaviors like a human.In previous researches, we proposed a humandetection method using a series of images from theCCD camera and an interactive trajectory planningmethod for a hand-to-hand behavior [8,9].

The robot takes an image from the CCDcamera, and extracts a human (Fig.2). In this paper,a long-term memory based on differentialextraction is used to detect a human considered as amoving object. Figure 3 shows an example of thevisual tracking based on the human extraction. Ifthe robot detects a human, the robot extracts themotion of the human hand. The sequence of thehuman hand is the inputs to the robot. The detailedprocedure is explained in the following. A humanwears a blue glove for performing a gesture displayedto the robot in order to simplify the problem. Afterthe taken image is transformed into the HSV colorspace, color corresponding to the blue glove isextracted by using thresholds. Next, the blue glove isdetected by using template matching based on asteady-state genetic algorithm (SSGA). The SSGAsimulates the continuous model of the generation,which eliminates and generates a few individuals in ageneration (iteration) [15]. The sequence of thehand position is represented by G(t)=(Gx(t), Gy(t))

where t=1, 2, ... , T; the maximal number of imagesis T.

2.2 A Fuzzy Spiking Neural Network forHuman Motion ExtractionWe apply a fuzzy spiking neural network (FSNN)for memorizing several motion patterns of a humanhand, because the human hand motion has specificdynamics. A SNN [13] is often called a pulsed neural

300 220

240

1168

902

704

479

111

9513

819

0

CCD cameras

4 DOF Arms

Line Sensors

UltrasonicSensors

Wireless LAN

Grippers

Fig.1: A human-like partner robot; Hubot


network and is considered as one of the artificialNNs imitating the dynamics introduced the ignitionphenomenon of a cell with the propagationmechanism of the pulse between cells. In thispaper, we use a modified simple spike responsemodel to reduce the computational cost. First ofall, the action potential hi(t) used as an internal

state is calculated as follows;

€

hi( t) = tanhhisyn( t) + hi

ext(t) + hiref (t)( )

(1)Here hi

syn(t) including the output pulses from other

neurons and hiext(t) is the input to the ith neuron

from the external environment. Furthermore,hi

ref(t) is used for representing the refractoriness of

the neuron. When the internal state of the ithneuron is larger than the predefined threshold, aspike or an impulse is outputted as follows;

€

pi ( t) =1 if hi

ref ( t) ≥ qi

0 otherwise

(2)

where qi is a threshold predefined for firing. Here

spiking neurons are arranged on a planar grid(Fig.4) and the number of the neuron (N) is set at25. By using the values of a human hand position,the input to the ith neuron is calculated by theGaussian membership function as follows;

€

hiext( t) = exp −

ci − G(t)2

σ 2

(3)where ci=(cx,i, cy,i) is the position of the ith neuron

on the image; σ is a standard deviation. The

sequence of spike outputs pi(t) is obtained by using

the human hand positions G(t). The weightparameters between spiking neurons are trained bythe Hebbian learning algorithm based on thetemporally sequential spikes as follows,

€

wj ,i ← tanhγ wgt ⋅ w j ,i + ξwgt ⋅ pi (t) ⋅ p j (t −1)( )(4)

where γwht is a discount rate and ξwgt is a learning

rate. Because the adjacent neurons along thetrajectory of the human hand position are easilyfired by the Hebbian learning, the FSNN canmemorize the temporal spike patterns based onvarious gestures. Next, the sequence of the firedspiking neurons are as an input for clustering by aself-organizing map (SOM) based on thecompetitive learning in order to memorize anddetect a spatial pattern of a human gesture.

2

.3 Imitative Behavior GenerationThe essential of an imitative learning in thismethod is to acquire a behavior according to ahuman physical motion. After behavior acquisition,a behavior similar to the human motion can be usedas a communication signal, i.e., a gesture. The robotcan acquire a behavior by incorporating some actionsegment from the human gesture. A trajectoryplanning problem for a behavior can result in a pathplanning problem from an initial configuration to afinal configuration corresponding to the motion ofthe detected human hand. Here a configuration θ is

expressed by a set of joint angles, because all jointsare revolute,

€

θ = (θ1,θ2,⋅ ⋅ ⋅,θn)T ∈Rn(5)

where n denotes the DOF of a robot arm. Thenumber of DOF of the partner robot shown in Fig.1

Fig.2: A visual system of the robot in imitation

position of neuron

(0, 0)

(X, Y)

(X, 0)

(0, Y)

Fig.4: Fuzzy spiking neurons arranged on the image

(1) (2)

(A) (B)

(C) (D)

Fig.3: An example of visual tracking: (A) An originalimage, (B) Reduced colors image, (C) difference between

two images, and (D) The result of differential filter.

Images


is 4 (n = 4). In addition, the position of the end-effector (robot hand or gripper), P=(px py pz)

T on

the base frame. Because a trajectory can berepresented by a series of m intermediateconfigurations, the trajectory planning problem isto generate a trajectory combining severalintermediate configurations corresponding to G(t).SSGA is applied to generate a trajectory for animitative behavior corresponding to a human handmotion. Here the SSGA for detecting a human handis called SSGA-1, while the SSGA for generating atrajectory is called SSGA-2.

Figure 5 shows a total architecture ofgenerating a trajectory for a robot behavior. Firstof all, the robot detects the human hand positionby SSGA-1, and then, SOM selects a node accordingto the human hand motion as inputs, and itscorresponding trajectory is selected by referring tothe knowledge database stored. The trajectory isused for generating initial trajectory candidates(θINIT) as an initial population of SSGA-2. Next,

SSGA-2 outputs the best trajectory, and the robotdisplays it to the human.

A trajectory candidate is composed of alljoint variables of intermediate configurations(Fig.6). Initialization generates an initialpopulation based on the previous best trajectorystored in the knowledge database linked with SOM.The jth joint angle of the kth intermediateconfiguration in the ith trajectory candidate θi,j,k,

which is represented as a real number, is generatedas follows (i=1, 2, ... , g),

€

θi, j,k ← θ j ,kINIT + β j

I ⋅ N 0,1( ) (6)

where θINITj,k is the previous best trajectory

referred from the knowledge base; βjI is a

coefficient for the jth joint angle. A fitness value isassigned to each trajectory candidate. Theobjective is to generate a trajectory realizing thepossibly short distance from the initialconfiguration to the final configuration whilerealizing good evaluation. To achieve theobjectives, we use a following fitness function,

€

f i = f p +ηT fd (7)

where ηT is a weight coefficient. The first term, fp,

denotes the distance between the hand position andthe target point. The second term, fd, denotes the

sum of squares of the difference between each jointangle between two configurations of t and t-1.Therefore, this trajectory planning problem can

result in a minimization problem. A selectionremoves the worst individual from the currentpopulation. Next, an elite crossover is performed.The elite crossover generates an individual byincorporating several genes from the best individualin the population. Consequently, the worstindividual is replaced with the individual generatedby the elite crossover. Furthermore, we use theadaptive mutation based on the fitness vault. Thesearching processes using the internal simulator arerepeated until the termination condition is satisfied.Here we use the maximal times of internalevaluations as the termination condition.

The robot can simply extend the acquiredmotions into duplication in a same phase,duplication in a different phase, and combination ofdifferent motions to realize the motion using botharms. These motion reproduction is performs in themode of interaction with human.

Fig.6: The representation of the ith trajectory candidatecomposed of m intermediate configurations

θ INIT

θ *

FSNN

SOM

SSGA-2

SSGA-1

G(t)

pi(t)

Images

Fig.5: Total architecture of the imitative learning


3 Exper imentsThis section shows experimental results of thepartner robot Hubot. First, we show the imitativelearning. The size (X,Y) of an image is (160, 120).Here a trial is defined as one cycle from humanhand detection by SSGA (SSGA-1), spatial andtemporal pattern learning of human hand motionby FSNN, gesture clustering by SOM, and behaviorgeneration by SSGA (SSGA-2). The number ofspiking neurons (N) is 25, and the number of nodesin SOM is 10. The population sizes of SSGA-1andSSGA-2 are 120 and 200, respectively. The numberof evaluations in SSGA-1 and SSGA-2 are 300 and5000, respectively. Furthermore, local hill-climbing search is used in SSGA-2.

Figure 7 shows an example of imitativelearning and Fig.8 shows the history of the nodeselected in SOM in the learning. The person triesto display various motions in order to know thereactive motion patterns of the robot. Therefore,several nodes are selected at first, but gradually,similar nodes are selected repeatedly. Finally, theaim of the human trial is to teach a circular handmotion. The person moved his right hand like acircle (Fig.7 (1)-(4)), and then, the robot movedthe right hand in the same way as the person did(Fig.7 (5)-(8)). In this way, the robot memorizesvarious human hand motion patterns.

Figures 9 and 10 shows more complicatedgestures and their corresponding motion patternsgenerated by SSGA-2. The circle indicates thedetected hand positions corresponding to motionpatterns. The robot extracts human motionpatterns and reproduces them in the internalrepresentation of the robotic configuration space.Figures 11 shows the history of the best fitnessvalue of SSGA-2 in the imitation shown in Fig.9.

4 SummaryThis paper proposed imitative learning for a partnerrobot. We must realize the high level of signalprocessing and system integration in order to dealwith human factors. We applied a fuzzy spikingneural network for extracting spatial and temporalpatterns of human gestures, a self-organizing mapfor clustering gestures, and a steady-state geneticalgorithm for generating a trajectory to perform abehavior similar to the human motion pattern.Experimental results show that the robot acquiresvarious motion patterns by imitating human handmotions. However, the voice recognition is requiredfor natural communication.

In our other research, we integrated voicerecognition and gesture recognition for mobilepartner robots [16], and furthermore, we usedmultilayer perceptron as behavior learning.Therefore as a future work, we intend toincorporate the behavior learning method instead ofknowledge database used in SOM.

0

5

10

15

20

0 30 60trials

Node No.

Fig.8: The change of the node selected in the SOMaccording to human hand motion patterns.

Fig.7: The experimental result of imitative learning


References:[1] M.Arbib, A.Billard, M.Iacobonni, E.Oztop,

Synthetic Brain Imaging: Grasping, MirrorNeurons and Imitation, Neural Networks, pp.975–997, 2000.

[2] A. Billard, M.J.Mataric, Learning Human ArmMovements by Imitation: Evaluation of ABiologically Inspired ConnectionistArchitecture, Robotics and AutonomousSystems, 37, 2-3, pp. 145-160, 2003.

[3] B.Elser, Imitative Learning of Movement-effect Contingencies around The 1st Birthday:Perspectives on Imitation from CognitiveNeuroscience to Social Science, France, 2002.

[4] T.Jebara, A.Pentland, Statistical ImitativeLearning from Perceptual Data, Proc. of 2ndInternational Conference on Developmentand Learning, pp. 191-196, 2002.

[5] G.Rizzolatti, M.A.Arbib, Language within ourgrasp, Trends in Neuroscience, 21, pp. 188-194, 1988.

[6] R.P.N.Rao, A.N.Meltzoff, Imitation Leaningin Infants and Robots: Towards ProbabilisticComputational Models, Proc. of ArtificialIntelligence and Simulation of Behaviors,

2003.[7] T.Fukuda, N.Kubota, An intelligent robotic

system based on a fuzzy approach, Proceedingsof IEEE 87, 9, 1448-1470, 1999.

[8] Naoyuki Kubota, Computational Intelligencefor Human Detection of A Partner Robot,Proc. of World Automation Congress 2004,The international conference on Multimediaand Image Processing 2004, 2004.

[9] N.Kubota, K.Tomoda, "Behavior Coordinationof A Partner Robot based on Imitation", Proc.of 2nd International Conference onAutonomous Robots and Agents, pp.164-169,2004.

[10] D.Sperber and D.Wilson, Relevance -Communication and Cognition, BlackwellPublishing Ltd. (1995).

[11] G.J.Klir, B.Yaun (ed.), Fuzzy Sets, Fuzzy Logic,and Fuzzy Systems, World ScientificPublishing, 1996.

[12] Michael Eysenck (ed.), Psychology, PrenticeHall, 1998.

[13] W.Maass and C.M.Bishop, Pulsed NeuralNetworks, The MIT Press, 1999.

[14] T.Kohonen, Self-Organizing Maps, 3rdEdition, Springer-Verlag, Berlin, Heidelberg,2001.

[15] G.Syswerda, A Study of Reproduction inGenerational and Steady-State GeneticAlgorithms, Foundations of GeneticAlgorithms, Morgan Kaufmann Publishers,Inc., 1991.

[16] N.Kubota, Y.Ito, M.Abe, H.Kojima,Computational Intelligence for NaturalCommunication of Partner Robots, Proc (CD-ROM) of The 5th Asian Symposium on AppliedElectromagnetics and Mechanics, 2005.

600

700

800

900

1000

1100

1200

1300

0 1000 2000 3000 4000 5000Iterations

Fitness

AveMaxMin

Fig.11: History of fitness value of SSGA-2

(A) Front view (B) Oblique view

Fig.9: An experimental result 1 of imitative learning(A) Front view (B) Oblique view

Fig.10: An experimental result 2 of imitative learning


Visual Perception and Reproduction for Imitative Learning ... · We developed a human-like partner...

Documents

Transcript of Visual Perception and Reproduction for Imitative Learning ... · We developed a human-like partner...