Learning Algorithm Design for Human-Robot Skill Transfer

117
Swansea University School of Engineering Zienkiewicz Centre for Computational Engineering Learning Algorithm Design for Human-Robot Skill Transfer Author: Chunxu Li Supervisor: Prof. Chenguang Y ang Submitted to Swansea University in fulfilment of the requirements for the Degree of Doctor of Philosophy May 30, 2019

Transcript of Learning Algorithm Design for Human-Robot Skill Transfer

Page 1: Learning Algorithm Design for Human-Robot Skill Transfer

Swansea University

School of Engineering

Zienkiewicz Centre for Computational Engineering

Learning Algorithm Design for Human-RobotSkill Transfer

Author:Chunxu Li

Supervisor:Prof. Chenguang Yang

Submitted to Swansea University in fulfilment of the requirements forthe Degree of Doctor of Philosophy

May 30, 2019

Page 2: Learning Algorithm Design for Human-Robot Skill Transfer

AbstractIn this research, we develop an intelligent learning scheme for performing human-robotskills transfer. Techniques adopted in the scheme include the Dynamic Movement Prim-itive (DMP) method with Dynamic Time Warping (DTW), Gaussian Mixture Model (G-MM) with Gaussian Mixture Regression (GMR) and the Radical Basis Function NeuralNetworks (RBFNNs). A series of experiments are conducted on a Baxter robot, a NAOrobot and a KUKA iiwa robot to verify the effectiveness of the proposed design.

During the design of the intelligent learning scheme, an online tracking system is de-veloped to control the arm and head movement of the NAO robot using a Kinect sensor.The NAO robot is a humanoid robot with 5 degrees of freedom (DOF) for each arm. Thejoint motions of the operator’s head and arm are captured by a Kinect V2 sensor, and thisinformation is then transferred into the workspace via the forward and inverse kinematics.

In addition, to improve the tracking performance, a Kalman filter is further employed tofuse motion signals from the operator sensed by the Kinect V2 sensor and a pair of MYOarmbands, so as to teleoperate the Baxter robot. In this regard, a new strategy is developedusing the vector approach to accomplish a specific motion capture task. For instance,the arm motion of the operator is captured by a Kinect sensor and programmed througha processing software. Two MYO armbands with embedded inertial measurement unitsare worn by the operator to aid the robots in detecting and replicating the operator’s armmovements. For this purpose, the armbands help to recognize and calculate the precisevelocity of motion of the operator’s arm. Additionally, a neural network based adaptivecontroller is designed and implemented on the Baxter robot to illustrate the validation forthe teleoperation of the Baxter robot.

Subsequently, an enhanced teaching interface has been developed for the robot usingDMP and GMR. Motion signals are collected from a human demonstrator via the Kinectv2 sensor, and the data is sent to a remote PC for teleoperating the Baxter robot. At thisstage, the DMP is utilized to model and generalize the movements. In order to learn frommultiple demonstrations, DTW is used for the preprocessing of the data recorded on therobot platform, and GMM is employed for the evaluation of DMP to generate multiplepatterns after the completion of the teaching process. Next, we apply the GMR algorithmto generate a synthesized trajectory to minimize position errors in the three dimensional(3D) space. This approach has been tested by performing tasks on a KUKA iiwa and aBaxter robot, respectively.

Finally, an optimized DMP is added to the teaching interface. A character recombi-nation technology based on DMP segmentation that uses verbal command has also beendeveloped and incorporated in a Baxter robot platform. To imitate the recorded motionsignals produced by the demonstrator, the operator trains the Baxter robot by physicallyguiding it to complete the given task. This is repeated five times, and the generated train-ing data set is utilized via the playback system. Subsequently, the DTW is employed topre-process the experimental data. For modelling and overall movement control, DMP ischosen. The GMM is used to generate multiple patterns after implementing the teachingprocess. Next, we employ the GMR algorithm to reduce position errors in the 3D spaceafter a synthesized trajectory has been generated. The Baxter robot, remotely controlledby the user datagram protocol (UDP) in a PC, records and reproduces every trajectory.Additionally, Dragon Natural Speaking software is adopted to transcribe the voice data.This proposed approach has been verified by enabling the Baxter robot to perform a writ-ing task of drawing different Chinese characters after the robot has been taught to writeonly one character.

i

Page 3: Learning Algorithm Design for Human-Robot Skill Transfer

List of Abbreviations/Nomenclatures

Table 1: Abbreviations

Abbreviation Description

DOF Degrees of freedomDTW Dynamic Time WarpingGMM Gaussian Mixture ModelGMR Gaussian Mixture RegressionDMP Dynamic Movement PrimitiveNN Neural NetworksUDP User Datagram ProtocolDH Denavit & Hartenberg’s methodKF Kalman Filter methodTbD Teaching by DemonstrationROS Robot Operation SystemUAV Unmanned Aerial VehicleRBF Radial Basis FunctionEM Expectation - Maximization algorithmBIC Bayesian Information CriterionPID Proportional Integral Derivative3D Space Three Dimensional Space

ii

Page 4: Learning Algorithm Design for Human-Robot Skill Transfer

Table 2: Nomenclatures

Symbol Description

n Number of jointsτ Torqued, c Spring and Damping coefficientsx, v Cartesian position and velocity of the end-effectors Canonical system statesk Number of GMMspk, µk,

∑k Weight, mean and variance of GMM

z Input vectorS (z) Gaussian regressor vectorq, q, q ∈ Rn×1 Vector of joint position, angular velocity and

acceleration respectivelya, d, α, θ Variables denoting the Denavit-Hartenberg parametersiT j ∈ R4×4 Homogenous transform from link i to jI(q) ∈ R5×5 Inertia matrixC(q, q) ∈ R5×5 Coriolis matrixG(q) ∈ R5×1 Gravity termsU(q) ∈ R5×1 Unmodeled elementsK Control gains(t), r(t) System noise and measured noise variance intensitiesω(t), v(t) Noise vectors of system and measurementx(t), u(t) System state variablesy(t) Measurement vectors

iii

Page 5: Learning Algorithm Design for Human-Robot Skill Transfer

Acknowledgements

First, I would like to express my sincere thanks to my supervisor Prof. Chenguang Yang.During the course of the past three years, there were moments of joy and bitterness. It is hisnever-ending support and encouragements that provided me with the confidence to pursuedirections to my heart’s desire. Without his guidance, I could never have hoped to achievewhat I have today. I would also like to convey my thanks to all the lovely colleagues at theZCCE where I worked, lived and laughed in the past three years, including the veterans,the past members and the new comers.

It is a great pleasure and honor to work among such a fun and ingenious bunch ofpeople. It is a place filled with fresh ideas. Finally, thanks to my wife Nana Ji, for her loveand support, especially when the writing of this thesis coincided with such an importantmoment of our lives. Without her help and understanding, I may never have glimpsed thelight at the end of this tunnel. Thanks also goes to my mother, who travelled all the wayfrom the other side of the globe to provide help during the last days of my writing when Ineeded it the most. Also, it is the long-term love, support and understanding of my familyduring all these years behind the scenes that have given me the chance to devote myself tomy passion. Thanks to you all.

iv

Page 6: Learning Algorithm Design for Human-Robot Skill Transfer

Statement of Authorship

This thesis is submitted to the Department of Engineering, Swansea University, in fulfil-ment of the requirements for the degree of Doctor of Philosophy. This thesis is entirelymy own work, and except where otherwise stated, describes my own research.

v

Page 7: Learning Algorithm Design for Human-Robot Skill Transfer

List of Figures

2.1 The mainstream industrial robots worldwide [1] . . . . . . . . . . . . . . 82.2 The FlexPendant of the Kawasaki robot [2] . . . . . . . . . . . . . . . . 112.3 The FlexPendant of the KUKA robot . . . . . . . . . . . . . . . . . . . . 112.4 Flowchart of the PID control architecture [3] . . . . . . . . . . . . . . . . 162.5 Flowchart of the computed torque control architecture [4] . . . . . . . . . 183.1 Illustration of NAO Robot and coordinate system . . . . . . . . . . . . . 193.2 Image of the Baxter robot . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Image of KUKA LBR robot . . . . . . . . . . . . . . . . . . . . . . . . 213.4 Illustration of the origin of the Kinect v2’s camera space. It is the same as

its depth sensor origin which is a modified form of [5] . . . . . . . . . . . 223.5 Image of MYO Armband . . . . . . . . . . . . . . . . . . . . . . . . . . 223.6 Warping example between two time series, modified from [6] . . . . . . . 243.7 Working principle of continuous-time KF . . . . . . . . . . . . . . . . . 264.1 Flowchart of the complete experimental process . . . . . . . . . . . . . . 294.2 Illustration of the relationship between NAO Robot and Kinect coordinate

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3 Illustration of the four selected positions under both NAO coordinate sys-

tem and Kinect coordinate system . . . . . . . . . . . . . . . . . . . . . 304.4 Illustration of the conception for human head orientation angles . . . . . 324.5 Image of body skeleton captured by Kinect . . . . . . . . . . . . . . . . 334.6 Angle of rotation of the ShoulderPitch . . . . . . . . . . . . . . . . . . . 344.7 Angle of rotation of the ShoulderRoll . . . . . . . . . . . . . . . . . . . 344.8 Angle of rotation of the ElbowRoll . . . . . . . . . . . . . . . . . . . . . 354.9 Illustration of operator’s head for different postures . . . . . . . . . . . . 364.10 Experimental setup of head and limb following tests . . . . . . . . . . . . 374.11 Result of the head following experiment for both human and NAO robot . 374.12 Result of the arm following experiment for both human and NAO robot . 385.1 The complete experimental teleoperation system . . . . . . . . . . . . . . 425.2 Diagram for the rule of research in total . . . . . . . . . . . . . . . . . . 425.3 Image of Kinect sensor: 1. Depth sensors, 2. RGB camera, 3. Motorized

base [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4 Image of depth and RGB sensor data collected from Kinect V1 modified

on [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.5 Mathematical Principle description . . . . . . . . . . . . . . . . . . . . 465.6 Demonstration of all related angles in Vector Approach: ShoulderPitch,

ShoulderYaw and ShoulderRoll, ElbowPitch and ElbowRoll . . . . . . . 465.7 The principle of vector approach in mathematical computing . . . . . . . 475.8 Error of Vector Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 49

vi

Page 8: Learning Algorithm Design for Human-Robot Skill Transfer

5.9 The orientation of the MYO in the initial pose and the current pose . . . 495.10 Demonstration of the experiment at the different positions . . . . . . . . 555.11 Image of designed control system . . . . . . . . . . . . . . . . . . . . . 555.12 Graphical result after the KF based sensor fusion (ShoulderPitch, Shoul-

derRoll, ShoulderYaw, ElbowPitch, ElbowRoll) . . . . . . . . . . . . . 565.13 Torque inputs of the control system with NN. (a) Torque inputs of the

control system without NN. (b) Torque inputs of the control system withNN. (c) Tracking performance of the designed system for both withoutNN and with NN. (d) NN learning weighs of every single joint. . . . . . . 57

6.1 Human arm model and its DH coordinate frames [8] . . . . . . . . . . . . 616.2 Screenshot of the skeleton tracking system and the geometry model for

human arm in joint space [8] . . . . . . . . . . . . . . . . . . . . . . . . 626.3 Illustration of the obstacle avoidance experiment . . . . . . . . . . . . . 686.4 The setup of the trajectory generalizing experiment . . . . . . . . . . . . 696.5 Illustration of the alignment using DTW . . . . . . . . . . . . . . . . . . 706.6 The learning and generalization result using the proposed DMP in an ob-

stacle passing task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.7 The demonstrated trajectories for the sine wave with GMM and the result 716.8 Curve on a vertical surface obtained after spatial generalization using the

modified DMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.1 Graphical representation of the overview of the proposed technology, mod-

ified from [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.2 The experimental setup for the Chinese character writing task. Step 1:

across stroke; Step 2: vertical stroke; Step 3: left-falling stroke; Step 4:right-falling stroke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.3 The demonstrated and reconstructed trajectories of the “Mu” characterstrokes, the x axis represents x direction and the y axis represents y di-rection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.4 The initial and generalized Chinese character . . . . . . . . . . . . . . . 837.5 The comparison of the joints angles for the vertical stroke of the “Mu”

character with or without DMP, the x axis represents time (in seconds)and the y axis represents joint angles (in radians) . . . . . . . . . . . . . 85

vii

Page 9: Learning Algorithm Design for Human-Robot Skill Transfer

List of Tables

1 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii2 Nomenclatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii5.1 Table for efficiency improvement of different angular positions . . . . . . 566.1 Model characters table using Denavit & Hartenberg’s Method [10] . . . . 61

viii

Page 10: Learning Algorithm Design for Human-Robot Skill Transfer

Contents

1 Chapter One: Introduction 11.1 Research Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Chapter Two: Review of the Related Work 62.1 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Development of Industrial Robots . . . . . . . . . . . . . . . . . . . . . 72.3 Review of Human-Robot Interaction . . . . . . . . . . . . . . . . . . . . 92.4 Review of Machine Learning in Robotics . . . . . . . . . . . . . . . . . 122.5 Robot Controller Design . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.1 Robot Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5.2 Kinematic Control . . . . . . . . . . . . . . . . . . . . . . . . . 152.5.3 Dynamic Control . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Chapter Three: Preliminary 193.1 NAO Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Baxter Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 KUKA LBR iiwa robot . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Kinect v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.5 MYO Armband . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.6 Dragon NaturallySpeaking . . . . . . . . . . . . . . . . . . . . . . . . . 223.7 Dynamic Movement Primitive . . . . . . . . . . . . . . . . . . . . . . . 233.8 Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . 233.9 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.10 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.11 RBF Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.12 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Chapter Four: Development of Kinect based Teleoperation of the NAO Robot 284.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Kinematics Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1 Acquisition of Orientation Angles of Head . . . . . . . . . . . . 314.3.2 Acquisition of Joint Angles of Arm . . . . . . . . . . . . . . . . 33

4.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4.1 Head Following Experiment . . . . . . . . . . . . . . . . . . . . 354.4.2 Limb Following Experiment . . . . . . . . . . . . . . . . . . . . 36

ix

Page 11: Learning Algorithm Design for Human-Robot Skill Transfer

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Chapter Five: Teleoperation Control of the Baxter Robot using Kalman Filterand Neural Networks 405.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 Environmental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1 System Configuration . . . . . . . . . . . . . . . . . . . . . . . 415.2.2 Development Workstation . . . . . . . . . . . . . . . . . . . . . 435.2.3 Robot Operating System and RosPy . . . . . . . . . . . . . . . . 445.2.4 User Datagram Protocol . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Motion Capture by Kinect . . . . . . . . . . . . . . . . . . . . . . . . . 455.3.1 General Calculation . . . . . . . . . . . . . . . . . . . . . . . . 455.3.2 Vector Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Measurement of Angular Velocity by MYO Armband . . . . . . . . . . . 495.5 Kalman Filtering based Sensor Fusion . . . . . . . . . . . . . . . . . . . 505.6 Neural Networks Based Control System . . . . . . . . . . . . . . . . . . 515.7 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.7.1 The KF Based Limb Following Experiment . . . . . . . . . . . . 545.7.2 The NN Learning Based Limb Following Experiment . . . . . . . 545.7.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 54

5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6 Chapter Six: An Enhanced Teaching Interface for a Robot using DMP andGMR 596.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.2 The Enhanced Teaching Interface . . . . . . . . . . . . . . . . . . . . . . 61

6.2.1 Calculation of Arm Joint Angles . . . . . . . . . . . . . . . . . . 616.2.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 646.2.3 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . 65

6.3 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.3.1 Obstacle Avoidance Experiment . . . . . . . . . . . . . . . . . . 676.3.2 Trajectory Generalizing Experiment . . . . . . . . . . . . . . . . 686.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7 Chapter Seven: Development of Writing Task Recombination TechnologyBased on DMP Segmentation via Verbal Command for Baxter Robot 737.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.2.1 Data Preprocessing using DTW . . . . . . . . . . . . . . . . . . 76

x

Page 12: Learning Algorithm Design for Human-Robot Skill Transfer

7.2.2 Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . . 777.3 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 797.3.2 Results and Analyses . . . . . . . . . . . . . . . . . . . . . . . . 80

7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

8 Chapter Eight: Conclusions 878.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 878.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

xi

Page 13: Learning Algorithm Design for Human-Robot Skill Transfer

1 Chapter One: Introduction

1.1 Research Motivations

Rapid development in robot technologies have necessitated the development of the robot’slocomotive capacities. In the light of the myriad of uses, robots can serve in modern-ized hospitals, schools, business and other fields, the research regarding human-robot skilltransfer has flourished during the last decades. Currently, robots are already being utilizedwith greater efficiency in the industrial field and it has been made possible to transfer hu-man motor skills to robots after the motion signals of the operator have been captured bya computer.

Generally, the objective of human-robot skill transfer is to enable robots to manipulatetheir movements with better dexterity and versatility as human beings [11]. To this end,two approaches have been adopted. One is by modelling the human motor control whichcan be adapted and implemented on a robot. The other is to facilitate robot learningthrough demonstration in which the human operator plays the role of a tutor, and therobot, a tutee [12]. Development in this field started with the fixed trajectory based skillstransfer from humans to robots. Thereafter, a hybrid mode was developed combiningboth position and force, which was further upgraded into an interactive mode by involvingonline feedback [9]. A recent progress is the enhanced skill transfer via the use of bio-signals and physiological signals of the human body. These signals can provide us with aricher and deeper account of the mechanism for human muscle activities compared withphysical signals [13].

Nowadays, with the advance of technology for human-robot skill transfer, demand forhuman operators has increased [14]. Meanwhile, robot’s training processes have also sig-nificantly reduced. In [15], an approach was developed for transferring skills betweentutor and tutee by capturing the movements of the tutor on a sensory-based computationalmodel. This information was then used to produce online master commands for devel-oping the learning process of the less-skilled tutees. In another research [16], a researchgroup employed a neural network to record human skills on a computer, and then trieddifferent ways to minimize the time of transferring skills from a human tutor to a tutee.

In recent years, motion capture technology has been developed and utilized in a widerange of areas, including HRI (human-robot interaction), computer animation and 3D filmproduction [17]. It serves to accurately record the 3D motion trajectory of each part of amoving object [18]. Based on this information, accurate modelling of moving objects andconcurrent semantic analyses of the movements can be achieved which is of great helpin technological fields, such as animation [19]. However, motion capture is an extremelytime-consuming process, due to the fact that the captured movement data require a hugeamount of both the pre-processing and manual segmentation of the action sequences andalso requires the accurate identification of semantics for each segment [20]. To overcome

1

Page 14: Learning Algorithm Design for Human-Robot Skill Transfer

these limitations, DTW has been proposed and employed to speed up the initial processes,for instance, in speech recognition, it assists in measuring the similarity between the two-time series to identify whether the two words represent the same term [21].

Except for the application in a structured environment, such as fixed-point operationin an industrial environment, research on robots application in the unstructured one hasalso gained the increasing attention, such as deep sea resources exploration [22], disastersite search and rescue operations [23], operation in radiation environment [24] [25]. Theresearch regarding the autonomous work of robots in the dangerous natural environmentshas helped in reducing the risks of human lives for doing dangerous tasks in an unreach-able environment. To start with, the robots have an obligation to autonomously work inan unstructured environment. However, traditional research methods are insufficient inmeeting the demands, hence, inspiration has been drawn from the constantly evolvingcreatures that thrive in complex environments. By studying their shape, structure, cogni-tive abilities, motion mechanisms and behavioural adaptations, various bio-inspired ideashave been proposed for the development of robots. According to the speech of experts de-livered at the 2004 edition of the IEEE International Conference on Robotics and Bionics,“Bionics that mimic the function and structure of living organisms with biological char-acteristics will gradually replace industrial operators and become the focus of research inthe field of robotics” [26].

1.2 Research Innovations

With the advancement in artificial intelligence, research and application in robotics havealso seen a great development [14]. At the same time, the research on human-robot skilltransfer has attracted widespread public attention in the past decade. Human-robot skilltransfer methodologies have been widely used in the industrial field [1]. As this tech-nology helps to directly transfer information on the human motion to the robot, it helpsto effectively replace humans with robots in high-risk operations and also benefits fromwork efficiency [27]. At present, regarding technological innovations and the frequent re-placement of humans by robots in the industrial facilities, human operators are requiredto constantly learn new mechanical skills in order to operate the equipment efficiently.Therefore, the development of scalable intelligent human-robot interaction technology isinnovating the traditional production line that involved repetitive movements, and is thusof great practical significance to industrial development.

This research focuses on the design of artificial intelligence algorithms which enablerobots to mimic human actions, as well as to facilitate them in adapting and adjusting incomplex environments, i.e. replacing or cooperating with human operators in performingvarious tasks. In the traditional manufacturing industry, the production line was lengthyand repetitive with no chance of skill expansion or skill adjustments [28]. In the latestresearch & development tasks, it is often necessary to re-analyse the point-to-point trajec-

2

Page 15: Learning Algorithm Design for Human-Robot Skill Transfer

tory so as to allow the machine equipment to re-read the work cycle [29]. The challengein this research thinks this process is usually incredibly time-consuming and financiallycosting.

Inspired by the above-stated demands and challenges, the following topics are investi-gated in my research work:

a) Through the designing of the robot learning methodologies by combining the DMP,GMM, GMR and DTW algorithms, the robot is able to generalize the motion trajectoriesspatially/temporally, which greatly reduces the training/teaching time. This will, in turn,improve working efficiency. For example, when the manufacturing line/type has beenchanged, there is no need to re-program and re-teach the robots for the new productionline [30];

b) Designing different teaching methods for different work conditions, such as the pro-duction of hazardous/toxic items, the real-time remote teleoperation based human-robotskill transfer can be employed. For productions that required high-precision operations,such as binding, and cutting, the physical teaching by demonstration method can be ap-plied [31];

c) The voice interaction has been combined with human-robot skill transfer to achieveverbal commands control for robots while carrying out specific tasks [32];

d) Applying the KF to reduce the noise of the multiple sensors and NN to overcome thedynamic uncertainties of the robot learning system during the teleoperation process [33].

1.3 Publications

The research during my PhD study has led to a number of publications as detailed below:1. Chunxu Li, Chenguang Yang, & Cinzia Giannetti. Segmentation and Generalization

for Writing Skills Transfer from Humans to Robots. Cognitive Computation and Systems,1(1):20-25, 2019.

2. Chunxu Li, Chenguang Yang, Zhaojie Ju, & Andy SK Annamalai. An enhancedteaching interface for a robot using DMP and GMR. International journal of intelligentrobotics and applications, 2(1):110-121, 2018.

3. Chunxu Li, Chenguang Yang, Jian Wan, Andy SK Annamalai, & Angelo Cangelosi.Teleoperation control of baxter robot using kalman filter-based sensor fusion. SystemsScience & Control Engineering, 5(1):156-167, 2017.

4. Chunxu Li, Chenguang Yang, Andy SK Annamalai, Qingsong Xu, & ShaoxiangLi. Development of Writing Task Recombination Technology Based on DMP Segmen-tation via Verbal Command for Baxter Robot. Systems Science & Control Engineering,6(1):350-359, 2018.

5. Chunxu Li, Chenguang Yang, Jian Wan, Andy SK Annamalai, & Angelo Cangelosi.Neural learning and Kalman filtering enhanced teaching by demonstration for a Baxter

3

Page 16: Learning Algorithm Design for Human-Robot Skill Transfer

robot. In Automation and Computing (ICAC), 2017 23rd International Conference on(pp. 1-6). IEEE, 2017.

6. Chunxu Li, Chenguang Yang, Peidong Liang, Angelo Cangelosi, & Jian Wan. De-velopment of Kinect based teleoperation of NAO robot. In Advanced Robotics and Mecha-tronics (ICARM), International Conference on (pp. 133-138). IEEE, 2016.

7. Junshen Chen, Mark Glover, Chunxu Li, & Chenguang Yang. Development of a userexperience enhanced teleoperation approach. In Advanced Robotics and Mechatronics(ICARM), International Conference on (pp. 171-177). IEEE, 2016.

8. Alex Smith, Chenguang Yang, Chunxu Li, Hongbin Ma, & Lijun Zhao. Develop-ment of a dynamics model for the Baxter robot. In IEEE International Conference onMechatronics and Automation (ICMA), International Conference on (pp. 1244-1249).IEEE, 2016.

9. Rijin Raju, Chenguang Yang, Chunxu Li, & Angelo Cangelosi. A video game de-sign based on Emotiv Neuroheadset. In Advanced Robotics and Mechatronics (ICARM),International Conference on (pp. 14-19). IEEE, 2016.

10. Junshen Chen, Mark Glover, Chenguang Yang, Chunxu Li, Zhijun Li, & AngeloCangelosi. Development of an immersive interface for robot teleoperation. In ConferenceTowards Autonomous Robotic Systems (pp. 1-15). Springer, Cham, 2017.

1.4 Organization of the Thesis

This thesis is structured in eight chapters. In the introduction (first chapter), the motivationand innovation of the research have been presented, discussing the differences between thetraditional human-robot skill transfer methods with ours. Chapter 2 presents a detailed de-scription, review of the academic products of researchers and the general strategies used inhuman-robot skill transfer technologies worldwide as well as analysing their novelties andchallenges. In addition, we especially arrange a chapter (Chapter 3) to introduce the de-vices and basic algorithms. In Chapter 4, we develop an online tracking system to controlthe arm and head of the NAO robot using Kinect sensor for the teleoperation (representingthe 6th paper in the publication list above), however, experimental results show large er-rors using Kinect alone to track the human motion. Therefore, in Chapter 5 we add the KFtechnique to fuse the signals from multiple sensors in order to reduce the noise and the NNlearning based control system is also developed to compensate the dynamic uncertainties,where the fused date are collected from Kinect sensor and a pair of MYO armbands (referto my 5th published paper from the list). In Chapter 6, an enhanced teaching interface fora robot using DMP and GMR has been designed and this proposed approach is tested ona KUKA iiwa and a Baxter robot by performing two tasks, which are passing through theobstacle of different heights and curve drawings with generalization, respectively, whichrefers to the 2nd publication. Chapter 7 focuses on the development of the verbal com-mands based writing task recombination technology based on the proposed segmentation

4

Page 17: Learning Algorithm Design for Human-Robot Skill Transfer

mechanism. The technology is validated through performing a Chinese character writingtask with the Baxter robot. During the task, different Chinese characters are written byteaching only one character representing the 4th paper in the publication list above. Chap-ter 8 summarizes the contribution of our work, and presents some suggestions for futurework.

5

Page 18: Learning Algorithm Design for Human-Robot Skill Transfer

2 Chapter Two: Review of the Related Work

2.1 Research Background

In recent years, there has been a rapid advancement in robotics, with its applications grow-ing extensively in numerous fields: from entertainment to the medical field and many oth-ers. Development in robotics has greatly facilitated people’s lives and is one of the mostrapidly emerging industries in this day and age. Most countries, especially developedcountries, are paying much attention to the development of robots, such as the NationalRobot Program of the United States, the Seventh Framework Program of the EuropeanUnion and the National Natural Science Foundation of China’s 863 Program [34]. Alongwith its unprecedented growth, research on robotics also faces a number of challenges.

From the beginning of the Industrial 4.0, the role of robots has become increasinglysignificant with the advance of science and technology. However, this role remains largelylimited to traditional industrial applications such as pre-set and repeated tasks. At present,it remains a challenge to create robots that are capable of performing an assortment of tasksin unknown, complex and dynamic environments. Besides, service-oriented, medical aswell as industrial robots are required to work in complex external environments. Hence,control of robots in unknown and dynamic environments has been widely studied and ahigher demand for advanced intelligence in robots has been recognized. It is noteworthythat in the working environment, robots are inevitably affected by unpredictable and un-certain external disturbances. In this regard, traditional control methods are not sufficientfor qualifying them in terms of intellectuality. Recent research has been actively focusedon the interaction and control between robots and external forces. Interactive control tech-nology cannot only assure compliance of robot behaviour, but also provide a relatively safeoperating environment for operators, which are impossible to achieve through traditionalmethods of robot control. The interaction between robots and complex environments canlead to potential instability, inaccuracy, mechanical nonlinearity and saturation, thus mayhave a negative impact on the overall working system. At present, there are two mainmethods of interactive control between robot and environment external forces: hybridforce/position control and impedance/admittance control [35]. Hybrid force/position con-trol has the potential of achieving good control performance and anti-disturbance ability.However, in case of a strong interaction between the robot and the environment, the hy-brid force/position control method often leads to system instability and security problems[36]. As compared to hybrid force/position control, impedance/admittance control aimsto establish the correct balance in the relationship between the robot and interactive forcereceived from the external environment. Through impedance/admittance control method,robot’s behaviour can be adapted to the environment and the overall stability of the controlsystem can be ensured [37]. Therefore, further study and development of robot interactivecontrol technology for dealing with various uncertainties in a real-world situation is neces-

6

Page 19: Learning Algorithm Design for Human-Robot Skill Transfer

sary so as to ensure that robots can work in a dynamic and unknown environment. Hence,the development of robot interactive control technology is of great practical significance.

In most nonlinear physical and mechanical systems, there exist wide-scale model dy-namic uncertainties. These uncertainties are usually due to changes in the external envi-ronment, loss of mechanical devices, and model errors, etc. Currently, there is no generalmethod for the control of nonlinear systems, due to the limitations in existing detectiontechnology. In this regard, many experts and scholars have conducted the extensive andin-depth research to explore different control methods. So far, many control methods havealready been applied in practical systems. However, different control methods have theirown unique advantages and disadvantages; hence, research on relevant control theoriesand methods for nonlinear systems is still an open problem. At present, a feasible com-mon approach is employed to combine control methods of various nonlinear systems forutilizing their advantages and compensating for their shortcomings, which achieves thedesired control performance.

In addition, due to physical constraints, nonlinearities such as saturation, dead zone andhysteresis are prevalent in physical actuators which greatly limit the actual control effectand may potentially lead to instability of the system. For instance, saturation is one of theinevitable problems in most actuators. Therefore, it is of great importance to consider andsolve the saturation nonlinearities in control design. In summary, this research focuses ontransferring human motion skills for designing the smart control of industrial robots. Forthis purpose, the research studies interactive control between robots and external forces,and realizes the adaptability of robots towards external forces with the aim of guaranteeingcontrol accuracy. Thus, this research not only has abundant value as a formative researchin robotics, but also has a huge number of prospective applications in a variety of fields.

2.2 Development of Industrial Robots

As the highest integrated application of mechatronics technology, industrial robots are atestament to the great development automation that has achieved in the industrial field. Inorder to introduce innovative advancement in the industry, the design and manufacture ofindustrial robots have become increasingly important. In robotics, a robust robot controlsystem is the core of the entire robot system, and plays an important role in improving thedynamic performance of robots, helping in reducing costs and improving work efficiency[27]. A new trend is the development of lightweight robots, such as the ABB industrialrobots which have undergone a marked reduction in the load-to-weight ratio by a factor ofthree since the 1990s [1]. Lightweight design reduces incurred costs and overall energyconsumption, while also reducing the mechanical stiffness of the system [38]. Howev-er, such a robot design results in complex vibration modes, which pose a challenge forrobot control. Therefore, it is imperative to establish a dynamic robot model that effi-ciently incorporates flexible characteristics in robot design to promote the optimization

7

Page 20: Learning Algorithm Design for Human-Robot Skill Transfer

and effective control. Generally, industrial robots are highly versatile. Though some robotapplications require a high control performance, most industrial robots normally still havesome versatility in controlling performance requirements.

(d) : GSK arc welding robot (e) : COMAU arc welding robot (f) : SIASUN spot welding robot

Figure 2.1: The mainstream industrial robots worldwide [1]

Followings are some of the main requirements for control performance of industrialrobots [39]:(1) assuring accuracy in trajectory tracking during continuous motions (such as laser weld-ing, laser cutting or water flow cutting);(2) maintaining velocity and accuracy during continuous motions (such as spraying andgluing);(3) managing high velocity and acceleration during manipulations (for instance materialhandling);(4) sustaining small overshoot and short settling time during tasks (such as spot weldingand palletizing) [29].

To improve the performance of robots, mainstream robot manufacturers (see Fig.2.1)have invested significantly in hiring qualified personnel and in control system research. Asa result, numerous control system structures and algorithms have been proposed which ef-fectively meet the practical engineering requirements. One of these is the method proposed

8

Page 21: Learning Algorithm Design for Human-Robot Skill Transfer

by Fanuc Robotics to reduce vibrations in robots [40]. The method designs a position esti-mation observer, which is later combined with the internal model control (IMC) structure.In this way, the controller design transforms the original system model into a single outputand dual inertia one. However, this method has its limitations – it is complicated and lack-s self-adaptivity in terms of generalization. Their invention, the Fanuc controller RJ3iC,features the enhanced vibration suppression, enabling accurate control. Another investiga-tion, known as ABB Robotics’ motion control system, uses a model-based control mecha-nism with continuous and path-tracking accuracy called the TrueMove mode [41]. In thismode, the velocity and accuracy of the manipulation can be closely monitored, however, itdoes not minimize vibration to a great extent. Additionally, Motoman Robotics designs ahigh-performance, accurate trajectory-tracking and vibration suppression control methodbased on an advanced robot motion control concept. Furthermore, the industrial robot con-trol system developed by B&R and Beckhoff adopts a model-based control approach andcombines torque feed-forward control method to compensate for the rapid change in non-linear inertia and joint flexural deformations. Thus, it guarantees stability in controllingthe robot as it moves at a high velocity [42].

2.3 Review of Human-Robot Interaction

According to [43], the human-robot interaction (HRI) is a sub-area of the human-computerinteraction (HCI), which studies the interaction between humans and robots, and focus-es on developing more intelligent and anthropomorphic robots. HRI is widely used inresearch and implementation of robot systems in hazardous zones where human involve-ment needs to be minimized and remote operation of robots is required. These inventionscan also apply to care for the elderly and the disabled as well as to entertainment purpos-es [44]. There are various methods of human-robot interaction, amongst which, two areconsidered the main methods of — physical interaction and teleoperation interaction.

The physical human-robot interaction (pHRI) studies the design, control, and plan-ning involved in the close physical interaction between humans and robots in a sharedworkspace. Previous research in pHRI leads to the development of a safe and responsivecontrol method to coordinate and control the physical reactions that occur when a robotperforms a task. Based on the research of Hogan et. al., impedance control is one of themost widely used methods for commanding a robot tracking a given orbit when there is ahuman partner in the workspace [45]. In this control method, the novelty is that the robotcan act like a spring: it allows to be propelled by people and bounces back to its originalposition when not in use; the disadvantages are the control precision and non-adaptivespeed, which limits its applications.

For specialised tasks, such as medical operations, the safest method for human-robotinteraction is teleoperation [46]. In this method, the user interfaces with the robot througha haptic signal for coordinated feedback. The research [47] carries out experiments to

9

Page 22: Learning Algorithm Design for Human-Robot Skill Transfer

characterize this problem, and derives several methods to provide haptic feedback in orderto improve a surgeon’s performance. However, they do not take the enough requiredcontrol room for the surgeon into consideration. As we all know, the surgeon’s activitieshave high requirements for cleaning, and thus the remote control is a better choice. In[48], a research team introduces a visual impedance scheme for vision-based control ofthe robot so as to realize task-level dynamical control. In the control scheme, the imagefeatures are applied to the impedance equation so that integration of a visual servo and aconventional servo system could be accomplished. However, this research is also limitedin the aspects of the self-adaptive decision.

In the research and application of robot technology, western countries gained a head-start before the eastern ones. Robot teaching and learning devices play an important partof the industrial robot control system and have thus been heavily investigated through-out the development of robot technology. To pursue development in robot application,numerous robot teaching devices have been developed by a number of industrial robotmanufacturers and teams through independent scientific research. So far, industrial robotmanufacturers and various scientific research institutions have developed a series of stan-dards that can help promote the wide-scale application of robot technology in industrialproduction. Teaching devices design varies with different underlying concepts. Neverthe-less, interface designs can be categorized into two categories: touch screen operation andkeyboard operation [49].

(a) Human-robot interaction interface represented by keyboard operationA FlexPendant consists of a display screen and the physical control buttons. The dis-play screen is adopted to display information about robot’s motion, and some may equipwith touch screen functions for simple utilities. The robot motion function informs thework status of the robot. Physical control buttons provided under the display screen allowfunctional control of robot’s motion. It benefits from efficiently maneuvering a robot’smovements in a fast-paced industrial work environment wherein a skilled operator canquickly control an industrial robot work’s file with the press of a key or a combination ofkeys [50].

An example of robot FlexPendant utilizing keyboard operation on the Kawasaki robotis shown in Fig.2.2. Buttons for controlling physical motions of the robots are providedalongside an operation manual. The information about task parameters and robot’s stateis displayed on the screen above the FlexPendant. The FlexPendant operates faster atprocessing information than the touch-screen, as it can collect messages faster through theuse of physical buttons. However, too many physical buttons increase robot FlexPendantsize thereby causing inconvenience for users [51].

(b) Human-robot interaction interface represented by touch screen operationIn this type of FlexPendant, the operator does not have a great variety in physical buttonsfor operation, in fact, sometimes there are not any physical buttons at all. Hence, FlexPen-dants are relatively small, and are thus convenient for personnel operation and simplified

10

Page 23: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 2.2: The FlexPendant of the Kawasaki robot [2]

for operator tasks. However, as functions are mainly implemented by using a softwareprogram, control performance is somehow affected.

An example of touch-screen operation is the FlexPendant designed by KUKA Roboticsas shown in Fig.2.3. In the corresponding human-robot interaction interface, various func-tions for controlling robot motion are provided, including monitoring various informationstates, creating robot motion work files, and reproducing the robot teaching function. Inaddition, the KUKA robot’s control system has an automatic operation mode which en-sures that the robot continues to operate even without an external higher-level controller[52]. In this thesis, the HRI interface is chosen as the latter in a KUKA iiwa robot platformbecause of its convenience and simple, easy-to-understand control system of the FlexPen-dant.

Figure 2.3: The FlexPendant of the KUKA robot

11

Page 24: Learning Algorithm Design for Human-Robot Skill Transfer

2.4 Review of Machine Learning in Robotics

With the development of science and technology, the demand for higher robotic intelli-gence and the minimization of dependent on human labour is increasing. Additionally,with the advancement in artificial intelligence, it has become possible to improve robots’intelligence to a level that it can accomplish some tasks independently in a specific scenari-o. A growing body of research is focused on exploring the interrelationship between therobots and the artificial intelligence [53] [54] [55]. It has come to be known that artificialintelligence consists of a series of machine learning algorithms [56]. Machine learningalgorithms are divided into the following categories: supervised learning, unsupervisedlearning, semi-supervised learning and reinforcement learning, each of which containsmany more specific algorithms [57] [58]. The RBF neural network algorithm used in thisresearch is one of the representative algorithms of supervised learning. Although machinelearning has helped to achieve excellent results in image detection and recognition, speechrecognition, games and many other applications, the research is ongoing for combining itwith robot control to improve the intelligent behaviour of robots.

At present, the supervised learning algorithm and reinforcement learning algorithm areused in robotics. In 2016, the University of Berkeley trains robots with convolutional neu-ral networks, which enables robots to complete tasks such as screwing bottle caps [59].Giusti et al. of Zurich University mount a monocular camera on the Unmanned Aeri-al Vehicle (UAV), and realize the real-time planning and control of UAV motion on realforest roads by using deep neural networks [60]. In the same year, Deep Mind collectsa large amount of data on 14 robot arms and trained these to grab objects by means ofenhanced learning [61]. After nearly 3,000 hours of training and more than 800,000 grab-bing attempts, the level of intelligence of robot arms for grabbing random objects havebeen evidently improved. In addition, Deep Mind and Google have worked on trainingrobots to open doors by using deep reinforcement learning [53]. In 2016, Mark Pfeiffer etal. propose an end-to-end learning algorithm based on a convolution neural network forground mobile robots [54]. Using 2D laser sensor to sense information about the surround-ing environment and by using data-driven methods, the path planning of indoor robots isrealized, where the planned path is compared with the one planned by the ROS on theactual robot.

In [62], an approach to overcome the shortcomings of Wiener filtering is proposedcalled the KF, and it is used to estimate past, current, and future state signals regardlessof the knowledge about the exact nature of the model. Filtering is a signal processing andtransformation technique. There are dynamic uncertainties in a robot’s functioning whichhave consequences in its teaching and learning experience. These uncertainties can becontrolled with the development of controls using Fuzzy Logic or Radial Basis Function(RBF) Networks.

RBF is a three-layer forward network with a single hidden layer [63]. Its first layer is

12

Page 25: Learning Algorithm Design for Human-Robot Skill Transfer

an input layer and consists of signal source nodes, and its second layer is a hidden layer.The total number of hidden layer nodes depends on the complexity and requirements ofthe problem described. The transformation function of the neurons in the hidden layer, forexample the radial basis function, is a non-negative linear function that is radially symmet-ric and attenuated to the center point [63]. It is a local-response function, and the specificlocal-response is reflected in the transformation of its visible layer to the hidden layer,a phenomenon which differs it from other networks. The third layer is the output layerwhich responds to the input mode, and therein the input layer only acts as a transmissionsignal [64]. Previous forward network transformation functions are used for a global re-sponse. Functionally, the input layer and the hidden layer can be connected with a weightfactor of 1. The output layer and the hidden layer perform different tasks, hence learningstrategies used for them will also be different [65]. The output layer adjusts to the linearweight using a linear optimization strategy, so the learning speed becomes faster. How-ever, the hidden layer adjusts parameters of the activation function (Green’s function andGaussian function, more commonly the latter) through nonlinear optimization strategieswhich make the learning process slower [66].

According to [67], research about the mutual relationship of machine learning androbot control is ongoing. Through the above examples of supervised learning, enhancedlearning and enhanced control, it is clear that extensive research is still required to createautonomous and truly intelligent robots. Despite its progress in image and voice recogni-tion, combining machine learning with control technology still remains challenges. Thisthesis summarizes three reasons for combining machine learning with control technology:

a) Machine learning that solves traditional problems is “independent”, implying that allinputs are assumed to be independent and identically distributed by default. Taking imageclassification as an example, different individual images do not affect the output of eachother. However, for robot control, specific time-sequence and correlation between frontand back behaviour of the robot are required.

b) Most of the problems that have been solved before machine learning are “no sub-ject”. “No subject” means that in the process of the machine learning algorithm, the onlyinput image and output results are needed. However, for effective control of the robot,information about the state of the robot also needs to be considered apart from the inputof environmental information.

c) Most of the problems solved in machine learning are “static”, while robot control is adynamic process as robot’s behaviour and environment constantly interact with each other.Information comes from the environment may affect the decision-making of the robot.After the robot executes its decision, the observed environment also changes. Hence, arobot’s environment and its state are always in a dynamic state of change.

Based on the requirement of human-robot skill transfer in terms of intelligence andautonomy, this thesis studies the key issues involved the process of controlling robot be-haviour (such as remote teleoperation, target/position detection, decisive motion planning

13

Page 26: Learning Algorithm Design for Human-Robot Skill Transfer

and human-action imitation), and the design and implementation of an advanced robotcontrol system based on machine learning techniques. In this process, we have changed“independence” into the “association”, “no subject” into “with subject” and “static” into“dynamic” to implement an optimum combination of machine learning with robot control.Therefore, the intellectualization is actualized, and the combination of machine learningwith robot control is validated.

2.5 Robot Controller Design

In robotics, control methods are derived to control both the position and the velocity of thejoint motor. There are two main categories in the robot control broadly: kinematic controland dynamic control. In the kinematic control, the control input is often the data related tothe joint positions or velocities, while in the most cases of dynamic control methods, theinput data are relevant to the joint torque [68].

2.5.1 Robot Modelling

Establishing a dynamic robot model is the foundation of robotics and controller designresearch. Currently, researchers endeavour to improve the dynamic performance of robotswhile reducing incurred cost alongside achieving high speed and minimizing the heavyload. As robots’ load-to-weight ratio decreases, robots become more flexible in design.With the enhanced flexibility, a dynamic model which accurately describes the character-istics of a robot is built, and the feed-forward torque is calculated based on the dynamicmodel, which improves the response speed of a robot.

The accurate dynamic robot has wide application in a number of fields – manipulatormotion simulation, motion control system design, mechanical design analysis, etc. Somecontrol schemes, such as predictive control [69], sliding mode control [70] and computedtorque control [71], require an accurate model of robot dynamics. The robot dynamicsmodels are commonly presented as:

M(q)q + C(q, q)q + G(q) + τext = τ (1)

where q denotes the vector of joint angles, M(q) ∈ Rn×n represents the definite inertiamatrix, and n denotes the degree of freedom (DoF); C(q, q)q ∈ Rn denotes the Coriolisand Centrifugal torques. Moreover, G(q) ∈ Rn represents torques due to the gravitationalforce, τ ∈ Rn is the control input vector and τext is the external disturbance. Here, we definethe kinetic energy of the robot as: M(q)q + C(q, q)q, and the potential energy in terms ofgravity as G(q). This calculation can be used to further calculate forward dynamics for thesimulation to measure manipulator motion based on the applied control input, or inversedynamics for control of robots by obtaining torque for a given set of joints.

14

Page 27: Learning Algorithm Design for Human-Robot Skill Transfer

Based on a robot’s specific geometry and inertia parameters, there are two commonlyused methods for developing the dynamics as given in (1): the Lagrangian-Eulerian (L-E) formula and the Recursive Newton-Eulerian (RN-E) method [72]. Both types of theresearch detail methods to describe the dynamic execution of robot motion.

The L-E method is based on a simple, systematic approach to obtaining values of thekinetic and potential energy of a rigid body system. In this regard, Bajeczy’s work illus-trates dynamic equations of motion for a robot. This robot is highly nonlinear, consistingof terms for inertia and gravity, and depending on physical parameters and configurationof the relationship among position, angular velocity and acceleration [73, 74]. Accordingto [75], the L-E method provides a closed form of robot dynamics, and is therefore suit-able for analytical calculations. It can also be used to design joint space (or task space, ifthe robot Jacobian matrix from the base to the end-effector is available).

Additionally, the L-E equation can be used for calculating forward and inverse dynam-ics, however this involves calculating a wide variety of coefficients for M(q) and C(q, q)as outlined in the equation (1), which proves to be time-consuming. Hence, this techniqueis relatively unsuitable for computing on-line dynamic calculations, since alternative s-trategies, especially when the RN-E or Lee’s Generalised d’Alembert Equations (GAE)calculate with fewer derivations and with higher speed [76]. In the research [77], an algo-rithmic for L-E technique is developed which greatly reduces the computational burden ofthe L-E formulation and aligns it with RN-E strategies.

The N-E formula is based on balancing out forces acting on the general link of arobot. This involves a series of equations with recursive solutions [78]. The forwardrecursion propagates link speed and acceleration, and then it recursively propagates forcesand torque along the chain of robots. This method is more efficient than L-E since it uti-lizes a serial chain as a manipulator; when a force is applied to a link, it can also producemotion in the connected link. Considering the considerable computational duplications,the algorithm is expressed in a recursive form [79]. The reduction in computational loadgreatly reduces overall computation time, allowing real-time forward and reverse dynamiccalculations, which assists in implementing real-time torque control methods.

2.5.2 Kinematic Control

Kinematics control is responsible for solving the inverse kinematics, i.e. generating thejoint position or velocity trajectory when the task trajectory in the Cartesian space and theinitial posture of the robot are given. One fundamental example of kinematics control isthe proportional integral derivative (PID) control; the principle of PID control is illustratedin Fig.2.4. The input r(t) and output c(t) of the system denote the velocity of the system,respectively. The velocity response of the system can be adjusted via KP. Moreover, theposition response can be manipulated by tuning KI , and the acceleration can be adjustedby tuning KD.

15

Page 28: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 2.4: Flowchart of the PID control architecture [3]

In [80], the authors propose a fuzzy partition controller mechanism based on error par-titioning method. This method combines fuzzy control technology with error partitioningPID controller. In this research, the design of the controller and improvement of systemcontrol performance is explained. Finally, the experiment proved that the proposed methodcan greatly improve the overall performance of the control system. In another study [81],the authors design a nonlinear PD controller for a manipulator with six degrees of free-dom and four-bar linkage trajectory tracking system to overcome a number of issues. Foronline trajectory planning and kinematics control of modular three-legged parallel robots,[82] designs an inverse kinematics method for directly solving the issue of joint angulardisplacement based on local exponential product formula. Moreover, according to [83],the author designs a PD controller plus gravity compensation control law for a four-barlinkage mechanism and achieved good control performance on the experimental platform.Unfortunately, all of the above-mentioned designs have limitations in terms of variabilityand dexterity.

The study [84] improves upon the design of the one [81] by developing a nonlinear PIDcontroller. Through comparison, it is inferred that the designed controller performance thatthey achieved is better than PD. Based on the kinematics model of a six-degree-freedomparallel robot [85], a nonlinear PID controller has been designed to overcome problemsof system interference and noise measurement. Experimental results demonstrate that thecontroller is easy to implement and has good control performance. In the research [3],the trajectory tracking control is studied based on the kinematics model of the robot. Thetrajectory tracking control of the mobile robot is divided into two categories – trajectorytracking controller (based on back-stepping and time-varying state feedback) and robotvelocity PID controller. In another study [86], back-stepping method is used to constructa mobile robot trajectory tracking controller, which during the design disintegrates mo-bile robot system into low-order subsystems. To this end, intermediate virtual controland partial Lyapunov function are used to simplify controller design such that accurate

16

Page 29: Learning Algorithm Design for Human-Robot Skill Transfer

characteristics for tracking error convergence are obtained.A robot trajectory tracking control system based on kinematics mode offers a number of

advantages including simple structure, less computational burden, and ease of implemen-tation. However, the ignorance of the dynamic characteristics limits the high performanceof the mobile robot, since actual control designs of robotics need to coordinate all jointsfor optimal efficacy.

2.5.3 Dynamic Control

A dynamic controller solves the inverse dynamics problem and calculates the torque re-quired by each joint of the robot. This is executed after the kinematic model has plannedout the desired trajectory for each joint to drive joint motion. In the study [87], the coeffi-cient matrix of the Stewart platform dynamic equation is assumed to be a constant, and aPID inverse dynamics controller is designed. In the process, the error is approximated asinterference, and a compensation controller is developed. The experimental results showa better performance of the designed controller than that obtained by using an uncompen-sated PID inverse dynamics controller, and it is able to obtain the faster reference signals.In another study [88], kinematic controller and dynamic PD controller are designed forredundantly driven parallel robots. Subsequently, a comparison of control performancecomparison is conducted on the experimental platform for both controllers, which showsa better performance of the dynamic controller. On the basis of the study [88], a researchteam design a PD controller, an augmented PD controller and torque controller based onthe dynamic learning model. They then compare the performance of the three controllers[89].

In addition, computed torque control, is also a commonly used control scheme forrobot manipulator, where the nonlinear terms of the dynamics model are compensateddirectly. This information is sent to a robot which provides feedback regarding positionand velocity which are used for control. Command torque τ is calculated using the aboveequation (1). Fig.2.5 is an illustration of the diagram of computed torque control, where s,v, a denote the position, velocity and acceleration of the reference trajectory, respectively,and q, q represent the position and velocity feedback of the robot. The gains Kp and Kv

can be modified to alter the stiffness and damping of the system.According to the research [4], a computational torque controller based on neural net-

work optimization is designed for achieving high-speed motion on 2-DOFs parallel ma-nipulators. In another study [90], the authors propose a modeling method of differentialgeometry for a multi-joint robot with a position/force hybrid control algorithm, and the ex-periments are carried out on a series of multi-joint robot prototypes with different degreesof freedom. Moreover, in [91], a dynamic analysis method based on linear projection map-ping is established and there are some controllers that consider performance indicators forrobot motion control. This strategy solves the point-to-point motion control problem of

17

Page 30: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 2.5: Flowchart of the computed torque control architecture [4]

parallel robots, and also proves that the control method can effectively improve systemperformance. The advantage of dynamic control is its capacity to consider external forceinterference such as uncertainty of kinetic parameters and friction. Furthermore, the study[92] develops a robust nonlinear controller for Stewart platform motion control, introduc-ing Friedland-Park friction mechanism, estimating frictional force to improve the system’srobustness.

18

Page 31: Learning Algorithm Design for Human-Robot Skill Transfer

3 Chapter Three: Preliminary

3.1 NAO Robot

In this research, we chose the NAO robot, a 5 DOF humanoid robot produced by Aldebaran-Robotics in France (see Fig.3.1) [93]. It supports multiple sensors and controllers, includ-ing head and jaw cameras, chest sonar sensors, movement motors on neck, hands and feet,three colour LEDs (red, green and blue) on the eyes, and head and feet tactile sensors [94].The robot has a body mass index (BMI) of about 13.5 kg/m2, relatively light as comparedto other robots of the same height [94]. According to the research [95], the NAO robothas 25 degrees of freedom (DOF) in total, wherein its 11 joints are found in the legs andpelvis and the rest are located in the trunk region, arms and head. In addition, each armconsists of a 2 DOF shoulder, a 2 DOF elbow, 1 DOF wrist and 1 DOF hand-gripper. Thehead is also able to rotate on both yaw and pitch axes [95].

Figure 3.1: Illustration of NAO Robot and coordinate system

3.2 Baxter Robot

The Baxter robot was developed by Rethink Robotics in the United States. It is an inno-vative, intelligent and collaborative robot (Fig.3.2). It is an ideal alternative to manpoweroutsourcing and fixed work automation [96]. With its unique features and benefits, theBaxter robot enables manufacturers to create cost-effective solutions for handling smallbatches and a variety of producing tasks as well as minimizing the requirement of tech-nical staff. There are many leading industrial companies that have gained a significantcompetitive advantage by incorporating the use of the Baxter robot [97].

The Baxter robot comprises of the following parts: one torso, one 2-DOF head andtwo 7-DOF arms, which are shoulder joint: s0, s1, elbow joint: e0, e1 and wrist joint:

19

Page 32: Learning Algorithm Design for Human-Robot Skill Transfer

w0, w1, w2, respectively [98]. It also consists of coordinated cameras, torque sensors,encoders, and sonar. Researchers can directly program the Baxter robot using open-sourcesuch as a standard ROS interface. Seven Serial Elastic Actuators (SEAs) drive the jointsin the Baxter robot arm which effectively regularizes the robot’s movements and helpsit to overcome the effect produced by surrounding obstacles [98]. Commonly, peopleteleoperate and program the Baxter robot using ROS (Robot Operating System) throughthe Baxter SDK running on Ubuntu Operation System. The ROS is an open-source systemand comprises of libraries, module devices and correspondence [99]. The use of thissystem improves the task in terms of displaying and programming of the robot on varioustypes of automated platforms [99].

Figure 3.2: Image of the Baxter robot

3.3 KUKA LBR iiwa robot

KUKA LBR iiwa robot (a robot with human-robot collaboration capabilities) is the firstcommercial robot that is approved for human-robot collaboration (HRC). The KUKA L-BR iiwa robot aids human-robot collaboration for the completion of highly sensitive andprecise tasks [100]. To ensure high control accuracy, the robot has an advanced design of7 degrees of freedom (DOFs) robot arm [100]. The arms are programmed via Workbench,which is a standard KUKA modifying platform employing KUKA robot language (KRL)and Java [100]. The KUKA LBR is controlled by the KUKA SmartPad (Fig. 3.3).

20

Page 33: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 3.3: Image of KUKA LBR robot

3.4 Kinect v2

The Kinect v2, produced by Microsoft, is an RGB-D device, and it can be used to capturedepth, colour, and IR images (as well as sound) using the Kinect 2.0 SDK (Software De-velopment Kit) [101]. The depth, IR and colour image resolutions are significantly betterthan the ones obtained using the first generation Kinect v1. Using the SDK, the obtainedinformation about colour and depth can be consolidated (transformed) into real-world co-ordinates, which are called camera space. These co-ordinates are directed towards thecentre of the depth sensor [102]. Skeletal tracking can also be achieved with the use of theKinect 2.0 SDK and this function has been used in this project. Consequently, this tech-nique is used to track the position of an operator in front of a Kinect device. When a pointis selected by the operator on the Kinect colour image, its depth information is acquiredin the Kinect depth image. In the frame that is constructed, both the colour image, anddepth image are put into the same frame, and its origin is located at the centre of the depthcamera. In this regard, the coordination system of the camera space follows a right-handconvention (see Fig.3.4).

3.5 MYO Armband

The MYO armband (shown in Fig. 3.5) is a wearable device produced by Thalmic Labs.With the MYO armband, an operator is able to communicate with the system via Blue-

21

Page 34: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 3.4: Illustration of the origin of the Kinect v2’s camera space. It is the same as its depth sensor origin which is amodified form of [5]

tooth. In addition, it has 8 built-in EMG sensors and one IMU sensor with 9 axes. Theseassure that the hand posture and arm motion can be efficiently detected when people movetheir arm muscles as the EMG sensors can distinguish the difference in hand gestures.Since every user has distinctive muscle size, skin type and other key differences, the sen-sors create a catalogue of information via electrical driving forces. For this purpose, acalibrating process is essential for MYO armbands to identify the wearer’s motions andgestures.

Figure 3.5: Image of MYO Armband

3.6 Dragon NaturallySpeaking

Dragon NaturallySpeaking is a speech recognition software created by Dragon Systems,a company located in Newton, Massachusetts [103]. Dragon NaturallySpeaking helpsoperators in creating documents, reports, emails, and fill-in forms as well as workflowsheets through verbal commands [103]. The words appear as text in Microsoft Office

22

Page 35: Learning Algorithm Design for Human-Robot Skill Transfer

Suite, Corel WordPerfect, and all Windows-based applications as they are spoken to thecomputer. Moreover, operators are able to create voice commands to run applications onthe computer in a multiple-step process for convenience.

3.7 Dynamic Movement Primitive

The biomimetic robot expert Ijspeert proposed a nonlinear dynamic system control methodin [104]. This method uses a series of linear differentiated equations to model the overallmotion of the robot into a nonlinear dynamic attractor model by adding an automaticlearning term. This method can model discrete motions (such as writing tasks) as well asrhythmic movement such as drumming [104]. DMP utilizes a comprehensive and dynamicsystem to express the diagram of motion trajectories. The DMP model is stated as follows[105]:

τsv = d(g − x) − cv + (g − x0) ftarget (2)τs x = v (3)

where x ∈ R is the Cartesian position, x0 is the initial position, v ∈ R is the velocity of therobot end-effector, τs is the scale factor which affects the speed of the generated motion,g donates the target position, ftarget is a forcing term, which is a nonlinear function, d andc are the coefficients for spring and damping, respectively. Additionally, x, v, v representthe position, velocity, and acceleration, respectively.

DMP has been developed through investigations and studies that helped in learningabout primitive movements to generate a sophisticated model. The concept of DMP canbe divided into two categories: the states that use unique formulations based on dynamicalstructures; and generated trajectories that are constructed via interpolating of factors [106].DMP consists of 2 components – a converted system, r, and a canonical system, h. Theformula for it is given as follows:

s = h(s) (4)x = r(x, s,w) (5)

where x is the transformed system states, s is the canonical system states and w is thetransforming parameters of the canonical system output.

3.8 Gaussian Mixture Model

The GMM actualizes the estimation of the probability density distribution in samples. Theestimated model is the weighted sum of finite Gaussian models [107], in which each Gaus-sian model represents a class. The data in the sample are clustered into several Gaussianmodels and the probability of each model is then obtained [107]. Subsequently, the largest

23

Page 36: Learning Algorithm Design for Human-Robot Skill Transfer

probability is selected by the GMM based on the results. The significance of GMM isconstructing a series of GMMs to denote joint-density, and then obtaining the probabilitydensity and regression function from each GMM.

The GMM formula is defined as,

p(xexp) =

k∑k=1

pk p(xexp | k) (6)

where k denotes the number of the model, pk represents the weight of the kth Gaussian,and it is also the kth Gaussian probability density function, p(xexp | k) is the conditionalprobability distribution, which includes parameters such as the average value of GMM µk

and the variance∑

k. To estimate the probability density, pk, µk and∑

k variables shouldbe available. When the values in the expression are learned, the result of the summation isthe probability of the sample xexp belonging to each model.

3.9 Dynamic Time Warping

The DTW is a typical optimization method used to denote the time difference between thetest time series and the reference one utilizing the time regular function W(n). It solvesthe regular function corresponding to the minimum distance of the two templates [6]. Thisalgorithm is based on dynamic programming (DP), and it effectively matches the timeseries with different lengths [21]. It follows a classic algorithm for speech recognition.The DTW calculates the similarity between two time series by extending and shortening

Figure 3.6: Warping example between two time series, modified from [6]

them. As shown in Fig.3.6, the upper and lower solid lines represent two time series,and the dashed lines located between the time series represent similar points. The DTWuses the sum of the distances between all similar points, termed the warp path distance, tomeasure the similarity between the two time series.

24

Page 37: Learning Algorithm Design for Human-Robot Skill Transfer

3.10 Kalman Filter

The KF method is used to optimally fuse the real-time, dynamic and low-level extra sen-sor data. It evaluates the statistical significance for the optimal fusion of data combinationby using statistical characteristics of the measurement model. If the system has a lineardynamic model, the system noise and sensor noise can be represented using white noisemodels that obey the Gaussian distribution. In this regard, KF would provide the uniquestatistically optimal estimate for the fusion data. The recursive KF reduces the computa-tional burden, and it is divided into two types – continuous-time KF and discrete KF.

The actual physical system is usually continuous, as a consequence, the descriptionof discrete systems often cannot completely replace the continuous-time system. Thetemporal mathematical model of the system is produced below [108]:

x(t) = A(t)x(t) + B(t)u(t) + G(t)ω(t)y(t) = H(t)x(t) + v(t)

(7)

where x and u are n-dimensional state variables; y is m-dimension measurement vector; Ais n × n-dimensional system matrix; G and B are n × r-dimensional system matrix; H ism × n-dimensional measurement matrix; ω is the zero-mean white noise vector of the r-dimensional continuous system; v is an m dimensional vector denoting the potential whitenoise among the measured data.

The continuous-time KF status equation, based on [109] is stated as follows,

˙x(t) = A(t)x(t) + B(t)u(t) + K(t)[y(t) − H(t)x

]K(t) = P(t)HT (T )r−1(t)

P(t) = P(t)HT (t) + A(t)P(t) − P(t)Ht(t)r−1H(t)P(t) + G(t)s(t)GT (t)

(8)

where K represents the filter gain matrix, x is the estimated value of x, and P is the es-timated covariance matrix. continuous-time KF is obtained from the measured values ofthe continuous-time process, and this method is used to estimate the time continuous-timevalue of the system state variable, which solves the differential equation of the matrix.Moreover, the continuous-time KF does not require complex recursive calculations. Theworking principle of KF is illustrated in Fig. 3.7.

3.11 RBF Neural Networks

In adaptive control, a neural network is a commonly used as function approximation tool.The unknown function term in the control system is estimated by the neural network. Aneural network can be grouped into linear parameterization and nonlinear parameteriza-tion, corresponding to RBFNNs and multi-layer neural network (MNNs). RBF neuralnetwork consists of an input layer, a hidden layer and an output layer. The input layer tothe hidden layer is a nonlinear transformation, and the output layer is a linear combination

25

Page 38: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 3.7: Working principle of continuous-time KF

of the output of the hidden layer. RBFNN is not only simple in structure and fast in learn-ing, but also avoids the problems of many nerve layers. As a result, the RBFNN meets thereal-time requirements of the control system. Therefore, in this thesis, the RBFNN is usedto approximate the unknown terms of the control system.

A continuous function can be approximated using linear parameter RBFNNs, such asF(z) : Rm → R, over a minimized set Ωz ⊂ Rm. This can be formulated as, shown belowin [110]:

F(z) = W∗TS (z) + εz ∀z ∈ Ωz (9)

where W∗ = [w∗1,w∗2, · · · ,w

∗l ]T ∈ Rl is the weight vector, z ∈ Ωz is the input vector with

Ωz ⊂ Rm being a minimal set, l is the NN node number, and εz is the estimation error.S (z) = [S 1(‖z−µ1‖), · · · , S l(‖z−µl‖)]T , is the regressor vector, with a radial basis functionS i(·), and µi (i = 1, · · · , l) a central inside S i(·). The Gaussian functions are presented asfollows:

S i(‖z − µi‖) = exp[−(z − µi)T (z − µi)

ς2

](10)

where ς is the variance and µi = [µi1, µi2, · · · , µim]T ∈ Rm represents the center of eachreceptive area. In case that if there is a large number of neurons, based on [110], the radialbasis function (10) can approximate any continuous function with arbitrary precision. Theevaluated weight W utilized to calculate F(z) = WS (z), where W is the estimated neuralweight, will be detailed determined in the later Chapter.

3.12 Concluding Remarks

In this chapter, the relevant preparatory knowledge of mathematical theory is introducedin detail which lays down a foundation for the following chapters. At the same time, this

26

Page 39: Learning Algorithm Design for Human-Robot Skill Transfer

thesis introduces the RBFNN, which is used to approximate the uncertain dynamics of thecontrol system. To minimize the undesired effect of sensor noise, we introduce signalsfusion from multi-sensors based on KF algorithm. The external devices or sensors andexperimental platforms used in this thesis are described in detail.

27

Page 40: Learning Algorithm Design for Human-Robot Skill Transfer

4 Chapter Four: Development of Kinect based Teleoperation of theNAO Robot

4.1 Introduction

Robotic technologies that deals with the teleoperation capabilities of robots are designed toperform specified tasks. Research on motion capture, human-based tracking systems etc.has attracted great attention in the last decade [14]. In research [15], the author developeda platform of multiple RGBD cameras for the Bayesian object tracking system using aMicrosoft Kinect sensor. The kinematic validity of a Kinect-based skeletal tracking systemis guaranteed with an upper limb virtual reality rehabilitation system in [19]. In anotherresearch [18], a model for tracking fingertips and the centre of the palm is developed basedon the Kinect sensor. However, it is also imperative to focus on the research that exploreshead motion.

Head-tracking technology has a wide range of application, especially in face recogni-tion and authentication, security, surveillance and human-robot interaction. However, itfaces challenges such as the low resolution of the human face in videos, changes in illu-mination and motion blur, amongst other factors [111]. According to [112], a multilayerneural network controller for limb following is developed to overcome those challenges.In [113], a research team establishes a model based on their study results for analysinghuman motion in order to create a robot arm that mimics the movements of a human arm.

The Kinect sensor, with its in-depth information, has an increasing number of appli-cations in special areas like gaming, entertainment, health and fitness. In early 2014, Mi-crosoft developed a new generation of sensors: Kinect v2. This new advancement comeswith a Kinect development kit SDK, which helps Kinect v2 to collect accurate color im-age information with depth, and thus became a more viable option for the head-followingproject [20]. Robot visual based detection and tracking systems have been successfully ap-plied in many fields, such as manufacturing industry, military and medical areas. However,because of the complexity of vision, the extraction of visual information needs complexalgorithms to support, and the processing process of these image algorithms usually istime-consuming, which makes the real-time performance of the system difficult to meetthe satisfaction.

NAO robots are employed in diverse fields such as computer science, mathematics,physics and artificial intelligence [114]. It has also been found useful in schools due to theease of use [115]. With the popularity of its utilization in the education sector, NAO robotis programmed for various practical applications, for example, one Swedish universityintroduces voice recognition and visual feedback for students by establishing a laboratoryof NAO robots [116]. Commercially, NAO robots have been used in large productionlaunches to assure the tech-compliance. This section includes a discussion of the Kinectv2 camera and NAO robot. Fig.4.1 shows the control block of the system which has been

28

Page 41: Learning Algorithm Design for Human-Robot Skill Transfer

implemented to verify the effectiveness of our approach.

Figure 4.1: Flowchart of the complete experimental process

4.2 Calibration

Before performing the tracking task, the transformation between the NAO robot’s coordi-nation and the Kinect’s is needed, as shown in (11):

Xi = T X′

i (11)

where T represents the transformation matrix. Hence, mathematically Xi=[xiyizi1]T andX′

i =[x′

iy′

iz′

i1]T are the coordinates under both the robot and Kinect systems.We can identify the transformation matrix T by calculating four points under both the

robot and Kinect coordinates systems. Based on the assumption that there are four points– p1, p2, p3 and p4, and that these coordinates lie in the robot coordinate system andthe Kinect coordinate system, respectively. Fig.4.2 shows one of the four points in therelationship between the NAO robot and the Kinect’s coordinate system. (12) illustratesthe equation to calculate the transfer matrix T from the Kinect coordinates to the robotcoordinates.

T =

x1 x2 x3 x4

y1 y2 y3 y4

z1 z2 z3 z4

1 1 1 1

x′

1 x′

2 x′

3 x′

4y′

1 y′

2 y′

3 y′

4z′

1 z′

2 z′

3 z′

41 1 1 1

−1

(12)

29

Page 42: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 4.2: Illustration of the relationship between NAO Robot and Kinect coordinate system

In this research, four points were chosen as shown in Fig.4.3. To acquire the positionsunder the Kinect coordinate system, a person stood in front of a Kinect sensor and heldhis right hand in the same position as that of the NAO robot. The selected four positionsunder NAO robot coordinate system are: (1, 1, 1), (2, -0.5, 0.5), (1, 0.5, 1.5), (1, -2, -3),which corresponding to the positions under the Kinect coordination: (-0.0790, 0.3053,0.3908), (0.0600, 0.2180, 0.5385), (-0.0012, 0.4000, -0.0703), (0.2364, -0.1443, 0.3090),respectively.

(a) : The first and second position in NAO and Kinectcoordinate frame

(b) : The third and fourth position in NAO and Kinectcoordinate frame

Figure 4.3: Illustration of the four selected positions under both NAO coordinate system and Kinect coordinate system

Substituting all the values on both the NAO and Kinect system into (12), we can calcu-late the transformation matrix of Kinect underground-based coordinate system T k

0 , based

30

Page 43: Learning Algorithm Design for Human-Robot Skill Transfer

on the assumption that the NAO robot remains stationary.

T k0 =

1 2 1 11 −0.5 0.5 −21 0.5 1.5 −31 1 1 1

−0.079 0.06 −0.0012 0.23640.3053 0.218 0.4 −0.14430.3908 0.5385 −0.0703 0.309

1 1 1 1

−1

(13)

Then, we would obtain:

T k0 =

7.87 5.09 2.37 6.58−10.27 −0.4 −0.73 −1.26

1.25 9.58 1.09 −0.39−0 0 0.0000 1.00

(14)

Furthermore, the rotation matrix of the Kinect underground-based coordinate system,Rk

0, is obtained on its transformation matrix, which is provided as follows:

Rk0 =

7.87 5.09 2.37−10.27 −0.4 −0.73

1.25 9.58 1.09

(15)

4.3 Kinematics Methodology

An image stream of the workspace, is captured by the Kinect v2 colour camera. In orderto select a point in the colour image, the operator requires to transform the pose of the endeffector of NAO from the Kinect colour space to the NAO coordinate space. The pixels ofthe colour image are thus transformed into the camera space as follows:

[xc yc 1]T = R · [xk yk 1]T + TTrans (16)

where xc and yc find the position of a transformed colour frame pixel (in camera space),xk and yk denote the position of the colour frame pixel, R is the rotational matrix, andTTrans is the translation matrix, between the colour frame and camera space. However,the Kinect v2 depth camera faces inherent camera issues, such as radial distortion (due tothe curvature of the lens). Fortunately, the intrinsic parameters are provided in one of theKinect SDK examples, and can be used to calculate the distortion transformation matrix.At this point, it becomes possible to transform any point in the camera space to NAO’sworkspace, where the NAO’s origin acts as its reference point.

4.3.1 Acquisition of Orientation Angles of Head

By capturing the pose of a user’s head, the orientation angles αkf , β

kf , γ

kf are acquired

by the Kinect v2. In order to transmit the data of orientation angles for user’s head, the

31

Page 44: Learning Algorithm Design for Human-Robot Skill Transfer

angles need to be transformed into those under the same coordination, here defined as thereference coordinate, as shown in Fig.4.4. In this section, the flat surface of the Kinectcamera is kept parallel to the user’s head, such that the value of βk

f equals zero. Thus, wehave the following rotation matrix:

Figure 4.4: Illustration of the conception for human head orientation angles

R fk (αk

f , βkf , γ

kf ) =

cαk

f cβkf cαk

f sβkf sγk

f − sαkf cγ

kf

sαkf cβ

kf sαk

f sβkf sγk

f + cαkf cγ

kf

−sβkf cβk

f sγkf

cαkf sβk

f cγkf + sαk

f sγkf

sαkf sβk

f cγkf − cαk

f sγkf

cβkf cγ

kf

(17)

where R f0 = RK

0 × R fK , and the value of RK

0 has already been obtained through calibration,as shown in (15), the rotation matrix R f

0 for the user’s head under the reference coordinatesystem can be found through (18), where we denote the abbreviations sαk

f for sa, sβkf for

sb, sγkf for sg, cαk

f for ca, cβkf for cb, cγk

f for cg.

R f0 =

7.87cacb − 10.27(casbsg + sacg) + 1.25(casbcg + sasg)7.87sacb − 10.27(sasbsg − cacg) + 1.25(sasbcg − casg)

−7.87sb − 10.27cbsg + 1.25cbcg

5.09cacb − 0.4(casbsg + sacg) + 9.58(casbcg + sasg)5.09sacb − 0.4(sasbsg + cacg) + 9.58(sasbcg − casg)

−5.09sb − 0.4cbsg + 9.58cbcg

2.37cacb − 0.73(casbsg − sacg) + 1.09(casbcg + sasg)2.37sacb − 0.73(sasbsg + cacg) + 1.09(sasbcg − casg)

−2.37sb − 0.73cbsg + 1.09cbcg

(18)

Finally, the orientation angles of user’s head under the basic ground coordinate can be

32

Page 45: Learning Algorithm Design for Human-Robot Skill Transfer

calculated, which are shown in (19)-(21), where we define R f0=

r11 r12 r13

r21 r22 r23

r31 r32 r33

.α0

f = arctanr21

r11(19)

β0f = arctan ± (

√r2

11 + r221

r31) (20)

γ0f = arctan

r32

r33(21)

4.3.2 Acquisition of Joint Angles of Arm

Kinect sensor emits infrared rays and detects their reflections so that the depth values ofeach pixel in the field of view can be calculated. In this process, first, information aboutthe object body and shape are extracted. Then with this information, the position of eachjoint can be obtained as shown in Fig.4.5.

Figure 4.5: Image of body skeleton captured by Kinect

To obtain the angle of rotation of ShoulderPitch, we can use joints – RightShoulder andCentreShoulder in 3D coordinates and the two skeletal nodes in the three-dimensionalspace form a straight line (l1), assuming that the 3D coordinates of the right shoulderare (x1, y1, z1) and of centre shoulder, (x2, y2, z2). As the ShoulderPitch joint remains un-changed during the rotation in the y-coordinate, only the xoz plane will be considered.The formula thus comes out to be:

z = k1x + b1 (22)

33

Page 46: Learning Algorithm Design for Human-Robot Skill Transfer

where k1=tanθ1 = |z2−z1 |

x2−x1(x1 , x2), b1 is not given for calculating the angle of rotation,

so the formula is not given here. The angular value between l1 and the horizontal axis isdefined as θ1 (ShoulderPitch ) as shown in Fig.4.6, which can be computed by:

θ1 = arctank1 = arctan(

z2 − z1

x2 − x1

)(23)

Figure 4.6: Angle of rotation of the ShoulderPitch

To obtain the angle of rotation of the ShoulderRoll, the 3D coordinates (x, y, z) of joints– ShoulderRight and ElbowRight will be used to calculate it, assuming that the three-dimensional coordinates of ElbowRight are (x3, y3, z3), and the two skeletal nodes in thethree-dimensional space form a straight line (l2). As the ShoulderRoll joint remains un-changed during the rotation in the z-coordinate, only the xoy plane will be considered.The linear equation is given below:

y = k2x + b2 (24)

where k2=tanθ2 =|y3−y2 |

x3−x2(x3 , x2), b2 is not given for calculating the angle of rotation, so

its formula is not given here. Assuming that the angle between l2 and the vertical axis is θ1,as shown in Fig.4.7, θ2 is the ShoulderRoll rotation angle, the formula will be computedas:

θ2 = arctank2 = arctan(

y3 − y2

x3 − x2

)(25)

Figure 4.7: Angle of rotation of the ShoulderRoll

To obtain the angle of rotation of the ElbowRoll, we can use joints of ElbowRight andWristRight in 3D coordinates. Assuming the 3D coordinates of WristRight is (x4, y4, z4),

34

Page 47: Learning Algorithm Design for Human-Robot Skill Transfer

similarly, there is a straight line constituted by ElbowRight and WristRight (l3), the linearequation is given as:

y = k3x + b3 (26)

where k3=tanθ3 =|y4−y3 |

x4−x3(x4 , x3), b3 is not given in the calculation of the angle of rotation,

so the formula is not given here. Assuming that the angle between l2 and l3 is θ3, as shownin Fig.4.8, θ3 is the ElbowRoll rotation angle, and the formula will be:

θ3 = arctank3 = arctan(

k2 − k3

1 + k2k3

)(27)

Figure 4.8: Angle of rotation of the ElbowRoll

4.4 Experimental Studies

In this section, some experiments have been designed to test the performance of the track-ing system, with the help of the software: Kinect SDK for windows, Visual Studio 2013and OpenCV library. The experimental environment is indoor and well-illuminated. Oneperson stands in front of the Kinect sensor, at a distance of two metres. Because the NAOrobot’s arm has only 5 DOFs, the ShoulderPitch, ShoulderRoll, ElbowRoll, ElbowYawand WristYaw joints of the human arm are chosen for the experiment.

4.4.1 Head Following Experiment

The purpose of the experiment was to test the stability of the head-following system. Forthis purpose, a command from the program developed by the Kinetic SDK 2.0 was sentto an additional computer which controls the NAO robot to imitate the movement of thehuman head. As the evidence shown in Fig.4.9, the experiment was conducted four timeswith different postures in order to get a total performance check. The postures included30 rotation to the left, 30 rotation to the right, pitching up by 30 and pitching down by30. Fig.4.10(a) shown below illustrates the first experiment. The collated data show thathead movements were accurately followed.

35

Page 48: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 4.9: Illustration of operator’s head for different postures

From the Fig.4.11, we can find that the average tracking error of HeadPitch angle (atabout 10%) is a bit more than the one of HeadYaw angle (at about 6%). Generally, thehead motion of the NAO robot is able to track the one of the human operators. Thus wecan conclude that the NAO robot imitates the movement of the human head well in bothtwo orientations.

4.4.2 Limb Following Experiment

In the second experiment, the operator moved his limbs while keeping the rest of hisbody still. The limbs were moved forward at 90, raised up at 180 and stretched side-ways at 90 at a low speed from the original position to the final position and for bothright and left arms. As in the case of the first experiment, after collecting the data, acommand was sent to the additional computer which controls the NAO robot to imitate themovements. Fig.4.10(b) shows the process of the second experiment. Fig.4.12 representsthe graph illustrating the data collected about the arm movement by both the operator andNAO robot. It can be concluded from the results that the robot can completely follow themovements of the 5 selected joints.

From the plotted graphs, according to the (28), we can conclude that, angles of theselected joints including both the head and arm have the average error at about 10% withthe maximum error at about 22%. This means using the Kinect camera sensor alone totrack the motion of a human operator shows commonly large errors, hence in the nextchapter, the KF based sensor fusion will be employed to minimize the undesired effectdue to the noise from the sensors.

re =pNAO − pKinect

pKinect(28)

36

Page 49: Learning Algorithm Design for Human-Robot Skill Transfer

(a) : Process of imitation for NAO robot head following: Rotating30 to left and right directions and pitching up and down with 30

(b) : Process of imitation for NAO robot limb following: Puttingforwards at 90, raising up at 180 and stretching sidewards at 90

for both arms

Figure 4.10: Experimental setup of head and limb following tests

Figure 4.11: Result of the head following experiment for both human and NAO robot

37

Page 50: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 4.12: Result of the arm following experiment for both human and NAO robot

38

Page 51: Learning Algorithm Design for Human-Robot Skill Transfer

4.5 Conclusion

A system for tracking a user’s head and arm movement while interacting with the NAOrobot via a Kinect v2 sensor is developed. The Kinect v2 sensor is used to track the headand arm motion of the human operator which are moved in a natural way. In addition,the kinematics equations adopted in this research are used to transfer the Euler anglesof the user’s head from the Kinect coordinate frame to the reference coordinate frame inorder to calculate all the joints angles of the human arm. Furthermore, we also conductedexperiments and drew the curve based on the test data. The results shows that even if theNAO robot can imitate the human user’s movements, there are still errors existing. Thusin the following sections, KF algorithm based sensor fusion will be employed to reducethe noise and RBFNNs will also be applied to overcome the unknown dynamics, whichwill increase the tracking performance of the teleoperation system.

39

Page 52: Learning Algorithm Design for Human-Robot Skill Transfer

5 Chapter Five: Teleoperation Control of the Baxter Robot usingKalman Filter and Neural Networks

5.1 Introduction

With the rapid advancements in sensor and actuator techniques, the use of robot tech-nology has been incorporated into areas such as control system design, data sensor inno-vation, bionics and artificial intelligence. According to the research [117], following theaccomplishment of modern robots, industrial robots currently attract considerable interest.HRI plays an increasingly important role in the industrial robot application. The robot isthought to have the capacity to adjust to the modern needs. Humans can adapt to varioussurrounding environments, hence, if robots demonstrate this adaptive ability through tele-operation then they would be able to carry out more complicated/challenging tasks. Withrobot teleoperation systems, an operator is able to remotely control a robot through theinternet [118].

However, there are some potential challenges, such as ground-truth information miss-ing, signal transmission delay due to the restrictions of transfer speed and data transmis-sion protocol. Despite these issues, it is still a critical and helpful instrument in the fieldof HRI research. Hence, numerous research studies on teleoperation have been done inthe works of literature. In [119], a mutual control strategy for the Baxter robot controlleris developed, where a strategy for emulating human writings links to a Baxter robot con-troller. Researchers are able to teleoperate a robot by utilizing electromyography (EMG)signals and a haptic device. In [120], surface electromyography (sEMG) signals have beenupgraded to perform teleoperation. The human operators can detect the situation in a hap-tic way and adjust muscle compression subliminally. In [121], the authors have improvedthe way for robot teleoperation by hand gesture recognition based on the visual data.

In addition, a direct approach has been developed for the robot to imitate human mo-tion. This approach, coined motion capture technology, is an ideal strategy for skill trans-fer. To capture human motion, the human body itself has to be followed. There are variousmethods to achieve human motion capture. The most widely used method is to detect themarkers from the body of a human operator. However, it is inconvenient for operation.Another method is to utilize image processing from typical cameras. However, this s-trategy is not reliable owing to the unstable body location capacities during the imagingprocess. Other methods include stereo-vision cameras that are applied to motion capturefor depth data analysis. Unfortunately, it needs a long processing time, thus it is proved tobe ineffective for real-time applications.

As one of the enabling techniques for teleoperation, motion capture primarily incor-porates two interfaces, which are remotely wearable device input interface and detectinginterface based on a vision system. Several sensors can be utilized for the visual system,such as Leap Motion and Kinect. In [7], human motions are obtained by using the Kinec-

40

Page 53: Learning Algorithm Design for Human-Robot Skill Transfer

t sensor, and through the utilization of the vector approach, the joint points of a Baxterrobot can be ascertained. In [122], motions of welder-related work are captured by theLeap Motion sensor, which is estimated by a soldering robot with teleoperation networks.Moreover, the wearable devices, for example, exoskeleton [123] or joystick, or omnihapticdevice [124] are normally used. In this thesis, we investigate a control scheme combiningthe wearable device MYO armband together with the motion capture system that uses aKinect sensor, to teleoperate a Baxter robot. The performance of the proposed scheme willbe optimized by KF based sensor fusion. In [125], a robust finite-horizon KF is designedfor discrete time-varying uncertain systems with both the additive and the multiplicativenoise. KF is widely used as it can estimate the past, current and future signal state, even ifthere are uncertainties in the model.

Filtering is a procedure of signal processing and transformation (removing or reduc-ing undesirable components and enhancing the required characteristics). The above stat-ed steps can be implemented either in software or hardware. KF is a software filteringmethod minimizing the mean square noise, which is one of the best estimation criterion.Furthermore, KF uses the state space model of signal and noise to estimate the value ofthe previous time, after that the observed values of the current time are updated. Based onthe established equations and observation systems, the algorithm estimates the minimummean square noise of the signal to be processed. There are methods that are able to ap-proximate and compensate for the uncertainties of the robot dynamics, for example, FuzzyLogic and NN. The NN control continues to be widely examined in the discrete-time con-trol system [126], [127] as well as the continuous-time control system [128].

Some researchers proposed strategies using Kinect and MYO armbands to obtain thejoint angles of human arms and then teleoperate with robots as in [7] and [110]. However,the problem raises that the accurate calculation of the angular data is unavailable owing tothe measured noise. Thus, this research develops an optimum strategy to reduce the influ-ence of the noise using KF based sensor fusion. The vector approach is used to calculatethe five required joints angles, then the KF algorithm is employed to output a series ofmore accurate data. MYO armbands worn on the operator’s lower and upper arm are uti-lized to identify and measure the angular velocity of human arm motion. A Kinect basedbody tracking system is also utilized. Furthermore, the control system is successfully con-nected to NN by UDP for the teaching and learning processes of the Baxter robot, whichpromotes the tracking performance.

5.2 Environmental Setup

5.2.1 System Configuration

To delineate the teleoperation of robots utilizing motion capture, an illustrative system isassembled. It consists of the body tracking system, Baxter robot and MYO armbands, as

41

Page 54: Learning Algorithm Design for Human-Robot Skill Transfer

shown in Fig. 5.1.

Figure 5.1: The complete experimental teleoperation system

Motion capture is conducted by the Kinect sensor. Kinect is utilized due to its low costand ability to provide the data required for this research. The Kinect device is connectedto a remote computer where processing programming software is used to get the positioninformation from the Kinect sensor.

The experiment is performed on a Baxter robot. The overall experimental system wasassociated with and controlled by the development workstation, a remote computer withan Ethernet link, as well as a pair of MYO armbands. The principle of the teleoperationsystem is represented in Fig. 5.2.

Figure 5.2: Diagram for the rule of research in total

42

Page 55: Learning Algorithm Design for Human-Robot Skill Transfer

5.2.2 Development Workstation

The first generation of the Kinect sensor, a part of the proposed teleoperation system, is anarrangement of sensors created as a fringe device with the Xbox video game device to trackthe human motion in 3D space. An RGB camera and a double infrared profundity sensorare located in the front of the Kinect sensor [129]. Utilizing image and depth sensors,Kinect V1 can distinguish the movement of an object/operator. It does not require thehuman user to wear any additional devices, as shown in Fig.5.3.

As can be seen in Fig.5.3, the depth detector of the Kinect V1 contains two units:the monochrome CMOS sensor and the infrared projector (label 1, Fig.5.3). They worktogether in the motion capture process. The image depth and RGB data collected via theKinect sensor is illustrated in Fig. 5.4. The human body can be identified by arrangingseveral straight lines together to show positions and poses in 3D space. Kinect collectsdata about human joint positions and velocities during teleoperation processes, and thensends them to the robot. In this way, human-robot cooperation is achieved. Compared tothe traditional motion tracking device with complex programming, which has high costand complex setup process, Kinect can be integrated into the control system working withopen source codes [130].

Numerous programming projects on Kinect are accessible, including OpenKinect, Open-

Figure 5.3: Image of Kinect sensor: 1. Depth sensors, 2. RGB camera, 3. Motorized base [7]

Figure 5.4: Image of depth and RGB sensor data collected from Kinect V1 modified on [7]

43

Page 56: Learning Algorithm Design for Human-Robot Skill Transfer

NI, Microsoft Kinect for windows SDK [131]. OpenKinect is free and designed to equipthe Kinect with computers and other devices in [132]. OpenNI is able to support a largeamount of different devices apart from Kinect in [132], which utilizes NITE to get theskeleton information of the operator, according to [133]. Microsoft Kinect for windowsSDK is another commonly used platform produced by Microsoft in [133]. SimpleOpenNIis an OpenNI and NITE wrapper for Processing in [134].

The utilization of programming software is imperative and relies on:(1) capacity to separate skeletal information;(2) similarity with different operating systems, for example, Windows and Linux;(3) clear documentation;(4) clear-cut and direct approach for quick confirmation of calculations.

After appropriate examination, a processing programming software which fulfils allthe prerequisites is utilized. According to [135], operators can program via Kinect withSimpleOpenNI wrapper for OpenNI and NITE, and skeleton information can be collectedon both Windows and Linux platforms. Processing is based on Java, hence fundamentallythe same syntax can be used. All the functions utilized in this thesis are given below:

PVector: A class to depict a few dimensional vectors, particularly the Geometric vec-tor. The magnitude and direction of each pixel can be obtained using the methods mag ()and heading (), according to [136].

pushMatrix() and popMatrix(): They can transfer the present transformation ma-trix into the matrix stack. The pushMatrix() can record the current coordinate systeminformation to the stack and popMatrix() restores them. The pushMatrix() capacity andpopMatrix() capacity are utilized in conjunction with other transformation functions andcan be used for the extent of changes [136].

5.2.3 Robot Operating System and RosPy

Robot Operating System (ROS) is an adaptable operating system for programming robots.According to [137], it contains the tools, conventions and libraries needed for the compli-cated task in order to simplify robot activities. ROS can be set up under multiple platforms.

RosPy is a related Python customer library for ROS. The RosPy customer API em-powers Python software engineers to rapidly interact with parameters, services and topicsof ROS. RosPy requires usage speed (i.e. designer time) over runtime execution, hencecalculations can be immediately prototyped and examined inside ROS. It is also benefitialfor codes which have no critical path such as codes for initialization and configuration. Alarge amount of ROS instruments are composed in RosPy script to develop introspectionabilities. A large number of ROS devices - Rostopic and Rosservice - are based on RosPy[99].

44

Page 57: Learning Algorithm Design for Human-Robot Skill Transfer

5.2.4 User Datagram Protocol

User Datagram Protocol (UDP) is one of the core members of the Internet protocol suite(the set of network protocols utilized for the Internet). Through this, a PC software cansend information, using datagrams, to different PCs with an Internet Protocol (IP). WithUDP, unique transmission channels or data paths can be established without prior com-munications. UDP is suitable for purposes where error checking and correction are eithernot necessary or performed in the application, avoiding the overhead of such processingat the network interface level. According to [138], UDP is frequently utilized for time-sensitive applications owing to the fact that dropping packets are preferred to waiting fordelay packets, where it is difficult to find an alternative for the real time system.

5.3 Motion Capture by Kinect

5.3.1 General Calculation

The motion capture calculations for the upper limb depends on information includingdistances, locations and joint angles. The distance between two specified points in 2D and3D space can be calculated by equation (29) and (30).

d2D =√

(x2 − x1)2 + (y2 − y1)2 (29)

d3D =√

(x2 − x1)2 + (y2 − y1)2 + (z2 − z1)2 (30)

where (x1, y1) and (x2, y2) are points in 2D space, d2D is the distance between these twopoints, (x1, y1, z1) and (x2, y2, z2) are points in 3D space, d3D is the length between thesetwo points.

The angles at all the joints are obtained by the law of cosines. The maximal angle is180 degrees. While computing the angles among the joints, an extra point is required todefine at 180∼360 degrees. After capturing the motion, a triangle is drawn by utilizing anytwo joint points. From the other two points, the third point of the triangle can be obtained.In this scenario, the coordinated statistics for every point of the triangle is known, thus weare able to find out the length of every side, instead of the value of each angle, which isstill unknown. As shown in Fig. 5.5, the magnitude of any coveted point can be calculatedby applying the law of cosines. Computations for the joint points illustrate the length ofsides a, b, c. Similarly, we can also calculate each angle of the triangle using the law ofcosines.

5.3.2 Vector Approach

The Kinect sensor can identify every single joint of the human body and supply feedbackconcurrently. All these directions are transformed into vectors and the particular angles

45

Page 58: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 5.5: Mathematical Principle description

of the joints. The coordinates of the human body joints collected via Kinect are underthe Cartesian space. The particular angles from arms are computed. After the mappingprocess by the Kinect sensor is completed, the data are sent to teleoperate the Baxter.

The five points ShoulderPitch, ShoulderYaw and ShoulderRoll as well as ElbowPitchand ElbowRoll, shown in Fig. 5.6, are computed from the arm positions data extractedfrom the Kinect.

Figure 5.6: Demonstration of all related angles in Vector Approach: ShoulderPitch, ShoulderYaw and ShoulderRoll,ElbowPitch and ElbowRoll

The computation of vectors is illustrated in Fig. 5.7. According to [7], the intense linesCO and CD represent the upper left and ahead part of the arm. The line BO represents the

46

Page 59: Learning Algorithm Design for Human-Robot Skill Transfer

distance from the left hip to the left shoulder, and AO represents the length between theright and left shoulder. Directions with coordinated data BX+, BY+ and BZ+ shows the axissystem of the Kinect sensor in Cartesian space, where point B is the origin.

Figure 5.7: The principle of vector approach in mathematical computing

Methodology for Computing ShoulderPitch and ElbowPitch: As illustrated in Fig. 5.7,according to [7], the angle ∠BOC (ShoulderPitch ) is obtained by the distance of two pointsfrom vectors OB to OC. The pitch angles of the shoulder and elbow are calculated uti-lizing the three specified joints’ position, which are shoulder (point O), elbow (point C)and hip (point B). Delivering these three points using the angle Of() function gives feed-back on the value for angles, which are directly sent to Baxter. The ∠OCD (ElbowPitch ),which is the angle among CD and OC, can be computed through sending hand, elbow andshoulder values into the angle Of() for working as well [7]. In this methodology, we canuse the angle Of() command in the Processing software to calculate any angles betweentwo vectors.

Methodology for Computing ShoulderYaw: As we can see from Fig. 5.7, accordingto [7], the angle ∠EBF (ShoulderYaw) is obtained by a similar method as utilizing boththe shoulder point and elbow point, which are point A, O and C, respectively, where thevectors OC and OA are grouped together. However, the above mentioned vectors OC andOA need to be anticipated into the plane XZ. In this way, we are able to obtain the vectors

47

Page 60: Learning Algorithm Design for Human-Robot Skill Transfer

BF and BE. Angle ∠EBF (ShoulderYaw) is the value of the angle between BF and BE,which can be computed by utilizing angle Of() command in Processing.

Methodology for Computing ElbowRoll and ShoulderRoll: The ElbowRoll is theangle between plane OCD and CDI, which can be calculated by the angle Of() functions.The ShoulderRoll is the most difficult to calculate due to the fact that the computing is notstraightforward and all the points are given in a 3D plane, the computing method utilizedabove will not work here [7].

To calculate the required angles, the vector points from elbow to hand, where its plane isopposite to the one from shoulder to elbow (through the shoulder joint too), are noted. Thisreference vector must be stable in relation to the rest of the body. Thereafter, the referencevector can be computed by intersecting the vector points from shoulder to shoulder, andshoulder to elbow.

According to [7], the normal line from that intersecting point of two vectors is usedfor continuous calculation. The vector OM can be obtained by verifying the intersectingpoint between vectors OC and OA. The vector OM is vertical to the plane expanded bythe vectors OA and OC. Clearly, vector OM and vector OC are vertical from each other.

The normal line vector CG can be decided via intersecting vector points CD and OC,which is additionally vertical to vector OC. At this point, we can obtain the vector OHby deciphering vector CG along vector CO to point O. The angle ∠MOH between vectorsOH and OM is defined as the ShoulderRoll [7].

The orientation angles sent by the Kinect can be separated by utilizing PMatrix3D withProcessing software. The PMatrix3D outputs the required rotation matrix as well, wherethe current coordination framework is well backup into the stack. It is then deliveredto the shoulder joint, additionally, the rotation matrix is utilized to transform into thecoordination system data. Every single computation in this capacity will be decided withinthe obtained coordination framework.

After the computation of ShoulderRoll and ElbowRoll angles, the rotation matrix fromthe stack can be recovered to obtain the initial coordination framework. The right Shoul-derRoll is additionally computed with the similar method. Furthermore, a small changehas been applied to the vectors coordination system.

Due to the inaccuracy of the function used to calculate roll angles, the error needsto be corrected; every value change of the ShoulderRoll keeps along with those of theShoulderYaw. After that, the statistics are plotted into the MATLAB, as seen in Fig. 5.8.After several trials, the error is mostly revised by the following equations:

γs = −γs − βs/2 − 0.6 (31)

where we define the angle of left ShoulderRoll as γs and the angle of left ShoulderYaw asβs. These returned data are sent to Baxter development platform for further work utilizingUDP protocol. The data packets created by the server are sent through the function intro-duced above. So far, every single angular value is sent to teleoperate the Baxter robot with

48

Page 61: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 5.8: Error of Vector Approach

the Python script.

5.4 Measurement of Angular Velocity by MYO Armband

The joint angles are obtained by integrating angle velocity. Any position of the humanoperator’s arms can set to be the initial position, if the joint angles are assumed to be zero,according to [110]. When the operator moves his arm to a new pose P, the rotation angles(joint angles) are pose P with respect to the initial pose in [110].

As shown in Fig. 5.9, the frame (X1, Y1, Z1) represents the orientation of the MYOarmband in the initial position. The frame (X2, Y2, Z2) represents the current orientationof the MYO. From the first MYO armband worn on the upper arm, we can obtain threeangular velocity v1x, v1y, v1z, which represent ShoulderRoll, Pitch and Yaw, respectively.From the second MYO armband worn on the forearm, we can obtain the angular velocity

Figure 5.9: The orientation of the MYO in the initial pose and the current pose

v2x, v2y, which represent ElbowRoll and ElbowPitch.

49

Page 62: Learning Algorithm Design for Human-Robot Skill Transfer

There would be errors in the joint angular velocity measured by the joint angle. More-over, the errors will be superimposed and coupled. The shoulder joint error will be super-imposed on the elbow joint, resulting in a greater elbow error. Errors will also accumulateas time evolutes. Although the sampling frequency IMU in the MYO is 50 Hz, the re-sulting angle will have a large difference in value when the joint angle is calculated fromthe angular velocity integral in the program [110]. Here, the method for the angular ve-locity is extended from the method of measurement for the angles using MYO armbandsmentioned in previous research [110]. In summary, MYO armband is used to measure theangular velocity of each joint, and the Kinect is used to get the angles of each joint.

5.5 Kalman Filtering based Sensor Fusion

The KF method has two basic assumptions:(1) the sufficiently accurate model of the information process is a linear (or time-

varying) dynamic system excited by white noise; (2) the measurement signal containsadditional white noise components for each session.

Only if the above assumptions are satisfied, can a KF method be applied. In this thesis,all the data collected from the Kinect sensor and MYO armbands fulfilled the above statedrequirements, hence the use of continuous-time KF to fuse the data from different sensorswere incorporated.

The equation (mentioned in the previous section) is for the continuous-time KF, where-in, x(0), ω and v are not related to each other, according to [108] and [109], the expectedvalues of the continuous-time KF satisfy the following statistical characteristics,

E x(0) = mx(0); E ω(t) = 0; E v(t) = 0

E [x(0) − mx(0)] [x(0) − mx(0)]T = P(0)

Ex(0)ωT (t)

= 0; E

x(0)vT (t)

= 0

Eω(t)vT (t)

= 0; E

ω(t)ωT (τKF)

= s(t)δ(t − τKF)

Ev(t)vT (τKF)

= r(t)δ(t − τKF)

(32)

where s is the system noise variance intensity matrix of continuous system; r is the arrayof measured noise variance intensity; mx(0) and P(0) are the initial mean value of x andthe initial covariance matrix, respectively; δ(t − τKF) is the Dirac δ function.

We assume that every single joint in the human arms is taken into account separatelyfor research, which shows that all the KF factors are the first order, hence, here A=1, B=1,G=1 and H=1. Then the KF equations are simplified as below,

xi = xi + ui + ω

yi = xi + v(33)

50

Page 63: Learning Algorithm Design for Human-Robot Skill Transfer

where in this special case, yi is the angular position of the number of i joint collected fromthe Kinect sensor. And ui is the angular velocity of the number of i joint of the operator'sarm motion.

˙xi = xi + ui + k (yi − xi)

k = pr−1

p = p − pr−1 p + s

(34)

where k is the filter gain matrix, p is the estimated covariance matrix, xi is the required(satisfied) data obtained from KF based sensor fusion, which is also the statistic that needsto be sent to the development workstation via UDP.

5.6 Neural Networks Based Control System

The notation of the data output from the result of KF based sensor fusion qdi, in additionto the desired joint space trajectory qd is defined

qdi = xi

qd = [qd1, qd2, qd3, qd4, qd5]T ∈ R5.(35)

Here, we apply the NN control procedure to achieve the accurate tracking of the referencejoint trajectory. Hence, the controller should be quick, steady and accurate. The referencejoint trajectory qd ∈ R5 consists of a time series of the angle values that are generatedfrom the motion data filtered by the KF. The angular matrix q ∈ R5 denotes the actual jointpositions during the teleoperation. At that point, as indicated by [110], the dynamics ofthe robot are given as shown in (36).

τ = M(q)q + C(q, q)q + G(q) + τext (36)

where M(q) ∈ R5×5 is the inertia matrix, C(q, q) ∈ R5×5 is the Coriolis matrix, G(q) ∈ R5×1

are the gravity terms and τext is unmodeled elements owing to the external disturbance andsystem uncertainties.

Define z = eq + Λeq, qr = qd −Λeq, where eq = q − qd, Λ = diag(λ1, λ2, . . . , λn), λn is apositive constant. Here the dynamic equation (36) can be rewritten as (37) from [110].

τ − τext = M(q)z + C(q, q)z + G(q) + M(q)qr + C(q, q)qr (37)

In [110], an adaptive controller is designed as (38),

τ = H(q) + M(q)qr + C(q, q)qr − Kz (38)

where K = diag(k1, k2...ki), ki >12 , H(q), M(q) and C(q, q) are the RBFNN estimates of

G(q) + τext, M(q) and C(q, q), respectively.

51

Page 64: Learning Algorithm Design for Human-Robot Skill Transfer

Then the closed circling framework dynamic equation is given below as (39) [110].

Mz + Cz + Kz = (M − M)qr + (C −C)qr + (H − H) (39)

The RBFNN based approximation approach is applied as follows [110].

M(q) = W∗TM S M(q) + εM

C(q, q) = W∗TC S C(q, q) + εC

H(q) = W∗TH S H(q) + εH

(40)

where W∗M, W∗

C and W∗H are the constant ideal weight matrix; S M(q), S C(q, q) and S H(q)

are the basis function matrix, and εM, εC and εH are the mismatch uncertainties due to thefact that the number of the hidden neuron is limited.

Then we can write the equation for basis function matrix as follow [110],

S M(q) = diag(S q, . . . , S q

)S C(q, q) = diag

([S q

S q

], . . . ,

[S q

S q

])S H(q) =

[S T

q . . . S Tq

]T

(41)

whereS q =

[φ (‖q − q1‖) φ (‖q − q2‖) . . . φ (‖q − qn‖)

]T

S q =[φ (‖q − q1‖) φ (‖q − q2‖) . . . φ (‖q − qn‖)

]T (42)

The estimates of M(q), C(q, q) and H(q) can be obtained as (43).

M(q) = WTMS M(q)

C(q, q) = WTCS C(q, q)

H(q) = WTHS H(q)

(43)

Now, substituting (43) into (39), then the previous equations are simplified as

Mz + Cz + Kz = WTMS M(q)qr

+ WTCS C(q, q)qr + WT

HS H(q)(44)

where WTM = WT

M −W∗TM , WT

C = WTC −W∗T

C and WTH = WT

H −W∗TH .

For a matrix of n × n, the sum of the elements on the main diagonal of the matrix (thediagonal from the upper left to the lower right) is called the trace of matrix A, denotedtr(A); if A, B are m × n and n × m matrices, then tr(AB) = tr(BA) [139]. Here, theLyapunov candidate function is chosen as,

V =12

zTMz

+12

tr(WT

MQMWM + WTC QCWC + WT

HQHWH

),

(45)

52

Page 65: Learning Algorithm Design for Human-Robot Skill Transfer

where QM, QC and QH are positive fixed weight matrix. Then, the derivative of V , whichshows

V = zTMz +12

zTMz + tr(WT

MQM˙WM + WT

C QC˙WC + WT

HQH˙WH

)(46)

Utilizing the property: M(q)-2C(q, q) are the skew symmetric matrix, and equation(46) becomes:

V = zTMz + zTCz + tr(WT

MQM˙WM + WT

C QC˙WC + WT

HQH˙WH

)(47)

The ideal weight matrix WM, WC and WH are constant matrices, we have the followingrelationship

˙W M = ˙WM

˙WC = ˙WC

˙WH = ˙WH,

(48)

substituting (44) into the equation (47), then we have:

V = − zTKz−

tr[WT

M

(S M(q)qrzT + QM

˙WM

)]+

tr[WT

C

(S C(q, q)qrzT + QC

˙WC

)]+

tr[WT

H

(S H(q)zT + QH

˙WH

)] (49)

According to [110], the upgraded principle is given as follow,˙WM = −Q−1

M(S M(q)qrzT + σMWM

)˙WC = −Q−1

C(S C(q, q)qrzT + σCWC

)˙WH = −Q−1

H(S H(q)zT + σHWH

) (50)

where σM, σC, σH are pre-designed positive constants.Combining (50) and (49), then we obtain

V = − zTKz − σMtr(WT

MWM

)− σCtr

(WT

CWC

)− σHtr

(WT

HWH

) (51)

Applying Young's inequality, (51) can be extended to

V = − zTKz +σMtr(W∗T

M W∗M)

2−σMtr(WT

MWM)2

+σCtr(W∗T

C W∗C)

2−σCtr(WT

CWC)2

+σHtr(W∗T

H W∗H)

2−σHtr(WT

HWH)2

(52)

53

Page 66: Learning Algorithm Design for Human-Robot Skill Transfer

Finally we obtainV ≤ −ηV + κ (53)

where η = min[2K, σM/(λmax(QM)

), σC/

(λmax(QC)

), σH/

(λmax(QH)

)], κ = 1

2 tr(σMW∗TM W∗

M+

σCW∗TC W∗

C + σGW∗TG W∗

G). Since V > 0 and κ is the result of the pre-designed constantsand weight matrix that we give, the length of κ ≤ η, we can have V ≤ 0. As indicated byLyapunov stability theorem, the closed-loop stability is guaranteed.

5.7 Experimental Studies

The experiment was conducted in a sufficiently illuminated indoor environment with oneoperator standing two metres in front of the Kinect sensor. As in the previous section, wechose to test the ShoulderPitch, ShoulderYaw, ShoulderRoll, ElbowPitch and ElbowRoll.After the collation of data, a Baxter robot was simulated in MATLAB and teleoperated byoperators.

5.7.1 The KF Based Limb Following Experiment

An operator wearing a pair of MYO armbands faced the Kinect sensor (as seen in Fig. 5.10)doing different arm movements. The operator wore one MYO armband near the centre ofthe upper arm and the other near the centre of the forearm. The former measured the ori-entation and angular velocity of the ShoulderPitch, ShoulderYaw and ShoulderRoll. Thelatter predicted the orientation and the angular velocity of the ElbowPitch and ElbowRoll.Before the experiments, the MYO armband was calibrated and EMG sensors were warmedup so the MYO armbands could better recognize different hand postures. During the ex-periment, the operator made no movements except with his arms at a reduced and stablevelocity.

5.7.2 The NN Learning Based Limb Following Experiment

In the second experiment, a test experiment was set up to examine the accuracy of thedesigned control system as shown in Fig.5.11, using NN learning. A heavy object wasattached to the end-effector of the Baxter. Meanwhile, the operator, wearing calibratedMYO armbands, stood in front of the Kinect sensor and remotely controlled the end-effector’s position.

5.7.3 Experimental Results

Fig. 5.12 illustrates the experimental results of the five selected DOFs with their differ-ent trajectories under the KF-based sensor fusion between Baxter and the operator. TheKinect sensor provided the position difference between the robot’s real trajectory and the

54

Page 67: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 5.10: Demonstration of the experiment at the different positions

Figure 5.11: Image of designed control system

base point. The MYO armbands provided the angular velocity of those five angles ac-cordingly before they were fused together via KF. The experimental data of the operator’sarm motion from Kinect and MYO, and the optimum output from KF based sensor fusionand the angular statistics of the simulated Baxter robot were recorded, respectively, for thetest. From the comparison results shown in Fig. 5.12, conclusion is drawn that the totalperformance of the motion capture system is improved with the KF based sensor fusion.

The use of a Kinect and MYO armband after KF based sensor fusion to teleoperatea Baxter robot was developed and verified. The results shown in Tab.5.1 demonstrate aseries of ratios, which are the different values of the 5 angles between those obtained byKF and those directly collected by Kinect. Because the values filtered by KF are optimal,and the noise during the teleoperation process is suppressed. Hence, that ratio is definedas the efficient improvement, which is denoted as re as formulated in (54). The ratios

55

Page 68: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 5.12: Graphical result after the KF based sensor fusion (ShoulderPitch, ShoulderRoll, ShoulderYaw, ElbowPitch,ElbowRoll)

shown in Tab.5.1 are averagely at 3.158%, 4.086%, 3.442%, 3.269% and 3.673%, standingfor the angular positions of ShoulderPitch, ShoulderRoll, ShoulderYaw, ElbowPitch andElbowRoll, respectively.

Table 5.1: Table for efficiency improvement of different angular positions

Data ShoulderPitch ShoulderRoll ShoulderYaw ElbowPitch ElbowRoll

Ratio 0.67% 0.1021% 0.348% 0.29% 0.3025%

re =pKF − pKinect

pKinect(54)

where pKF and pKinect are the experimental data of different angular positions obtainedfrom the KF based sensor fusion and directly collected from the Kinect, respectively.

Fig.5.13(c) shows the time series of updating joint parameters in the function approx-imation method. It can be seen that the adaptive parameters of the trained five joints

56

Page 69: Learning Algorithm Design for Human-Robot Skill Transfer

converge boundedly with time. Fig.5.13(a) and Fig.5.13(b) give the joint compensationtorque curves. The output torque of the NN is mainly to compensate for the uncertaintiesin the system to achieve the desired control performance. It shows that the uncertainty ofthe control system appears at a certain moment after the application of the neural network,and the control system can be well compensated by the input torque. The NN learningweighs can be seen in Fig.5.13(d).

Figure 5.13: Torque inputs of the control system with NN. (a) Torque inputs of the control system without NN. (b)Torque inputs of the control system with NN. (c) Tracking performance of the designed system for both without NN andwith NN. (d) NN learning weighs of every single joint.

It can be deduced from the above results that NN learning is a good choice as it has anacceptable error margin during the teaching and learning process, thereby satisfying theproposed design goal.

5.8 Conclusion

Following the last chapter, a KF based sensor fusion has been used to improve the trackingperformance. This involved using a Kinect sensor to capture the motion of an operator’sarm using the vector approach. The vector approach can accurately calculate the angulardata of human arm joints, by selecting 5 out of 7 joints on each arm. In addition, theangular velocity of human operator’s arm is measured by a pair of MYO armbands wornon the operator’s arm. The data collected is thereafter sent to the Baxter robot for tele-operation. The use of the MYO in this research is due to its portability and capacity for

57

Page 70: Learning Algorithm Design for Human-Robot Skill Transfer

accurate computation of the values for the angular velocity of the shoulder and elbow mo-tion. It works well with the Kinect sensor, and the application of KF based sensor fusionhas greatly enhanced user-accuracy during teleoperation processes.

However, after usage of KF based sensor fusion, the robot has not completely donethe motion along with the desired trajectory qd owing to the existence of dynamic uncer-tainties. Hence a second experiment was conducted to compare the effect of the unknownpayload under NN learning with that without NN learning. The results illustrate that theNN-learning applied control system achieves Baxter teaching with the smallest number oferrors.

58

Page 71: Learning Algorithm Design for Human-Robot Skill Transfer

6 Chapter Six: An Enhanced Teaching Interface for a Robot usingDMP and GMR

6.1 Introduction

Robot technology is developing fast and its effective applicability for use in traditionalmanual tasks has increased. The domains in which use have been found for robots continueto increase exponentially which prompt a need for robots with the ability to move assmoothly as a human [140]. In recent years, with the integration of robots into humandaily life, the need for robots to acquire human skills, and teaching by demonstration arethe underlying purpose of many research works in both academic and non-academic fields[17]. In [121], a slave-master controller based TbD is developed to retain the kinematicconstraint by using the tracking system. A trajectory learning approach for multi-robotinteraction for welding tasks is developed in [141]. The ability to transfer human skills,via demonstration, to robots has a huge influence on building robot intelligence, and is animportant way to promote robot learning. It helps to successfully overcome the need tomanually control the robot to replicate complex human movements.

During the teaching process, the human operator transfers motor skills to the imitator(robot) through recordings of the motions being taught and generalized [142]. In [143],a research team develops a problem-based learning (PBL) method on an autonomousvacuum robot with mechatronics systems. An actuated dynamics technology based onproportion-integral-derivative (PID) controller is proposed in [144], where the unactuateddynamics have been shown to be globally bounded. The acquisition of teaching data is re-alized through body sensations which convey human-robot dialogue effectively. This hasa wide range of usage in intelligent identification and control systems as a natural way forhuman-robot interaction. Common somatosensory devices including wear systems, suchas 3DSuit, data gloves, and optical movement capture systems such as Microsoft’s Kinectsomatosensory camera, have been widely employed to the HRI research studies.

The GMM is a commonly used clustering algorithm. It approximates the complex dis-tribution. During the process, it first extracts a feature element of each unit in the sample,then uses the GMM to cluster these features as an object, finally obtaining the segmen-tation result. In [145], authors use continuous myoelectric signals to employ GMMs formultiple limb motion classification. Early in the 19th century, scientists use a finite GM-M whose parameters are estimated through the EM algorithm by estimating a probabilitydensity function of human skin color [146]. The playback process of a robot includesmovement control and movement trajectory reproduction. Trajectory reproduction is em-ployed when the encoded data are sent to regression techniques, such as GMR. Movementcontrol is the result of mapping robot motion trajectories. It is also the playback processobtained from a demonstrator after the learning process. In addition, human-robot skilltransfer can be achieved optimally by programming with special characteristic methods,

59

Page 72: Learning Algorithm Design for Human-Robot Skill Transfer

such as DMP [104].With the objective of designing robots that can complete operational tasks, such as

grabbing objects in dynamic and complicated environments, robots require the ability toovercome and avoid obstacles. For this purpose, the DMP method is used to model andgeneralize the motion trajectory inside the environment with obstacles via combining thespecific planning algorithm with its generalization and stability. In [147], the DMP modelcombined with the haptic based kinesthetic method is used to comprehend the trajectorygeneration for the spherical impediment. In [148], the authors argue that the DMP canbe represented by defining a function approximator. The method calls for unique obstaclerecords to calculate the repulsive pressure within the vicinity of the impediment. A GMMis broadly used in pattern recognition and facts evaluation, whereas GMR is a widely-used quantitative analysis approach. The GMM can clean the probability distribution ofarbitrary shapes. In [106] and [149], the authors develop a proposed GMM to representthe nonlinear term, which is not required to manually specify the parameters of the basicfunctions. However, most probability estimation methods are often not able to attain thecomplete information. For example, though the sample would be known, the Gaussiandistribution is unknown. Hence, it is necessary to employ the Expectation - Maximizationalgorithm (EM) solution.

Machine vision based detection and tracking have attracted wide attention. In [14], aresearch team develops a method, wherein a Bayesian-based object tracking system withthe special focus on Microsoft Kinect devices intelligently schedules a network of multi-ple RGBD sensors. A kinematics based skeletal tracking system using a Microsoft Kinectsensor with an upper limb virtual reality rehabilitation system is investigated [19]. In [18],a task tracking the fingertips and palm centre are developed using a Kinect sensor. Howev-er, due to the unavoidable influence of camera resolution, lens distortion, the illuminationintensity of the surrounding environment and data transmission accuracy, the image itselfcontains a lot of noise, which increases the complexity of image processing and featurematching. In addition, the robot object is a complex time-varying dynamic system, andthere is strong coupling between their degrees of freedom. All these indicate that thereare still many unsolved problems in the research of robot visual areas, which need to befurther studied and discussed, and it is a very valuable subject.

DTW is a measurement of the similarity between two time series. Its usage in speechrecognition is to discover if two words constitute the same phrase. In the time collection,the period of the two time series may not be identical, and the DTW calculates the simi-larity between the two time collections by extending and shortening the time series [150].In this thesis, the Baxter robot is used to test the proposed teaching method by being com-manded to perform obstacle avoidance tasks after the DMP was applied. Then the KUKALBR robot is used to prove the achievement of our designed teaching method by drawingcurves in a horizontal flat paper programmed by recording a sequence of actions taught bya human demonstrator. After that, the DTW and GMR are used to analyse and generalize

60

Page 73: Learning Algorithm Design for Human-Robot Skill Transfer

Table 6.1: Model characters table using Denavit & Hartenberg’s Method [10]

LinkNumber θi di(m) ai(m) αi(rad)

1 θ1 0 0 π/22 θ2 0 0 π/23 θ3 d3 0 π/24 θ4 0 0 π/25 θ5 d5 0 π/26 θ6 0 0 π/27 θ7 0 l7 0

the recorded movements. This allows the robot to function as proposed and the validationof our developed teaching interface has been illustrated.

6.2 The Enhanced Teaching Interface

In this section, we investigate the algorithms for teaching process, playback and the gen-eralizing task, including the DMP, the GMR and the DTW.

6.2.1 Calculation of Arm Joint Angles

According to [8], a Denavit-Hartenberg (DH) featured system chart, as can be seen inFig.6.1, was created to represent the 7-DOF model of our human arm. The DH kinematicparameters of the human arm are indexed in Tab.6.1. In step with the DH approach,the outline of the coordinate frames transformation from body i to border i − 1 can becalculated.

Figure 6.1: Human arm model and its DH coordinate frames [8]

The skeleton data of a human in 3-D positions can be obtained by using the Kinectsensor, which consist of 25 joints and this is shown in the left side of Fig.6.2. The armmodel was made visible with a geometry model, which is shown on the right side of Fig.6.2

61

Page 74: Learning Algorithm Design for Human-Robot Skill Transfer

of our previous work [8]. Next, the point Hip-Left is selected as the origin, at the sametime as x-axis is within the identical path of vector

−−→AO and y-axis is in conjunction with

vector−−→OC [8]. It is easy to align the regular vector of every axis to the base coordinate,

−→X0,

−→Y0 and

−→Z0 [8]:

−→X0 =

−−→AO

|−−→AO|

; (55)

−→Y0 =

−−→OC

|−−→OC|

; (56)

−→Z0 =

−→X0 ×

−→Y0 (57)

Figure 6.2: Screenshot of the skeleton tracking system and the geometry model for human arm in joint space [8]

In our previous work [8], it was found that the plane COD and plane xOy form thesupplement angle θ1.

θ1 = π− <−−→CO ×

−−→CD,−−→CB ×

−−→CO > (58)

θ2 is the angle formed by vector−−→CD and y-axis, which is shown as follows [8]:

θ2 =<−−→CO,−−→CD > (59)

Similarly, the plane BCD and plane CDE form the angle θ3 as defined below [8]:

θ3 =<−−→CB ×

−−→CD,−−→CD ×

−−→CE > (60)

θ4 =<−−→DC,−−→DE > (61)

62

Page 75: Learning Algorithm Design for Human-Robot Skill Transfer

θ5 is the angle between plane CDE and DEH [8].

θ5 =<−−→EC ×

−−→ED,−−→ED ×

−−→EH > (62)

Angle θ6 is formed by vector−−→ED and plane EFH [8].

θ6 = Π/2+ <−−→EH ×

−−→EG,−−→ED > (63)

However, the angle θ7 is difficult to calculate using the above method, which is theyaw angle of the wrist [8]. Here we define a vector

−→V which is in the plane of EFH and

perpendicular to−−→DE. θ7 can thus be measured by forming

−→V and

−→Y7 [8]. While,

−→X7 =

−−→EF

|−−→EF|

; (64)

−→Z7 =

−−→EF ×

−−→EH

|−−→EF ×

−−→EH|

; (65)

−→Y7 =

−→Z7 ×

−→X7 (66)

So now the problem becomes to solve−→V [8]. We know that

−→V is in the plane of EFH

and−→V is perpendicular to

−−→DE. Supposing that:

−→V = k1

−−→EF + k2

−−→EH (67)

There are:

(k1−−→EF + k2

−−→EH) ×

−−→DE = 0; |k1

−−→EF + k2

−−→EH| = 1 (68)

Therefore,

θ7 =<−→V ,−→Y7 > (69)

By using the calculation above, every single joint angle can be obtained. However, twothings have to be adhered to ensure that the calculations are accurate:

(1) the left thumb must be in the same plane as the palm; (2) when the angular vectorschange from zero to π, then further problems need to be addressed [8].

Those above seven angles and their preliminary positions are shown in Fig.6.1 in thejoint space. The proposed geometry vector technique is developed based on the precept ofcosine cost of two vectors proven in (62). Furthermore, the posture between two planesmay be calculated by giving them regular vector [8].

63

Page 76: Learning Algorithm Design for Human-Robot Skill Transfer

6.2.2 Data Preprocessing

The DTW algorithm is based totally on the concept of dynamic programming, and it aimsto locate the shortest distance and highest quality matching path between two distinctcheck samples and reference templates. Let us define the reference time collection as T =t1, t2, t3, · · · , ti, · · · , tL1

and the test sample as R =

r1, r2, r3, · · · , r j, · · · , rL2

, wherein ti

and r j denote the joint posture values of the time factors, L1 and L2 denote the vectorlengths. The space matrix D(i, j) has been illustrated from (87) in Section 5.2.1, when thevectors T and R are nonlinearly matched.

In our research, the DTW approach was used to align the recorded patterns by giving awarped characteristic, W =

w1,w2, · · · ,wp, · · · ,w(p)

, where w(p) = (ip, jp) is the match

factor [6]. Here the warped characteristic W is needed to decrease the gap between thecheck sample vector and the reference template vector. Therefore, the equation is definedas:

D = minK∑

k=1

d[w(p)] (70)

where d[w(p)] = d[Ti(p),R j(p)] describes the distance measure between the i(p)th featureof the test sample vector and the j(p)th feature of the reference template vector, which isusually characterized by a square measure defined as follows:

d[w(p)] = [Ti(p) − R j(p)]2 (71)

In order to align the two samples, we need to construct a matrix grid of m × n, thematrix element (i, j) represents the distance d(Ti,R j) between the two points Ti and R j andeach matrix element (i, j) represents the alignment of points Ti and R j. DTW is aimingto finding a path through several grid points that is representing the aligned points for thetwo samples to be calculated. Here we firstly outline the minimum cumulative distanceamong the 2 factors as DAcc(i, j), then we find that [6]:

DAcc(i, j) = d(Ti,R j) + min(qi,q j)[DAcc(qi, q j)] (72)

where (qi, q j) belongs to the set of all points within a certain path that exists betweenpoints (1, 1) and (i, j). It can be seen from the above components that the minimal cumu-lative distance of the factor (i, j) is related to not only the local distance d(Ti,R j) of theeigenvalues Ti, R j, but also the minimal cumulative distance earlier than this point in thecoordinate system [6].

Hence we conclude that (i, j − 1), (i − 1, j) and (i − 1, j − 1) for any point c(p) = (i, j)inside the coordinate system can reach the preceding point of c(p), so the selection ofthe preceding factor only needs to align with the three above factors. According to theequation below, we can calculate the equal DTW distance among the check pattern vector

64

Page 77: Learning Algorithm Design for Human-Robot Skill Transfer

and the reference template vector, which is shown as follows:

D′ = DAcc(L1, L2). (73)

6.2.3 Trajectory Generation

The canonical system of DMP is an exponential differential equation given by:

τs s = −αhs (74)

where τs > 0 is the temporal scaling factor, αh > 0 is the stability parameter and s is thephase value varying from 0 to 1.

The transformation system is made up of two contents in Cartesian space: a springdamping system and a nonlinear term (see Section 7 of Chapter 3). The transforma-tion function f presents the complex nonlinear system, and it transforms the result ofthe canonical system, which is given by:

f (s) =

N∑i=1

wsl(s) (75)

where N is the number of GMM, ws ∈ R is the weight of l(s), l(s) are the variable valuesof the normalized radial basis functions, which can be given as follows:

l(s) =exp(−hi(s − ci)2)∑N

m=1 exp(−hm(s − cm)2)(76)

where ci > 0 are the centers and hi > 0 are the widths of the Gaussian basis functions. Nis the number of the Gaussian functions.

After selecting the start line x0 and goal g of the canonical system s=0, and integratingthe canonical system, we are able to generate motion by the usage of the weight parameter[151]. DMP is used here to obtain the nonlinear transformation characteristic f (s) throughskill transfer from the demonstrator. However, there is an issue in creating the transformedsystem through the usage of more than one verified path. We, thereafter applied the GMMto overcome the above problems.

The GMM is the estimation of the probability density distribution of the samples [152].The estimated value is the weighted sum of numerous Gaussian models and every Gaus-sian version represents a class [152]. In this thesis, the joint probability of the nonlinearsystem, which is the teaching data encoded through GMM, and the records were recon-structed via GMR to generalize the movement trajectory. For any degree of freedom, givenj teaching data factors ξ j =

f j, s j

, j ∈ R, where s j and f j had been defined in the DMP

segment, N is the number of records points contained in a single training, every teaching

65

Page 78: Learning Algorithm Design for Human-Robot Skill Transfer

data ξ j follows the subsequent probability distribution:

p(ξ j) =

K∑k=1

p(k)p(ξ j | k) (77)

where p(k) is the prior probability, p(ξ j | k) is the conditional probability distribution,which follows the Gaussian distribution, and K is the number of Gaussian model distri-bution. Thus, the whole set of teaching data can be expressed by the GMM as follows[153]:

p(k) = πk (78)

p(ξ j | k) = N(ξ j, µk,∑

k) =

1√(2π)D |

∑k|∗ e−0.5(ξ j−µk)T ∑−1

k (ξ j−µk) (79)

where D is the dimension of the GMM encoding the teaching data. Here we use theBayesian Information Criterion (BIC) method to obtain the value of K [154].

S BIC = −L(ξ j) +n(K)

2lgN

L(ξ j) =

N∑j=1

lg(p(ξ j))

n(K) = K − 1 + K(D +1

2D(D + 1))

(80)

where L(ξ j) measures the model’s characterization of data, n(K) is the number of freeparameters of the model, which is a measure of the complexity of the model.

The GMM parameters need to be determined and are denoted as πk, µk,∑

k. That isthe kth component of prior probability, expectations and variance, respectively. The EMalgorithm is used to estimate the GMM parameters, which are obtained by giving themaximum similarity estimation of the parameters in the probability model, expectationsand variance, respectively.

The teaching data s j is used as the query point, and the corresponding spatial value f ′jis estimated by GMR. It is known that p(ξ j | k) satisfies the Gaussian distribution,

(f j,ks j,k

)∼

N(µk,∑

k), where µk=µ f ,k, µs,k

,∑

k=

∑f ,k

∑f s,k∑

s f ,k∑

s,k

, and the conditional probability f j,k

satisfies the Gaussian distribution as given s j,k [153].Then we have:

f j,k | s j,k ∼ N(µ′f ,k,∑′

f ,k) (81)

µ′f ,k = µ f ,k +∑

f s,k

∑−1

s,k(s j,k − µs,k) (82)∑′

f ,k=

∑f ,k−

∑f s,k

∑−1

f ,k

∑s f ,k

(83)

66

Page 79: Learning Algorithm Design for Human-Robot Skill Transfer

where we are able to calculate the variance∑′

f and the average µ′f of the kth GMM com-ponent, which is shown as follows [153]:

µ′f =

K∑k=1

ηkµ′f ,k (84)

∑′

f=

K∑k=1

η2k

∑′

f ,k(85)

ηk =p(s j | k)∑Ki=1 p(s j | i)

(86)

where µ f′ is the estimation acquired through the distribution of the expected conditions,

and f j is similar to the reconstruction of area values, known as µ f′= f j

′. The generalizeddata of points ( f j

′,s j) are not included in the teaching data, but encapsulates all the essentialfeatures of the teaching behaviour. Furthermore, under the relevant constraints

∑f′, it can

generate a smooth and reliable motion trajectory, which results in the effective control ofthe robot.

6.3 Experimental Studies

A Baxter robot and a KUKA LBR robot were used in our experiments to verify the ef-fectiveness of the proposed method. As for the experimental platform, the PC operationsystem was Windows 10. There was also Kinect SDK for windows, Visual Studio 2013and OpenCV library. The KUKA robot was programmed via Workbench, which is a com-mon modifying platform combined with KUKA robot language (KRL) and Java. Theexperiments were conducted in a well-lighted environment.

6.3.1 Obstacle Avoidance Experiment

Several tests were designed to test the performance of our designed system by controllingthe Baxter robot to navigate a high obstacle. One person, a demonstrator, stood two metresin front of the Kinect sensor. The operator guided the Baxter robot to carry out its tasksvia teleoperation, as shown in Fig.6.3.

At the same time, those data of each joint of operator’s arm is recorded, which is usedfor the playback of the Baxter robot. After that, we increase the height of the obstacle, it isnoticed that the Baxter is not able to pass the obstacle successfully. Hence, the DMP hasbeen employed to generalize the trajectory of the Baxter robot. By doing this, the Baxteris finally able to pass through the obstacle successfully with the increasing height.

67

Page 80: Learning Algorithm Design for Human-Robot Skill Transfer

(a) Obstacle avoidance by teleoperation with Baxter (b) Obstacle avoidance by playback

(c) Failed to pass the obstacle with increasing height (d) Succeeded to pass the obstacle with increasing height afterapplying DMP

Figure 6.3: Illustration of the obstacle avoidance experiment

6.3.2 Trajectory Generalizing Experiment

For this experiment, we tested a KUKA LBR robot, using recorded data of movementsof a demonstrator saved locally and processed using MATLAB. The data generated bythe trajectories were sent to a separate computer and used to control the robot arm. Amarker pen was connected to draw patterns on a horizontal flat surface. A sine wavewas chosen as the reference trajectory for the demonstration, wherein the ability of thedesigned technique is tested with complex shapes. During the experiment, we used a pre-revealed template shaped on a sheet of A4 paper. Then, a human operator guided theKUKA robot as shown in Fig.6.4. The movement of the robot endpoint was recordedduring the demonstration.

The five recorded trajectories in Cartesian space were saved and analysed using theK-means method and EM algorithm to obtain the GMMs. The experimental trajectorieswere plotted through MATLAB. After that, the DTW was used to align the 5 trajectories,with the first curve selected as the reference to align all the other patterns. GMM was usedto encode the trajectories. In the end, the KUKA robot was able to reproduce a generalized

68

Page 81: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 6.4: The setup of the trajectory generalizing experiment

curve on the vertical surface.

6.3.3 Results

The first group of experiments aimed to verify the performance of the proposed DMP, in-cluding the ability to generalize, i.e., spatial scaling, and the learning performance whenthe demonstration is defective. In this experiment, the demonstration process used jointss1, e0 and w2 fixed, and the angles of the joints s0, e1, w0 and w1 which are recordedduring the demonstration. Then the demonstration data are used for the training of themodified DMP. The training result is shown in Fig.6.6. It can be seen from the graph thatthe maximum and the minimum values of ShoulderPitch between the data from demon-stration and playback, in some specific time point, differ by about 0.4 radians. This impliesthat the range of the arm motion of the Baxter robot increased at about 0.4, leading to anaccurate motion. Here the theoretic explanation is that the ShoulderPitch joint angle de-cides the maximum high that a robot can touch and we use the DMP generalization toincrease the moving range of this angle. The motion of joint ShoulderPitch is regeneratedfrom the demonstration, which synthesized the features of the demonstration and enabledthe robot to perform the obstacle passing task successfully as shown in Fig.6.3(d).

As shown on the left side of Fig.6.7, for the second experiment, five distinctly separatecurves were drawn horizontally. It can be concluded that the number of GMMs affectedthe result trajectories. Hence, to achieve good performance of the generated trajectories,the number of GMM components was pegged at 20. In addition, the optimal result tra-jectory is shown on the right side of Fig.6.7. By using the GMR method, we transform

69

Page 82: Learning Algorithm Design for Human-Robot Skill Transfer

(a) Alignment result between the first and the second curves (b) Alignment result between the first and the third curves

(c) Alignment result between the first and the forth curves (d) Alignment result between the first and the fifth curves

Figure 6.5: Illustration of the alignment using DTW

Figure 6.6: The learning and generalization result using the proposed DMP in an obstacle passing task

70

Page 83: Learning Algorithm Design for Human-Robot Skill Transfer

the data retrieval problem of TbD into a joint distribution estimation problem, which isapproximated by a mixture of Gaussians. During the calculation, the key point of thelearning process was correlative to the number of points in the sample set of data linearly.Here the prediction process relied on this number.

Figure 6.7: The demonstrated trajectories for the sine wave with GMM and the result

After that, we modified the DMP code to apply the spatial and temporal generalization.The generalized curve was then drawn on a vertical flip chart pad by the playback processof the KUKA robot, as shown in Fig.6.8. A smooth curve was retrieved from multipledemonstrations using the modified DMP, where the playback process was achieved at 5times the speed, proving the proposed temporal generalization. Future works would lookinto using DMP segmentation for teaching by demonstration-based tasks.

Figure 6.8: Curve on a vertical surface obtained after spatial generalization using the modified DMP

71

Page 84: Learning Algorithm Design for Human-Robot Skill Transfer

6.4 Conclusion

A GMR and DMP combined with DTW based TbD technology was developed in this the-sis, which seeks an effective and superior method for humans to interact with robots. TheKinect V2 sensor was used to teleoperate the Baxter robot for the accurate generation ofthe motions. For the motion generation, the discrete DMP was selected as the basic mo-tion model, which can achieve the generalization of the motions. To improve the learningperformance of the DMP model, the GMM and GMR were employed for the estimationof the unknown function of the motion model. The DMP model was enabled to retrieve abetter motion from multiple demonstrations of a specific task. Two experiments have beenemployed to test the performance of our designed teaching interface. The results verifiedthe effective generalization of the proposed methods. Compared to standard teaching ap-proaches, the proposed teaching interface evaluated the DMP for multiple demonstrationsby combining with GMR after all the experimental data had been initially pre-processedby DTW.

72

Page 85: Learning Algorithm Design for Human-Robot Skill Transfer

7 Chapter Seven: Development of Writing Task Recombination Tech-nology Based on DMP Segmentation via Verbal Command for Bax-ter Robot

7.1 Introduction

Teaching by demonstration (TbD) technology is fast gaining ground in the field of robotic-s. This is due to the advancement in the quick and efficient transfer of skills. With the aidof human guides, robots are able to acquire the dexterity required to carry out tasks. Thismode of human-robot skill transfer has a number of advantages which include:(1) TbD does not require a human instructor with expert skills and knowledge;(2) human-robot skill transfer is achieved in a convenient and efficient manner;(3) it effectively takes human factors such as flexibility and compliance into account [11].

These benefits facilitate task accomplishment [11]. Character-learning based on TbDtechnology with trajectory matching is a highly-researched topic in imitation learning andhas received considerable attention in the past few decades. Conventional approachesin this area constitute spline-based methods [155], dynamic system methods [156] andprobabilistic model methods [157]. The main advantage of robot learning is that it seekseffective control strategies to complete complex motion tasks which are challenging toattain through traditional methods.

Spline-based methods can generate trajectories quickly. However, they are time-dependent,sensitive to interference, do not have the ability to adjust in real-time, and the calculationsneed to be revised when new data is received. Dynamic system approaches that modeldiscrete and rhythmic motions, are not time-dependent, and they allow for real-time ad-justments. It has topological equivalence and is often used as a dynamic movement forhigh level characterization primitives, such as DMP [158] for the construction of complexmodified primitive libraries (complex motions consisting of simple motions representedby a series of primitives). However, DMP is not suitable for direct encoding of com-plex motions and requires more teaching information (position, speed and acceleration).Probabilistic model methods, such as a hidden Markov model (HMM), GMM, dynam-ic Bayesian network (DBN), are often used to match trajectory [31]. Probability modelshave strong coding and noise-processing abilities, excellent robustness, which can dealwith high-dimensional problems. GMM, in particular, has a strong ability for encodingand reproducing continuous complex trajectories. Compared to the DMP, it only need-s space and position-based teaching information that enables it to be used for imitationlearning of complex motions.

The basic theory of GMM is that as long as the number of Gaussian mixtures is suffi-ciently large, an arbitrary continuous distribution can be approximated by calculating theweighted averages of the Gaussian mixtures with arbitrary precision. It is widely used

73

Page 86: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 7.1: Graphical representation of the overview of the proposed technology, modified from [9]

in trajectory generation for robot simulation learning and has a strong behavioural cod-ing capability. For example, in the research [159], by training the teaching data to therobot, learning GMM and stable estimation of the multi-dimensional dynamic system ofnonlinear motion are achieved. These cannot be only generalized to reach the unknownposition but can also be adjusted online in case of interference. In [160], the frameworkof variance-based imitation learning is given in the task space, the motion is modelled byusing GMM, the expectation maximization (EM) algorithm is used to reconstruct the tra-jectory, which is generated by using the GMR and the optimization evaluator realizes theimitative learning. In [161] and [162], the authors employ a Gaussian process to establisha random forward model to represent the motion to be simulated with the Kullback-Leiblerdivergence as an indicator of imitative performance, to ensure effective learning of the datathrough distribution prediction and to perform trajectory matching, and finally to achievemotion simulation. Content-based retrieval methods have gained significance in motion-

74

Page 87: Learning Algorithm Design for Human-Robot Skill Transfer

captured data retrieval [163]. During the data matching process, the start frame and the endframe of the search condition sequence are first indexed into the library to select possiblealternative segments in the motion capture database, and finally a DTW method is used tocalculate the similarity to determine the final search results [164]. With the continued de-velopments in robot research, it has come to be known that robot movement behaviour ishighly complex, and requires the robot’s learning ability to be more demanding. The tradi-tional algorithms cannot achieve complex movements that are not obtained solely throughlaws of movement, such as hitting the ball and a writing task [164]. Furthermore, robotsare required to have enhanced learning ability so that it can self-compensate, interact witha random dynamic environment, and deal with sudden and unknown situations. However,the existing methods are to predict the model in the finite time domain, and the globalstability of the system is difficult to prove.

Writing is a complex task with problems that include continuous complex trajectorycharacterization and discrete trajectory generation. Some scholars have proposed a con-trol chart model [165] and a recurrent neural network [166] to teach robots writing skills,however, they fall short of fixing the problems. Though the control chart model can gen-erate discrete trajectories, the ability to represent complex trajectories is not enough, andthe recurrent neural network can only be used for the reproduction of simple trajectories.Owing to the above problems, a GMM-based teaching method is required that can achievecomplex trajectory characterization for acquiring writing skills.

The test of the imitation learning process in this thesis aims to encode the teachingdata through GMM, extract the behavior characteristics, and reconstruct the data throughGMR, so as to actualize the continuous Chinese character writing skills in trajectories.This method, with the aid of the DMP, is found to effectively solve problems. Based onthe basic DMP, a multi-task extension is applied by teaching the robot every single unitof the Chinese characters needed to achieve the regroup tasks after segmentation of tra-jectories. The Baxter robot successfully learns the writing skills for scribbling Chinesecharacters with a non-continuous trajectory through good coding ability and generaliza-tion performance. The framework that has a basic TbD procedure is divided into fourphases: demonstration, segmentation, alignment and generalization [9]. The overview ofthe framework is shown in Fig.7.1.

7.2 Methodology

In this section, we investigate the proposed teaching process method, playback and theDMP segmentation-based regroup tasks, including the proposed DMP promoted by addingGMM, after the preprocessing using DTW. Here we apply the DTW to match the similar-ity between the character and the candidate domain, using the weighted distance in bothdirections as the final distance.

75

Page 88: Learning Algorithm Design for Human-Robot Skill Transfer

7.2.1 Data Preprocessing using DTW

DTW is an effective time series matching method and is widely used for time series pro-cessing and signal processing [164]. Earlier applications were applied in areas such asspeech recognition. Given the first character sample C as the reference with a width Mand the second character sample Q to be aligned with a width N, the size of the referencearea is the approximate as the size of the character to be aligned, M ≈ N.

Assuming that the writing task is well-structured and continuous, for any two columnswhere there is no cross-matching occurring in between, and each column can find anotherto be matched with, continuous results were achieved. The restriction ensures that thei− th column of the sample C and the j− th column of the sample Q have an accumulateddistance D(ci, q j), and it is only jointly determined by D(ci, q j−1), D(ci−1, q j), D(ci−1, q j−1)and D(ci, q j), which is shown as follows [164]:

D(ci, q j) = min

D(ci, q j−1)D(ci−1, q j)

D(ci−1, q j−1)

+ d(ci, q j) (87)

where both i, j > 1, d(ci, q j) is the distance function between two samples, and the distancebetween element ci in column i in C and element q j in column j in Q, is defined by theEuclidean distance defined as [164]:

d(ci, q j) = (ci − q j)2 (88)

Although there is a difference in writing between the same characters, this differenceshould be kept within a small local area. Therefore, the global path must be constrained tomaintain the invariance of the local structure and accelerate the solution of the problem. Itis important to limit the spatial distance between the two columns ci and q j to be less thanr elements [167].

‖i − j‖ ≤ r (89)r = dk ∗ seqLe (90)

where k is a constant coefficient and seqL is the length of the feature sequence.Using the dynamic programming method can accelerate the solution of the horizontal

distance D(cM, qN). Since the feature sequence length of the sample Q to be aligned andthe feature sequence length of the reference domain are the almost the same, and it isnot necessary to normalize the accumulated distance D(cM, qN) by the sequence length[167]. Note that the length and height of the characters (as the dis-aligned sample andthe reference sample) are inconsistent, we used the weighted two-way DTW algorithmto calculate the final distance of two characters, the weight is the length of the sequenceitself:

dist(C,Q) = D(cN , qN) ∗ N + D(cM, qM) ∗ M (91)

76

Page 89: Learning Algorithm Design for Human-Robot Skill Transfer

Finally, each reference sample C and the dis-aligned sample Q were arranged in as-cending order from dist(C,Q) to obtain a list of candidate regions. Since each sample willhave several key points, the overlapping candidate fields need to be eliminated. Accordingto the obtained candidate sample list, for any single character inside, if there is an orderprior to, and if the overlapping area ratio exceeds the threshold (here we define it as 0.2),it is eliminated. The final list is the result of the optimization.

7.2.2 Trajectory Generation

The basic idea underlying DMP is to use an easily understood dynamic system to achievethe expression of motion trajectory [168]. The spring damping system is the simplestinstance, as shown in (2) of Chapter 3. Here we use a proportional coefficient to scalethe trajectory shape when the new target point is farther than the initial target point of theteaching trajectory.

Equation (2) represents a transformation system. Every independent transformationsystem refers to one degree of freedom, where f (s) is a nonlinear disturbance force andcan be generated through learning. A specific expression is as follows:

f (s) =

∑i ωiϕi(s)s∑

i ϕi(s)(92)

τs s = −as (93)

ϕi(s) = exp(−hi(s − ci)2) (94)

where ϕi is the Gaussian function, wherein ci is the center, hi is the width. By adjusting theweight ωi, equation (92) can be used to express arbitrary shape trajectories. a is constant,s is a phase parameter with the value from 1 monotonically converges to 0 [13]. Wecan conclude that during the process, the external factors decrease as it nears the targetposition g, which ensures the stability of the system to the goal. Equation (93) representsa canonical system, which is used to realize the coupling between multiple transformationsystems, and is not directly dependent on time [13].

The policy parameter ωi can be commonly learned using supervised learning algo-rithms such as locally weighted regression (LWR). To put it simply, in this thesis, weobtain the weight via finding a proper parameter vector by minimizing the following er-ror, where the ftarget can be obtained from the equation (2) from the Section 7 of Chapter3. Combining equations (2) with (92), on the basis of least square method, this locallyweighted linear regression problem can be solved efficiently. If J equals the minimum, ωi

is the optimum.

J =∑

s

( ftarget − f (s))2 (95)

77

Page 90: Learning Algorithm Design for Human-Robot Skill Transfer

After selecting the starting point x0 and target g of the canonical system s=0, then inte-grating the canonical system, we can generate a movement by using the weight parameter.The principle of DMP is to obtain the nonlinear transformation function f (s) by learn-ing from the movements of the demonstrator [13]. So far, the spatial value f (s) and thetemporal value s have been obtained. However, there is a limitation on the creation ofthe transformed systems with multiple demonstrated paths, hence we have employed theGMM to overcome the above issues.

The teaching data is acquired by the motion capture system, and the spatial coordinateinformation of the end-effector of the teacher is obtained. First, dimensionality reduc-tion techniques, such as principal component analysis, are used to perform data prepro-cessing, and the three-dimensional data is mapped into two-dimensional space to obtaintwo-dimensional data. Teaching data are inputted into the learning model during the dataencoding. In order to simplify the data processing steps, the emphasis is on the representa-tion learning and generalization of the system. The two-dimensional teaching data of thisarticle α = αs, αt are obtained directly from the previous section, where we assume thatαs, αt, respectively, representing the spatial value and the temporal value of the teachinginformation. The GMM consists of multiple Gaussian distributions of the value α of eachelement [169],

P(α) =

K∑i=1

ωiη(α; µi,∑

i

) (96)

η(α; µi,∑

i

) =1√

(2π)n/2|∑

i |1/2

e−1/2(α−µi)T ∑−1i (α−µi) (97)

where K is the number of Gaussian functions. The larger the K value is, the better themodel can represent a complex situation; ωi, µi and

∑i are the weights, mean values, and

covariance matrices of the i−th Gaussian distribution, which are required to be determined.The EM algorithm is used to estimate the parameters of the GMM (ωi, µi and

∑i), and

the parameters are learned by searching the parameters in the probability model [170]. TheEM algorithm is commonly employed to estimate the parameter with hidden variables.The above is a maximum likelihood estimation problem. This algorithm continuouslyimproves the lower bound of the likelihood function and optimizes the parameters [170].The training process is very sensitive to the initial value of parameters, which need to beinitialized by k-means clustering. After the parameters (ωi, µi and

∑i) are determined, the

GMM model can be learned based on the data from demonstrations, and thus the skill isencoded by the GMM model.

In this research work, GMR is used to reconstruct the teaching data via GMM learning,by doing this, the generalized output is obtained. αt of teaching data is used as a searchingpoint, and its according spatial value α′s is estimated by GMR. Note that η(α, µi,

∑i) meet

78

Page 91: Learning Algorithm Design for Human-Robot Skill Transfer

the Gaussian distribution [171]: (αs,i

αt,i

)∼ η(α, µi,

∑i

) (98)

where µi=µs,i, µt,i

,∑

i=

∑s,i

∑st,i∑

ts,i∑

t,i

, and the conditional probability of αs,i satisfies the

Gaussian distribution at the given αt,i, i [171]. Then we have

µ′s,i = µs,i +∑

st,i

∑−1

t,i(αt,i − µt,i) (99)∑′

s,i=

∑s,i−

∑st,i

∑−1

t,i

∑ts,i

(100)

and then the average µ′s and variance∑′

s of the number i of GMM components can becalculated as follows [171]:

µ′s =

K∑i=1

ηiµ′s,i (101)

∑′

s=

K∑i=1

η2i

∑′

s,i(102)

ηi =p(αt | i)∑K

n=1 p(αt | n)(103)

where the mean µs′ is the required teaching data reconstruction value (µs

′= αs′), and final-

ly the generalized data points α′ = (αs′, αt) and the variance memory for extracting task

constraints∑

s′ can be obtained. The generalized data points are not included in the teach-

ing data, but they encapsulate all the essential features of the teaching behaviour. Underthe relevant constraint of

∑s′, smooth and reliable motion trajectories can be generated to

achieve effective control of the robot.

7.3 Experimental Studies

7.3.1 Experimental Setup

A Baxter robot is used to verify the effectiveness of the proposed method. A marker penis attached to the gripper of the robot. The operator physically guides the Baxter to writea Chinese character on a flat paper. The experimental setup is shown in Fig.7.2.

The experimental platform, Visual Studio 2013 and OpenCV library, are used within aWindows 10 operation system. The experimental environment was indoor and adequate-ly illuminated. During the teaching process, the operator demonstrated how to write theChinese character “Mu” five times. In doing this, we had four separate single primitives,

79

Page 92: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 7.2: The experimental setup for the Chinese character writing task. Step 1: across stroke; Step 2: vertical stroke;Step 3: left-falling stroke; Step 4: right-falling stroke

which were generalized by DMP to regroup other Chinese words. There was a self-madeimplementation running on a remote PC to control the recording and playback of the tra-jectories of the Baxter robot. This was done by defining any text for the locally outputtedtrajectory files. In addition, Dragon NaturallySpeaking was installed on the remote PC,allowing transfer from voice to text signal, and the generation of robot motion controlcommands via UDP.

The demonstration process is repeated five times with the joint W2 fixed, and the tra-jectories of the joints S 0, E1, W0 and W1 with time were recorded. The demonstration datawas thereafter used to train the modified DMP. The parameters of the DMP model wereset as: τs = 1, K = 20, a = 8. The GMM has a strong trajectory coding ability for complextasks. This research used the JAVA based codes to acquire the writing data in the teach-ing mode, applied the GMM-based imitative learning for writing skills, and obtained thegeneralized output through the GMR, which was performed automatically by the Baxterrobot. Based on the operator’s demonstrated strokes, the second, third and fourth strokesof the Chinese character “Mu” were chosen to be generalized.

7.3.2 Results and Analyses

The experimental trajectories were plotted using MATLAB. The five recorded movementtrajectories for each stroke were saved in Cartesian space, where the K-means method was

80

Page 93: Learning Algorithm Design for Human-Robot Skill Transfer

(a) : The 5 demonstrated trajectories for the across stroke (b) : The result of trajectory for the across stroke using GMR

(c) : The 5 demonstrated trajectories for the vertical stroke (d) : The result of trajectory for the vertical stroke usingGMR

used to initial the analysed data. Thereafter, the EM algorithm was used to obtain theGMMs. After which the DTW method was used to align the five recorded trajectories,where the first curve was chosen as the reference to be aligned with others.

It can be seen from Fig.7.3 that b,d,f,h are GMR-reconstructed trajectories using MAT-LAB and Fig.7.3a,c,e,g are the five demonstrated trajectories. Here we take the secondstep to be spatially generalized, which is the across stroke. Using the GMM-based imita-tive learning, the trajectory can be continuously used to write the Chinese character “Mu”.The blue dotted line is the teaching trajectory by demonstration, the solid black line is thegenerated trajectory and the red solid line is the generalized result after DMP and GMMcoding. Next, the first stroke and all the other generalized strokes are able to form a newChinese character “Bu” by using the verbal commands orderly as shown in Fig.7.4. Thedemonstration process recorded the variable data of six joints S 0, S 1, E0, E1, W0 and W1.The joint W2 was fixed to the value 0 and these data were used to train the modified DMP.The training results are shown in Fig.7.5.

Obviously there are maximum and the minimum values of all the joints angles between

81

Page 94: Learning Algorithm Design for Human-Robot Skill Transfer

(e) : The 5 demonstrated trajectories for the left-fallingstroke

(f) : The result of trajectory for the left-falling stroke usingGMR

(g) : The 5 demonstrated trajectories for the right-fallingstroke

(h) : The result of trajectory for the right-falling stroke usingGMR

Figure 7.3: The demonstrated and reconstructed trajectories of the “Mu” character strokes, the x axis represents xdirection and the y axis represents y direction

82

Page 95: Learning Algorithm Design for Human-Robot Skill Transfer

Figure 7.4: The initial and generalized Chinese character

the demonstration and generalization. In some special time points, spaced about 0.04radians apart (i.e. the range of the arm movement of the robot in those two situationsdiffers at about 0.04), could lead to different arm motions in two different positions. Themovement of joints S 0 to W1 is regenerated through the teaching process, which enabledthe robot to perform the Chinese character writing task successfully as shown in Fig.7.4,and synthesizes the features of our proposed technology as well. As shown in Fig.7.3,smooth curves are obtained from multiple demonstrations using the modified DMP. Hence,the robot acquired the writing skill after the human demonstration.

7.4 Conclusion

A Chinese character recombination technology based on DMP segmentation using verbalcommand for a Baxter robot is developed in this thesis. The technology performs well incontinuous writing of trajectories with generalizations. In the motion generation part, wechose the discrete DMP as the basic motion model because it can generalize the motiontrajectories. To promote the learning efficiency of the DMP model, we employ the GMMand GMR to estimate the unknown function of the model referring to the movement. Withthis modification, the DMP model is able to create better movement (under the multipledemonstrations based tasks). Through this method, the simultaneous encoding of multipleteachings effectively reduces the influence of errors, which, in turn, improves the stability

83

Page 96: Learning Algorithm Design for Human-Robot Skill Transfer

(a) : The value of angles for the initial vertical stroke

84

Page 97: Learning Algorithm Design for Human-Robot Skill Transfer

(b) : The value of angles for the generalized vertical stroke

Figure 7.5: The comparison of the joints angles for the vertical stroke of the “Mu” character with or without DMP, thex axis represents time (in seconds) and the y axis represents joint angles (in radians)

85

Page 98: Learning Algorithm Design for Human-Robot Skill Transfer

of the system.Our proposed method can be extended to achieve multi-task learning and improve the

system’s co-processing capabilities. For this purpose, Dragon NaturallySpeaking softwarehas been installed on the PC and provides an efficient voice interaction framework for TbD,facilitating a higher-level robot learning.

86

Page 99: Learning Algorithm Design for Human-Robot Skill Transfer

8 Chapter Eight: Conclusions

8.1 Summary of Contributions

The unprecedented advancement in modern technology has led to an ever-increasing de-mand for the rapid development in robotic technology. The complexity of today’s robotsystem and the complexity of the work environment influence a robot’s functioning in ex-ecuting tasks and speeding performance. State-of-the-art, intelligent and controlled robotsthat can function autonomously have not yet been developed. In terms of the autonomous,safe and accurate control of robots that operate in complex environments there are stillmajor challenges to be overcome. This thesis has studied those challenges from the per-spective of human-robot skill transfer. The main content of this thesis is the design of theskill transfer method between industrial robots and human beings based on learning algo-rithms. Industrial robots are working in unknown and complex environments. The designof interactive control method needs to meet the requirements of high accuracy and safety.

This thesis first introduces the research status of industrial robots, then summarizesand analyses the research status, advantages and disadvantages of human-robot interac-tion control, machine learning methods and control methods in the field of robotics. Inthis thesis, common problems in robot teaching and behaviour reproduction, such as safe-ty, control accuracy, adaptability and self-expansion, are studied in depth. This thesisstudies the challenges of a human-robot skill transfer perspective. This has been donevia investigating a learning algorithm design validated on a NAO robot, Baxter robot andKUKA LBR iiwa robot, respectively. The main contents of this research are summarizedas follows:

1. Chapter one provides an introduction to robotic research and a short discussion ofrelated publications;

2. Chapter two is a review of the literature related to the scope of the study. Thisreview covers literature on the development of industrial robotics, the relating human-robot interaction, machine learning in robot areas and robot controller design;

3. Chapter three covers a discussion of robot platforms, equipment including the Kinectsensor and the MYO armband and basic mathematical methodologies including the DMP,GMM, GMR, DTW algorithms;

4. Chapter four illustrates a tracking system using a Kinect sensor to teleoperate witha NAO Robot. The operator is able to move his head and arm in a natural way to interactwith the NAO robot, which imitates the operator’s actions in real-time;

5. Chapter five covers the investigation of a NN learning and KF based TbD techniquefor a Baxter robot to further increase the accuracy of the teleoperation process by fusing thedata collected from the MYO armband and Kinect v2 sensor. The NN system is employedto compensate for the uncertain dynamics that exist in the learning process.

6. In chapter six, an enhanced teaching interface for a robot using DMP and GMR is

87

Page 100: Learning Algorithm Design for Human-Robot Skill Transfer

developed. A Baxter robot is commanded to go around obstacles set at different heights.Thereafter, the KUKA LBR iiwa robot is used to draw curves in a horizontal plane with aplayback in a vertical pad at different speeds.

7. Chapter seven demonstrates a robot writing technology inspired by DMP based seg-mentation and speech recognition. This involves the operator teaching the robot to writea Chinese character via verbal commands and robot playback via spatial and temporalgeneralization.

In addition, the novelties of this research project are as follows:1. This thesis presents and designs a robot control system which integrates vision

technology and RBFNN approximation technology to solve the problem of remote controlfor a robot manipulator. Finally, the method is employed on the Baxter robot platform.

2. Through the combination of DMP, GMR GMM and DTW algorithms, the robot hasthe ability to self-adapt and generalize spatially/temporally, which greatly reduces the timeof teaching and training for robots, improves work efficiency. It is validated in the Baxterand KUKA iiwa robots.

3. According to different working needs, different teaching methods have been de-signed: for production of hazardous chemicals, the real-time remote teleoperation basedhuman-robot skill transfer has been applied; for those producing operations with high-precision requirements, such as binding, cutting, the physical teaching by demonstrationmethod can be applied.

4. The transfer of human motor skills is not just one-way based, such as Chinesewriting tasks, human operators teach writing skills to robots, robots can still transfer thesame skills to humans in the learning stage, such as children, disabled people.

5. Combining voice interaction with human-robot skill transfer to achieve verbal com-mands such that controlling the robot to complete specific tasks is more convenient.

8.2 Future Works

Although this thesis has achieved some research results in solving the problem of skilltransfer between humans and robots, there are still some aspects to be improved:

1. This research work was limited by the inability to carry it out in an actual problem-solving situation even though there were three robots available. Hence, in the future, anactual industrial application or task based the research will be further carried out.

2. In addition, the dynamic model provided is not reliable outside the workspace whereit was verified and tested. The controller developed in this work needs physical verificationto determine how it will deal with actual environmental disturbances.

3. There exists a chattering phenomenon in extracting human motion information invisual technology. If the chattering signal is used as the input signal of the controller, itwill not only reduce the control performance of the system, but also cause mechanical wear

88

Page 101: Learning Algorithm Design for Human-Robot Skill Transfer

and tear over time, which will lead to instability of the system. Therefore, it is necessaryto solve the problem of input signal jitter.

4. As suggested previously, there is a need for research on the practical uses to whichthe different types of sensors and algorithms can be adopted. For example, the fuzzy logiccan be used for the sensor fusion instead of the KF.

89

Page 102: Learning Algorithm Design for Human-Robot Skill Transfer

References

[1] Hyojoo Son, Changwan Kim, Hyoungkwan Kim, Seung Heon Han, andMoon Kyum Kim. Trend analysis of research and development on automation androbotics technology in the construction industry. KSCE Journal of Civil Engineer-ing, 14(2):131–139, 2010.

[2] Atsushi Watanabe, Yoshiharu Nagatsuka, and Jun Mizuno. Robot teaching device,May 13 2008. US Patent 7,373,220.

[3] Xu Junyan, Zhang Peiren, and Cheng Junfei. Real-time trajectory tracking controlof mobile robot based on backstepping time-varying state feedback and pid controlmethod. Electric Machines and Control, 8(1):35–38, 2004.

[4] Zhiyong Yang, Jiang Wu, and Jiangping Mei. Motor-mechanism dynamic modelbased neural network optimized computed torque control of a high speed parallelmanipulator. Mechatronics, 17(7):381–390, 2007.

[5] Clemens Amon, Ferdinand Fuhrmann, and Franz Graf. Evaluation of the spatialresolution accuracy of the face tracking system for kinect for windows v1 and v2.In Proceedings of the 6th Congress of the Alps Adria Acoustics Association, 2014.

[6] Francois Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nicholson, Yan-ping Chen, and Eamonn Keogh. Dynamic time warping averaging of time seriesallows faster and more accurate classification. In Data Mining (ICDM), 2014 IEEEInternational Conference on, pages 470–479. IEEE, 2014.

[7] Hitesh Reddivari, Chenguang Yang, Zhaojie ju, Peidong Liang, Zhijun Li, and BinXu. Teleoperation control of baxter robot using body motion tracking. In Mul-tisensor Fusion and Information Integration for Intelligent Systems (MFI), 2014International Conference on, pages 1–6, 2014.

[8] Chunxu Li, Chenguang Yang, Peidong Liang, Angelo Cangelosi, and Jian Wan. De-velopment of kinect based teleoperation of nao robot. In 2016 International Con-ference on Advanced Robotics and Mechatronics (ICARM), pages 133–138. IEEE,2016.

[9] Chenguang Yang, Chao Zeng, Peidong Liang, Zhijun Li, Ruifeng Li, and Chun-Yi Su. Interface design of a physical human–robot interaction system for humanimpedance adaptive skill transfer. IEEE Transactions on Automation Science andEngineering, 15(1):329–340, 2018.

[10] Peidong Liang, Lianzheng Ge, Yihuan Liu, Lijun Zhao, Ruifeng Li, and Ke Wang.An augmented discrete-time approach for human-robot collaboration. Discrete Dy-namics in Nature and Society, 2016, 2016.

90

Page 103: Learning Algorithm Design for Human-Robot Skill Transfer

[11] Chenguang Yang, Chao Zeng, Yang Cong, Ning Wang, and Min Wang. A learningframework of adaptive manipulative skills from human to robot. IEEE Transactionson Industrial Informatics, 2018.

[12] Jorge Solis, Simone Marcheschi, Antonio Frisoli, Carlo Alberto Avizzano, andMassimo Bergamasco. Reactive robot system using a haptic interface: an activeinteraction to transfer skills from the robot to unskilled persons. Advanced robotics,21(3-4):267–291, 2007.

[13] Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning andgeneralization of motor skills by learning from demonstration. In Robotics andAutomation, 2009. ICRA’09. IEEE International Conference, pages 763–768, 2009.

[14] Hanxuan Yang, Ling Shao, Feng Zheng, Liang Wang, and Zhan Song. Recentadvances and trends in visual tracking: A review. Neurocomputing, 74(18):3823–3831, 2011.

[15] Florian Faion, Simon Friedberger, Antonio Zea, and Uwe D Hanebeck. Intelli-gent sensor-scheduling for multi-kinect-tracking. In Intelligent Robots and System-s (IROS), 2012 IEEE/RSJ International Conference on, pages 3993–3999. IEEE,2012.

[16] Michael C Nechyba and Yangsheng Xu. Human skill transfer: neural networks aslearners and teachers. In Proceedings 1995 IEEE/RSJ International Conference onIntelligent Robots and Systems. Human Robot Interaction and Cooperative Robots,volume 3, pages 314–319. IEEE, 1995.

[17] Chunxu Li, Chenguang Yang, Jian Wan, Andy SK Annamalai, and Angelo Can-gelosi. Teleoperation control of baxter robot using kalman filter-based sensor fu-sion. Systems Science & Control Engineering, 5(1):156–167, 2017.

[18] Jagdish L Raheja, Ankit Chaudhary, and Kunal Singal. Tracking of fingertips andcenters of palm using kinect. In Computational intelligence, modelling and sim-ulation (CIMSiM), 2011 third international conference on, pages 248–252. IEEE,2011.

[19] Tao Gordon, Philippe Archambault, and Mindy Levin. Evaluation of kinect skeletaltracking in a virtual reality rehabilitation system for upper limb hemiparesis. InVirtual Rehabilitation (ICVR), 2013 International Conference on, pages 164–165.IEEE, 2013.

[20] Cha Zhang and Zhengyou Zhang. Calibration between depth and color sensors forcommodity depth cameras. In Computer Vision and Machine Learning with RGB-DSensors, pages 47–64. Springer, 2014.

91

Page 104: Learning Algorithm Design for Human-Robot Skill Transfer

[21] Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. Voice recognition al-gorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping(dtw) techniques. arXiv preprint arXiv:1003.4083, 2010.

[22] John Bares, Martial Hebert, Takeo Kanade, Eric Krotkov, Tom Mitchell, Reid Sim-mons, and William Whittaker. Ambler: An autonomous rover for planetary explo-ration. Computer, 22(6):18–26, 1989.

[23] Anirban Mazumdar, Martin Lozano, Aaron Fittery, and H Harry Asada. A compact,maneuverable, underwater robot for direct inspection of nuclear power piping sys-tems. In 2012 IEEE International Conference on Robotics and Automation, pages2818–2823. IEEE, 2012.

[24] WC Flannigan, Gabriel M Nelson, and Roger D Quinn. Locomotion controllerfor a crab-like robot. In Robotics and Automation, 1998. Proceedings. 1998 IEEEInternational Conference on, volume 1, pages 152–156. IEEE, 1998.

[25] John E Bares and David S Wettergreen. Dante ii: Technical description, results, andlessons learned. The International Journal of Robotics Research, 18(7):621–649,1999.

[26] Zhongcheng Wu, Fei Shen, Dezhang Xu, and Huaguo Zhou. Networked transducerinterface module design and its application in bionic robotic sensing system. InIntelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on,volume 6, pages 4759–4762. IEEE, 2004.

[27] Davis Meike and Leonids Ribickis. Energy efficient use of robotics in the automo-bile industry. In Advanced Robotics (ICAR), 2011 15th International Conferenceon, pages 507–511. IEEE, 2011.

[28] Scott A Green, Mark Billinghurst, XiaoQi Chen, and J Geoffrey Chase. Human-robot collaboration: A literature review and augmented reality approach in design.International journal of advanced robotic systems, 5(1):1, 2008.

[29] Torgny Brogårdh. Present and future robot control developmentan industrial per-spective. Annual Reviews in Control, 31(1):69–79, 2007.

[30] Chunxu Li, Chenguang Yang, and Cinzia Giannetti. Segmentation and generaliza-tion for writing skills transfer from humans to robots. Cognitive Computation andSystems, 2019.

[31] Chunxu Li, Chenguang Yang, Zhaojie Ju, and Andy SK Annamalai. An enhancedteaching interface for a robot using DMP and GMR. International Journal of Intel-ligent Robotics and Applications, pages 1–12, 2018.

92

Page 105: Learning Algorithm Design for Human-Robot Skill Transfer

[32] Chunxu Li, Chenguang Yang, Andy Annamalai, Qingsong Xu, and Shaoxiang Li.Development of writing task recombination technology based on dmp segmentationvia verbal command for baxter robot. Systems Science & Control Engineering,6(1):350–359, 2018.

[33] Chunxu Li, Chenguang Yang, Jian Wan, Andy Annamalai, and Angelo Cangelosi.Neural learning and kalman filtering enhanced teaching by demonstration for a bax-ter robot. In 2017 23rd International Conference on Automation and Computing(ICAC), pages 1–6. IEEE, 2017.

[34] Songde Ma. Integration technology for advanced manufacturing: R&d in china 863program. In IEEE International Conference on Integration Technology, 2007.

[35] Richard Volpe and Pradeep Khosla. A theoretical and experimental investigation ofexplicit force control strategies for manipulators. Automatic Control IEEE Trans-actions on, 38(11):1634–1650, 1993.

[36] Loulin Huang, Shuzhi Sam Ge, and Tongheng Lee. Fuzzy unidirectional forcecontrol of constrained robotic manipulators. Fuzzy Sets & Systems, 134(1):135–146, 2003.

[37] Haifa Mehdi. Stiffness and impedance control using lyapunov theory for robot-aided rehabilitation. International Journal of Social Robotics, 4(1):107–119, 2012.

[38] Anders Blomdell, Isolde Dressler, Klas Nilsson, and Anders Robertsson. Flexibleapplication development and high-performance motion control based on externalsensing and reconfiguration of abb industrial robot controllers. In Proc. ICRA 2010Workshop on Innovative Robot Control Architectures for Demanding (Research)Applications, pages 62–66. Citeseer, 2010.

[39] Auke Jan Ijspeert. Central pattern generators for locomotion control in animals androbots: a review. Neural Networks, 21(4):642–653, 2008.

[40] Christine Connolly. A new integrated robot vision system from fanuc robotics.Industrial Robot: An International Journal, 34(2):103–106, 2007.

[41] Thanh Vo et al. Autonomous packaging robot. 2010.

[42] Lianzheng Ge, Jian Chen, and Ruifeng Li. Feedforward control based on fourierseries trajectory fitting method for industrial robot. In Control And Decision Con-ference (CCDC), 2017 29th Chinese, pages 4890–4894. IEEE, 2017.

[43] Michael A Goodrich, Alan C Schultz, et al. Human–robot interaction: a survey.Foundations and Trends® in Human–Computer Interaction, 1(3):203–275, 2008.

93

Page 106: Learning Algorithm Design for Human-Robot Skill Transfer

[44] Christoph Bartneck and Jodi Forlizzi. Shaping human-robot interaction: under-standing the social aspects of intelligent robotic products. In CHI’04 Extended Ab-stracts on Human Factors in Computing Systems, pages 1731–1732. ACM, 2004.

[45] Neville Hogan. Controlling impedance at the man/machine interface. In Robotic-s and Automation, 1989. Proceedings., 1989 IEEE International Conference on,pages 1626–1631. IEEE, 1989.

[46] Russell H Taylor. A perspective on medical robotics. Proceedings of the IEEE,94(9):1652–1664, 2006.

[47] Allison M Okamura. Methods for haptic feedback in teleoperated robot-assistedsurgery. Industrial Robot: An International Journal, 31(6):499–508, 2004.

[48] Yoshihiro Nakabo and Masatoshi Ishikawa. Visual impedance using 1 ms visu-al feedback system. In Robotics and Automation, 1998. Proceedings. 1998 IEEEInternational Conference on, volume 3, pages 2333–2338. IEEE, 1998.

[49] Timo Oksanen, Jari Kostamo, Petro Tamminen, and Johannes Tiusanen. Robotcompetition as a teaching and learning platform. In Proc. the 18th IFAC WorldCongress, Milano, August, pages 5176–5181, 2011.

[50] Takayuki Yazawa, Masago Shiba, Hiroto Nakajima, and Masashi Fujiwara. Indus-trial robot, July 12 2016. US Patent 9,387,584.

[51] UG Student. Robot truck loading by unique matrix routine. Development, 2(5),2015.

[52] Rainer Bischoff, Johannes Kurth, Gunter Schreiber, Ralf Koeppe, Alin Albu-Schaffer, Alexander Beyer, Oliver Eiberger, Sami Haddadin, Andreas Stemmer,Gerhard Grunwald, et al. The kuka-dlr lightweight robot arm-a new reference plat-form for robotics research and manufacturing. In Robotics (ISR), 2010 41st inter-national symposium on and 2010 6th German conference on robotics (ROBOTIK),pages 1–8. VDE, 2010.

[53] Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. Deep reinforce-ment learning for robotic manipulation with asynchronous off-policy updates. InIEEE International Conference on Robotics & Automation, 2017.

[54] Mark Pfeiffer, Michael Schaeuble, Juan Nieto, Roland Siegwart, and Cesar Cadena.From perception to decision: A data-driven approach to end-to-end motion planningfor autonomous ground robots. 2016.

[55] Tai Lei and Liu Ming. Deep-learning in mobile robotics - from perception to controlsystems: A survey on why and why not. 2016.

94

Page 107: Learning Algorithm Design for Human-Robot Skill Transfer

[56] Juergen Schmidhuber. Deep learning in neural networks: an overview. NeuralNetw, 61:85–117, 2015.

[57] Vincent Franois-Lavet, Raphael Fonteneau, and Damien Ernst. How to discountdeep reinforcement learning: Towards new dynamic strategies. Computer Science,2015.

[58] Juan C. Caicedo and Svetlana Lazebnik. Active object localization with deep rein-forcement learning. In IEEE International Conference on Computer Vision, 2015.

[59] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-endtraining of deep visuomotor policies. Journal of Machine Learning Research,17(1):1334–1373, 2015.

[60] Alessandro Giusti, Jerome Guzzi, Ciresan Dan, Fang Lin He, Juan Pablo Rodriguez,Flavio Fontana, Matthias Faessler, Christian Forster, Jurgen Schmidhuber, and Gi-anni Di Caro. A machine learning approach to visual perception of forest trails formobile robots. IEEE Robotics & Automation Letters, 1(2):661–667, 2017.

[61] Sergey Levine, Peter Pastor, Alex Krizhevsky, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with large-scale data collection. InternationalJournal of Robotics Research, (10), 2016.

[62] Ke Li, Yue Lei Zhang, and Zhi Xiong Li. Application research of kalman filter andSVM applied to condition monitoring and fault diagnosis. In Applied Mechanicsand Materials, volume 121, pages 268–272. Trans Tech Publ, 2012.

[63] Sridhar Seshagiri and Hassan K Khalil. Output feedback control of nonlinear sys-tems using rbf neural networks. IEEE Transactions on Neural Networks, 11(1):69–79, 2000.

[64] Carlo Cecati, Janusz Kolbusz, Paweł Rozycki, Pierluigi Siano, and Bogdan M Wil-amowski. A novel rbf training algorithm for short-term electric load forecasting andcomparative studies. IEEE Transactions on industrial Electronics, 62(10):6519–6529, 2015.

[65] Marko Robnik-Sikonja. Data generators for learning systems based on rbf networks.IEEE transactions on neural networks and learning systems, 27(5):926–938, 2016.

[66] Rui Yang, Poi Voon Er, Zidong Wang, and Kok Kiong Tan. An rbf neural networkapproach towards precision motion system with selective sensor fusion. Neurocom-puting, 199:31–39, 2016.

[67] Yukiko Hoshino, Tsuyoshi Takagi, U Di Profio, and Masahiro Fujita. Behaviordescription and control using behavior module for personal robot. In IEEE Interna-tional Conference on Robotics & Automation, 2004.

95

Page 108: Learning Algorithm Design for Human-Robot Skill Transfer

[68] Reza N Jazar. Theory of applied robotics: kinematics, dynamics, and control.Springer Science & Business Media, 2010.

[69] Philippe Poignet and Maxime Gautier. Nonlinear model predictive control of a robotmanipulator. In Advanced Motion Control, 2000. Proceedings. 6th InternationalWorkshop on, pages 401–406. IEEE, 2000.

[70] Yong Feng, Xinghuo Yu, and Zhihong Man. Non-singular terminal sliding modecontrol of rigid manipulators. Automatica, 38(12):2159–2167, 2002.

[71] Mark Uebel, Ioannis Minis, and Kevin Cleary. Improved computed torque controlfor industrial robots. In Robotics and Automation, 1992. Proceedings., 1992 IEEEInternational Conference on, pages 528–533. IEEE, 1992.

[72] J Adongo Ochier, Clementina D Mladenova, and Peter C Muller. An approach toautomatic generation of dynamic equations of elastic joint manipulators in symboliclanguage. Journal of Intelligent and Robotic Systems, 14(2):199–218, 1995.

[73] Antal K Bejczy. Robot arm dynamics and control. Jet Propulsion LaboratoryTechnical Memo, pages 33–669, 1974.

[74] Antal K Bejczy and Richard Paul. Simplified robot arm dynamics for control. InDecision and Control including the Symposium on Adaptive Processes, 1981 20thIEEE Conference on, pages 261–262. IEEE, 1981.

[75] SaFid M Megahed. Principles of robot modelling and simulation. John Wiley &Sons, Inc., 1993.

[76] George Lee, Bowen Lee, and Ricahrd Nigam. Development of the generalizedd’alembert equations of motion for mechanical manipulators. In Decision and Con-trol, 1983. The 22nd IEEE Conference on, pages 1205–1210. IEEE, 1983.

[77] King Sun Fu, Rafael Gonzalez, and Lee George. Robotics control, sensing, vision,and intelligence. 1987.

[78] Bruno Siciliano, Lorenzo Sciavicco, Luigi Villani, and Giuseppe Oriolo. Robotics:modelling, planning and control. Springer Science & Business Media, 2009.

[79] Philip Mckerrow. Introduction to robotics. Addison-Wesley Longman PublishingCo., Inc., 1991.

[80] Yongli Huang and Seiji Yasunobu. A general practical design method for fuzzy pidcontrol from conventional pid control. In Ninth IEEE International Conference onFuzzy Systems. FUZZ-IEEE 2000 (Cat. No. 00CH37063), volume 2, pages 969–972. IEEE, 2000.

96

Page 109: Learning Algorithm Design for Human-Robot Skill Transfer

[81] Puren R Ouyang, Wen-Jun Zhang, and Fang-Xiang Wu. Nonlinear pd control fortrajectory tracking with consideration of the design for control methodology. InRobotics and Automation, 2002. Proceedings. ICRA’02. IEEE International Con-ference on, volume 4, pages 4126–4131. IEEE, 2002.

[82] Guilin Yang, I-Ming Chen, Weihai Chen, and Wei Lin. Kinematic design of a six-dof parallel-kinematics machine with decoupled-motion architecture. IEEE trans-actions on robotics, 20(5):876–887, 2004.

[83] Fathi Ghorbel and Ruvinda Gunawardana. A validation study of pd control of aclosed-chain mechanical system. In Decision and Control, 1997., Proceedings ofthe 36th IEEE Conference on, volume 2, pages 1998–2004. IEEE, 1997.

[84] Zhen Gao, Dan Zhang, and Yunjian Ge. Design optimization of a spatial sixdegree-of-freedom parallel manipulator based on artificial intelligence approaches.Robotics and Computer-Integrated Manufacturing, 26(2):180–189, 2010.

[85] Yunxing Su, Binye Duan, and Chunhong Zheng. Nonlinear pid control of asix-dof parallel manipulator. IEE Proceedings-Control Theory and Applications,151(1):95–102, 2004.

[86] Warren E Dixon, Darren M Dawson, Erkan Zergeroglu, and Aman Behal. Adaptivetracking control of a wheeled mobile robot via an uncalibrated camera system. IEEETransactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 31(3):341–352, 2001.

[87] Se-Han Lee, Jae-Bok Song, Woo-Chun Choi, and Daehie Hong. Position controlof a stewart platform using inverse dynamics control with approximate dynamics.Mechatronics, 13(6):605–619, 2003.

[88] Guangfeng Liu, Yulie Wu, Xuezhong Wu, Yingying Kuen, and Zhixing Li. Analysisand control of redundant parallel manipulators. In Robotics and Automation, 2001.Proceedings 2001 ICRA. IEEE International Conference on, volume 4, pages 3748–3754. IEEE, 2001.

[89] Hui Cheng, Yiu-Kuen Yiu, and Zexiang Li. Dynamics and control of redun-dantly actuated parallel manipulators. IEEE/ASME Transactions on mechatronics,8(4):483–491, 2003.

[90] Guanfeng Liu and Zexiang Li. A unified geometric approach to modeling and con-trol of constrained mechanical systems. IEEE Transactions on robotics and au-tomation, 18(4):574–587, 2002.

[91] Farhad Aghili. A unified approach for inverse and direct dynamics of constrainedmultibody systems based on linear projection operator: applications to control andsimulation. IEEE Transactions on Robotics, 21(5):834–849, 2005.

97

Page 110: Learning Algorithm Design for Human-Robot Skill Transfer

[92] Hag Seong Kim, Young Man Cho, and Kyo-II Lee. Robust nonlinear task spacecontrol for 6 dof parallel manipulator. Automatica, 41(9):1591–1600, 2005.

[93] David Gouaillier, Vincent Hugel, Pierre Blazevic, Chris Kilner, Jer Ome Monceaux,Pascal Lafourcade, Brice Marnier, Julien Serre, and Bruno Maisonnier. Mechatron-ic design of nao humanoid. In Robotics and Automation, 2009. ICRA’09. IEEEInternational Conference on, pages 769–774. IEEE, 2009.

[94] David Gouaillier, Vincent Hugel, Pierre Blazevic, Chris Kilner, Jerome Monceaux,Pascal Lafourcade, Brice Marnier, Julien Serre, and Bruno Maisonnier. The naohumanoid: a combination of performance and affordability. CoRR abs/0807.3223,2008.

[95] Jason Kulk and JS Welsh. A low power walk for the nao robot. preprint, availableat, 2008.

[96] Chenguang Yang, Huaiwei Wu, Zhijun Li, Wei He, Ning Wang, and Chun-Yi Su.Mind control of a robotic arm with visual fusion technology. IEEE Transactions onIndustrial Informatics, 2017.

[97] Andrew D Wilson, Jarvis A Schultz, Alex R Ansari, and Todd D Murphey. Dynamictask execution using active parameter identification with the baxter research robot.IEEE Transactions on Automation Science and Engineering, 14(1):391–397, 2017.

[98] Cliff Fitzgerald. Developing baxter. In 2013 IEEE Conference on Technologies forPractical Robot Applications (TePRA), pages 1–6. IEEE, 2013.

[99] Steve Cousins. Ros on the pr2 [ros topics]. IEEE Robotics& Automation Magazine,17(3):23–25, 2010.

[100] Gunter Schreiber, Andreas Stemmer, and Rainer Bischoff. The fast research inter-face for the kuka lightweight robot. In IEEE Workshop on Innovative Robot ControlArchitectures for Demanding (Research) Applications How to Modify and EnhanceCommercial Controllers (ICRA 2010), pages 15–21. Citeseer, 2010.

[101] Kourosh Khoshelham and Sander Oude Elberink. Accuracy and resolution of kinectdepth data for indoor mapping applications. Sensors, 12(2):1437–1454, 2012.

[102] Jarrett Webb and James Ashley. Beginning Kinect Programming with the MicrosoftKinect SDK. Apress, 2012.

[103] James Altman. Taming the dragon effective use of dragon naturallyspeaking speechrecognition software as an avenue to universal access. Writing & Pedagogy,5(2):333–348, 2014.

98

Page 111: Learning Algorithm Design for Human-Robot Skill Transfer

[104] Stefan Schaal, Jan Peters, Jun Nakanishi, and Auke Ijspeert. Learning movementprimitives. In Robotics research, the eleventh international symposium, Springer,pages 561–572, 2005.

[105] Stefan Schaal. Dynamic movement primitives-a framework for motor control inhumans and humanoid robotics. In Adaptive motion of animals and machines, pages261–280. Springer, 2006.

[106] Affan Pervez and Dongheui Lee. Learning task-parameterized dynamic movementprimitives using mixture of gmms. Intelligent Service Robotics, 11(1):61–78, 2018.

[107] Hongyu Ren, Da Xu, and Wenxin Li. Modeling the uncertainty in finger-vein au-thentication by the gaussian mixture model. In Proceedings of the 2017 4th Inter-national Conference on Biomedical and Bioinformatics Engineering, pages 39–42.ACM, 2017.

[108] Narjes Davari, Asghar Gholami, and Mohammad Shabani. Multirate adaptiveKalman filter for marine integrated navigation system. The Journal of Navigation,pages 1–20, 2016.

[109] Simo Sarkka and Arno Solin. On continuous-discrete cubature Kalman filtering.IFAC Proceedings Volumes, 45(16):1221–1226, 2012.

[110] Chenguang Yang, Junshen Chen, and Fei Chen. Neural learning enhanced teleoper-ation control of baxter robot using IMU based motion capture. In Automation andComputing (ICAC), 2016 22nd IEEE International Conference on, pages 389–394,2016.

[111] Wenyi Zhao, Rama Chellappa, P Jonathon Phillips, and Azriel Rosenfeld. Facerecognition: A literature survey. ACM computing surveys (CSUR), 35(4):399–458,2003.

[112] Frank L Lewis, Aydin Yesildirek, and Kai Liu. Multilayer neural-net robot con-troller with guaranteed tracking performance. Neural Networks, IEEE Transactionson, 7(2):388–399, 1996.

[113] Masanobu Yamamoto and Kazutada Koshikawa. Human motion analysis based ona robot arm model. In Computer Vision and Pattern Recognition, 1991. ProceedingsCVPR’91., IEEE Computer Society Conference on, pages 664–665. IEEE, 1991.

[114] Alina Ninett Panfir, Razvan Gabriel Boboc, and Gheorghe Leonte Mogan. Naorobots collaboration for object manipulation. In Applied Mechanics and Materials,volume 332, pages 218–223. Trans Tech Publications Ltd, 2013.

99

Page 112: Learning Algorithm Design for Human-Robot Skill Transfer

[115] Jason Kulk and James S Welsh. Evaluation of walk optimisation techniques for thenao robot. In Humanoid Robots (Humanoids), 2011 11th IEEE-RAS InternationalConference on, pages 306–311. IEEE, 2011.

[116] Athanasia Louloudi, Ahmed Mosallam, Naresh Marturi, Pieter Janse, and VictorHernandez. Integration of the humanoid robot nao inside a smart home: A casestudy. In Proceedings of the Swedish AI Society Workshop (SAIS). Linkoping Elec-tronic Conference Proceedings, volume 48, pages 35–44, 2010.

[117] Elena Garcia, Maria Antonia Jimenez, Pablo Gonzalez De Santos, and Manuel Ar-mada. The evolution of robotics research. IEEE Robotics & Automation Magazine,14(1):90–103, 2007.

[118] Kerstin Dautenhahn. Socially intelligent robots: dimensions of human–robot in-teraction. Philosophical Transactions of the Royal Society B: Biological Sciences,362(1480):679–704, 2007.

[119] Chenguang Yang, Sai Chang, Peidong Liang, Zhijun Li, and Chun-Yi Su. Teleoper-ated robot writing using EMG signals. In Information and Automation, 2015 IEEEInternational Conference on, pages 2264–2269, 2015.

[120] Chenguang Yang, Junshen Chen, Zhijun Li, Wei He, and Chun-Yi Su. Develop-ment of a physiological signals enhanced teleoperation strategy. In Information andAutomation, 2015 IEEE International Conference on, pages 13–19, 2015.

[121] Chen Li, Hongbin Ma, Chenguang Yang, and Menyin Fu. Teleoperation of a virtualicub robot under framework of parallel system via hand gesture recognition. In 2014IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1469–1474,2014.

[122] YuKang Liu, YuMing Zhang, Bo Fu, and Ruigang Yang. Predictive control forrobot arm teleoperation. In Industrial Electronics Society, IECON 2013-39th Annu-al Conference of the IEEE, pages 3693–3698, 2013.

[123] Inseong Jo, Younkyu Park, and Joonbum Bae. A teleoperation system with anexoskeleton interface. In 2013 IEEE/ASME International Conference on AdvancedIntelligent Mechatronics, pages 1649–1654, 2013.

[124] Zhangfeng Ju, Chenguang Yang, Zhijun Li, Long Cheng, and Hongbin Ma. Tele-operation of humanoid baxter robot using haptic feedback. In Multisensor Fusionand Information Integration for Intelligent Systems (MFI), 2014 International Con-ference on, pages 1–6. IEEE, 2014.

[125] Fuwen Yang, Zidong Wang, and YS Hung. Robust kalman filtering for discretetime-varying uncertain systems with multiplicative noises. IEEE Transactions onAutomatic Control, 47(7):1179–1183, 2002.

100

Page 113: Learning Algorithm Design for Human-Robot Skill Transfer

[126] Chenguang Yang, Shuzhi Sam Ge, Cheng Xiang, Tianyou Chai, and Tong HengLee. Output feedback NN control for two classes of discrete-time systems withunknown control directions in a unified approach. IEEE Transactions on NeuralNetworks, 19(11):1873–1886, 2008.

[127] Yan-Jun Liu, CL Philip Chen, Guo-Xing Wen, and Shaocheng Tong. Adaptive neu-ral output feedback tracking control for a class of uncertain discrete-time nonlinearsystems. IEEE Transactions on Neural Networks, 22(7):1162–1167, 2011.

[128] Weisheng Chen and Licheng Jiao. Adaptive tracking for periodically time-varyingand nonlinearly parameterized systems using multilayer neural networks. IEEETransactions on Neural Networks, 21(2):345–351, 2010.

[129] Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren, and Dieter Fox. RGB-Dmapping: Using kinect-style depth cameras for dense 3d modeling of indoor envi-ronments. The International Journal of Robotics Research, 31(5):647–663, 2012.

[130] Adi Sucipto, Agung Harsoyo, and Pranoto Hidaya Rusmin. Implementation of ges-ture recognition on aquarium application. In System Engineering and Technology(ICSET), 2012 IEEE International Conference on, pages 1–4, 2012.

[131] Maged N Kamel Boulos, Bryan J Blanchard, Cory Walker, Julio Montero, AalapTripathy, and Ricardo Gutierrez-Osuna. Web GIS in practice x: a microsoft kinectnatural user interface for google earth navigation. International journal of healthgeographics, 10(1):1, 2011.

[132] Norman Villaroman, Dale Rowe, and Bret Swan. Teaching natural user interactionusing OpenNI and the Microsoft Kinect sensor. In Proceedings of the 2011 con-ference on Information technology education, pages 227–232. The Journal of theACM, 2011.

[133] Jordi Bolıbar. Kinect audio-runner: Audio feedback for improving performance inlong-distance running. Master of Science Thesis, KTH Royal Institute of Technolo-gy, Stockholm, Sweden, 2012.

[134] Marco Ronchetti and Mattia Avancini. Using kinect to emulate an interactive white-board. MS in Computer Science, University of Trento, 2011.

[135] Jared St Jean. Kinect Hacks: Tips & Tools for Motion and Pattern Detection.“O’Reilly Media, Inc.”, 2012.

[136] Enrique Ramos. Processing. In Arduino and Kinect Projects, pages 35–60. Springer,2012.

[137] Ryan Calo. Open robotics. Maryland Law Review, 70(3), 2011.

101

Page 114: Learning Algorithm Design for Human-Robot Skill Transfer

[138] Craig Partridge and Stephen Pink. A faster udp (user datagram protocol).IEEE/ACM Transactions on Networking, 1(4):429–440, 1993.

[139] Yuguang Fang, Kenneth A Loparo, and Xiangbo Feng. Inequalities for the traceof matrix product. IEEE Transactions on Automatic Control, 39(12):2489–2490,1994.

[140] Cheng S Chin and Keng M Yue. Application of an intelligent table-top vacuumrobot cleaner in mechatronics system design education. Journal of Robotics andMechatronics, 23(5):645, 2011.

[141] Sonia Chernova and Manuela Veloso. Teaching multi-robot coordination usingdemonstration of communication and state sharing. In Proceedings of the 7th in-ternational joint conference on Autonomous agents and multiagent systems-Volume3, pages 1183–1186. International Foundation for Autonomous Agents and Multia-gent Systems, 2008.

[142] Sylvain Calinon, Danilo Bruno, Milad S Malekzadeh, Thrishantha Nanayakkara,and Darwin G Caldwell. Human–robot skills transfer interfaces for a flexible surgi-cal robot. Computer methods and programs in biomedicine, 116(2):81–96, 2014.

[143] Cheng Chin and Keng Yue. Vertical stream curricula integration of problem-basedlearning using an autonomous vacuum robot in a mechatronics course. EuropeanJournal of Engineering Education, 36(5):485–504, 2011.

[144] Cheng Siong Chin, MW Shing Lau, Eicher Low, and GG Lee Seet. A robust con-troller design method and stability analysis of an underactuated underwater vehicle.International Journal of Applied Mathematics and Computer Science, 16:345–356,2006.

[145] Yonghong Huang, Kevin B Englehart, Bernard Hudgins, and Adrian DC Chan.A gaussian mixture model based classification scheme for myoelectric control ofpowered upper limb prostheses. IEEE Transactions on Biomedical Engineering,52(11):1801–1811, 2005.

[146] Ming-Hsuan Yang and Narendra Ahuja. Gaussian mixture model for human skincolor and its applications in image and video databases. In Storage and Retrievalfor Image and Video Databases VII, volume 3656, pages 458–467. InternationalSociety for Optics and Photonics, 1998.

[147] Petar Kormushev, Sylvain Calinon, and Darwin G Caldwell. Imitation learning ofpositional and force skills demonstrated via kinesthetic teaching and haptic input.Advanced Robotics, 25(5):581–603, 2011.

102

Page 115: Learning Algorithm Design for Human-Robot Skill Transfer

[148] Freek Stulp, Gennaro Raiola, Antoine Hoarau, Serena Ivaldi, and Olivier Sigaud.Learning compact parameterized skills with a single regression. In HumanoidRobots (Humanoids), 2013 13th IEEE-RAS International Conference on, pages417–422. IEEE, 2013.

[149] Micha Hersch, Florent Guenter, Sylvain Calinon, and Aude G Billard. Learningdynamical system modulation for constrained reaching tasks. In Humanoid Robots,2006 6th IEEE-RAS International Conference on, pages 444–449. IEEE, 2006.

[150] Abdullah Mueen and Eamonn Keogh. Extracting optimal performance from dy-namic time warping. In Proceedings of the 22nd ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, pages 2129–2130. ACM, 2016.

[151] Neville Hogan and Dagmar Sternad. Dynamic primitives of motor behavior. Bio-logical cybernetics, pages 1–13, 2012.

[152] Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. Speaker verificationusing adapted gaussian mixture models. Digital signal processing, 10(1-3):19–41,2000.

[153] Jianjun Yu, Yusen Men, Xiaogang Ruan, and Shaoqiong Zhao. The research andimplementation of behavior imitation system about nao robot based on kinect. CAAITransactions on Intelligent Systems, 2:006, 2016.

[154] Kenneth P Burnham and David R Anderson. Multimodel inference: understandingaic and bic in model selection. Sociological methods & research, 33(2):261–304,2004.

[155] Lingyun Hu, Changjiu Zhou, and Zengqi Sun. Estimating biped gait using spline-based probability distribution function with q-learning. IEEE Transactions on In-dustrial Electronics, 55(3):1444–1452, 2008.

[156] Satoshi Kurosawa, Hidehisa Nakayama, Nei Kato, Abbas Jamalipour, and YoshiakiNemoto. Detecting blackhole attack on aodv-based mobile ad hoc networks bydynamic learning method. IJ Network Security, 5(3):338–346, 2007.

[157] Simon Thompson, Takehiro Horiuchi, and Satoshi Kagami. A probabilistic modelof human motion and navigation intent for mobile robot path planning. In 20094th International Conference on Autonomous Robots and Agents, pages 663–668.IEEE, 2009.

[158] Rui Huang, Chenguang Yang, Fan Yang, and Zhijun Li. Robot hand learning frommultiple demonstrations using dynamics motor primitives. In Advanced Mechatron-ic Systems (ICAMechS), 2017 IEEE International Conference, pages 1–6, 2017.

103

Page 116: Learning Algorithm Design for Human-Robot Skill Transfer

[159] Andrej Gams, Auke J Ijspeert, Stefan Schaal, and Jadran Lenarcic. On-line learn-ing and modulation of periodic movements with nonlinear dynamical systems. Au-tonomous robots, 27(1):3–23, 2009.

[160] Manuel Muhlig, Michael Gienger, Sven Hellbach, Jochen J Steil, and Christian Go-erick. Task-level imitation learning using variance-based movement optimization.In Robotics and Automation, 2009. ICRA’09. IEEE International Conference, pages1177–1184, 2009.

[161] Tim Van Erven and Peter Harremos. Renyi divergence and kullback-leibler diver-gence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.

[162] Wei Wang, Baoju Zhang, Dan Wang, Yu Jiang, Shan Qin, and Lei Xue. Anomalydetection based on probability density function with kullback–leibler divergence.Signal Processing, 126:12–17, 2016.

[163] Michael S Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. Content-basedmultimedia information retrieval: State of the art and challenges. ACM Transactionson Multimedia Computing, Communications, and Applications (TOMM), 2(1):1–19, 2006.

[164] Shunyi Yao, Ying Wen, and Yue Lu. Hog based two-directional dynamic timewarping for handwritten word spotting. In Document Analysis and Recognition(ICDAR), 2015 13th IEEE International Conference, pages 161–165, 2015.

[165] Hector De la Torre Gutierrez and DT Pham. Estimation and generation of trainingpatterns for control chart pattern recognition. Computers & Industrial Engineering,95:72–82, 2016.

[166] Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. Empir-ical evaluation of gated recurrent neural networks on sequence modeling. In NIPS2014 Workshop on Deep Learning, December 2014, 2014.

[167] Yaodong Zhang, Kiarash Adl, and James Glass. Fast spoken query detection usinglower-bound dynamic time warping on graphical processing units. In Acoustic-s, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference,pages 5173–5176, 2012.

[168] Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal.Dynamical movement primitives: learning attractor models for motor behaviors.Neural computation, 25(2):328–373, 2013.

[169] Mingzhi Li, Zhiqiang Ma, Yong Shan, and Xiaoyan Zhang. Adaptive backgroundupdate based on gaussian mixture model under complex condition. Jisuanji Yingy-ong/ Journal of Computer Applications, 31(7):1831–1834, 2011.

104

Page 117: Learning Algorithm Design for Human-Robot Skill Transfer

[170] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. InAdvances in Neural Information Processing Systems, pages 4565–4573, 2016.

[171] Sylvain Calinon and Aude Billard. Statistical learning by imitation of competingconstraints in joint space and task space. Advanced Robotics, 23(15):2059–2076,2009.

105