Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic
description
Transcript of Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Dr Changjiu ZhouDr Changjiu ZhouSchool of Electrical & Electronic EngineeringSchool of Electrical & Electronic Engineering
Singapore PolytechnicSingapore [email protected]@sp.edu.sg
www.robo-erectus.org
Learning and Control of Learning and Control of Biped LocomotionBiped Locomotion
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Introduction
Biped Walking Cycles
How to Control Biped Locomotion
How to Plan/Learn Biped Gaits
Biped learning by reinforcement
Some Research Topics
Outline
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Single Support
Single Support
Double Support
Time
Biped Gait (Frontal View)
Biped Gait (Frontal Plane)
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Ri ght:Left:
t=kTc kTc+Td (k+1)Tc (k+1)Tc+Td
Doubl e-supportPhase
Si ngl e-supportPhase
Ri ght- l eg Swi ng Phase
Lef t- l eg Swi ng Phase Lef t- l eg Stance Phase Lef t- l eg Swi ng Phase
Doubl e-supportPhase
kTc+Tm
Biped Gait (Sagittal Plane)
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Right
Support
Left
Support
Left-to-Right
Transition
Right-to-Left
Transition
Swing time completed
Left foot touches down
Right foot touches down
Swing time completed
Finite State Machine for Biped Walking Control
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
In static walking, the biped has to move very slowly so that the dynamics can be ignored.
The biped’s projected center of gravity (PCOG) must be within the supporting area.
Single Support Double Support
Static Walking
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
In dynamic walking, the motion is fast and hence the dynamics cannot be negligible.
In dynamic walking, we should look at the zero moment point (ZMP) rather than PCOG.
The stability margin of dynamic walking is much harder to quantify.
t
t
f
i
dttPtPMinimize dzmpzmp
2)()(
Dynamic Walking
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Unpowered DOF between the foot and ground
This constraint limits the trajectory tracking approaches used commonly in manipulators research.
Why is Biped Robotics Hard?
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Inverse kinematics model
Feet position and ZMP (PCOG)
Desired joint angles
Biped Robot
Biped Control: Model-based
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Except for certain massless leg models, most biped models are nonlinear and do not have analytical solutions.
Massless leg model is the simplest model. The body of the robot is usually assumed to be point mass and can be viewed to be an inverted pendulum.
When the leg inertia and other dynamics like that of the actuator, joint friction, etc. are included, the overall dynamic equations can be very nonlinear and complex.
Biped Control: Model-based
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Example: Massless leg model• The simplest biped model• Some assumptions, e.g.,
• From D’Alembert’s principle
ZMP c
ZMP c
xzX X
z g
yzY Y
z g
0z
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Since none of the humanoid robots match biological humanoids in terms of mobility, adaptability, and stability, many researchers try to examine biological bipeds so as to extract certain algorithms that are applicable to the robots.
Reverse Engineering
Biped Control: Biologically Inspired
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
1. Central Pattern Generators (CPG)
2. Passive Walking
Two Main Research Areas
Biped Control: Biologically Inspired
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
ZMP-based Gait Planning
• Plan the hip and ankle trajectories according to walking constraints and ground constraints.
• Derive all joint trajectories by inverse kinematics.
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Example: Gait Planning for Walking on Slope
Y
Z
X
Lao
Ds
L
Qs
Qf
Qb
Hao
Lan
Lab Laf
Lthigh
Lshank
Lhip
t=k*Tc+ Td Tm Tc
TdTcktQ
TcktQQ
TdTcktQQ
TcktQ
t
s
fs
bs
s
a
*)1(,
*)1(,
*,
*,
)(
0)*)1((
0)*(
dca
ca
TTk
Tk
0)*)1((
0)*(
dca
ca
TTkx
Tkx
0)*)1((
0)*(
dca
ca
TTkz
Tkz
- Plan gait using 3rd order Spine which guarantees the continuity of both 1st derivative and 2nd derivative.
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Example: Planning Results
0
10
20
30
40
50
60
70
80
0.06 0.66 1.26 1.86 2.46 3.06 3.66 4.26 4.86 5.46
Time (s)
Jo
int
An
gle
s (
de
g)
Hip joint angle
Knee joint angle
Ankle joint angle
Consecutive walking gait along slope
Joint angles
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
IP-based Gait Planning• The dynamic equation of the IP
model
L
v
2wf
sing
L
• If the angle is small, it can be simplify as a linear homogeneous 2nd order differential equation
g
L
wtwt eCeCt 21)( gw h
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
3D Linear Pendulum Model
m
Y
X
Z
pr
O
10
0r
Tp
x
m y J
z f mg
0
0
/ /
p p
r r
r r p p
rC Sp
J rC Sq
rC S D rC S D D
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Example: IP-based Gait Planning
-25
-20
-15
-10
-5
0
5
10
15
20
25
0 0.48 0.96 1.44 1.92 2.4 2.88 3.36 3.84 4.32 4.8 5.28 5.76
Time (s)
Jo
int
an
gle
s (
de
g)
Leftankle Lefthip Rightankle Righthip
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Biped Kicking
Kicking constraints:
– Kicking range– Friction– …
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Kicking Pattern
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-80
-60
-40
-20
0
20
40
60
80
t (s)
angl
e (d
eg)
HipKneeAnkle
I ni t i al i ze Angl e Swi ng Backward ki ck
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
A humanoid robot aims to select a good value for the swing leg parameters for each consecutive step so that it achieves stable walking.
A reward function that correctly defines this objective is critical for the reinforcement learning.
Unstable
r = -1 (punishment)
Supporting foot
Stable
r = 0 (reward)
Biped Learning by Reinforcement (1)
R o b o E r ec t
u s
www.robo-erectus.org
• The control objective of the gait synthesizing for biped dynamic balance can be described as
• To evaluate biped dynamic balance in the frontal plane, a penalty signal should be given if the biped robot falls down in the frontal plane
SyxP zmpzmpzmp 0,,
otherwise
vyandSyifr yuzmpyzmpy
1
0
Biped Learning by Reinforcement (2)
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Supporting foot
Excellent
Good
OK
Bad
Very Bad
Reinforcement Learning with Fuzzy Evaluative Feedback
Biped Learning by Reinforcement (3)
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
AEN
ASN
SAM
External RL signal
r
RL Agent
X State
variables F Output Action
r̂
F
• Both the AEN and ASN are initialized randomly.• Learning starts from scratch.• It needs a large number of trials for learning.
� AEN - the action-state evaluation network
� ASN - the action selection network
� SAM - the stochastic action modifier
The RL Agent
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
A E N
A S N
S A M
E x t e r n a l R L s i g n a l
r
F R L A g e n t
X S t a t e
v a r i a b l e s F O u t p u t
A c t i o n
r ˆ
F
E x p e r t k n o w l e d g e f o r t h e a c t i o n s e l e c t i o n
NBPM isFTHENisXIF
E x p e r t k n o w l e d g e f o r t h e a c t i o n - s t a t e e v a l u a t i o n
NSPB isTHENisXIF
… …
• Neural fuzzy networks are used to replace the neuron-like adaptive elements.• The expert knowledge can be directly built into the FRL agent as a starting configuration.• The ASN and/or AEN could house available expert knowledge to speed up its learning.
The FRL Agent
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
AEN
ASN
SAM
External RL signal
FRL Agent (Fuzzy Evaluative Feedback)
X State
variables F Output Action
r̂
F
Fuzzification Fuzzy
Inference Defuzzification
Evaluation Rule Base
• The numerical evaluative feedback is not the biological plausible.• The fuzzy evaluative feedback is much closer to the learning environment in the real world. • The fuzzy evaluative feedback is based on a form of continuous evaluation.
r
+1
0
-1
Success Viable Failure
fuzzy numerical
State
The FRL Agent with Fuzzy Evaluative Feedback
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Types Action Network
(ASN)
Critic Network
(AEN)
Evaluative Feedback
RL agent neural neural numerical
FRL agent(Type A)
neuro-fuzzy neural numerical
FRL agent(Type B)
neuro-fuzzy neuro-fuzzy numerical
FRL agent(Type C)
neuro-fuzzy neuro-fuzzy Fuzzy
Comparison of FRL Agents
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Information Available for Biped Gait Synthesizing
The Description of the Information Case A No expert knowledge is available. Only
numerical reinforcement signal is used to train the gait synthesizer.
Case B Only the intuitive biped balancing knowledge is used as the initial configuration of the gait synthesizer.
Case C Both the intuitive biped balancing knowledge and walking evaluation knowledge are utilized.
Case D Besides all the information used in case C, the fuzzy evaluative feedback, rather than numerical evaluative feedback, is included.
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
The Gait Synthesizer Using Two Independent FRL Agents
FRL Agent-y
FRL Agent-x
dy
dx
Biped States
Fuzzy Evaluative Feedback Unit (Frontal plane)
zmpy
Fuzzy Evaluative Feedback Unit (Sagittal plane)
Evaluation Rules (Frontal plane)
Intuitive Balancing Rules (Frontal plane)
Intuitive Balancing Rules (Sagittal plane)
Evaluation Rules (Sagittal plane)
yr
xr
zmpy
zmpx
zmpx
( ) ( ) ( )new oldi i it t t
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
0 0.5 1 1.5 2 2.5 30.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
t(s)
Ang
le(r
ad)
Before LearningAfter Learning
0 0.5 1 1.5 2 2.5 30.6
0.8
1
1.2
1.4
1.6
1.8
2
t(s)
Ang
le(r
ad)
Before LearningAfter Learning
Ankle joint Knee joint
Before and After Learning
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
The ZMP trajectory after FRL (type C)
-60 -40 -20 0 20 40 60-20
-10
0
10
20
30
40
X-Axis(cm)
Y-A
xis(
cm)
The ZMP trajectory with FRL
Prescribed ZMP Tracking result using FRL (type C)
Moving Direction
Right Foot
Left Foot Left Foot
Results (1)
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Results (2)
Walk (Backward)
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Some Research Topics• Online gait generating• Online footprint planning• Constraints
– ZMP constraint for stable walking– Friction constraint for stable walking– …
• Current Challenges– Knee bending – Body shifting– …
• …
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
References
• C. Zhou, “Robot learning with GA-based fuzzy reinforcement learning agents,” Information Sciences 145 (2002) 45-68.
• C. Zhou, “Fuzzy-arithmetic-based Lyapunov synthesis to the design of stable fuzzy controllers: a computing with words approach,” Int. J. Applied Mathematics and Computer Science 12(3) (2002) 101-111.
• C. Zhou and Q. Meng, “Dynamic balance of a biped robot using fuzzy reinforcement learning agents,” Fuzzy Sets and Systems 134(1) (2003) 169-187.
• C. Zhou, P.K. Yue, Z. Tang and Z. Sun, “Development of Robo-Erectus: A soccer-playing humanoid robot,” Proc. IEEE-RAS Intl. Conf. on Humanoid Robots, CD-ROM, 2003.
• Z. Tang, C. Zhou and Z. Sun, “Gait synthesizing for humanoid penalty kicking,” Dynamics of Continuous, Discrete and Impulsive Systems, Series B, (2003) 472-477.
• D. Maravall, C. Zhou and J. Alonso, “Hybrid fuzzy control of inverted pendulum via vertical forces,” Int. J. of Intelligent Systems, 2004 (in press).
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Acknowledgements
• Staff Member
P.K. Yue, F.S. Choy, Nazeer Ahmed
M.F. Ercan, Mike Wong, H. Li• Research Associate
Z. Tang (Tsinghua U.), J. Ni (Shanghai Jiao Tong U.)• Technical Support Officer
H.M. Tan, W. Ye• Students
P.P. Khing, H. W. Yin, H.F. Lu, H.X. Tan, J.X. Teo,
Stephen Quah, H.M. Tan, Y.T. Tan
Development of Humanoid Soccer Robots
R o b o E r ec t
u s
www.robo-erectus.org
Thanks!
Dr Changjiu ZhouDr Changjiu ZhouSchool of Electrical and Electronic EngineeringSchool of Electrical and Electronic Engineering
Singapore PolytechnicSingapore [email protected]@sp.edu.sg
www.robo-erectus.orgwww.robo-erectus.org