Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

37
Development of Humanoid Soccer Robots R o b o E re c t u s www.robo-erectus.org Dr Changjiu Zhou Dr Changjiu Zhou School of Electrical & Electronic School of Electrical & Electronic Engineering Engineering Singapore Polytechnic Singapore Polytechnic [email protected] [email protected] www.robo-erectus.org Learning and Control of Learning and Control of Biped Locomotion Biped Locomotion

description

Learning and Control of Biped Locomotion. Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic [email protected] www.robo-erectus.org. Outline. Introduction Biped Walking Cycles How to Control Biped Locomotion How to Plan/Learn Biped Gaits - PowerPoint PPT Presentation

Transcript of Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Page 1: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Dr Changjiu ZhouDr Changjiu ZhouSchool of Electrical & Electronic EngineeringSchool of Electrical & Electronic Engineering

Singapore PolytechnicSingapore [email protected]@sp.edu.sg

www.robo-erectus.org

Learning and Control of Learning and Control of Biped LocomotionBiped Locomotion

Page 2: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Introduction

Biped Walking Cycles

How to Control Biped Locomotion

How to Plan/Learn Biped Gaits

Biped learning by reinforcement

Some Research Topics

Outline

Page 3: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Single Support

Single Support

Double Support

Time

Biped Gait (Frontal View)

Biped Gait (Frontal Plane)

Page 4: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Ri ght:Left:

t=kTc kTc+Td (k+1)Tc (k+1)Tc+Td

Doubl e-supportPhase

Si ngl e-supportPhase

Ri ght- l eg Swi ng Phase

Lef t- l eg Swi ng Phase Lef t- l eg Stance Phase Lef t- l eg Swi ng Phase

Doubl e-supportPhase

kTc+Tm

Biped Gait (Sagittal Plane)

Page 5: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Right

Support

Left

Support

Left-to-Right

Transition

Right-to-Left

Transition

Swing time completed

Left foot touches down

Right foot touches down

Swing time completed

Finite State Machine for Biped Walking Control

Page 6: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

In static walking, the biped has to move very slowly so that the dynamics can be ignored.

The biped’s projected center of gravity (PCOG) must be within the supporting area.

Single Support Double Support

Static Walking

Page 7: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

In dynamic walking, the motion is fast and hence the dynamics cannot be negligible.

In dynamic walking, we should look at the zero moment point (ZMP) rather than PCOG.

The stability margin of dynamic walking is much harder to quantify.

t

t

f

i

dttPtPMinimize dzmpzmp

2)()(

Dynamic Walking

Page 8: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Unpowered DOF between the foot and ground

This constraint limits the trajectory tracking approaches used commonly in manipulators research.

Why is Biped Robotics Hard?

Page 9: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Inverse kinematics model

Feet position and ZMP (PCOG)

Desired joint angles

Biped Robot

Biped Control: Model-based

Page 10: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Except for certain massless leg models, most biped models are nonlinear and do not have analytical solutions.

Massless leg model is the simplest model. The body of the robot is usually assumed to be point mass and can be viewed to be an inverted pendulum.

When the leg inertia and other dynamics like that of the actuator, joint friction, etc. are included, the overall dynamic equations can be very nonlinear and complex.

Biped Control: Model-based

Page 11: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Example: Massless leg model• The simplest biped model• Some assumptions, e.g.,

• From D’Alembert’s principle

ZMP c

ZMP c

xzX X

z g

yzY Y

z g

0z

Page 12: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Since none of the humanoid robots match biological humanoids in terms of mobility, adaptability, and stability, many researchers try to examine biological bipeds so as to extract certain algorithms that are applicable to the robots.

Reverse Engineering

Biped Control: Biologically Inspired

Page 13: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

1. Central Pattern Generators (CPG)

2. Passive Walking

Two Main Research Areas

Biped Control: Biologically Inspired

Page 14: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

ZMP-based Gait Planning

• Plan the hip and ankle trajectories according to walking constraints and ground constraints.

• Derive all joint trajectories by inverse kinematics.

Page 15: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Example: Gait Planning for Walking on Slope

Y

Z

X

Lao

Ds

L

Qs

Qf

Qb

Hao

Lan

Lab Laf

Lthigh

Lshank

Lhip

t=k*Tc+ Td Tm Tc

TdTcktQ

TcktQQ

TdTcktQQ

TcktQ

t

s

fs

bs

s

a

*)1(,

*)1(,

*,

*,

)(

0)*)1((

0)*(

dca

ca

TTk

Tk

0)*)1((

0)*(

dca

ca

TTkx

Tkx

0)*)1((

0)*(

dca

ca

TTkz

Tkz

- Plan gait using 3rd order Spine which guarantees the continuity of both 1st derivative and 2nd derivative.

Page 16: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Example: Planning Results

0

10

20

30

40

50

60

70

80

0.06 0.66 1.26 1.86 2.46 3.06 3.66 4.26 4.86 5.46

Time (s)

Jo

int

An

gle

s (

de

g)

Hip joint angle

Knee joint angle

Ankle joint angle

Consecutive walking gait along slope

Joint angles

Page 17: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

IP-based Gait Planning• The dynamic equation of the IP

model

L

v

2wf

sing

L

• If the angle is small, it can be simplify as a linear homogeneous 2nd order differential equation

g

L

wtwt eCeCt 21)( gw h

Page 18: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

3D Linear Pendulum Model

m

Y

X

Z

pr

O

10

0r

Tp

x

m y J

z f mg

0

0

/ /

p p

r r

r r p p

rC Sp

J rC Sq

rC S D rC S D D

Page 19: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Example: IP-based Gait Planning

-25

-20

-15

-10

-5

0

5

10

15

20

25

0 0.48 0.96 1.44 1.92 2.4 2.88 3.36 3.84 4.32 4.8 5.28 5.76

Time (s)

Jo

int

an

gle

s (

de

g)

Leftankle Lefthip Rightankle Righthip

Page 20: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Biped Kicking

Kicking constraints:

– Kicking range– Friction– …

Page 21: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Kicking Pattern

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-80

-60

-40

-20

0

20

40

60

80

t (s)

angl

e (d

eg)

HipKneeAnkle

I ni t i al i ze Angl e Swi ng Backward ki ck

Page 22: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

A humanoid robot aims to select a good value for the swing leg parameters for each consecutive step so that it achieves stable walking.

A reward function that correctly defines this objective is critical for the reinforcement learning.

Unstable

r = -1 (punishment)

Supporting foot

Stable

r = 0 (reward)

Biped Learning by Reinforcement (1)

Page 23: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

R o b o E r ec t

u s

www.robo-erectus.org

• The control objective of the gait synthesizing for biped dynamic balance can be described as

• To evaluate biped dynamic balance in the frontal plane, a penalty signal should be given if the biped robot falls down in the frontal plane

SyxP zmpzmpzmp 0,,

otherwise

vyandSyifr yuzmpyzmpy

1

0

Biped Learning by Reinforcement (2)

Page 24: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Supporting foot

Excellent

Good

OK

Bad

Very Bad

Reinforcement Learning with Fuzzy Evaluative Feedback

Biped Learning by Reinforcement (3)

Page 25: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

AEN

ASN

SAM

External RL signal

r

RL Agent

X State

variables F Output Action

F

• Both the AEN and ASN are initialized randomly.• Learning starts from scratch.• It needs a large number of trials for learning.

� AEN - the action-state evaluation network

� ASN - the action selection network

� SAM - the stochastic action modifier

The RL Agent

Page 26: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

A E N

A S N

S A M

E x t e r n a l R L s i g n a l

r

F R L A g e n t

X S t a t e

v a r i a b l e s F O u t p u t

A c t i o n

r ˆ

F

E x p e r t k n o w l e d g e f o r t h e a c t i o n s e l e c t i o n

NBPM isFTHENisXIF

E x p e r t k n o w l e d g e f o r t h e a c t i o n - s t a t e e v a l u a t i o n

NSPB isTHENisXIF

… …

• Neural fuzzy networks are used to replace the neuron-like adaptive elements.• The expert knowledge can be directly built into the FRL agent as a starting configuration.• The ASN and/or AEN could house available expert knowledge to speed up its learning.

The FRL Agent

Page 27: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

AEN

ASN

SAM

External RL signal

FRL Agent (Fuzzy Evaluative Feedback)

X State

variables F Output Action

F

Fuzzification Fuzzy

Inference Defuzzification

Evaluation Rule Base

• The numerical evaluative feedback is not the biological plausible.• The fuzzy evaluative feedback is much closer to the learning environment in the real world. • The fuzzy evaluative feedback is based on a form of continuous evaluation.

r

+1

0

-1

Success Viable Failure

fuzzy numerical

State

The FRL Agent with Fuzzy Evaluative Feedback

Page 28: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Types Action Network

(ASN)

Critic Network

(AEN)

Evaluative Feedback

RL agent neural neural numerical

FRL agent(Type A)

neuro-fuzzy neural numerical

FRL agent(Type B)

neuro-fuzzy neuro-fuzzy numerical

FRL agent(Type C)

neuro-fuzzy neuro-fuzzy Fuzzy

Comparison of FRL Agents

Page 29: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Information Available for Biped Gait Synthesizing

The Description of the Information Case A No expert knowledge is available. Only

numerical reinforcement signal is used to train the gait synthesizer.

Case B Only the intuitive biped balancing knowledge is used as the initial configuration of the gait synthesizer.

Case C Both the intuitive biped balancing knowledge and walking evaluation knowledge are utilized.

Case D Besides all the information used in case C, the fuzzy evaluative feedback, rather than numerical evaluative feedback, is included.

Page 30: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

The Gait Synthesizer Using Two Independent FRL Agents

FRL Agent-y

FRL Agent-x

dy

dx

Biped States

Fuzzy Evaluative Feedback Unit (Frontal plane)

zmpy

Fuzzy Evaluative Feedback Unit (Sagittal plane)

Evaluation Rules (Frontal plane)

Intuitive Balancing Rules (Frontal plane)

Intuitive Balancing Rules (Sagittal plane)

Evaluation Rules (Sagittal plane)

yr

xr

zmpy

zmpx

zmpx

( ) ( ) ( )new oldi i it t t

Page 31: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

0 0.5 1 1.5 2 2.5 30.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

t(s)

Ang

le(r

ad)

Before LearningAfter Learning

0 0.5 1 1.5 2 2.5 30.6

0.8

1

1.2

1.4

1.6

1.8

2

t(s)

Ang

le(r

ad)

Before LearningAfter Learning

Ankle joint Knee joint

Before and After Learning

Page 32: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

The ZMP trajectory after FRL (type C)

-60 -40 -20 0 20 40 60-20

-10

0

10

20

30

40

X-Axis(cm)

Y-A

xis(

cm)

The ZMP trajectory with FRL

Prescribed ZMP Tracking result using FRL (type C)

Moving Direction

Right Foot

Left Foot Left Foot

Results (1)

Page 33: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Results (2)

Walk (Backward)

Page 34: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Some Research Topics• Online gait generating• Online footprint planning• Constraints

– ZMP constraint for stable walking– Friction constraint for stable walking– …

• Current Challenges– Knee bending – Body shifting– …

• …

Page 35: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

References

• C. Zhou, “Robot learning with GA-based fuzzy reinforcement learning agents,” Information Sciences 145 (2002) 45-68.

• C. Zhou, “Fuzzy-arithmetic-based Lyapunov synthesis to the design of stable fuzzy controllers: a computing with words approach,” Int. J. Applied Mathematics and Computer Science 12(3) (2002) 101-111.

• C. Zhou and Q. Meng, “Dynamic balance of a biped robot using fuzzy reinforcement learning agents,” Fuzzy Sets and Systems 134(1) (2003) 169-187.

• C. Zhou, P.K. Yue, Z. Tang and Z. Sun, “Development of Robo-Erectus: A soccer-playing humanoid robot,” Proc. IEEE-RAS Intl. Conf. on Humanoid Robots, CD-ROM, 2003.

• Z. Tang, C. Zhou and Z. Sun, “Gait synthesizing for humanoid penalty kicking,”  Dynamics of Continuous, Discrete and Impulsive Systems, Series B, (2003) 472-477.

• D. Maravall, C. Zhou and J. Alonso, “Hybrid fuzzy control of inverted pendulum via vertical forces,” Int. J. of Intelligent Systems, 2004 (in press).

Page 36: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Acknowledgements

• Staff Member

P.K. Yue, F.S. Choy, Nazeer Ahmed

M.F. Ercan, Mike Wong, H. Li• Research Associate

Z. Tang (Tsinghua U.), J. Ni (Shanghai Jiao Tong U.)• Technical Support Officer

H.M. Tan, W. Ye• Students

P.P. Khing, H. W. Yin, H.F. Lu, H.X. Tan, J.X. Teo,

Stephen Quah, H.M. Tan, Y.T. Tan

Page 37: Dr Changjiu Zhou School of Electrical & Electronic Engineering Singapore Polytechnic

Development of Humanoid Soccer Robots

R o b o E r ec t

u s

www.robo-erectus.org

Thanks!

Dr Changjiu ZhouDr Changjiu ZhouSchool of Electrical and Electronic EngineeringSchool of Electrical and Electronic Engineering

Singapore PolytechnicSingapore [email protected]@sp.edu.sg

www.robo-erectus.orgwww.robo-erectus.org