Download - Takeshi Shibuya University of Tsukuba [email protected]

Transcript
Page 1: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Takeshi ShibuyaUniversity of Tsukuba

[email protected]

1

A fundamental study on representation of reward for reinforcement learning in dynamic environments+ an introduction of rescue simulation

Page 2: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

OutlineReinforcement learning

A interactive learning framework in soft computing

a method to learn in dynamic environmentRoboCup Rescue: Overview

an application of soft computing

2

Reinforcement learning

(theoritical side)Learning in dynamic

environment(application side)

Rescue simulation

Page 3: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

REINFORCEMENT LEARNING

Contents:・ Reinforcement Learning in psychology・ Learning in dynamic environments

3

Page 4: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Reinforcement Learning in psychology

If he finishes to push numbers orderly,he gets a peanut as reward. 4

Kyoto University

Page 5: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

notable thingsin Reinforcement Learning

The learner acquires suitable behavior from the only reward.

The trainerDoes not have to tell the learner how to

behave step by step.

5

Page 6: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

What is reinforcement learning(RL)?

6

The agent enhances values that bring rewards.

The agent selects the action whose value is highest.

Agent

rewardState

Action

Environment1 2

Valu

e

Actions

Page 7: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Research themelearning in dynamic environment:How to learn behavior when suitable

action is changed?

7

time

Action 1

Action 2

?Great reward

Page 8: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Dividing reward into two part:Time-dependent part: to be designed.Time-independent part: to be learnt

8

Research themelearning in dynamic environment:

Page 9: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Research themelearning in dynamic environment:

9

Probability of selecting EAST action increases.

Proposed method enables the agent to adapt the change of the environment

The probability of selecting action switches after the change of environment

Page 10: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

ROBOCUP RESCUE

Contents:・ Overview of Robocup rescue・ demonstration

10

Page 11: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Leagues in RoboCup

Soccer Robot leagues Simulation leagues

Rescue Robot leagues Simulation leagues

2D 3D

Ultimate goal of the RoboCup:By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup. (from official site)

11

Page 12: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

RoboCup Rescue The purpose: (1) to develop simulators that form the infrastructure of the

simulation system and emulate realistic phenomena predominant in disasters. (2) to develop intelligent agents and robots that are given the capabilities of the main actors in a disaster response scenario.(from official site)

Virtual Robots simulation(Powered by USARSim)

Agent simulation12

Page 13: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

RoboCup Rescue: The agent simulation

Buildings: Fire, Collapse

Roads : Traffic movement Blocked roads due to rubble etc

Emergency services: Fire brigades Ambulance teams Police forces

13

Page 14: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Agent’s observation and Action

14

Page 15: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Demonstration/ movie

15

Page 16: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

RoboCup Rescue + RL (Team MRL)Reinforcement learning is employed for

controlling agent.The details are not shown in the paper.

Team MRL is the champion of RoboCup 2007. (total: 8 teams)

Omid Aghazadeh+, Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation , RoboCup 2007: Robot Soccer World Cup XI Lecture Notes in Computer Science, 2008, Volume 5001/2008, 409-416, DOI: 10.1007/978-3-540-68847-1_42

16

Page 17: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

SummaryFollowing topics are overviewed:

Reinforcement learningThe framework and some research theme

RoboCup RescueAims in some leagues and demonstrations

17

Page 18: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

18

Page 19: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

19未知の一定量 既知の変化量

学習の対象

Page 20: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Reinforcement LearningAs an engineering approach

Agent(learner)

reward

State

Action

Environment

20

Page 21: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Deviding reward into two part:Time-dependent part: to be designed.Time-independent part: to be learnt

21

Page 22: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

22

Research Theme 1: learning in Partially observable environment:

Torque

If agent can observe four states(angle and angular velocity of each joint ), the agent can control it.

If the agent can not use velocity information,the agent can not determine the direction to be torqued.Angular velocity

Page 23: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Complex-valued reinforcement learning enables the agent to overcome the problem by using context of behavior.

23

100 %

50 %

-50 %

-100 %

Swing up

1

2

1

2

Research Theme 1: learning in Partially observable environment:

Page 24: Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

reward function

24