Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Takeshi ShibuyaUniversity of Tsukuba

shibuya@iit.tsukuba.ac.jp

A fundamental study on representation of reward for reinforcement learning in dynamic environments+ an introduction of rescue simulation

OutlineReinforcement learning

A interactive learning framework in soft computing

a method to learn in dynamic environmentRoboCup Rescue: Overview

an application of soft computing

Reinforcement learning

(theoritical side)Learning in dynamic

environment(application side)

Rescue simulation

REINFORCEMENT LEARNING

Contents:・ Reinforcement Learning in psychology・ Learning in dynamic environments

Reinforcement Learning in psychology

If he finishes to push numbers orderly,he gets a peanut as reward. 4

Kyoto University

notable thingsin Reinforcement Learning

The learner acquires suitable behavior from the only reward.

The trainerDoes not have to tell the learner how to

behave step by step.

What is reinforcement learning(RL)?

The agent enhances values that bring rewards.

The agent selects the action whose value is highest.

rewardState

Action

Environment1 2

Actions

Research themelearning in dynamic environment:How to learn behavior when suitable

action is changed?

Action 1

Action 2

？Great reward

Dividing reward into two part:Time-dependent part: to be designed.Time-independent part: to be learnt

Research themelearning in dynamic environment:

Probability of selecting EAST action increases.

Proposed method enables the agent to adapt the change of the environment

The probability of selecting action switches after the change of environment

ROBOCUP RESCUE

Contents:・ Overview of Robocup rescue・ demonstration

Leagues in RoboCup

Soccer Robot leagues Simulation leagues

Rescue Robot leagues Simulation leagues

Ultimate goal of the RoboCup:By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup. (from official site)

RoboCup Rescue The purpose: (1) to develop simulators that form the infrastructure of the

simulation system and emulate realistic phenomena predominant in disasters. (2) to develop intelligent agents and robots that are given the capabilities of the main actors in a disaster response scenario.(from official site)

Virtual Robots simulation(Powered by USARSim)

Agent simulation12

RoboCup Rescue: The agent simulation

Buildings: Fire, Collapse

Roads : Traffic movement Blocked roads due to rubble etc

Emergency services: Fire brigades Ambulance teams Police forces

Agent’s observation and Action

Demonstration/ movie

RoboCup Rescue + RL (Team MRL)Reinforcement learning is employed for

controlling agent.The details are not shown in the paper.

Team MRL is the champion of RoboCup 2007. (total: 8 teams)

Omid Aghazadeh+, Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation , RoboCup 2007: Robot Soccer World Cup XI Lecture Notes in Computer Science, 2008, Volume 5001/2008, 409-416, DOI: 10.1007/978-3-540-68847-1_42

SummaryFollowing topics are overviewed:

Reinforcement learningThe framework and some research theme

RoboCup RescueAims in some leagues and demonstrations

19未知の一定量既知の変化量

学習の対象

Reinforcement LearningAs an engineering approach

Agent(learner)

reward

Action

Environment

Deviding reward into two part:Time-dependent part: to be designed.Time-independent part: to be learnt

Research Theme 1: learning in Partially observable environment:

Torque

If agent can observe four states(angle and angular velocity of each joint ), the agent can control it.

If the agent can not use velocity information,the agent can not determine the direction to be torqued.Angular velocity

Complex-valued reinforcement learning enables the agent to overcome the problem by using context of behavior.

100 ％

50 ％

-50 ％

-100 ％

Swing up

Research Theme 1: learning in Partially observable environment:

reward function

Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Documents

Transcript of Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

Nike Demo at Shibuya Crossing

Takeshi Amemiya Advanced Econometrics

Semi-Annual Report · Abema Towers Shibuya Fukuras Shibuya Scramble Square Shibuya Hikarie Shibuya Stream View of Shibuya area from Ebisu Prime Square Central Tower Cash Distributions

Kleber Takeshi Kojima

Takeshi Akiyoshi

(1), (2), Shin-ya (2) and Takeshi (2)upal/assets/34.pdf · 72570, Mexico (2) National Institute of Materials and Chemical Research (NIMC), AIST, MITI, 1-1 Higashi, Tsukuba, Ibaraki

Kitano Takeshi (excerpt)

Japanese Lesson Study in - leseprobe.buch.de · Naruto University of Education, Japan Takeshi Miyakawa University of Tsukuba, Japan With support of Shizumi Shimizui Takuya Baba Kazuyoshi

Computational Thinking - Tsukuba

20110906 shibuya ni_vol3

ANNUAL REPORT - Tsukuba

'Yamanote Mixtape' Shibuya-Kei, 2006

SHIBUYA maþ Shibuya city Disaster prevention Portal a a a ...

Shibuya City Partnership Certificate Guide to the ...

B-2 SHIBUYA MAP...SHIBUYA MAP VER 5.0 Adores Shibuya・・・・・ Apple Store・・・・・ ATM ・・・・・ BIC Camera (Electronic devices)・・・・・・・

Spotlight A gravitational shift to Shibuya November 2017 · 2017-11-21 · savills.com.jp/research 02 Spotlight | A gravitational shift to Shibuya November 2017 “Shibuya is undergoing

Methodology Results - Tsukuba

JICA Tsukuba Guide€¦ · JICA Tsukuba Guide 2019 . INDEX I. Outline of JICA Tsukuba II ... Welcome to JICA Tsukuba Center! JICA Tsukuba, which is located in Tsukuba Science City,

An Outline of Shibuya s Urban History

Takeshi Yamakawa - GBV