Takeshi ShibuyaUniversity of Tsukuba
1
A fundamental study on representation of reward for reinforcement learning in dynamic environments+ an introduction of rescue simulation
OutlineReinforcement learning
A interactive learning framework in soft computing
a method to learn in dynamic environmentRoboCup Rescue: Overview
an application of soft computing
2
Reinforcement learning
(theoritical side)Learning in dynamic
environment(application side)
Rescue simulation
REINFORCEMENT LEARNING
Contents:・ Reinforcement Learning in psychology・ Learning in dynamic environments
3
Reinforcement Learning in psychology
If he finishes to push numbers orderly,he gets a peanut as reward. 4
Kyoto University
notable thingsin Reinforcement Learning
The learner acquires suitable behavior from the only reward.
The trainerDoes not have to tell the learner how to
behave step by step.
5
What is reinforcement learning(RL)?
6
The agent enhances values that bring rewards.
The agent selects the action whose value is highest.
Agent
rewardState
Action
Environment1 2
Valu
e
Actions
Research themelearning in dynamic environment:How to learn behavior when suitable
action is changed?
7
time
Action 1
Action 2
?Great reward
Dividing reward into two part:Time-dependent part: to be designed.Time-independent part: to be learnt
8
Research themelearning in dynamic environment:
Research themelearning in dynamic environment:
9
Probability of selecting EAST action increases.
Proposed method enables the agent to adapt the change of the environment
The probability of selecting action switches after the change of environment
ROBOCUP RESCUE
Contents:・ Overview of Robocup rescue・ demonstration
10
Leagues in RoboCup
Soccer Robot leagues Simulation leagues
Rescue Robot leagues Simulation leagues
2D 3D
Ultimate goal of the RoboCup:By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, comply with the official rule of the FIFA, against the winner of the most recent World Cup. (from official site)
11
RoboCup Rescue The purpose: (1) to develop simulators that form the infrastructure of the
simulation system and emulate realistic phenomena predominant in disasters. (2) to develop intelligent agents and robots that are given the capabilities of the main actors in a disaster response scenario.(from official site)
Virtual Robots simulation(Powered by USARSim)
Agent simulation12
RoboCup Rescue: The agent simulation
Buildings: Fire, Collapse
Roads : Traffic movement Blocked roads due to rubble etc
Emergency services: Fire brigades Ambulance teams Police forces
13
Agent’s observation and Action
14
Demonstration/ movie
15
RoboCup Rescue + RL (Team MRL)Reinforcement learning is employed for
controlling agent.The details are not shown in the paper.
Team MRL is the champion of RoboCup 2007. (total: 8 teams)
Omid Aghazadeh+, Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation , RoboCup 2007: Robot Soccer World Cup XI Lecture Notes in Computer Science, 2008, Volume 5001/2008, 409-416, DOI: 10.1007/978-3-540-68847-1_42
16
SummaryFollowing topics are overviewed:
Reinforcement learningThe framework and some research theme
RoboCup RescueAims in some leagues and demonstrations
17
18
19未知の一定量 既知の変化量
学習の対象
Reinforcement LearningAs an engineering approach
Agent(learner)
reward
State
Action
Environment
20
Deviding reward into two part:Time-dependent part: to be designed.Time-independent part: to be learnt
21
22
Research Theme 1: learning in Partially observable environment:
Torque
If agent can observe four states(angle and angular velocity of each joint ), the agent can control it.
If the agent can not use velocity information,the agent can not determine the direction to be torqued.Angular velocity
Complex-valued reinforcement learning enables the agent to overcome the problem by using context of behavior.
23
100 %
50 %
-50 %
-100 %
Swing up
1
2
1
2
Research Theme 1: learning in Partially observable environment:
reward function
24
Top Related