Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November...
-
Upload
wesley-taylor -
Category
Documents
-
view
215 -
download
1
Transcript of Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November...
![Page 1: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/1.jpg)
Design and Implementation of General Purpose
Reinforcement Learning Agents
Tyler Streeter
November 17, 2005
![Page 2: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/2.jpg)
Motivation
Intelligent agents are becoming increasingly important.
![Page 3: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/3.jpg)
Motivation
• Most intelligent agents today are designed for very specific tasks.• Ideally, agents could be reused for many tasks.• Goal: provide a general purpose agent implementation.• Reinforcement learning provides practical algorithms.
![Page 4: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/4.jpg)
Initial Work
Humanoid motor control through neuroevolution (i.e. controlling physically simulated characters with neural networks optimized with genetic algorithms) (videos)
![Page 5: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/5.jpg)
What is Reinforcement Learning?
• Learning how to behave in order to maximize a numerical reward signal
• Very general: almost any problem can be formulated as a reinforcement learning problem
![Page 6: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/6.jpg)
Basic RL Agent Implementation
• Main components:– Value function: maps
states to “values”
– Policy: maps states to actions
• State representation converts observations to features (allows linear function approximation methods for value function and policy)
• Temporal difference (TD) prediction errors train value function and policy
![Page 7: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/7.jpg)
RBF State Representation
![Page 8: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/8.jpg)
Temporal Difference Learning• Learning to predict the difference in value between successive time steps.
• Compute TD error: δt = rt + γV(st+1) - V(st)
• Train value function: V(st) ← V(st) + ηδt
• Train policy by adjusting action probabilities
![Page 9: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/9.jpg)
Biological Inspiration
• Biological brains: the proof of concept that intelligence actually works.
• Midbrain dopamine neuron activity is very similar to temporal difference errors.
Figure from Suri, R.E. (2002). TD Models of Reward Predictive Responses in Dopamine Neurons. Neural Networks, 15:523-533.
![Page 10: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/10.jpg)
Internal Predictive ModelAn accurate predictive model can temporarily replace actual experience
from the environment.
![Page 11: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/11.jpg)
Training a Predictive Model• Given the previous observation and action, the predictive model tries to predict
the current observation and reward.• Training signals are computed from the error between actual and predicted
information.
![Page 12: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/12.jpg)
Planning• Reinforcement learning from simulated experiences.
• Planning sequences end when uncertainty is too high.
![Page 13: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/13.jpg)
Curiosity Rewards• Intrinsic drive to explore unfamiliar states.
• Provide extra rewards proportional to uncertainty.
![Page 14: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/14.jpg)
Verve
• Verve is an Open Source implementation of curious, planning reinforcement learning agents.
• Intended to be an out-of-the-box solution, e.g. for game development or robotics.
• Distributed as a cross-platform library written in C++.
• Agents can be saved to and loaded from XML files.
• Includes Python bindings.
http://verve-agents.sourceforge.net
![Page 15: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/15.jpg)
1D Hot Plate Task
![Page 16: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/16.jpg)
1D Signaled Hot Plate Task
![Page 17: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/17.jpg)
2D Hot Plate Task
![Page 18: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/18.jpg)
2D Maze Task #1
![Page 19: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/19.jpg)
2D Maze Task #2
![Page 20: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/20.jpg)
Pendulum Swing-Up Task
![Page 21: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/21.jpg)
Pendulum Neural Networks
![Page 22: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/22.jpg)
Cart-Pole/Inverted Pendulum Task
![Page 23: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/23.jpg)
Planning in Maze #2
![Page 24: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/24.jpg)
Curiosity Task
![Page 25: Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.](https://reader036.fdocuments.in/reader036/viewer/2022062806/5697bfc91a28abf838ca8c9c/html5/thumbnails/25.jpg)
Future Work
• More applications (real robots, game development, interactive training)
• Hierarchies of motor programs– Constructed from low-level primitive actions
– High-level planning
– High-level exploration