From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced...
From Exploration to Planning
Cornelius Weber and Jochen TrieschFrankfurt Institute for Advanced StudiesGoethe University Frankfurt, Germany
18th International Conference on Artificial Neural Networks3d - 6th September 2008, Prague
Reinforcement Learning
value actor units
fixed reactive system that always strives for the same goal
Trained Weights
reinforcement learning does not use the exploration phase
to learn a general model of the environment
that would allow the agent to plan a route to any goal
so let’s do this
Learning
actor
state space
randomly move aroundthe state space
learn world models:● associative model● inverse model● forward model
Learning: Associative Model
weights to associateneighbouring states
use these to find any possible routes between agent and goalj
ss'iji sw=' s~ jii
ss'ij s''sε=Δw s~
Learning: Inverse Model
weights to “postdict”action given state pair
use these to identify the action that leads to a desired stateji
s s'akijk s'sw=a ~ jikk
sas'kij s'saaε=Δw ~
sum product Sigma-Pi neuron model
Learning: Forward Model
weights to predict stategiven state-action pair
use these to predict the next state given the chosen actionjk
ass'ikji saw=' s jkii
ass'ikj sa''sε=Δw s
Discussion
- reinforcement learning ... if no access to full state space
- previous work ... AI-like planners assume links between states
- noise ... wide “goal hills” will have flat slopes
- shortest path ... not taken; how to define?
- biological plausibility ... Sigma-Pi neurons; winner-take-all
- to do: embedding ... learn state space from sensor input
- to do: embedding ... let the goal be assigned naturally
- to do: embedding ... hand-designed planning phases