On Linking Reinforcement Learningwith Unsupervised Learning
Cornelius Weber, FIAS
presented at Honda HRI, Offenbach, 17th March 2009
actor
state space
1-layer RL model of BG ...
go left?
go right?... is too simple to handle complex input
complex input(cortex)
need another layer(s) to pre-process complex data
feature detection
action selection
actor
state space
models’ background:
- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)
- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)
- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)
- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...
- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
sensory input
reward
action
scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;
reward given if horizontal bar at specific position
model that learns the relevant features
top layer: SARSA RL
lower layer: winner-take-all feature learning
both layers: modulate learning by δ
RL weights
featureweights
input
action
note: non-negativity constraint on weights
Energy function: estimation error of state-action value
identities used:
Discussion
- simple model: SARSA on winner-take-all network with δ-feedback
- learns only the features that are relevant for action strategy
- theory behind: derivation of value function estimation (approx.)
- non-negative coding aids feature extraction
- link between unsupervised- and reinforcement learning
- demonstration with more realistic data needed
Bernstein FocusNeurotechnology,BMBF grant 01GQ0840
EU project 231722“IM-CLeVeR”,call FP7-ICT-2007-3
Frankfurt Institutefor Advanced Studies,FIAS
Sponsors
Top Related