Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning
in Creating Brain-Like Intelligence, Sendhoff et al.
Course: Robots Learning from Humans
Kim, Jung Ah
College of Natural SciencesInterdisciplinary Program in Brain Science
Seoul National University
2
Contents
1. Introduction
2. Related Works – Reinforcement Learning Research – Neuroscience Findings– Open Questions
3. Basal Ganglia Models – Basal Ganglia System Model – Basal Ganglia Spiking Neural Network Model
4. Discussion 5. Conclusion
Introduction
• Research objective : – to develop an autonomous behavior learning system for
machines
• Why? – People will demand that the machine selects the best
option by assessing the situation
• What to investigate:– Learning mechanism underlying behavior selection in the
animal basal ganglia (BG)
Introduction
Animal LearningBasal Ganglia (BG)
• Classical Pavlovian conditioning
• Instrumental conditioning
Machine LearningReinforcement Learning (RL)
• Temporal difference (TD) learning
Phasic activity of dopamine neurons
in the BG
Related Works (1) – What is RL?
Action at
Reward rt
State st+1
In reinforcement learning: • agent interacts with its environment• perceptions (state), actions, rewards (repeat)• task is to choose actions to maximize rewards• complete background knowledge unavailable
Reinforcement learning (RL) is concerned with “how an agent ought to take actions in an environment so as to maximize some notion of long-term reward.”
Related Works (2) – BG
Cerebral Cortex (frontal, prefrontal, and parietal areas)
BG (striatum)
Cerebral cortex (motor area)
• Basal Ganglia(BG) is buried deep within the telencephalon
Related Works (2) – Why BG?
• Information from the cortex flows through the direct and indirect pathways in parallel
• The outputs of both pathways ultimately regulate the motor thalamus
• The direct pathway helps to select certain motor actions while the indirect pathway simultaneously suppresses competing, and inappropriate, motor programs
Related WorksMachine learning
(Reinforcement learning, RL)Neuroscience
(Basal Ganglia, BG)
Model-based RL• Uses experiences to construct an
internal model of state transitions • Can solve complex structured tasks • Dyna and real-time dynamic
programming
Model-free RL • Uses experiences to directly learn one
or two simpler quantities, which can then achieve the optimal behavior without learning a world model
• Can be used to solve unknown tasks • Temporal difference(TD) learning and
Q-learning
Model-based RL• Dorsomedial striatum • Prelimbic prefrontal cortex• Orbito frontal cortex • Medial prefrontal cortex • Parts of the amygdala
Model-free RL (Neuromodulatory system)
• Dorsolateral striatum • Amygdala These brain areas are interconnected
by parallel loops! BG are running model-free RL independently,
as well as running model-based RL by receiving modeled, or structured, input from cortex.
Basal Ganglia(BG) Models (I)
• BG System Model
Two types of test environments:• MDP (Markov Decision Process)• HOMDP (High Order MDP)
Basal Ganglia(BG) Models (I)
• BG System Model Two types of test environments:
1) MDP (Markov Decision Process)
2) HOMDP (High Order Markov Decision Process)
Basal Ganglia(BG) Models (II)
• BG Spiking Neural Network Model– Can select and initiate
an action for trial and error in the presence of noisy, ambiguous input streams, and then adaptively of actions tune selection probability and timing
– Indirect pathway selects an action and the direct pathway initiates the selected action
Discussion
• The BG system model focuses on the problems associated with relating model-based RL and model-free RL in a single system
• Open questions: – Architecture – Role of Neuromodulators– Neural Mechanisms of the Timing
• Perspective:– Associative Interacting Intelligence
Conclusion
• The BG system model illustrates the effectiveness of internal state representation and internal reward for achieving a goal in shorter trials
• The BG spiking neural circuit model has the capacity for probabilistic selection of action and also shows that selection probability and execution timing can be modulated
Top Related