Download - Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.

Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning

in Creating Brain-Like Intelligence, Sendhoff et al.

Course: Robots Learning from Humans

Kim, Jung Ah

College of Natural SciencesInterdisciplinary Program in Brain Science

Seoul National University

2

Contents

1. Introduction

2. Related Works – Reinforcement Learning Research – Neuroscience Findings– Open Questions

3. Basal Ganglia Models – Basal Ganglia System Model – Basal Ganglia Spiking Neural Network Model

4. Discussion 5. Conclusion

Introduction

• Research objective : – to develop an autonomous behavior learning system for

machines

• Why? – People will demand that the machine selects the best

option by assessing the situation

• What to investigate:– Learning mechanism underlying behavior selection in the

animal basal ganglia (BG)

Introduction

Animal LearningBasal Ganglia (BG)

• Classical Pavlovian conditioning

• Instrumental conditioning

Machine LearningReinforcement Learning (RL)

• Temporal difference (TD) learning

Phasic activity of dopamine neurons

in the BG

Related Works (1) – What is RL?

Action at

Reward rt

State st+1

In reinforcement learning: • agent interacts with its environment• perceptions (state), actions, rewards (repeat)• task is to choose actions to maximize rewards• complete background knowledge unavailable

Reinforcement learning (RL) is concerned with “how an agent ought to take actions in an environment so as to maximize some notion of long-term reward.”

Related Works (2) – BG

Cerebral Cortex (frontal, prefrontal, and parietal areas)

BG (striatum)

Cerebral cortex (motor area)

• Basal Ganglia(BG) is buried deep within the telencephalon

Related Works (2) – Why BG?

• Information from the cortex flows through the direct and indirect pathways in parallel

• The outputs of both pathways ultimately regulate the motor thalamus

• The direct pathway helps to select certain motor actions while the indirect pathway simultaneously suppresses competing, and inappropriate, motor programs

Related WorksMachine learning

(Reinforcement learning, RL)Neuroscience

(Basal Ganglia, BG)

Model-based RL• Uses experiences to construct an

internal model of state transitions • Can solve complex structured tasks • Dyna and real-time dynamic

programming

Model-free RL • Uses experiences to directly learn one

or two simpler quantities, which can then achieve the optimal behavior without learning a world model

• Can be used to solve unknown tasks • Temporal difference(TD) learning and

Q-learning

Model-based RL• Dorsomedial striatum • Prelimbic prefrontal cortex• Orbito frontal cortex • Medial prefrontal cortex • Parts of the amygdala

Model-free RL (Neuromodulatory system)

• Dorsolateral striatum • Amygdala These brain areas are interconnected

by parallel loops! BG are running model-free RL independently,

as well as running model-based RL by receiving modeled, or structured, input from cortex.

Basal Ganglia(BG) Models (I)

• BG System Model

Two types of test environments:• MDP (Markov Decision Process)• HOMDP (High Order MDP)

Basal Ganglia(BG) Models (I)

• BG System Model Two types of test environments:

1) MDP (Markov Decision Process)

2) HOMDP (High Order Markov Decision Process)

Basal Ganglia(BG) Models (II)

• BG Spiking Neural Network Model– Can select and initiate

an action for trial and error in the presence of noisy, ambiguous input streams, and then adaptively of actions tune selection probability and timing

– Indirect pathway selects an action and the direct pathway initiates the selected action

Discussion

• The BG system model focuses on the problems associated with relating model-based RL and model-free RL in a single system

• Open questions: – Architecture – Role of Neuromodulators– Neural Mechanisms of the Timing

• Perspective:– Associative Interacting Intelligence

Conclusion

• The BG system model illustrates the effectiveness of internal state representation and internal reward for achieving a goal in shorter trials

• The BG spiking neural circuit model has the capacity for probabilistic selection of action and also shows that selection probability and execution timing can be modulated