Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.

Show Me the Money!

Dmitry Kit

Outline

Overview Reinforcement Learning Other Topics Conclusions

Learning Models

Hebbian Learning Strengthens the relationship between neurons that

exhibit similar activity patterns and/or in close proximity

Might Explain Topological Features of the Brain Population Coding Basis function learning Area allocation to different functions

Reinforcement Learning (RL) Strengthens the relationship between choices that are

causally connected in obtaining some reward

RL Framework and the Brain

Reward Signal Representation Dopamine

Local Action selection structures Lateral Intra-parietal Area (LIP) Supplementary Eye Field (SEF) Frontal Eye Field (FEF)

Global Mechanism for action selection Basal Ganglia

Outline

Overview Reinforcement Learning

Reward Signal (Dopamine)Decision Variables (SEF, LIP, Other)Global Mechanism for Choice (Basal Ganglia)

Other Topics Conclusions

Reward Signal (Dopamine)

Located in Nigra Pars Compacta (SNc)

Modulates neurons in many different regions

Tonic low frequency activity

Only sends the error signal between expected and actual rewards

RL and Dopamine

Decision Variables in LIP

Contains neurons that code for: expected gain relative rewards between different actions

This activity was observed to be before the choices were actually presented and movement was madeSuggesting that these neurons were used to

decide on appropriate action

LIP Neuron Activity

Expectation of: High reward

produced high firing frequency (black line)

Low reward produced high firing frequency (gray line)

The firing rate was correlated with gain expectation early in the trial

Overall Neural Activity in LIP A large portion of examined neurons showed a

significant activity related to gain expectation, outcome probability and estimated value These were mostly exhibit in the early part of the trial

These neurons were also modulated by the actual movement

Neural Features of SEF

Three types of neurons found Active upon failure to perform task

Not responsible for executing actions Not related to spatial stimuli Possible error signal coding

Active upon success Not a response to visual stimuli Not responsible for motor control Related to some internal coding of performance

Active before and during the delivery of reinforcement Possibly interconnected to other regions of the brain Seem to code expected reward versus actual reward received

Function of SEF

Monitoring and controlling: Perception and production systems during decision

making Error Correction Production of responses that are not well-learned Overcoming habitual responses

Evidence: Neurons do not generate eye movements Monitor performance and reward

Reward Coding in Other Structures

Neurons in orbitofrontal cortex show: Selectivity to the type of physical reward

Solid Liquid Etc.,

Distinguish between rewards and punishers Some neurons in amygdala respond to

magnitude of reward

Local Choices

Multiple areas in charge of decision making Frontal Eye Field (FEF) LIP Supplementary Eye Field (SEF) Etc.,

Might have different goals

Need a global mechanism to arbitrate between these different goals

Physiology (Basal Ganglia)

Located at the base of the cerebrum Consists of:

Caudate Nucleus (CD) and putamen (PUT) (collectively called striatum) Input from cerebral cortex and part of the thalamus

Globus pallidus External Segment (GPe) Internal Segment (GPi)

Subthalamic Nucleus (STN) Receives direct input from cerebral cortex

Substantia Nigra Pars reticulate (SNr) Pars compacta (SNc)

Output Stations (GPi and SNr) To thalamus and brain stem motor areas

Anatomical Locations (BG)

BG: Function

Controls Thalamocortical networks

Mainly involved in hand or arm movements Brain stem motor networks

Superior Colliculus eye-head orienting

the pedunculopontine nucleus locomotion

periaqueductal gray vocalization

autonomic responses

BG-SC connection

Exists in many lower mammals Method of control

CD inhibits neural activity in SNrSNr projects inhibitory connections to SC

Inhibition is the main method of control Appropriate action is selected by inhibiting all

except the desired action

Neural Properties of BG

Contains memory-guided neurons Contains neurons that code expectation of

task specific events SNr

Only effected by planned movementsResponse fields of neurons is the same to

those they connect to in SC

Circuit Diagram

Coordinated Activity Model

Use GPe to select just the activity you need (Focus)

Use STN to inhibit a planned future activity(Sequencing)

Might be an incorrect model if we emphasize the direct cortical input to the STN Direct control over movement

suppression

Learning of Sequential Procedures

Frontoparietal association cortices and anterior part of the basal ganglia learn new sequencesUses visuospatial coordinates

Motor-premotor cortices and the mid-posterior part of the basal ganglia exploit learned sequencesUses motor coordinates

BG and Decision Making

Ventral striatum receives input from neocortical areas (cognition) and limbic (emotional) areas

Speed of saccades are related to emotional or motivational state

As with SEF and LIP many BG neurons respond to the expectation of reward

Uses dopaminergic neurons to: Modulate selectivity of individual neurons Modulate response magnitude of individual neurons

Circuit Diagram Revisited

Consequences of BG Disorders

Involuntary movement Random movement

Visually guided saccades Shorter saccades Problems with coordinated movements Responds deficit to memory-guided saccades Trouble holding fixation

Especially if STN is damaged Inability to learn sequential procedures Lack of motivation to perform actions

Why Disinhibition?

Possibly an evolutionary by-product

Need a gating mechanism not an enhancement mechanism

Outline

Overview Reinforcement Learning Other Topics Attention Vs. Reward Credit Assignment Problem Conclusions

Attention Or reward?

Attention is a more global concept than reward Defined as the study of vigilance, selective processing of stimuli,

and control systems for complex behavior Attention can modulate neurons before the onset of

stimuli, just like reward expectation neurons Attention is dependant on task difficulty How does one distinguish between reward expectation

signal and attention to a particular stimuli at a single neuron? Some studies of attention might have been looking at the same

neural signal as those studying reward Provide better definitions for reward and attention

Attention might be defined in terms of rewards

The Credit Assignment Problem

What chain of actions resulted in reward?

Which of the action to the right got you your steak?

Action/order 1st 2nd 3rd 4th 5th

Take a car

Take a bike

Open Door

Sit down

Shift in seat

Point at Menu Item

Tapped the table

Solution

Start from the point of receiving the reward Recently shown that in rats the hippocampus replays their daily

experiences backwards Back propagate the reward at a discounted rate

In monkeys the neurons coding for reward expectation decreased their activity, when the delay between the cue and reward was increased

Converges to optimal policy if: Any action has some positive probability of being chosen Infinite time

Infinite Time

Humans do not need an optimal policy The inherent randomness in our

environment, behaviors, and tasks make it unlikely that a set of truly unrelated actions coincide frequently

Conclusion Choose a set of actions (eg., SEF, LIP, etc.,) Execute a subset of actions that do not violate physical

limitations (Basal Ganglia) Compare the final result against the expected result (Dopamine) Try to do the task again

Almost every task that we execute uses the eyes to locate the target of interest and therefore it is not surprising that the eye is closely related to the current task.

Might be a huge oversimplification: “Correlation does not imply causation” More experiments are needed to show these relationships

The End

Thank You

Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.

Documents

Transcript of Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.