Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.
-
Upload
michael-mccormick -
Category
Documents
-
view
215 -
download
1
Transcript of Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.
![Page 1: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/1.jpg)
Show Me the Money!
Dmitry Kit
![Page 2: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/2.jpg)
Outline
Overview Reinforcement Learning Other Topics Conclusions
![Page 3: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/3.jpg)
Learning Models
Hebbian Learning Strengthens the relationship between neurons that
exhibit similar activity patterns and/or in close proximity
Might Explain Topological Features of the Brain Population Coding Basis function learning Area allocation to different functions
Reinforcement Learning (RL) Strengthens the relationship between choices that are
causally connected in obtaining some reward
![Page 4: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/4.jpg)
RL Framework and the Brain
Reward Signal Representation Dopamine
Local Action selection structures Lateral Intra-parietal Area (LIP) Supplementary Eye Field (SEF) Frontal Eye Field (FEF)
Global Mechanism for action selection Basal Ganglia
![Page 5: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/5.jpg)
Outline
Overview Reinforcement Learning
Reward Signal (Dopamine)Decision Variables (SEF, LIP, Other)Global Mechanism for Choice (Basal Ganglia)
Other Topics Conclusions
![Page 6: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/6.jpg)
Reward Signal (Dopamine)
Located in Nigra Pars Compacta (SNc)
Modulates neurons in many different regions
Tonic low frequency activity
Only sends the error signal between expected and actual rewards
![Page 7: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/7.jpg)
RL and Dopamine
![Page 8: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/8.jpg)
Decision Variables in LIP
Contains neurons that code for: expected gain relative rewards between different actions
This activity was observed to be before the choices were actually presented and movement was madeSuggesting that these neurons were used to
decide on appropriate action
![Page 9: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/9.jpg)
LIP Neuron Activity
Expectation of: High reward
produced high firing frequency (black line)
Low reward produced high firing frequency (gray line)
The firing rate was correlated with gain expectation early in the trial
![Page 10: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/10.jpg)
Overall Neural Activity in LIP A large portion of examined neurons showed a
significant activity related to gain expectation, outcome probability and estimated value These were mostly exhibit in the early part of the trial
These neurons were also modulated by the actual movement
![Page 11: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/11.jpg)
Neural Features of SEF
Three types of neurons found Active upon failure to perform task
Not responsible for executing actions Not related to spatial stimuli Possible error signal coding
Active upon success Not a response to visual stimuli Not responsible for motor control Related to some internal coding of performance
Active before and during the delivery of reinforcement Possibly interconnected to other regions of the brain Seem to code expected reward versus actual reward received
![Page 12: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/12.jpg)
Function of SEF
Monitoring and controlling: Perception and production systems during decision
making Error Correction Production of responses that are not well-learned Overcoming habitual responses
Evidence: Neurons do not generate eye movements Monitor performance and reward
![Page 13: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/13.jpg)
Reward Coding in Other Structures
Neurons in orbitofrontal cortex show: Selectivity to the type of physical reward
Solid Liquid Etc.,
Distinguish between rewards and punishers Some neurons in amygdala respond to
magnitude of reward
![Page 14: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/14.jpg)
Local Choices
Multiple areas in charge of decision making Frontal Eye Field (FEF) LIP Supplementary Eye Field (SEF) Etc.,
Might have different goals
Need a global mechanism to arbitrate between these different goals
![Page 15: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/15.jpg)
Physiology (Basal Ganglia)
Located at the base of the cerebrum Consists of:
Caudate Nucleus (CD) and putamen (PUT) (collectively called striatum) Input from cerebral cortex and part of the thalamus
Globus pallidus External Segment (GPe) Internal Segment (GPi)
Subthalamic Nucleus (STN) Receives direct input from cerebral cortex
Substantia Nigra Pars reticulate (SNr) Pars compacta (SNc)
Output Stations (GPi and SNr) To thalamus and brain stem motor areas
![Page 16: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/16.jpg)
Anatomical Locations (BG)
![Page 17: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/17.jpg)
BG: Function
Controls Thalamocortical networks
Mainly involved in hand or arm movements Brain stem motor networks
Superior Colliculus eye-head orienting
the pedunculopontine nucleus locomotion
periaqueductal gray vocalization
autonomic responses
![Page 18: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/18.jpg)
BG-SC connection
Exists in many lower mammals Method of control
CD inhibits neural activity in SNrSNr projects inhibitory connections to SC
Inhibition is the main method of control Appropriate action is selected by inhibiting all
except the desired action
![Page 19: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/19.jpg)
Neural Properties of BG
Contains memory-guided neurons Contains neurons that code expectation of
task specific events SNr
Only effected by planned movementsResponse fields of neurons is the same to
those they connect to in SC
![Page 20: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/20.jpg)
Circuit Diagram
![Page 21: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/21.jpg)
![Page 22: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/22.jpg)
Coordinated Activity Model
Use GPe to select just the activity you need (Focus)
Use STN to inhibit a planned future activity(Sequencing)
Might be an incorrect model if we emphasize the direct cortical input to the STN Direct control over movement
suppression
![Page 23: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/23.jpg)
Learning of Sequential Procedures
Frontoparietal association cortices and anterior part of the basal ganglia learn new sequencesUses visuospatial coordinates
Motor-premotor cortices and the mid-posterior part of the basal ganglia exploit learned sequencesUses motor coordinates
![Page 24: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/24.jpg)
BG and Decision Making
Ventral striatum receives input from neocortical areas (cognition) and limbic (emotional) areas
Speed of saccades are related to emotional or motivational state
As with SEF and LIP many BG neurons respond to the expectation of reward
Uses dopaminergic neurons to: Modulate selectivity of individual neurons Modulate response magnitude of individual neurons
![Page 25: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/25.jpg)
Circuit Diagram Revisited
![Page 26: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/26.jpg)
Consequences of BG Disorders
Involuntary movement Random movement
Visually guided saccades Shorter saccades Problems with coordinated movements Responds deficit to memory-guided saccades Trouble holding fixation
Especially if STN is damaged Inability to learn sequential procedures Lack of motivation to perform actions
![Page 27: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/27.jpg)
Why Disinhibition?
Possibly an evolutionary by-product
Need a gating mechanism not an enhancement mechanism
![Page 28: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/28.jpg)
Outline
Overview Reinforcement Learning Other Topics Attention Vs. Reward Credit Assignment Problem Conclusions
![Page 29: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/29.jpg)
Attention Or reward?
Attention is a more global concept than reward Defined as the study of vigilance, selective processing of stimuli,
and control systems for complex behavior Attention can modulate neurons before the onset of
stimuli, just like reward expectation neurons Attention is dependant on task difficulty How does one distinguish between reward expectation
signal and attention to a particular stimuli at a single neuron? Some studies of attention might have been looking at the same
neural signal as those studying reward Provide better definitions for reward and attention
Attention might be defined in terms of rewards
![Page 30: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/30.jpg)
The Credit Assignment Problem
What chain of actions resulted in reward?
Which of the action to the right got you your steak?
Action/order 1st 2nd 3rd 4th 5th
Take a car
Take a bike
Open Door
Sit down
Shift in seat
Point at Menu Item
Tapped the table
![Page 31: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/31.jpg)
Solution
Start from the point of receiving the reward Recently shown that in rats the hippocampus replays their daily
experiences backwards Back propagate the reward at a discounted rate
In monkeys the neurons coding for reward expectation decreased their activity, when the delay between the cue and reward was increased
Converges to optimal policy if: Any action has some positive probability of being chosen Infinite time
![Page 32: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/32.jpg)
Infinite Time
Humans do not need an optimal policy The inherent randomness in our
environment, behaviors, and tasks make it unlikely that a set of truly unrelated actions coincide frequently
![Page 33: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/33.jpg)
Conclusion Choose a set of actions (eg., SEF, LIP, etc.,) Execute a subset of actions that do not violate physical
limitations (Basal Ganglia) Compare the final result against the expected result (Dopamine) Try to do the task again
Almost every task that we execute uses the eyes to locate the target of interest and therefore it is not surprising that the eye is closely related to the current task.
Might be a huge oversimplification: “Correlation does not imply causation” More experiments are needed to show these relationships
![Page 34: Show Me the Money! Dmitry Kit. Outline Overview Reinforcement Learning Other Topics Conclusions.](https://reader030.fdocuments.in/reader030/viewer/2022032415/56649efc5503460f94c0f7cd/html5/thumbnails/34.jpg)
The End
Thank You