Neurobiological Models and Research Themes
-
Upload
matthew-crossley -
Category
Science
-
view
80 -
download
2
Transcript of Neurobiological Models and Research Themes
Neurobiological Models and Research Themes
Matthew J. Crossley
Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106
I. A neurobiological model of appetitive instrumental conditioning
II. Overview of my research
III. Contribution to the Ivry lab
Talk Goals
Why Instrumental Conditioning?
• The Ashby lab bread and butter is category learning
• Information-Integration category-learning is a procedural skill
• Appetitive Instrumental Conditioning is a procedural skill
• Procedural Skills
• Model Architecture
• Instrumental Conditioning Applications
• Instrumental Conditioning Summary
Part I Outline
• Procedural Skills
• Model Architecture
• Instrumental Conditioning Applications
• Category Learning Applications
• Instrumental Conditioning Summary
Outline
• Learned incrementally from feedback
• Model-free reinforcement learning
• Habitual control
• E.g., riding a bike or playing an instrument
• E.g., radiology
Procedural Skills
Procedural Skills Depend on the Basal Ganglia
• Basal ganglia are a collection of subcortical nuclei
• Interconnects with cortex in well defined circuits
• Striatum is a major input structure
Procedural Learning Depends on the Striatum
• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995
• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992
• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996
• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011
Striatal Neurons
Medium Spiny Projection Neurons (MSNs)
96%
GABA Interneurons 2%
TANs - Cholinergic Interneurons 2%
The TANs are of Particular Interest
• Tonically active and pause to excitatory input
• Presynaptically inhibit cortical input to MSNs
• Get major input from CM-Pf (thalamus)
• Learn to pause to stimuli that predict reward (requires dopamine)
• Procedural Skills
• Model Architecture
• Instrumental Conditioning Applications
• Category Learning Applications
• Closing Remarks
Outline
Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses
Pf-TAN Synapse
CTX-MSN Synapse
Ashby and Crossley (2011)
Response and Feedback
• Model responds if SMA crosses threshold
• Model is given feedback after every trial
Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses
Pf-TAN Synapse
CTX-MSN Synapse
Ashby and Crossley (2011)
CTX-MSN Synaptic Modification Requires a TANs Pause
• Synaptic Strengthening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Elevated DA levels
• Synaptic Weakening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)
Synaptic Plasticity in the Striatum Depends on Dopamine (DA)
• Synaptic Strengthening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Elevated DA levels
• Synaptic Weakening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)
DA Encodes Reward Prediciton Error (RPE)
• Elevated after unexpected reward
• Depressed after unexpected no-reward
• Does nothing if anything expected happens
Bayer & Glimcher (2005)
Computing RPE
Obtained feedback on trial n:
Predicted feedback on trial n:
Rn =
�1 if positive feedback0 otherwise
Pn = Pn�1 + �(Rn�1 � Pn�1)
RPE on trial n:
RPE(n) = Rn � Pn
Updating Synapses in the Model
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Presynaptic Activity
Presynaptic Activity
Synaptic Strengthening
Synaptic Weakening
Updating Synapses in the Model
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Postsynaptic Activation
Postsynaptic Activation
Synaptic Strengthening
Synaptic Weakening
Updating Synapses in the Model
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Elevated DA
Depressed DA
Synaptic Strengthening
Synaptic Weakening
• Procedural Skills
• Model Architecture
• Instrumental Conditioning Applications
• Instrumental Conditioning Summary
Outline
Fast Reacquisition
Ashby and Crossley (2011)
Fast reacquisition is evidence that extinction did not erase initial learning
Fast Reacquisition Mechanics
TANs quickly stop pausing, and thereby protect cortico-striatal synapses
Partial Reinforcement Extinction (PRE)
Extinction is slower when acquisition is trained with partial reinforcement
Slowed Reacquisition
Condition
Phase
Ext2 Ext8 Prf2 Prf8
Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec
ExtinctionNo
ReinforcementNo
ReinforcementLean Schedule Lean Schedule
Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min
Woods and Bouton (2007)
Renewal - Basic Design
Condition
Phase
ABA AAB ABC
Acquisition Environment A Environment A Environment A
Extinction Environment B Environment A Environment B
Renewal (Extinction)
Environment A Environment B Environment C
Bouton et al. (2011)
ABA Mechanics
Crossley, Horvitz, Balsam, & Ashby (in prep)
Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses
Instrumental Conditioning Summary
• The TANs protect learning at CTX-MSN synapses.
• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.
I. A Neurobiological model of appetitive instrumental conditioning
II. Overview of my research
III. Contribution to the Ivry Lab
Talk Goals
Many Qualitative Differences Between RB and II
RB II
Unsupervised learning Yes No
Observational learning Yes No
Dual-task interference Yes No
Time needed to process feedback
Yes No
Interference from button switch
No Yes
Interference from Feedback Delay
No Yes
II Category Learning is a Procedural Skill
Unlearning Experiment Design
Crossley, Maddox & Ashby (under review)
Condition
Phase
Active ConditionMeta-Learning
Condition
Acquisition True Feedback True Feedback
Extinction Feedback Manipulation Feedback Manipulation
Reacquisition True FeedbackTrue Feedback
New Categories
We Achieved Unlearning
Unlearning requires partially-contingent feedbackCrossley, Maddox & Ashby (under review)
Theoretical AccountNetwork architecture and new DA model
Crossley, Maddox & Ashby (under review)
• DA is RPE scaled by response-feedback contingency
System Interaction Theme
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Do systems cooperate to learn optimal behavior?
• What does it take to get system-switching?
• Does the procedural system learn during declarative control?
• What mechanistic models describe system switching throughout learning?
• What is the correct neurobiological model of system switching?
System Interaction Theme
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Do systems cooperate to learn optimal behavior?
• What does it take to get system-switching?
• Does the procedural system learn during declarative control?
• What mechanistic models describe system switching throughout learning?
• What is the correct neurobiological model of system switching?
Do Systems Cooperate?
Perfect accuracy is possible with trial-by-trial switching between RB and II strategies
Ashby & Crossley (2010)
2 days (1200 trials) of training on:
Systems Compete
Information-Integration Uniform Hybrid Non-Uniform Hybrid
GuessingRule-BasedInformation_integrationHybrid
Decision-Bound Model Fit Summary
Num
ber o
f Par
ticip
ants
05
1015
20
Almost nobody was best fit by a hybrid model
Ashby & Crossley (2010)
System Interaction Theme
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Do systems cooperate to learn optimal behavior?
• What does it take to get system-switching?
• Does the procedural system learn during declarative control?
• What mechanistic models describe system switching throughout learning?
• What is the correct neurobiological model of system switching?
What does it take to get successful system switching?
A
B
DC
Behavioral: Crossley, Roeder & Ashby (in prep)
fMRI: Turner, Crossley & Ashby (in prep)
Crossley, Roeder & Ashby (in prep)
Successful System-Switching
Training Protocol
• 100 RB trials
• 400 II trials
• 300 intermixed trials
• 100 button-switched intermixed trials
Successful System-Switching
Button Switch
Crossley, Roeder & Ashby (in prep)
Persistent button-switch interference on II trials but not RB trials supports true system switching
Butt
on S
witc
h In
terf
eren
ce
Butt
on S
witc
h In
terf
eren
ce
System Interaction Theme
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Do systems cooperate to learn optimal behavior?
• What does it take to get system-switching?
• Does the procedural system learn during declarative control?
• What mechanistic models describe system switching throughout learning?
• What is the correct neurobiological model of system switching?
Does the procedural system learn during declarative control?
Conditions
• Transfer Positive
• All Positive
• Transfer Negative
• All Negative
Crossley & Ashby (in prep)
Potential for weak bootstrapping
Small, but significant hit in Transfer Negative condition during first 50 trials after transfer
TransferTrain
Crossley & Ashby (in prep)
System Interaction Theme
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Do systems cooperate to learn optimal behavior?
• What does it take to get system-switching?
• Does the II system learn during RB control?
• What mechanistic models describe system switching throughout learning?
• What is the correct neurobiological model of system switching?
System Interaction Theme
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Do systems cooperate to learn optimal behavior?
• What does it take to get system-switching?
• Does the II system learn during RB control?
• What mechanistic models describe system switching throughout learning?
• What is the correct neurobiological model of system switching?
Category Structure and Feedback Effects
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• What system learns unstructured categories?
• Does probabilistic feedback induce procedural learning?
The Experiment
Crossley, Madsen & Ashby (in prep)
Conditions
• Unstructured - Deterministic
• Unstructured - Probabilistic
• Rule-based - Deterministic
• Rule-based - Probabilistic
The Experiment
Crossley, Madsen & Ashby (in prep)
Butt
on S
witc
h In
terf
eren
ce
Acc
urac
y
Butt
on S
witc
h In
terf
eren
ce
Rea
ctio
n T
ime
Button-switch effect on unstructured categories suggests procedural control
Learning Under a Dual-Task
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction (i.e., reward removal in instrumental conditioning and noncontingent feedback in category learning).
• Phasic DA response should be scaled by response-feedback contingency.
• Hypothesis 1: Dual-task induces procedural control.
• Hypothesis 2: Dual-task only slows the declarative system down.
RB category learning with a simultaneous numerical Stroop task
The Experiment
Paul, Crossley & Ashby (in prep)
• Every participant does either RB or II structures with:
• Single-task, button-switch
• Dual-task, button-switch
I. A Neurobiological model of appetitive instrumental conditioning
II. Overview of my research
III. Contribution to the Ivry Lab
Talk Goals
I. Lots of room to build spiking networks
Hand / Object Choice networks
Inhibitory Control and Competition Resolution
Supervised learning in the cerebellum
Model of timing in instrumental conditioning
II. Object choice, hand choice, and categorization: Experiment ideas
Contribution to the Ivry Lab
Spiking Networks of Hand and Object Choice
Motivation
• Predictive clarity
• Model-based imaging
• Natural ability to account for patient data
• Generate new experiments
Supervised Learning in the Cerebellum
Hypothesized hand and object choice brain systems operate with different learning algorithms.
Doya, 2000
Spiking Networks of IC and CR
• Role of the hyperdirect pathway?
• Relationship to our studies of system switching?
I. Many of the tools used to dissociate RB and II category learning systems might be used to dissociate hand choice from object choice, and subsystems thereof.
Feedback delay
Time duration to process feedback
Feedback contingency
Automaticity
Object choice, hand choice, and categorization experiment ideas