Neurobiological Models of Instrumental Conditioning

78
Neurobiological Models of Instrumental Conditioning Matthew J. Crossley Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

Transcript of Neurobiological Models of Instrumental Conditioning

Page 1: Neurobiological Models of Instrumental Conditioning

Neurobiological Models of Instrumental Conditioning

Matthew J. Crossley

Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

Page 2: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Page 3: Neurobiological Models of Instrumental Conditioning

Why Instrumental Conditioning?

• The Ashby lab bread and butter is category learning

• Information-Integration category-learning is a procedural skill

• Appetitive Instrumental Conditioning is a procedural skill

Page 4: Neurobiological Models of Instrumental Conditioning

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology

Procedural Skills

Page 5: Neurobiological Models of Instrumental Conditioning

Procedural Skills

Where are the tumors?

Page 6: Neurobiological Models of Instrumental Conditioning

Procedural Skills

TUMORS!

Page 7: Neurobiological Models of Instrumental Conditioning

Procedural Skills Depend on the Basal Ganglia

• Basal ganglia are a collection of subcortical nuclei

• Interconnects with cortex in well defined circuits

• Striatum is a major input structure

Page 8: Neurobiological Models of Instrumental Conditioning

Cortex Excites the Striatum

Page 9: Neurobiological Models of Instrumental Conditioning

Striatum Inhibits the GPi

Page 10: Neurobiological Models of Instrumental Conditioning

GPi Inhibits the Thalamus

High baseline firing rate

Page 11: Neurobiological Models of Instrumental Conditioning

Striatum Disinhibits the Thalamus

Page 12: Neurobiological Models of Instrumental Conditioning

Thalamus Excites Cortex

Page 13: Neurobiological Models of Instrumental Conditioning

Dopamine Modulates Activity

Page 14: Neurobiological Models of Instrumental Conditioning

Procedural Learning Depends on the Striatum

• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992

• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996

• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Page 15: Neurobiological Models of Instrumental Conditioning

Striatal Neurons

Medium Spiny Projection Neurons (MSNs)

96%

GABA Interneurons 2%

TANs - Cholinergic Interneurons 2%

Page 16: Neurobiological Models of Instrumental Conditioning

The TANs are of Particular Interest

• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward (requires dopamine)

Page 17: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Page 18: Neurobiological Models of Instrumental Conditioning

Model Architecture

Ashby and Crossley (2011)

Page 19: Neurobiological Models of Instrumental Conditioning

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse

Ashby and Crossley (2011)

Page 20: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Early Trial

Page 21: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Early Trial

Page 22: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Early Trial

Page 23: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Early Trial

Page 24: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Early Trial

SMA

Page 25: Neurobiological Models of Instrumental Conditioning

Response and Feedback

• Model responds if SMA crosses threshold

• Model is given feedback after every trial

Page 26: Neurobiological Models of Instrumental Conditioning

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse

Ashby and Crossley (2011)

Page 27: Neurobiological Models of Instrumental Conditioning

CTX-MSN Synaptic Modification Requires a TANs Pause

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Page 28: Neurobiological Models of Instrumental Conditioning

Synaptic Plasticity in the Striatum Depends on Dopamine (DA)

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Page 29: Neurobiological Models of Instrumental Conditioning

DA Encodes Reward Prediciton Error (RPE)

• Elevated after unexpected reward

• Depressed after unexpected no-reward

• Does nothing if anything expected happens

Bayer & Glimcher (2005)

Page 30: Neurobiological Models of Instrumental Conditioning

Computing RPE

Obtained feedback on trial n:

Predicted feedback on trial n:

Rn =

�1 if positive feedback0 otherwise

Pn = Pn�1 + �(Rn�1 � Pn�1)

RPE on trial n:

RPE(n) = Rn � Pn

Page 31: Neurobiological Models of Instrumental Conditioning

DA Released on Trial n

DA(n) =

�⌅⇤

⌅⇥

1 if RPE > 10.8RPE + 0.2 if � 0.25 < RPE � 10 if RPE < 0.25

Page 32: Neurobiological Models of Instrumental Conditioning

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Presynaptic Activity

Presynaptic Activity

Synaptic Strengthening

Synaptic Weakening

Page 33: Neurobiological Models of Instrumental Conditioning

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Postsynaptic Activation

Postsynaptic Activation

Synaptic Strengthening

Synaptic Weakening

Page 34: Neurobiological Models of Instrumental Conditioning

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Elevated DA

Depressed DA

Synaptic Strengthening

Synaptic Weakening

Page 35: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Late Trial

Page 36: Neurobiological Models of Instrumental Conditioning

Network Dynamics: Late Trial

Page 37: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Late Trial

Page 38: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Late Trial

Page 39: Neurobiological Models of Instrumental Conditioning

Network Dynamics - Late Trial

SMA

Page 40: Neurobiological Models of Instrumental Conditioning

Model Accounts for Electrophysiological Recordings from TANs

Ashby and Crossley (2011)

Page 41: Neurobiological Models of Instrumental Conditioning

Model Accounts for Electrophysiological Recordings from MSNs

Ashby and Crossley (2011)

Page 42: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Page 43: Neurobiological Models of Instrumental Conditioning

Fast Reacquisition

Ashby and Crossley (2011)

Fast reacquisition is evidence that extinction did not erase initial learning

Page 44: Neurobiological Models of Instrumental Conditioning

Fast Reacquisition Mechanics

TANs quickly stop pausing, and thereby protect cortico-striatal synapses

Page 45: Neurobiological Models of Instrumental Conditioning

Fast Reacquisition Mechanics

Page 46: Neurobiological Models of Instrumental Conditioning

Partial Reinforcement Extinction (PRE)

Extinction is slower when acquisition is trained with partial reinforcement

Page 47: Neurobiological Models of Instrumental Conditioning

PRE Mechanics

TANs take longer to stop pausing under partial reinforcement

Page 48: Neurobiological Models of Instrumental Conditioning

Slowed Reacquisition

Condition

Phase

Ext2 Ext8 Prf2 Prf8

Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec

ExtinctionNo

ReinforcementNo

ReinforcementLean Schedule Lean Schedule

Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min

Woods and Bouton (2007)

Page 49: Neurobiological Models of Instrumental Conditioning

Behavioral Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 50: Neurobiological Models of Instrumental Conditioning

Modeling Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 51: Neurobiological Models of Instrumental Conditioning

TANs don’t stop pausing during extinction in Prf Conditions

CTX-MSN Synapse Pf-TAN Synapse

Page 52: Neurobiological Models of Instrumental Conditioning

Renewal - Basic Design

Condition

Phase

ABA AAB ABC

Acquisition Environment A Environment A Environment A

Extinction Environment B Environment A Environment B

Renewal (Extinction)

Environment A Environment B Environment C

Bouton et al. (2011)

Page 53: Neurobiological Models of Instrumental Conditioning

Renewal

Page 54: Neurobiological Models of Instrumental Conditioning

Model Architecture

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 55: Neurobiological Models of Instrumental Conditioning

Synaptic Plasticity at ALL Pf-TAN Synapses

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 56: Neurobiological Models of Instrumental Conditioning

Renewal

Crossley, Horvitz, Balsam, & Ashby (in prep)

Page 57: Neurobiological Models of Instrumental Conditioning

ABA Mechanics

Crossley, Horvitz, Balsam, & Ashby (in prep)

Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses

Page 58: Neurobiological Models of Instrumental Conditioning

Instrumental Conditioning Summary

• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.

Page 59: Neurobiological Models of Instrumental Conditioning

Untested Physiological Predictions

• Development of TANs pause precedes development of category-specific responses in MSNs

• TANs should stop pausing during extinction

Page 60: Neurobiological Models of Instrumental Conditioning

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference (TD) model of DA

Outline

Page 61: Neurobiological Models of Instrumental Conditioning

Putting TD into the model

We want to replace the discrete-trial model of DA with a continuous

time model

Page 62: Neurobiological Models of Instrumental Conditioning

The TD Prediction Error

TrialTime Step

Pred

ictio

n Er

ror

Page 63: Neurobiological Models of Instrumental Conditioning

The TD Prediction Error

⇥t = rt + �V (t+ 1)� V (t)

rt =

�1 if reward at time t

0 if no reward at time t

Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947

Page 64: Neurobiological Models of Instrumental Conditioning

Model Architecture

Spiking Neuron Driven by TD prediction error:

TANs were removed for initial TD applications

⇥t = rt + �V (t+ 1)� V (t)

Page 65: Neurobiological Models of Instrumental Conditioning

We Need Modified Learning Equations

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Synaptic Strengthening

Synaptic Weakening

DA is no longer modeled on a discrete trial-by-trial basis!

Page 66: Neurobiological Models of Instrumental Conditioning

A Cortico-Striatal Synapse

Page 67: Neurobiological Models of Instrumental Conditioning

CaMKII, PP-1 and Striatal Plasticity

Page 68: Neurobiological Models of Instrumental Conditioning

Learning Equations

w(n+ 1) = w(n)

+ �w

�[SCaMKII(t)� SCaMKII base]

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w

�[SCaMKII(t)� SCaMKII base]

+[Dbase �DPP-1(t)]+w(n)dt

Synaptic Strengthening

Synaptic Weakening

CaMKII Activity

CaMKII Activity

Page 69: Neurobiological Models of Instrumental Conditioning

Learning Equations

w(n+ 1) = w(n)

+ �w

�[SCaMKII(t)� SCaMKII base]

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w

�[SCaMKII(t)� SCaMKII base]

+[Dbase �DPP-1(t)]+w(n)dt

Synaptic Strengthening

Synaptic Weakening

PP-1 Activity

PP-1 Activity

Page 70: Neurobiological Models of Instrumental Conditioning

Acquisition and Extinction

Trial

Prop

ortio

n R

espo

nses

Em

itted

Trial

CT

X-M

SN S

ynap

tic S

tren

gth

Page 71: Neurobiological Models of Instrumental Conditioning

MSN and SNc

TrialTime Step

TrialTime Step

MSN

Out

put

SNc

Out

put

Page 72: Neurobiological Models of Instrumental Conditioning

CaMKII and PP-1

DA model learns very quickly that reward is taken away

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Page 73: Neurobiological Models of Instrumental Conditioning

Extinction under noncontingent reward delivery

Trial

Prop

ortio

n R

espo

nses

Em

itted

Trial

CT

X-M

SN S

ynap

tic S

tren

gth

Page 74: Neurobiological Models of Instrumental Conditioning

MSN and SNc

TrialTime Step

MSN

Out

put

TrialTime Step

SNc

Out

put

Page 75: Neurobiological Models of Instrumental Conditioning

MSN and SNc

Noncontingent reward delivery keeps DA surprised

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Page 76: Neurobiological Models of Instrumental Conditioning

CaMKII and PP-1

Noncontingent reward delivery keeps DA surprised

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Page 77: Neurobiological Models of Instrumental Conditioning

Summary and Future Directions

• TANs need to be added to account for reacquisition, renewal, and other effects after extinction with noncontingent reward

• TD model might need to be modified once the TANs are included and post-extinction effects are examined

Page 78: Neurobiological Models of Instrumental Conditioning

Acknowledgments Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

!

Funding:

NIMH Grant MH3760-2, Todd Wilkinson