Dopamine, Uncertainty and TD Learning

14
Dopamine, Uncertainty and TD Learning CNS 2004 Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL

description

Dopamine, Uncertainty and TD Learning. Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL. CNS 2004. Dorsal Striatum (Caudate, Putamen). Prefrontal Cortex. Nucleus Accumbens (Ventral Striatum). Amygdala. Substantia Nigra. Ventral Tegmental Area. - PowerPoint PPT Presentation

Transcript of Dopamine, Uncertainty and TD Learning

Page 1: Dopamine, Uncertainty  and TD Learning

Dopamine, Uncertainty

and TD Learning

CNS 2004

Yael Niv

Michael Duff

Peter Dayan

Gatsby Computational Neuroscience Unit, UCL

Page 2: Dopamine, Uncertainty  and TD Learning

What is the function of Dopamine?

Dorsal Striatum (Caudate, Putamen)

Ventral TegmentalArea

Substantia Nigra

Amygdala

Nucleus Accumbens(Ventral Striatum)

Prefrontal Cortex

Parkinson’s Disease-> Movement control?

Intracranial self-stimulation;Drug addiction-> Reward pathway?-> Learning?

Also involved in:- Working memory- Novel situations- ADHD- Schizophrenia…

Page 3: Dopamine, Uncertainty  and TD Learning

What does phasic Dopamine encode?Unpredicted reward(neutral/no stimulus)

Predicted reward(learned task)

Omitted reward(probe trial)

(Schultz et al.)

Page 4: Dopamine, Uncertainty  and TD Learning

The TD Hypothesis of Dopamine

Phasic DA encodes a reward prediction error

• Precise theory for generation of DA firing patterns

• Compelling account for the role of DA in classical conditioning

)1()( ttV

)1()1( tVtr

...)3()2()1()()(

trtrtrrtVt

reward

value

r

V

(Sutton+Barto 1987, Schultz,Dayan,Montague 1997)

)()1()1()1( tVtVtrt Temporal difference error

Page 5: Dopamine, Uncertainty  and TD Learning

But: Fiorillo, Tobler & Schultz 2003• Introduce inherent uncertainty into the classical

conditioning paradigm

• Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0%

Stimulus = 2 sec visual stimulus

Reward (probabilistic) = drops of juice

Page 6: Dopamine, Uncertainty  and TD Learning

Fiorillo, Tobler & Schultz 2003At stimulus time - DA represents

mean expected reward

Delay activity - A ramp in activity up to reward

Hypothesis: DA ramp encodes uncertainty in reward

Page 7: Dopamine, Uncertainty  and TD Learning

“Uncertainty Ramping” and TD error?• The uncertainty is predictable from the stimulus• TD predicts away predictable quantities If it represents uncertainty, the ramping activity should

disappear with learning according to TD.

Uncertainty ramping is not easily compatible with the TD hypothesis

Are the ramps really coding uncertainty?

Page 8: Dopamine, Uncertainty  and TD Learning

At time of reward:• Prediction errors result from

probabilistic reward delivery

• Crucially: Positive and negative errors cancel out

A closer look at FTS’s results

p = 50%

p = 75%

Page 9: Dopamine, Uncertainty  and TD Learning

• TD prediction error δ(t) can be positive or negative• Neuronal firing rate is only positive (negative values can

be encoded relative to base firing rate)

But: DA base firing rate is low -> asymmetric encoding of δ(t)

A TD Resolution:

55%

270%

δ(t)

DA

Page 10: Dopamine, Uncertainty  and TD Learning

Negative δ(t) scaled by

d=1/6 prior to PSTH

summation

Simulating TD with asymmetric errors

Learning proceeds normally (without scaling) − Necessary to produce the right predictions− Can be biologically plausible

Page 11: Dopamine, Uncertainty  and TD Learning

With asymmetric coding of errors, the mean TD error at the time of reward p(1-p)=> Maximal at p=50%

However:• No need to assume explicit coding of uncertainty -

Ramping is explained by neural constraints.• Explanation for puzzling absence of ramp in trace

conditioning results.• Experimental test: Ramp as within or

between trial phenomenon?

Challenges: TD and noise;

Conditioned inhibition, additivity

DA - Uncertainty or Temporal Difference?Experiment

Model

Page 12: Dopamine, Uncertainty  and TD Learning

Trace conditioning: A puzzle and its resolution

• Same (if not more) uncertainty, but no DA ramping (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman)

• Resolution: lower learning rate in trace conditioning eliminates ramp

CS = short visual stimulus

Trace period

US (probabilistic) = drops of juice

Page 13: Dopamine, Uncertainty  and TD Learning

• Rate coding is inherently stochastic• Add noise to tapped delay line representation

=> TD learning is robust to this type of noise

σ = 0.0577

σ = 0.0866

σ = 0.1155

prediction error weights

Mirenowicz and Schultz (1996)

Other sources of uncertainty: Representational Noise (1)

Page 14: Dopamine, Uncertainty  and TD Learning

• Neural timing of events is necessarily inaccurate• Add temporal noise to tapped delay line representation

=> Devastating effects of even small amounts of temporal noise on TD predictions

Other sources of uncertainty: Representational Noise (2)

ε = 0.05

ε = 0.10