Rouault sfn2014

24
INTEGRATION OF VALUES AND INFORMATION IN DECISION-MAKING 1 Marion ROUAULT, Jan DRUGOWITSCH and Étienne KOECHLIN Laboratoire de Neurosciences Cognitives INSERM U960, Ecole Normale Supérieure, Paris

Transcript of Rouault sfn2014

Page 1: Rouault sfn2014

INTEGRATION OF VALUES AND INFORMATION

IN DECISION-MAKING

1

Marion ROUAULT, Jan DRUGOWITSCH and Étienne KOECHLIN

Laboratoire de Neurosciences Cognitives INSERM U960,

Ecole Normale Supérieure, Paris

Page 2: Rouault sfn2014

Neural bases of action outcomes evaluation

2

Fronto-striatal loops

• Executive control of behavior relies on evaluation of action outcomes to adjust subsequent action

Atlas Yelnik and Bardinet

Striatum

Dopaminergic system: reward processing

Ventromedial prefrontal cortex

Page 3: Rouault sfn2014

Action outcomes may convey two types of value signals:

- “Rewarding” value : valorisation for the action outcome over an axis of subjective preferences - “Informational” value : information transmitted by the action outcome about choice reliability (probability that, in the current situation, the chosen action was the most appropriate)

Working hypothesis

3

Reinforcement learningSimple, rapid, phylogenetically old

Bayesian inferenceSophisticated, rapidly saturated

How are processed rewarding and informational aspects of action outcomes? What are their neural and functional interaction?

Page 4: Rouault sfn2014

Probabilistic reversal learning task

Correct state is rewarded 80 % of the time+ reversals

• States :

4

• Values : 2, 4, 6, 8, 10 € before decision, range 1:11 € after decision• Minimal instructions

Page 5: Rouault sfn2014

3 conditionsManipulate separately values and information

Values provide no information about the most frequently rewarded state

Higher values are correlated with the most frequently rewarded state

Higher values are correlated with the less frequently rewarded state

5

Reward

80% 20%

20% 20%80%80%

Pro

ba

bili

ty

RewardReward

CONDITION CORRELATE

D

CONDITION RANDOM

CONDITION ANTI-CORRELATED

Page 6: Rouault sfn2014

Behavior

6Subjects favor accuracy, “being correct”, over simply maximizing reward

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et

Trial number after contingency reversal

CONDITION ANTI-CORRELATED

CONDITION CORRELATE

D

CONDITION RANDOM

22 SUBJECTS

Page 7: Rouault sfn2014

Variables contributive to choice?Logistic regressions

Co

ntr

ibu

tion

to

ch

oic

e

(be

ta w

eig

ht)

p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1 p r1 r2 EV1 EV2 xt-1

CONDITION CORRELATE

D

CONDITION ANTI-CORRELATED

CONDITION RANDOM

• Differential processing of rewards given the experimental condition: informational value• No computation of expected value

Page 8: Rouault sfn2014

Choice models

• Optimal choice would be rational combination of probabilities and rewards:

Probability x Reward• However people’s behavior is usually suboptimal• To explain this sub-optimality, it is assumed that subjects

have distortions in their probabilities and rewards representations

8

Khaneman and Tversky 1979 Zhang and Maloney 2012

Page 9: Rouault sfn2014

1000 simulations

SUBJECTSDISTORTIONS MODEL

Distortions modelCONDITION

ANTI-CORRELATEDCONDITION CORRELATE

D

CONDITION RANDOM

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et

Trial number after contingency reversal

Page 10: Rouault sfn2014

Mixed model: integration of 2 concurrent systems for decision-making

Particularity of the protocol: possible rewards to gain are presented before choice:

RLBayesian inference

Combination of beliefs and reinforcement:Choice over: 0.75 BeliefBay + 0.25 QRL

Qt+1 = Qt + α (Rt – Qt)

(1 – w)Qt + wRt

with w biasing current expected returns

Revision of Qs before choice:Revision of beliefs before choice given reward distributions:

Page 11: Rouault sfn2014

1000 simulations

SUBJECTSMIXED MODEL

Mixed modelCONDITION

ANTI-CORRELATEDCONDITION CORRELATE

D

CONDITION RANDOM

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et

Trial number after contingency reversal

Page 12: Rouault sfn2014

Relative gain to a Bayesian model solely monitoring beliefs

LLH BIC AIC

DISTORTIONS MIXED

p = .057

Models comparison

p < .005

p < .05

Distortions might be better explained by a mixed model integrating two systems for decision-making

Page 13: Rouault sfn2014

Mixed model without informational valueC

ho

ice

% o

f ta

rge

t w

ith

be

st e

xpe

cte

d v

alu

eC

ho

ice

% o

f m

ost

fre

que

ntly

re

wa

rde

d t

arg

et

Trial number after contingency reversal

CONDITION CORRELATE

D

CONDITION RANDOM

CONDITION ANTI-CORRELATED

SUBJECTSMIXED MODEL WITHOUT INFORMATIONAL VALUE

Page 14: Rouault sfn2014

14

Informational value processing

p < .005 unc., c > 10 voxels, z = 40.

Small but significant positive correlation with informational value within dlPFC regions

Refaire extraction de betas dans GLM36

Page 15: Rouault sfn2014

p < 0.005 unc. c > 10.

linear

quadratic

Neuro Imaging resultsBelief system RL system

Page 16: Rouault sfn2014

Neuro Imaging results

16

Belief system: RL system:

Neural activations are coherent with a mixed model involving two systems for decision-making

Page 17: Rouault sfn2014

Summary• The product of the distortions is actually explained

by an integration of two systems for decision-making

• Rewarding value: network involving

• Informational value: network involving dlPFC,

17

Reinforcement learningSimple, rapid, phylogenetically old

Bayesian inferenceSophisticated, rapidly saturated

Page 18: Rouault sfn2014

Acknowledgments

18

Frontal lobe functions team

Page 19: Rouault sfn2014

CONDITION CORRELATE

DCONDITION RANDOM

CONDITION ANTI-CORRELATED

Ch

oic

e r

eu

ros

wh

en

pre

sen

ted

Choice given reward presented

19

4 € 10 €

How much do you choose 10 euros, independently of your belief about the current state?

2 4 6 8 10

Remaining effect of rewarding value visible in condition random

Choices rather related to states

Page 20: Rouault sfn2014

20

Reinforcement learning model

Computations associated with RL model:

with w biasing current expected returns

Page 21: Rouault sfn2014

21

Action selection:

Generative model of the taskz STATE OF THE WORLD (NOT OBSERVED)

Page 22: Rouault sfn2014

Variables contributive to choice?Logistic regressions

p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1

Co

ntr

ibu

tion

to

ch

oic

e

(be

ta w

eig

ht)

Co

ntr

ibu

tion

to

ch

oic

e

(be

ta w

eig

ht)

DISTORTIONS MIXED

22 subjects

19 subjects

p r1 r2 xt-1 p r1 r2 xt-1 p r1 r2 xt-1

Page 23: Rouault sfn2014

1000 simulations

SUBJECTSDISTORTIONS MODEL

Distortions modelCONDITION

ANTI-CORRELATEDCONDITION CORRELATE

D

CONDITION RANDOM

Ch

oic

e %

of t

arg

et

with

b

est

exp

ec

ted

va

lue

Ch

oic

e %

of m

ost

fre

que

ntly

re

wa

rde

d t

arg

et

Trial number after contingency reversal

Page 24: Rouault sfn2014

Mixed model: integration of 2 concurrent systems for decision-making

Particularity of the protocol: possible rewards to gain are presented before choice:

RLBayesian inference

Linear combination of beliefs and reinforcement:Choice over: 0.75 BeliefBay + 0.25 QRL

Revision of beliefs before choice given reward distributions:

Revision of Qs before choice:

with w biasing current expected returns