Two is better than one: distinct roles for familiarity and...

1
Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories Cristina Savin , Peter Dayan , Máté Lengyel Computational and Biological Learning, Department of Engineering, University of Cambridge AUTOASSOCIATIVE MEMORY NETWORK APPROXIMATELY OPTIMAL RETRIEVAL: MONOLITHIC SYSTEM PLASTICITY IN A SINGLE SYNAPSE This work was supported by the Wellcome Trust and the Gatsby Charitable Foundation (PD). Computational and Biological Learning Gatsby Computational Neuroscience Unit, University College London 1 2 1 1 2 APPROXIMATELY OPTIMAL RETRIEVAL: DUAL SYSTEM t unknown at recall ! weak A LINK TO BEHAVIOUR: RECOGNITION MEMORY t known dual system 0 5 10 15 single module control error (%) familiarity module recollection module 1 100 200 0 0.05 neuron index activation 0 50 100 150 10 −1 10 0 10 1 t familiarity signal c performance improvement because it is easier to explore the space of possible patterns (as in auxiliary variable methods) these benefits are not restricted to a sampling based representation familiarity = pattern age here, but generalization is possible (c is a sufficient statistic, gives overall expected signal strength) alternative representation (mean-field) task: has the current pattern been seen before? in the context of palimpsest memories = has this pattern been seen recently? an additional decision layer estimates the probability that the current pattern is familiar Biologically-plausible synapses exhibit the palimpsest property: memory traces decay over time, overwritten by ongoing plasticity. This makes retrieval from memory difficult. Here, we propose a probabilistic framework for optimal retrieval from palimpsest synapses. We compare two possible solutions for dealing with the unknown pattern age: 1. integrating it out as an unknown quantity or 2. estimating it in a separate subsystem, in which age is treated as an auxiliary variable. We show that the dual system, inspired by hippocampal and perirhinal cortex connectivity, exhibits significantly better performance. In a recognition task, the same system exhibits several characteristics similar to human and animal experiments. Our results suggest a normative motivation for dual module models for recognition memory, as being beneficial even when the task to solve is recollection. dual system single module error (%) 0 5 10 15 t known control ... all-to-all connectivity (can be relaxed) recurrent network of excitatory neurons binary patterns recall cue = noisy version of original pattern storage retrieval pattern to efficacies cue retrieved pattern states synaptic synaptic retrieve 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 recurrent input probability of firing external input = 0 external input = 1 t known local complex 0 5 10 15 control dynamics error (%) dynamics Gibbs sampling does poorly because the posterior is multimodal/correlated (’frustrated’) complex dynamics (tempered transitions, Neal 1996) work, but hard to build a neurally plausible implementation goal of optimal recall: represent distribution over patterns given information in the weights and recall cue (Sommer Dayan 1998; Lengyel et al 2005) sampling-based representation of the full distribution over patterns alternative representations possible (mean-field) since t unknown, marginalize over all possible pattern ages input present throughout recall error (%) 0 50 100 150 0 10 20 30 40 t known t unknown control sampling-based representation 0.05 0.1 10 0 10 1 10 2 10 3 0 0.2 0 0 0.2 0.4 0.6 strong n=5 W=0 W=1 10 0 10 2 10 4 10 6 10 −10 10 −5 10 0 n=5 n=10 n=15 n=25 n=1 discrete state-synapses example: cascade model (Fusi et al 2005) palimpsest memory: memory trace is a perturbation away from the stationary distribution of synaptic weights; it decays over time back to baseline signature of a memory depends on age -- P D pre post Idea: joint Gibbs in (x,t) space hits false alarms model 0 0.2 0.4 0.6 0.8 1 1 0 0.2 0.6 0.4 odour list recognition in rats 0.8 (Fortin et al 2004) false alarms 0 0.2 0.4 0.6 0.8 1 controls amnesics recognition memory ROCs from amnesics and aged-matched controls 0 0.2 0.4 0.6 0.8 1 (Yonelinas et al 1998) false alarms 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.04 0.08 0.12 0 10 20 30 40 recollection: average entropy familiarity: estimated t 0.1 0.3 0.5 0.7 0.9 novel familiar GP classifier familiarity-based possible decision layer implementations 1 100 200 0 0.05 neuron index Tth novel familiar the recognition performance still depends on the recollection module (indirectly) recognition relies on both familiarity and recollection (Fortin et al 2004, Yonelinas 2001, etc) familiarity is more important for recent memories (Fortin et al 2004) future work: a generative model for recognition modeling hippocampal/perirhinal lesions both rec fam 0 50 100 * * fam rec 80 90 100 * recent patterns only all patterns a prior for t that is closer to experiments 50 100 150 200 0 0.05 0.1 t 0 ? total current to a neuron recurrent input external input t unknown control = feedforward network, no information from recurrent weights

Transcript of Two is better than one: distinct roles for familiarity and...

Two is better than one: distinct roles for familiarityand recollection in retrieving palimpsest memories

Cristina Savin , Peter Dayan , Máté LengyelComputational and Biological Learning, Department of Engineering, University of Cambridge

AUTOASSOCIATIVE MEMORY NETWORK

APPROXIMATELY OPTIMAL RETRIEVAL: MONOLITHIC SYSTEM

PLASTICITY IN A SINGLE SYNAPSE

This work was supported by the Wellcome Trust and the Gatsby Charitable Foundation (PD).

Computational andBiological Learning

Starting point:

Gatsby Computational Neuroscience Unit, University College London

1

2

1 12

APPROXIMATELY OPTIMAL RETRIEVAL: DUAL SYSTEM

t unknown at recall !

wea

k

A LINK TO BEHAVIOUR: RECOGNITION MEMORY

t known dual system

0

5

10

15

singlemodule

control

erro

r (%

)

familiarity module

recollection module1 100 200

0

0.05

neuron index

activ

atio

n

0 50 100 150

10−1100

101

t

familiarity signalc

performance improvement because it is easier to explore the space of possible patterns (as in auxiliary variable methods)

these benefits are not restricted to a sampling based representation familiarity = pattern age here, but generalization is possible (c is a sufficient statistic, gives overall expected signal strength)

alternative representation(mean-field)

task: has the current pattern been seen before?

in the context of palimpsest memories = has thispattern been seen recently?

an additional decision layer estimates theprobability that the current pattern is familiar

Biologically-plausible synapses exhibit the palimpsest property: memory traces decay over time, overwritten by ongoing plasticity. This makes retrieval from memory difficult. Here, we propose a probabilistic framework for optimal retrieval from palimpsest synapses. We compare two possible solutions for dealing with the unknown pattern age: 1. integrating it out as an unknown quantity or 2. estimating it in a separate subsystem, in which age is treated asan auxiliary variable. We show that the dual system, inspired by hippocampal and perirhinal cortex connectivity, exhibits significantly better performance. In a recognition task, the same system exhibits several characteristics similar to human and animal experiments.Our results suggest a normative motivation for dual module models for recognition memory, as being beneficial even when the task to solve is recollection.

dual system

singlemodule

erro

r (%

)

0

5

10

15

t known

control

...

all-to-all connectivity (can be relaxed)recurrent network of excitatory neurons

binary patternsrecall cue = noisy version of original pattern

storage retrievalpattern to

efficacies

cue

retrieved pattern

statessynaptic synaptic

retrieve

10 20 30 40 500

0.20.40.60.8

1

recurrent input

prob

abili

ty o

f firi

ng

external input = 0external input = 1

t known local complex 0

5

10

15

control

dynamics

erro

r (%

)

dynamics

Gibbs sampling does poorly because the posterioris multimodal/correlated (’frustrated’)

complex dynamics (tempered transitions, Neal 1996)work, but hard to build a neurally plausibleimplementation

goal of optimal recall: represent distribution over patterns given information in the weightsand recall cue (Sommer Dayan 1998; Lengyel et al 2005)

sampling-based representation of the fulldistribution over patterns alternative representations possible (mean-field)

since t unknown, marginalize over all possiblepattern ages

input present throughout recall

erro

r (%

)

0 50 100 1500

10

20

30

40 t knownt unknowncontrol

sampling-basedrepresentation

0.05 0.1100 101 102 1030

0.2

0 0 0.2 0.4 0.6st

rong

n=5

W=0 W=1

100 102 104 10610−10

10−5

100

n=5n=10

n=15n=25

n=1discrete state-synapsesexample: cascade model (Fusi et al 2005)

palimpsest memory:memory trace is a perturbation away from the stationary distribution of synaptic weights; it decays over time back to baselinesignature of a memory depends on age

- -PD

pre

post

Idea: joint Gibbs in (x,t) space

hits

false alarms

model

0 0.2 0.4 0.6 0.8 1

1

0

0.2

0.6

0.4

odour list recognition in rats

0.8

(Fortin et al 2004)false alarms

0

0.2

0.4

0.6

0.8

1

controls

amnesics

recognition memory ROCs from amnesicsand aged-matched controls

0 0.2 0.4 0.6 0.8 1

(Yonelinas et al 1998)

false alarms0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

0 0.04 0.08 0.120

10

20

30

40

recollection: average entropy

fam

iliar

ity: e

stim

ated

t

0.1

0.3

0.5

0.7

0.9novel

familiar

GP classifier familiarity-based

possible decision layer implementations

1 100 2000

0.05

neuron index

Tth

novel

familiar

the recognition performance still depends on the recollectionmodule (indirectly)

recognition relies on both familiarity and recollection (Fortin et al 2004, Yonelinas 2001, etc)

familiarity is more important for recent memories (Fortin et al 2004)

future work: a generative model for recognition modeling hippocampal/perirhinal lesions

both recfam0

50

100*

*

fam rec

80

90

100 *

recent patterns onlyall patterns

a prior for t that is closer to experiments

50 100 150 2000

0.05

0.1

t0

?

total current to a neuron

recurrent input external input

t unknown

control = feedforward network, no information from recurrent weights