6!!9 - Computer Science | School of Engineeringmmoll/publications/moll-miikkulainen1997conver... ·...

20
6!!9 Pergamon NeuralNetworks, Vol.10,No.6, pp. 1017-1036,1997 01997 ElsevierScienceLtd.MI rightsreserved Printedin GreatBritain 0893-6080/97$17.00+.00 CONTRIBUTED ARTICLE Convergence-Zone PII: S0893-6080(97)00016-6 Episodic Memory: Analysis and Simulations MARK MOLL1ANDRISTOMIIKKULAINEN2 ‘School of Computer Science, Carnegie Mellon University and ‘Department of Computer Sciences, The University of Texas at Austin (Received3 March1995;accepted11 November1996) Abstract—Hunrun episodic memory provides a seemingly unlimited storage for everyday experiences, and a retrieval system that allows us to access the experiences with partial activation of their components. The system is believed to consist of afast, temporary storage in the hippocampus,and a slow, long-term storage within the neocortex. Thispaper presents a neural network model of the hippocampal episodic memory inspired by Darnusio’s idea of Convergence Zones. The model consists of a layer of perceptual feature maps and a binding layer. A perceptual feature pattern is coarse coded in the binding layer, and stored on the weights between layers. A partial activation of the stored features activates the bindingpattern, which in turn reactivates the entire storedpattem. For many configurationsof the model, a theoretical lower boundfor the memory capacity can be derived, and it can bean order of magnitude or higher than the number of all units in the model, and several orders of magnitude higher than the number of binding-layer units. Computationalsimulationsfirther indicate that the average capacity is an order of magnitude larger than the theore- tical lower bound, and making the connectivity between layers sparser causes an even further increase in capacity. Simulationsalso show that ifntore descriptive bindingpatterns are used, the errors tend to be rnoreplausible (patterns are confused with other similar patters), with a slight cost in capacity. The convergence-zone episodic memory there- fore accounts for the immediate storage and associative retrieval capability and large capacity of the hippocampal memory, and shows why the memory encoding areas can be much smaller than the perceptual maps, consist of rather coarse computational units, and are only sparsely connected to the perceptual maps. 01997 Elsevier Science Ltd. Keywords-Convergence zones,Episodicmemory, Associative memory, Long-termmemory, Content-addressable memory, Memorycapacity, Retrieval errors,Hippocarnpus. 1. INTRODUCTION Human memory system can be divided into semantic memoryof facts,rules,and generalknowledge,and epi- sodic memory that records the individual’sday-to-day experiences(Tulving, 1972, 1983).Episodic memory is characterizedby extremeefficiencyandhighcapacity. Newmemoriesare formedeveryfew seconds,andmany of those persistfor years, even decades(Squire, 1987). Anothersignificantcharacteristicof human memory is Acknowledgements: We thank Greg Plaxton for pointing us to martingale analysis on this problem, and two anonymous reviewers for references on hippocampal modeling, and for suggesting experiments with sparse comectivity. This research was supported in part by NSF grant no. IRI-9309273 and Texas Higher Education Coordinating Board Grant no. ARP-444 to the second author. The simulations were mn on the Cray Y-MP 8/864 at the University of Texas Center for High-Performance Computing. Requests of reprints should be sent to: Risto Miildmlainen, Depart- ment of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, USA; Tel: +1 512471 9571; Fsx: +1 512471 8885; e-mail: [email protected]. content-addressability. Most of the memories can be retrieved simply by activatinga partial representation of the experience,such as sound, a smell, or a visual image. Despite a vast amount of research, no clear under- standinghas yet emerged on exactly where and how the episodic memory traces are represented in the brain. Severalrecent results,however, suggestthat the system consistsof two components:the hippocampus serves as a fast, temporarystoragewhere the traces are createdimmediatelyas the experiencescomein, andthe neocortex has the task of organizing and storing the experiencesfor the lifetime of the individual(Alvarez & Squire,1994;Halgren,1984;Marr, 1971;McClelland etal., 1995;Milner,1989;Squire,1992).It seemsthatthe tracesare transferredfrom the hippocampusto the neo- cortex in a slow and tediousprocess, which may take several days, or weeks, or even years. After that, the hippocampusis no longer necessary for maintaining thesetraces,andtheresourcescanbe reusedforencoding new experiences. 1017

Transcript of 6!!9 - Computer Science | School of Engineeringmmoll/publications/moll-miikkulainen1997conver... ·...

6!!9PergamonNeuralNetworks,Vol.10,No.6, pp. 1017-1036,1997

01997 ElsevierScienceLtd.MI rightsreservedPrintedin GreatBritain

0893-6080/97$17.00+.00

CONTRIBUTED ARTICLE

Convergence-Zone

PII: S0893-6080(97)00016-6

Episodic Memory: Analysis andSimulations

MARK MOLL1 AND RISTOMIIKKULAINEN2

‘School of Computer Science, Carnegie Mellon University and ‘Department of Computer Sciences, The University of Texas at Austin

(Received3 March1995;accepted11 November1996)

Abstract—Hunrunepisodic memory provides a seemingly unlimited storage for everyday experiences, and a retrievalsystem that allows us to access the experiences with partial activation of their components. The system is believed toconsist of afast, temporary storage in the hippocampus,and a slow, long-term storage within the neocortex. Thispaperpresents a neural network model of the hippocampal episodic memory inspired by Darnusio’s idea of ConvergenceZones. The model consists of a layer of perceptual feature maps and a binding layer. A perceptual feature pattern iscoarse coded in the binding layer, and stored on the weights between layers. A partial activation of the stored featuresactivates the bindingpattern, which in turn reactivates the entire storedpattem. For manyconfigurationsof the model, atheoretical lower boundfor the memory capacity can be derived, and it can bean order of magnitudeor higher than thenumber of all units in the model, and several orders of magnitude higher than the number of binding-layer units.Computational simulationsfirther indicate that the average capacity is an order of magnitude larger than the theore-tical lower bound, and making the connectivity between layers sparser causes an even further increase in capacity.Simulationsalso show that ifntore descriptive bindingpatterns are used, the errors tend to be rnoreplausible (patternsare confused with other similar patters), with a slight cost in capacity. The convergence-zone episodic memory there-fore accounts for the immediate storage and associative retrieval capability and large capacity of the hippocampalmemory, and shows why the memory encoding areas can be much smaller than the perceptual maps, consist of rathercoarse computational units, and are only sparsely connected to the perceptual maps. 01997 Elsevier Science Ltd.

Keywords-Convergencezones,Episodicmemory,Associativememory,Long-termmemory,Content-addressablememory,Memorycapacity,Retrievalerrors,Hippocarnpus.

1. INTRODUCTION

Human memory system can be divided into semanticmemoryof facts,rules,and generalknowledge,and epi-sodic memory that records the individual’sday-to-dayexperiences(Tulving, 1972, 1983).Episodicmemoryis characterizedby extremeefficiencyandhighcapacity.Newmemoriesare formedeveryfew seconds,andmanyof those persist for years, even decades(Squire, 1987).Another significantcharacteristicof human memory is

Acknowledgements: We thank Greg Plaxton for pointing us tomartingale analysis on this problem, and two anonymous reviewersfor references on hippocampal modeling, and for suggestingexperiments with sparse comectivity. This research was supported inpart by NSF grant no. IRI-9309273 and Texas Higher EducationCoordinating Board Grant no. ARP-444 to the second author. Thesimulations were mn on the Cray Y-MP 8/864 at the University ofTexas Center for High-Performance Computing.

Requests of reprints should be sent to: Risto Miildmlainen, Depart-ment of Computer Sciences, The University of Texas at Austin, Austin,TX 78712, USA; Tel: +1 512471 9571; Fsx: +1 512471 8885; e-mail:[email protected].

content-addressability.Most of the memories can beretrieved simply by activatinga partial representationof the experience,such as sound, a smell, or a visualimage.

Despite a vast amount of research, no clear under-standinghas yet emerged on exactly where and howthe episodic memory traces are represented in thebrain. Severalrecent results,however, suggestthat thesystem consistsof two components:the hippocampusservesas a fast, temporarystoragewhere the traces arecreatedimmediatelyas the experiencescomein, and theneocortex has the task of organizingand storing theexperiencesfor the lifetime of the individual(Alvarez& Squire,1994;Halgren,1984;Marr, 1971;McClellandet al., 1995;Milner,1989;Squire,1992).It seemsthatthetracesare transferredfrom the hippocampusto the neo-cortex in a slow and tediousprocess, which may takeseveral days, or weeks, or even years. After that, thehippocampusis no longer necessary for maintainingthesetraces,andtheresourcescanbereusedforencodingnew experiences.

1017

1018 I M. h4011and R. J4iikkukainen

Althcmgh severalartificialneural networkmodelsofassociative memoryhave been proposed(Ackleyet al.,1985;/!m ari, 1977,1988;Anderson,1972;Andersonetal., 191‘7; Cooper, 1973; Grossberg, 1983; Gardner-Medwir1, 1976; Hinton & Anderson, 1981; Hopfield,1982, 1984; Kairiss& Miranker,1997;Kanerva,1988;Knapp & Anderson,1984;Kohonen,1971,1972,1977,1989;K:ohonen& Miikisara,1986;Kortge, 1990;Little& Shaw, 1975; McClelland & Rumelhart, 1986b;Miikla.i.ainen, 1992;Steinbuch,1961;Willshawet al.,1969), the fast encoding,reliable associativeretrieval,

capacityof even the hippocampalcomponentof hum memoryhas been difficultto accountfor. For

n the Hopfieldmodelof N units,N141nN pat-terns c:m be stored with a 99% probabilityof correctretrieva1 when N is large (Amit, 1989; Hertz et al.,1991;R:eeler, 1988;McElieceet al., 1986).This meansthat storing and retrieving,for example, 106memorieswouldrequirein the orderof 108nodesand 1016connec-tions, vIhich is unrealistic,given that the hippocamprdformatim in higheranimalssuch as the rat is estimatedto have about 106primaryexcitatoryneuronswith 1010connections (Amaral,Ishizuka,& Claibome,1990),andthe entire human brain is estimatedto have about 1011neurons~and 1015synapses(Jessell,1991).

‘l%es~earliermodelshad a uniform,abstractstructureandwe]“enotspecificallymotivatedby anyparticularpartof thehumanmemorysystem.In thispaper,anew modelfor asscwiativeepisodicmemoryis proposedthat makesuse of 1.hreeideas abouthow the hippocampalmemory“mighttK put together.The model abstractsmost of thelow-level biologicalcircuitry,focusingon showingthatwith a biologicallymotivated overall architecture,anepisodi; memorymodelexhibitscapacityand behaviorvery silnilar to that of the hippocampalmemorysystem.Thethr>ecentralideasare:(1)value-unitencodingin theinputfeaturemaps,(2)sparse,randomencodingof tracesin the 1uppocampus,and (3) a convergence-zonestruc-ture bel,weenthem.

Sine: the input to the memory consists of sensoryexperierice,in the modelit shouldhave a representationsimilarto theperceptualrepresentationsin thebrain.Thelow-level sensory representationsare organized intomaps, that is, similar sensoryinputsare representedbynearbylocationson the corticalsurface(Knudsenet rd.,1987).:1 is possiblethatalsohigher-levelrepresentationshave a map-likestructure.This is hard to verify,but atleast tkere is plenty of supportfor value-unitencoding,that is, that the neuronsrespondselectivelyto only cer-tain tyIEs of inputs, such as particularfaces, or facialexpressions,or particularwords (Hassehnoet al., 1989;Heit et al., 1989;Rolls, 1984).

The structureof thehippocampusis quitewellknown,and ref:ently its dynamicsin memory processinghavealSO bxm observed. Wilson & McNaughton (1993)found %at rats encode locations in the maze throughensemtdes of seemingly random, sparse activation

patterns in the hippocampalarea CA1. When the ratexploresnew locations,new activationpatternsappear,and when it returns to the earlier locations, the samepattern is activatedas during the firstvisit. O’Reilly&McClelland(1994)showedthat the hippocampalcircui-try is well-designedto form such sparse,diverseencod-ings, and that it can also perform pattern completionduringrecall.

Damasio(1989a,b)proposeda generalframeworkforepisodicrepresentations,based on observationsof typi-cal patterns of injury-relateddeficits.The idea is thatthere is no multi-modalcortical area that would buildan integrated and independent representation of anexperiencefrom its low-level sensory representations.Instead,the representationtakes place only in the low-levelcortices,with the differentpartsboundtogetherbya hierarchyof convergencezones.An episodicrepresen-tation can be recreated by activatingits correspondingbindingpatternin the convergencezone.

The convergence-zoneepisodic memory model islooselybased on the above three ideas. It consistsof alayerof perceptualmapsanda bindinglayer.Anepisodicexperienceappearsas a patternof localactivationsacrossthe perceptualmaps,and is encodedas a sparse,randompatternin thebindinglayer.Theconnectionsbetweenthemapsand thebindinglayerstoretheencodingin a singlepresentation,andthecompleteperceptualpatterncanlaterbe regeneratedfrompartialactivationof theinputlayer.

Many details of the low-level neural circuitry areabstractedin the model.The units in the model corre-spond to functionalcolumns rather than neurons andtheir activation levels are represented by integers.Multi-stageconnectionsfrom the perceptual maps tothe hippocampusare modeledby directbinary connec-tions that are bidirectional,and the connectionswithinthehippocampusare not takenintoaccount.At this levelof abstraction,thebehaviorof themodelcanbe analyzedboth theoretically and experimentally, and generalresultscan be derivedaboutits properties.

A theoreticalanalysisshows that: (1) with realistic-sizemapsand bindinglayer, the capacityof the conver-gence-zonememorycan be very high, higher than thenumberof units in the model,and can be severalordersof magnitudehigher than the number of binding-layerunits;(2) the majorityof the neuralhardwareis requiredin the perceptualprocessing;the bindinglayer needs tobe onlya fractionof the sizeof the perceptualmaps;and(3) the computationalunits could be very coarse in thehippocampusand in the perceptualmaps; the requiredcapacityis achievedwith a very smallnumberof suchunits. Computationalsimulationsof the model furthersuggest that: (1) the average storage capacity may bean orderof magnitudehigherthan the theoreticallowerbound; (2) the capacity can be further increased byreducingthe connectivitybetweenfeaturemapsand thebinding layer, with best results when the connectivitymatches the sparsenessof the binding representations;

..—

Convergence-Zone Episdoci Memory: Analysis and Simulations 1019

BindinsImer

————Fe@reMsp 1 FeatureMSp2 FeetureMsp 3 FestureMsp4

FIGURE1. Storage.The weightson the connectionsbetweentheappropriatefeature units and tha binding represantstionof thepatternsre set to 1.

and(3) if thebindingpatternsfor similarinputsaremademore similar, the errors that the model makes becomemore plausible:the retrievedpatternsare similarto thecorrect patterns. These results suggest how one-shotstorage, content-addressability,high capacity, androbustnesscould all be achieved within the resourcesof the hippocampalmemorysystem.

2. OUTLINE OF THE MODEL

The convergence-zonememory model consistsof twolayers of real-valuedunits (the feature map layer andthe bindinglayer), and bidirectionalbinaryconnectionsbetweenthe layers(Figure1).Perceptualexperiencesarerepresentedas vectorsof featurevalues,suchas color=red,shape = round,size= small.Thevrduesareencodedas unitson the featuremaps.Thereis a separatemap for each featuredomain,and each unit on the maprepresents a particular value for that feature. Forinstance, on the map for the color feature, the valueredcould be specifiedby turning on the unit in thelower-rightquarter (Figure 1). The feature map unitsare connected to the binding layer with bidirectionalbinary connections(i.e. the weightis eitherOor 1). Anactivation of units in the feature map layer causes anumberof units to become active in the bindinglayer,and viceversa.In effect,the bindinglayeractivationis acompressed, distributed encoding of the perceptualvalue-unitrepresentation.

Initially,allconnectionsare inactiveat O.A perceptualexperienceis stored in the memorythroughthe featuremap layer in three steps.First, thoseunits that representthe appropriatefeaturevaluesare activatedat 1.Second,a subsetof m bindingunitsare randomlyselectedin thebindinglayeras thecompressedencodingfor thepattern,and activatedat 1. Third, the weightsof all the correc-tionsbetweentheactiveunitsin the featuremapsandtheactiveunits in the bindinglayer are set to 1 (Figure 1).Note that only one presentationis necessaryto store apatternthis way.

To retrievea pattern,firstallbindingunitsare set to O.The pattern to be retrieved is partially specifiedin thefeaturemaps by activatinga subsetof its featureunits.For example,in Figure2a the memoryis cued with thetwo leftmostfeatures.The activationpropagatesto thebinding layer through all connectionsthat have beenturnedon sofar. The setof bindingunitsthat a particularfeatureunitturnson is calledthebindingconstellationofthatunit.Allbindingunitsin thebindingencodingof thepatternto beretrievedareactiveat 2 becausetheybelongto the binding constellation of bothretrievalcueunits.Anumberof other units are also activatedat 1, becauseeachcueunittakespart in representingmultiplepatterns,and therefore has several other active connectionsaswell. Only those units active at 2 are retained; unitswith less activationare turnedoff (Figure2b).

The activationof the remainingbindingunits is thenpropagatedbackto the featuremaps(Figure2c).A num-berof unitsare activatedat variouslevelsin eachfeaturemap, dependingon how well theirbindingconstellationmatchesthecurrentpatternin thebindinglayer.Chancesare that the unit that belongsto the samepattern as thecues has the largest overlap and becomesmost highlyactivated. Only the most active unit in each featuremapis retained,andas a result,a complete,unambiguousperceptual pattern is retrieved from the system(Figure2d).

If thereare n unitsin thebindinglayerandm unitsarechosenas a representationfor a pattern, the numberofpossibledifferentbindingrepresentationsis equalto

()n

m

If n is sufficientlylarge and m is relativelysmall com-pared to n, this number is extremelylarge, suggestingthat the convergence-zonememory could have a verylargecapacity.

However,dueto theprobabilisticnatureof the storageand retrievalprocesses,thereis alwaysa chancethat theretrieval will fail. The binding constellationsof theretrievalcueunitsmay overlapsignificantly,and severalspuriousunits may be turned on at the binding layer.When the activationis propagatedback to the featuremaps, some randomunit in a feature map may have abindingconstellationthatmatchesthespuriousunitsverywell (Figure3). This rogue unit may receivemore acti-vation than the correctunit, and a wrong featurevaluemay be retrieved.As morepatternsare stored,the bind-ing constellationsof the featureunitsbecomelarger,anderroneousretrievalbecomesmore likely.

To determinethe capacity of the convergence-zonememory,thechanceof retrievalerrormustbe computed.Below,a probabilisticformulationof the model is firstgiven, and a lower bound for the retrieval error isderived.

1020

(a)

[Fea(

(b)

[Feaf

FIGURIactivatSctivat

Let Ziunitdincrea

m.;;e nerandol

[Feat

FIGURI

moreu

EIFeatwe Map4

X-.L-L--IeMSP1 F--2 FeatureMap3 F-tare MSP4

(c)

ill hloll and R. Miikkulainen

BindiMLas’er-.

————FeatureMap1 Featm Map2 Featurehiap3 FeatureMap4

(d) BindingLayer

l_=_-

I_._-.k■ ■ ■

r_’_’lr_l r”_ lr—————

LK--_-luFeaturehlap1 FeetnreMap2 FeatureMap3 FeatureMSp4

L Ratrlaval. A stored pattern la ratriavad by presentinga partial rapraaantationaa a cue. The eke of the square indicatesI level of the unit. (a) Ratrlavalcuee activatea bhtdlngpattam. (b) Laaa active blndlng unite are tumad M. (c) Blndlngpatternfeature units. (d) Lass active faatum unlta are turned off.

. PROBABILISTICFORMULATION

>thesizeof thebindingconstellationof a featurer i patternshavebeenstoredon it andlet Yibe itsafter storingthe ith patternon it. Obviously,Y1obtainthedistributionof Yiwheni >1, notethatactiveconnectionsbelongto the intersectionof ay chosen subsetof m comections among all n

comectionsof the unit, and its remaininginactivecon-nections(a set with n – zi–l elements,wherezi-l is thebinding constellationat the previous step). Therefore,Yi$i >1 is hypergeometricallydistributed(AppendixAl) with parametersm, n – zi-l, and n:

P(Yi=ylZi -1 = Zi- 1)=(n-H(::Y)/()

BhdtngLayer (1)

k!lm!MeII1 FeatureMap2 FeatureMaP3 FeahII’chhP4

LErroneousratrleval.A roguafeature unit Is retrieved,# the correct one, whan its blndlng conatallatlon haeb in common with the lnteraactlon oi the ratrlavalcuetlonsthan the bindingconstallatlonof the correctunit.

The constellationsizeZi is then givenby

z(2)

k= 1

Let Zbe the numberof patternsstoredon a particularfeature unit after p random feature patternshave beenstoredin the entire memory.Zis binomiallydistributed(AppendixAl) withparametersp and 1~ where~is thenumberof units in a featuremap:

Let Z be the bindingconstellationof a particularfea-tureunitafterp patternshavebeen storedin thememory.It can be shown(AppendixA.2) that

Convergence-Zone Episdoci Memory: Analysis and Simulations 1021

“z)=n(’-(’-ap)ad‘4)

‘Ur(z)=n(%)p(+ap)(+n(lz–1) 1–

)

m(2n – m – 1) p(5)

n(n – 1)~ “

Initially,when no patternsare stored, the bindingcon-stellationis zero and it will convergeto n as more pat-ternsare stored(sinceO< 1– # < 1).Becausethebasesof the exponential in the varianceof Z are smallerthan1, the variancewill go to zero whenp goes to infinity.Therefore, in the limit the binding constellationwillcoverthe entirebindinglayer with probability1.

The bindingconstellationof a featureunit, giventhatat leastonepatternhasbeenstoredon it, is denotedby ~.This variable representsthe binding constellationof aretrievalcue, which necessarilymust have at least onepatternstoredon it (assumingthat the retrievalcues arevalid).Theexpectedvalueandvarianceof~ followfrom(4) and (5):

‘(z)=m+(n-m)(+a’)md“)Var(2)

‘@-mH’-l(l-@-m(l-ap-’)(+(n–m)(n–m–1) 1–

)

m(2n – m – 1) ‘–1n(n – 1)~ “

(7)

Note that the expectedvalue of ~ is alwayslarger thanthatof Z. Initiallythe differenceis exactlym, and it goesto zero asp goes to infinity(because2 alsoconvergeston).Leti?!be the bindingconstellationof thejth retrieval

cueandletXjbe thenumberofunitsin theintersectionofthe tirstj retrievalcues.Clearly,Xl =21 = m. To getXjforj >1, we removefrom considerationthem unitsallretrievalcuesnecessarilyhavein common(becausetheybelongto the same storedpattern),and randomlyselect2j – m units from the remainingset of n – m units andseehow manyof thembelongto the currentintersectionof xj–l – m units.This is a hypergeometricdistributionwith parametersi] —m, xj–1– m, and n — m:

~(xj=Xjlzj=~j,Xj–1 ‘Xj - 1)

The sizeof the totalbindingconstellationactivateddur-ing retrievalis obtainedby takingthis intersectionoverthe bindingconstellationsof all c retrievalcues.

The numberof units in commonbetweena potentialrogueunitandthe c retrievalcuesis denotedby RC+landis also hypergeometricallydistributed,however withparametersz, XC,and n, becausewe cannotassumethatthe rogueunit has at least m units in commonwith thecues:

P(RC+ ~= rlZ = Z,XC = xc) =

(’3(:3/(3

(9)

The correctunit in a retrievalmap (i.e. in a featuremapwhere a retrieval cue was not presented and where afeaturevalueneedsto be retrieved)will receivean acti-vationX.+l, becauseit also has at least m units in com-mon with the retrieval cues. The correct unit will beretrieved if XC+1> Rc+ 1. Now, XC+l and RC+l differonly in the last intersectionstep, where XC+]dependson ~ and XC,and Rc+l depends on Z and XC.Sincel?(~)> E(Z)((4) and (6)), E(XC+1)> E(R~+1),~d thecorrectunit will be retieved mostof the time, althoughthis advantage gradually decreases as more patternsare stored in the memory. In each feature map thereare (~ – 1) potential rogue units, so the conditionalprobability of successful retrieval is (1 – P(RC+l>xc+~Ixc+1,Z,xc))(f-1)>not addressing tie-breaking.Unfortunately,it is very difficult to compute p~wce~~,the unconditionalprobability of successful retrieval,because the distributionfunctionsof Z, XC,XC+landRC+l are not known. However,it is possibleto derivebounds for p,ua$, and show that with reasonablevaluesfor n, m, f, andp, the memoryis reliable.

4. LOWER BOUNDFOR MEMORYCAPACITY

Memorycapacitycan be definedas the maximumnum-ber of patternsthat can be storedin the memoryso thatthe probabilityof correctretrievalwith a givennumberof retrievalcues is greaterthana (a constantclose to 1).In thissection,a lowerboundforthechanceof successfulretrievalwill be detived.The analysisconsistsof threesteps:(1) boundsfor the numberof patternsstoredon afeatureunit;(2)boundsforthebindingconstellationsize;and (3)boundsfor the intersectionsof bindingconstella-tions.Givenparticularvaluesfor the systemparameters,and ignoring dependenciesamong constellations,it isthen possibleto derive a lower bound for the capacityof the model.

4.1. Numberof PatternsStoredon a FeatureUnit

SinceZis binomiallydistributed(withparametersp and

1022

I/f), (

(PI

(PI

Thesedl~ adeterrboundboundhandSolve(the nu

il

Giventure U

j,=

~moffbounds(AppendixAl) can be applied:

(lo)

uationsgive the probabilitythat Zis more thani32foff its mean. The parameters 61 and 62e the tradeoff between the tightness of theand the probability of satisfying them. Ifre desiredwith a givenprobabilityf?,the rightss of (10) and (11) are made equal to /3 andmi$land 62. The lower and upper bound form of patternsstoredon a featureunit then are

{

(1– ~1)~if a solutionfor 61exists(12)

Ootherwise

iu= (1+ ~z)~. (13)

it at least one patternhas been storedon a fea-the boundsbecome

1+(1 -61)P; “If a solutionfor til exists

1 otherwise(14)

i = 1+(1+ 62)P4.f

*4.2.Si of the BindingConstellation

(15)

t

Instead of choosing exactly m different binding unitsfor the indingconstellationof a featuremap unit, con-siderth processof randomlyselectingknot-necessarily-distinctunits in such a way that the expectednumberofdiffere t unitsis m.This willmakethe analysiseasieratthe cos of larger variance,but the boundsderivedwillalsobe validfor the actualprocess.To determinek,notethat thenumberof unitsthatdo notbelongto thebindingreprese tation is equalton – m on average

()~k

n l—– = n —m.n

dSolvin fork, we get

(16)

~= inn – ln(n– m)inn –ln(n–1)”

(17)

$Note th t k is almostequaltom for largen.Let assumei patternsare storedon the featuremap

M. h4011and R. iUiikkulainen

unit, which is equivalentto selectingki units from thebindinglayerat random.Let ~E be the expectedsize ofthe finalbindingconstellation,estimatedafter v bindingunitshavebeen selected.Then

@=z’,+@-z’+(l-Y-()

~ki-v

=n – (n –Z’v) 1– ; , (18)

whereZ’Vis the sizeof the bindingconstellationformedby the first v selectedunits. ObviouslyZ’v is equal toZ’,-l or exactl one larger, and the expected increase

Jof Z’v is 1– ~. Since ~E–1 depends stochasticallyonlyon Z’v–1,the expectedvalueof ~E, given~E–1,is

= J!@lz’,- ~= Z’v- ~)

‘n-(n-z’,-d+))(+’-’()

~ ki-v+ 1

= n –(n–Z’v–1) 1– ;

=.$_,. (19)

Therefore,E(AEIZ$E.~)=zE- ~and the sequenceof vari-ables GE,..., Z: is a martingale (see Appendix A.3).Moreover, it can be shown (Appendix A.4) that!~E–~B-ll s 1, so that bounds for the final bindingconstellationZ can be obtainedfrom Azuma’sinequal-ities. For the lower bound, the martingale2$E,...,Z~,(with length kil) is used, and for the upper bound,Z9 ...7Z~U(with lengthkiu). Using(18) and notingthatZ=Z~, for the lower bound and Z=Z~ti for the upperbound,Azuma’sinequalitiescan be writtenas

‘P(z+(+)’’)-’fi)(20)

Se -h’n,A>0. (21)

After derivinga value for Abased on the desiredcon-fidencelevel& thefollowinglowerandupperboundsforZ are obtained:

“=n(+r)- ’22)

Convergence-Zone Episdoci Memory: Analysis and Simulations

‘u=n(l-(1-3ki”)+A@’23)The correspondingboundsfor ~ are

1023

After derivinga value for Xbased on the desiredcon-fidence,the followingupperboundfor Xj is obtained:

(Xj- I,. – m)(~u– m)Xj,~= m+

(n – m)+A/Xj-~,~ –m. (31)

This bound is computedrecursively,with xl,. = m.Comparingwith the probabilisticformulationof Section

~u=n(l-fl-:~k’”l+~lfi. (25)

3, note that

(xi- ~.U– m)(.Zu– m)[ [ n) ) v ‘“

4.3. Intersection of Binding Constellations

The process of forming the intersectionof c bindingconstellationsincrementallyone cue at a time can alsobe formulatedas a martingaleprocess.To see how,con-siderthe processof formingan intersectionof two sub-sets of a common supersetincrementally,by checking(one at a time) whether each element of the first setoccurs in the secondset. Assumethat v elementshavebeen checked this way. Let X’, denote the numberofelementsfound to be in the intersectionso far, and X:the currentlyexpectednumber of elementsin the finalintersection.Then

XE=X, + (nl–v)(n2–X’V)v v (26)

n$—v ‘

wherenl, nz and n, are the sizesof the first,second,andthe superset.As shownin AppendixA.5, the sequencex;, ..., X:, is a martingale.In addition,if nl + nz – 1<n,, IX:–X~-ll = 1, and Azuma’s inequalitiescan beapplied.

The aboveprocessappliesto formingthe intersectionof bindingconstellationsof retrievalcueswhenthe inter-sectionin the previousstep is chosenas the firstset, thebindingconstellationof thejth cue as the secondset, thebindinglayeras thecommonsuperset,andthem unitsallretrieval cues have in common are excludedfrom theintersection.In this case

nl ‘Xj– 1,U — m (27)

nz = Zu—m (28)

n~= n —m, (29)

wherexj–l,Uis an upperboundforXj–l.Azuma’sinequal-ity can be appliedifxj–l, U+ 2U– 1< n (whichneedstobe checked).Using(27)–(29)in (26)andnotingthatX~,= Xj – m, Azuma’sinequalitybecomes

P(x:’ 2X: – A@)=

( (Xj_1,. – m)(%–‘) + ~~xj-, ~ – mP Xj km+

(n – m) )

< e–A2n, A > ().— (30)

(n – m)

is the expectedvalueof the hypergeometricdistributionderivedfor Xj (8) when ~j and Xj-l are at their upperbounds.

As the last step in the analysisof bindingconstella-tions, the boundsfor XC+landR.+lmust be computed.When X. is at its upper bound, the intersectionis thelargest,and a potentialrogueunithas the largestchanceof taking over. In this case, a lower bound for XC+lisobtainedby carrying the intersectionprocess one stepfurther,and applyingAzuma’sinequality:

(px ~~ m+ (xc,u–m)(?j –m)c+

)– AJ~rn

(n – m)

(32)

whichresultsin

x +1~=m+ (xc,u–m)(.Z1–m)c, – A&=7il. (33)

(n – m)

If the resultinglowerboundis smallerthan m, m can beused instead.Similarly,to get the upperboundfor R.+l,one more intersectionstep needs to be carried out, butthis time the m unitsare not excluded:

( JpR.+ 1 z ‘-+ A& s e- A2’2,A > (), (34)n

and the upperboundbecomesxc,u Zu—+ AK.‘C+1,U= ~ (35)

4.4. DependenciesBetweenBindingConstellations

Strictlyspeaking,the aboveanalysisis valid only whenthebindingconstellationsofeachcueare independent.Ifthe same partialpattern is stored many times, the con-stellationswill overlap beyond the m units that theynecessarily have in common. Such overlap tends toincreasethe size of the finalintersection.

In mostcasesof realisticsize,however,the increaseisnegligible.Thenumberof featuresVin commonbetweentwo randompatternsof c featureseach is given by thebinomialdistribution:

1024 M. A4011and R. Miikkulainen

$The ch ce that two randompatternsof c featureshavemore an one featurein commonis

P(v > )=1 –P(V=O)– ZJ(V=1)F

~DÊ‹‘l-(l-;)C-C(;)(l-;)C-’>’37)which~anbe rewrittenas

( a(+)’ ’38)ll(v>l)=l– 1+

FThisch ceisnegligible forsufficientlylargevaluesoffFor ex pie, already when~ = 5000 and c = 3, thechance is 1.2 X 10–7,and can be safely ignoredwhencompu$nga lowerboundfor the capacity.

~

4.5.0 taining the LowerBound

It is n possibleto use (10)-(15), (17) and (20)-(25)and(3 )–(35)to derivea lowerboundfor theprobabilityof suc essful retrievalwith given systemparametersn,m, f, t, c, and p, where tis the total numberof featuremaps. e retrieval is successful if rC+l,U,the upperbound for R.+l,is lower than x~+l,~,the lower boundfor XC.~l. Under this constraint, the probability thatnone o:: the variablesin the analysisexceedsits boundsis a lovverboundfor successfulretrieval.

Obtaining the upper boundfor Xcinvolvesbounding‘c – 1 variables:Zand ~ for the c cues and Xcfor thec—l intersections. Computing XC+l,land r=+l,~eachinvolvet bounding3 variables(Z,Z, and Xc+l;Z,~, andR.+1).‘~ ere are t – c maps,each with onexC+l,lboundand f -- 1 different r.+l,. bounds (one for each rogueunit). :%e total number of bounds is therefore3C– 1+ ‘f( t – c). Setting the righthand sides of theinequa..ities (10), (11), (20), (21), (30), (32) and (34)equalttoa smallconstant/3,a lowerboundfor successfulretriewJ is obtained:

Psuccess > U – 6)3’-1‘3f(’-c)7 (39)

which,lforsmall13,can be approximatedbyI p,ucce,,>1 – (3’–1+‘f(t–‘))/3. (40)

bOn e other hand, if it is necessaryto determinea

1

lower und for the capacityof a model with given n,m,f, t, and c at a givenconfidencelevelp,-, /3is firstobtain d from (40),and the numberof patternsp is thenincre until one of the bounds(10), (11), (20), (21),(30), ( 2) or (34) is exceeded,or rc+l,ubecomesgreaterthanx +1,1.

5. EX+MPLE:MODELINGTHE HIPPOCAMPALMEMORYSYSTEM

As an Iexample,let us apply the above analysisto thehippodampalmemory system.It is difficultto estimatehow camrsethe representationsare in sucha system,andhowmanyeffectivecomputationalunitsandconnections

thereshouldbe.Thenumbersofneuronsandconnectionsin the rat hippocampalformationhave been used as aguidelinebelow. Althoughthe human hippocampusiscertainly larger than that of the rat, the hippocampus,being phylogeneticallyone of the oldest areas of thebrain,is fairlysimilaracrosshighermammalsandshouldgive an indicationof the ordersof magnitudeinvolved.More importantly,the convergence-zonemodel can beshownto applyto a widerangeof theseparameters.Twocases at opposite ends of the spectrum are analyzedbelow: one where the number of computationalunitsand connectionsis assumedto be limited, and anotherthat is based on a large number of effectiveunits andconnections.

5.1. A Coarse-GrainedModel

Firstnote that each unit in the modelis meantto corre-spondto a verticalcolumnin the cortex.It is reasonableto assume feature maps with 1Os of such columns(Sejnowski& Churchland,1989).Each input activatesa local area on the map, includingperhaps102columnsabovethethreshold.Therefore,thefeaturemapscouldbeapproximatedwith 104computationalunits.Therewouldbe a minimumof perhaps4 suchmaps,of which3 couldbe used to cue the memory.

Thereare some 1Osprimaryexcitatorycells in the rathippocampalformation(Arnaralet al., 1990;Bosset al.,1985,1987;Squireet al., 1989).If we assumethat func-tionalunits contain 102of them, then the model shouldhave 104bindingunits.Onlyabout0.5–2.5%of thehip-pocampal neurons are simultaneouslyhighly active(0’Reilly& McClelland,1994),so a bindingpatternof102unitswouldbe appropriate.Assurning that all com-putationalunits in the featuremaps are connectedto allunitsin thehippocampus,thereare a totalof 108afferentconnectionsto thehippocampus,andthenumberof suchconnectionsper verticalcolumnin the featuremapsandper excitatoryneuronin the hippocampusis 102,both ofwhich are small but possiblenumbers (Arnarrdet al.,1990).

If we selectf= 17000,n = 11500,m= 150,andstore1.5 x 104patternsin the memory,2*and Xj-l,u are lessthm (.@, the chance of partial overlapof more than 1featureis lessthan 1.04X 10-8,andtheanalysisaboveisvalid.Settingd = 1.96 X 10-7 yieldsboundsrc+l,u<Xc+l,lWithp~-~~ > 99%.In other words, 1.5 X 104tracescan be storedin the memorywith99%probabilityof successfulretrieval.Sucha capacityis approximatelyequivalentof storing one new memory every 15s for4 days, 16h a day, which is similarto what is requiredfrom the humanhippocampalsystem.

5.2. A Fine-GrainedModel

It is possiblethat a lot moreneuronsand connectionsareinvolved in the hippocampal memory system than

Convergence-Zone Episdoci Memory: Analysis and Simulations 1025

assumedabove.For example,let us assumethat each ofthe verticalcolumnsin the featuremapsis computation-ally distinctive,that is, there are 106units in the featuremaps.Let us furtherassumethat the systemhas 15fea-ture maps, 10of whichare used to cue the memory,andthe bindinglayerconsistsof 105units,with 150usedforeach binding pattern. Assuming full connectivitybetween the feature units and the binding units, thereare 1.5 X 1012connectionsin the system,whichmightbe possibleif it is assumedthat a large numberof col-lateralsexiston the inputsto the hippocampus.

Applyingthe above analysisto this memoryconfig-uration, 0.85 X 108patterns can be stored with 9970probabilityof successfulretrieval. In this case, :U andxj–l,vare less than ($n, the chanceof partialoverlapofmorethan 1featureis lessthan0.45 X 10–10,and setting8 = 0.5X 10-’ yieldsboundsrc+l,u<x.+,,, withp,uu.,,> 99%.In otherwords,a newtracecouldbe storedevery15s for 62 years, 16h a day, without much memoryinterference.

Thiskindof capacityis probablyenoughfor theentirehuman lifetime, and exceeds the requirementsfor thehippocampal formation. With such a capacity, therewouldbe no need to transferrepresentationsto the neo-corticalmemorysystem.One conclusionfrom this ana-lysisis that thehippocampalformationis likelyto haveamorecoarse-grainedthanfine-grainedstructure.Anotherconclusionis that it is possiblethat the neocorticalmem-orycomponentmayalsobe basedon convergencezones.Theresultis interestingalsofromthe theoreticalpointofview,becausethe lowerboundis an orderof magnitudehigherthan the numberof units in the system,and threeordersof magnitudehigher than the numberof bindingunits. To our knowledge,this lower bound is alreadyhigherthanwhatis practicallypossiblewithotherneuralnetworkmodelsof associativememoryto date.

6. EXPERIMENTALAVERAGECAPACITY

The analysisabovegivesus a lowerboundfor the capa-city of the convergence-zonememory;the average-casecapacitymay be much higher.Althoughit is difficulttoderivetheaveragecapacitytheoretically,an estimatecanbe obtainedthroughcomputersimulation.Notall config-urations of the model can be simulated,though. Themodel has to be small enough to fit in the availablememory,while at the same time fulfillingthe assump-tionsof theanalysisso thatlowerboundscanbe obtainedfor the samemodel.

To findsuchconfigurations,firstthe featuremappara-meters~,t, and c, and the confidencelevelP are fixedtovaluessuchthat,....,, = 99%(40).Second,a valueforn is chosen so that the model will fit in the availablememory.The connectionstake up most of the memoryspace(evenif eachconnectionis representedby onebit)and the amountof memoryallocatedfor the featuremapand the bindinglayer activations,the array of patterns,

and the simulationprogramitself is negligible.Finally,the sizeof thebindingpatternm andthe maximumnum-berofpattemsp is foundsuchthatthe theoreticalboundsyieldrC+l,U< XC+l,Iandthepartialoverlapis negligible.In the models studied so far, the highest capacity hasbeen obtainedwhen m is only a few percentof the sizeof the bindinglayer, as in the hippocampus.

The simulationprogramis straightforward.The acti-vationsin eachmaparerepresentedas arraysof integers.The connectionsbetweena featuremap and the bindinglayerare encodedas a two-dimensionalarrayof bits,onebit for each connection.Before the simulation,a set ofp- randompatternsare generatedas a two-dimensionalarrayofpw X t integers.A simulationconsistsof stor-ingthepatternsoneat a timeandperiodicallytestinghowmanyof a randomly-selectedsubsetof themcan be cor-rectly retrievedwith a partialcue.

The ‘fine-grained’exampleof Section5.2 is unfortu-natelytoo largeto simulate.With 15 X 105X 106= 1.5X 1012connections h would require 187.5 GB ofmemory,which is not possiblewith the currentcompu-ters. However, the ‘coarse-grained’model has 4 X17000 X 11500 = 7.82 X 108 one-bit connections,which amounts to approximately100MB, and easilyfitsin availablememory.

Several configurationswere simulated,and they allgave qualitatively similar results (Figure 4). In thecoarse-grainedmodel, practically no retrieval errorswere produceduntil 370000 patternshad been stored.With 375000 patterns, 99% were correctly retrieved,and after that the performance degraded quickly to94% with 400000 patterns, 71% with 460000, and23% with 550000 (Figure4). Each run took about twohoursof CPU time on a Cray Y-MP 8/864.From thesesimulations,and those with other configurationsshownin Figure4, it canbe concludedthat theaveragecapacityof theconvergence-zoneepisodicmodelmaybe asmuchas one order of magnitudelarger than the theoreticallowerbound.

7. LESSIS MORE:THE EFFECTOF SPARSECONNECTIVITY

In the convergence-zonemodel so far, the featuremapshave been fully connected to the binding layer. Suchuniformconfigurationmakesanalysisand simulationofthemodeleasier,butit is notveryrerdistic.Bothfromthepointof viewof modelingthe hippocampalmemoryandbuilding artificialmemory systems, it is important toknowhow the capacityis affectedwhen the two layersare only sparselyconnected.

A seriesof simulationswererun to determinehowthemodel would perform with decreasingconnectivity.Agivenpercentageof connectionswere chosenrandomlyanddisabled,andthe 99$10capacitypointof the resultingmodel was found. The binding patterns were chosenslightlydifferentlyto accountfor missingconnections:

1026 M. Moll and R. Miikkulainen

I 1.0I

I -mi=lO.e

a

i!0.6

I%f ~0.4

0.2

af.4ooon+OOOOm=l50C-.35OO

-in*1 7000n-l 1500m==150C.15000

rm-l 25c.7omo

0.0 I1 2 3 4 56 7 8 9.

I pilttCITMl stored (% 1000OO)0

LFtGURE . Experimentalavaragacapacity.The horizontalaxis ehowstha numberof pattamaetorad ins logarithmicacaleof hundredsofthouaan e. The vertical axle indicatesthe pamanWa of correctlyretrievedpattame out of a randomly-selectedeubaat of 500 stored-s (ad-t ~~e SOH mh ~fns).~ch m*l COIWi*~ of four~m maps,endduringretrieval,thafourth featurewaaratrfav uaingthafiratthraa aacuaa. The modsisdifferedintheelzaa ofthafeaturarnepa f, bindingIsyereiza n,andblndlngpatternsizentlllap indicatesvaragaeoverthraa simulations.Thathaoratlcal capacityCofthafirat modalwithp— = 99%wes35000, that ofthe sac d 15000, and that of the third 70000.

?

the m inding units were chosen among those binding experiments.At each level of comectivity from 100’ZOlayer “ts that were connected to all features in the down to 20%, five different simulationswith differentfeaturepattern.If there were less than m suchunits, the randomcomectivitywererun and the resultswere aver-bindin patternconsistedof all availableunits. aged (Figure5). Sincethe main effectof sparseconnec-

Due o high computationalcost of the simulations,a tivity is to limit the number of binding units that aresmall onvergence-zonememory with ~ = 1000,n = availablefor the bindingpattern,one wouldexpectthat3000, = 20, t = 4, and c = 3 was used in these the capacitywouldgo down.However,just the opposite

1000O

L__—_LJ100 60 60 40 20 0

connectivity rate (%)

LFIGURE 5. Capacity with sparse connectivity.The horizontalaxis showe the percentageof connection that were available to formMnding s,andthavarticaiaxielndicateathacapeoityatthaW%co*- level.Aaconmctlvftydecraaaaa,theretrfevaipettamsbacomdrnora fooUaed, and the mtriavaibecomemoreratiabiauntii about 30% connactivlty,where there are no longer enough con-nation* to form binding pattarneof m unite.The modelhad f= 1000,n = 3000,m= 20,t= 4, c = 3, andtheplotie an averageof fiveSimufationa.

Convergence-Zone Episdoci Memory: Analysis and Simulations 1027

turnsout to be true:the fewerconnectionsthemodelhadavailable(downto 30%connectivity),the morepatternscouldbe storedwith99~oprobabilityof correctretrieval.

Theintuitionturnsoutto be incorrectforan interestingreason:sparseconnectivitycausesthe bindingconstella-tionsto becomemore focused,removingspuriousover-lap that is the maincauseof retrievalerrors.To see this,considerhow~ (thesizeof thebindingconstellationof afeatureunit after at least one patternhas been storedonit) growswhen a new patternis storedon the unit.A setof bindingunitsis chosento representthe pattern.Whenonly a fractionr of the connectionsto the bindinglayerare available,there are only r~nbindingunits to choosefrom, comparedto n in the fully connectedmodel.It istherefore more likely that a chosen binding unit isalready part of the bindingconstellationof the featureunit, and this constellationthereforegrows at a slowerpace than in the fully connectedcase.The expectedsizeof the constellationafterp patternshave been storedinthe memorybecomes(from(6))

‘(z)=m+(r-+%)p)“’)whichdecreaseswith comectivity r as long as thereareat leastmbindingunitsavailable(i.e.rk > m). Whenthebinding constellations are small, their intersectionsbeyondthem commonunitswillbe smallaswell.Duringretrieval it is then less likely that a rogue unit willbecome more active than the correct unit. The activitypatternsin the sparselyconnectedsystemare thusbetterfocused,and retrievalmorereliable.

When there are fewer than m bindingunits available,the capacity decreases very quickly. In the model ofFigure5 (with m = 20), the averagenumberof bindingunitsavailableis45 for3590connectivity,24for3090,12for25%,and5 for20%.In otherwords,theconvergenze-zone episodic memory performs best when it is con-nected just enough to support activationof the sparsebinding patterns. This is an interestingand surprisingresult, indicatingthat sparse connectivityis not just anecessitydue to limitedresources,but alsogivesa com-putationaladvantage.In the contextof the hippocampalmemorysystemit makessensesinceevolutionwouldbelikely to producea memoryarchitecturethat makesthebest use of the availableresources.

8. ERRORBEHAVIOR

When the bindingpatternsare selectedat randomas inthe model outlined,when errors occur duringretrieval,the resulting feature values are also random. Humanmemory,however,rarelyproducessuchrandomoutput.Humanmemoryperformanceis often approximate,butrobust and plausible.That is, when a featurecannotberetrievedexactly,a valueis generatedthat is closeto thecorrectvalue.

To testwhethertheconvergence-zonearchitecturecanmodelsuch“human-like”memorybehavior,the storagemechanismmustbe changedso that it takesintoaccountsimilaritiesbetween stored patterns. So far the spatialarrangementof the units has been irrelevant in themodel;let us nowtreat the featuremapsand the bindinglayer as one-dimensionalmaps (Figure6). The bindinglayer is dividedinto sections,one for each featuremap.The bindingunitsare selectedstochasticallyfrom a dis-tribution function that consists of one componentforeach of the sections.Each componentis a flatrectanglewith a radiusof u units aroundthe unit whose locationcorrespondsto thelocationof theactivefeaturemapunit.The distributionfunctionis scaledso that on averagemunitswillbe chosenfor the entirebindingpattern.

The radius u of the distributioncomponentsdeter-minesthe varianceof the resultingbindingpatterns.Bysettingu a (n/t– 1)/2a uniformdistributionisobtained,which gives us the basic convergence-zonemodel. Bymakingu smaller,thebindingrepresentationsfor similarinputpatternsbecomemore similar.

Severalsimulationswithvaryingdegreesof a wererunto see how the error behaviorand the capacity of themodelwouldbe affected(Figure7). A configurationof4 featuremaps (3 cues) with 20000 bindingunits,400feature map units, and a 20-rmitbinding pattern wasused. The results show that when the bindingpatternsaremademoredescriptive,theerrorsbecomemoreplau-sible:whenan incorrectfeatureis retrieved,it tendsto beclose to the correct one. When the binding units areselectedcompletelyat random (u = (rzh– 1)/2), theaveragedistanceis 5000;whenu = 20, it dropsto 2500;and whena = 5, to 700.Such ‘‘human-like”behaviorisobtainedwitha slightcostin memorycapacity.Whenthepatternsbecomelessrandom,thereis moreoverlapin theencodings, and the capacity tends to decrease. Thiseffect, however,appearsto be rather minor, at least inthe smallmodelssimulatedin our experiments.

Binding Layer and DistributionFunction

1 r J I 1 ul—I [1- m I I I I I I I-III-111

uuLmnAuFeature Map 1 Feature Map 2 Feature Map 3 Feature Map 4

FIGURE6. Selecting a binding rsprsssntstionfor an input pst-tern. The binding layer Is dlvidsd into eeetlone. On average, mblndlng unite wIII be selected from a dlatrlbutionfunction thatconaista of one rectangularcomponent for each section, csn-tersdaroundthe unitwhoselocationeorrsapondatoths locationof the inputfeatureunit. If the center is closeto the boundaryofthe esetlon, the distributioncomponentwraps around the ssc-tlon (aa for Feature Map 1). Ths parameter u determines theradiuaof the components,and therebythe variance in the bind-ing pattern.Thie particularfunctionhad u = 2, that la, the widthof the individual components was 2 x 2 + 1 = 5. Ths modelparameterswere f= 10, n = 40, m = 6.

1028

6000

5000

4000

2000

1000

n

M. Moll and R. Miikkulainen

70000

60000

50000

20000

1000O

050 40 30 20 10 0a

FIGURE~. Errorbehaviorwithdeecriptlvebhflng patterns.lhe Iowarcuwe(wfth ecefeat left)ehowsthesveraga distanceof incorrectfy-retrievaqfeaturasss sfunction of the distributionradiusu. The uppercurve(withscalest right) indlceteethe correepondlngcapacitystthe 99% $onfidencelevel.Whenthe bindingpatternis iaea random,the erroneousunitetendto be cloaarto the correctonce.The modelhsd n= q0,000,f= 400,m= 20, t= 4, c= 3. Retrievalof all storedpatternswaschecked.The plote indlceteaveregaaoversix simulations.

9. DISCUSSION

The co~vergence-zoneepisodicmodel as analyzedand

1

simula d aboveassumesthat the featurepatternsdo notoverlapmuch,andthat the patternis retrievedin a singleiteratio . Possiblerelaxationof these assumptionsandthe effe ts of suchmodificationsare discussedbelow.

9.1. PafternOverlap

The th~oreticallower-boundcalculationsassumedthat

$the ch ce of overlapof more than one feature is verysmall, d this was also true in the models that wereanalyzedand simulated.However,this restrictiondoesnot limit the applicabilityof the model as much as itmightfirstseem,for two reasons:

First, it mightappearthat certainfeaturevaluesoccurmore often than others in the real world,causingmoreoverlap~than there currentlyis in the model.However,notethdtthe inputto the modelis representedon featuremaps.One of the basicpropertiesof both computationaland biologicalmaps is that they adapt to the input dis-tribution by magnifyingthe dense areas of the inputspace.

t

other words, if someperceptualexperienceismorefr uent,moreunitswillbe allocatedforrepresent-ing it sp that each unit gets to respondequallyof?ento

i

inputs Kohonen, 1989;Merzenichet al., 1984;Ritter,1991). erefore,overlapin the featuremap representa-tions is, significantlymore rare than it may be in theabsolu~experience:the minordifferencesaremagnifiedandtheImpresentationsbeeomemoredistinguishableandmore ~emorable.

Second, as discussed in Section 4.4, the chance of

overlapof more than one feature is clearly small if thefeature values are independent.For example in thecoarse-grainedmodel, at the 9990 capacity point, onaveragetherewere 88 otherpatternsthat sharedexactlyone commonfeaturewith a givenpattern,whereastherewere only 0.0078 other patternsthat shared more thanone feature. To be sure, in the real world the featurevalues across maps are correlated,which would makeoverlap of more than one featmremore likely than itcurrentlyis in the model. While it is hard to estimatehow common such correlationswould be, they couldgrow quite a bit before they become significant.Inother words, the conclusionsdrawn from the currentmodel are valid for at least small amounts of suchcorrelations.

9.2. ProgressiveRecall

The retrievalprocessadoptedin the convergence-zonemodel is a version of simple recall (Gardner-Medwin,1976), where the pattern is retrieved based on onlydirect associationsfrom the retrievalcues. In contrast,progressive recall is an iterativeprocess that uses theretrievedpattern at each step as the new retrievalcue.Progressiverecall couldbe implementedin the conver-gence-zonemodel.Supposefeaturesneedto be retrievedin severalmaps.Afterthe firstretrievalattempt,the rightfeatureunit will be clearlyidentifiedin most maps.Forthe secondretrievaliteration,all theseunitscan be usedascues,andit is likelythata patternwillbe retrievedthatis closerto the correctpatternthanthe oneobtainedwithjust simple recall. This way, progressiverecall wouldcause an increase in the capacity of the model. Also,

Convergence-Zbne Episdoci Memory: Analysis and Simulations 1029

such a retrievalwouldprobablybe more robustagainstinvalid retrievalcues (i.e. cues that are not part of thepatternto be retrieved).The dynamicsof theprogressiverecall process are difficult to analyze (see Gibson &Robinson(1992)for a possibleapproach)and expensiveto simulate,and simplerecall was thus used in this firstimplementationof the convergence-zonemodel.

Above,a theoreticallower bound for the capacityofsimplerecallwithina givenerrortolerancewasderived,and the averagecapacitywas estimatedexperimentally.Two other types of capacitycan also be definedfor anassociativememorymodel(Amari, 1988).The absolutecapaci~ refers to the maximumnumberof patternsthatthe networkcan representas equilibriumstates,and therelative capacity is themaximumnumberofpatternsthatcan be retrievedby progressiverecall.The lowerboundfor the simplerecallderivedin thispaperis also a lowerbound for the absolutecapacity,and thus also a lowerbound for the relative capacity, which may be ratherdifficultto derivedirectly.

10. RELATEDWORK

Associativememoryis one of the earliestand stillmostactiveareas of neuralnetworkresearch,and the conver-gence-zonemodel needs to be evaluatedfrom this per-spective.Althoughthe architectureis mostlymotivatedby the neurosciencetheory of perceptualmaps, hippo-campalencoding,andconvergencezones,it is mathema-tically most closely related to statistical associativememories and the sparse distributedmemory model.Contrastingthe architecturewith the Hopfieldnetworkand modified backpropagationis appropriatebecausethese are the best-knownassociativememorymodelstodate.Eventuallyconvergence-zonememorymightserveas a modelof humanepisodicmemorytogetherwiththetrace featuremap modeldescribedbelow.Althoughit isan abstractmodelof the hippocampalsystem,it is con-sistentwith the morelow-levelmodelsof the hippocam-pal circuitry,and complementsthem well.

10.L The HopfieldModel

The Hopfieldnetwork (Hopfield,1982)was originallydevelopedto modelthe computationalpropertiesof neu-robiologicalsystemsfrom the perspectiveof statisticalmechanics(Amit et al., 1985a,b; Kirkpatrick& Sher-nngton, 1988;Peretto& Niez, 1986).The Hopfieldnet-workis characterizedby full connectivity,exceptfromaunit to itself.Patternscan be storedoneat a time,but thestoragemechanismis rather involved.To storean addi-tionalpatternin a networkof, say,Nunits, theweightsofall the N X (N – 1)connectionshave to be changed.Incontrast,theconvergence-zonememoryismoresparseinthat only t X m < n <<~weightshave to be modified.

A pattern is retrieved from the Hopfield networkthrough progressive recall. The cues provide initial

activation to the network, and the unit activationsare updated asynchronouslyuntil they stabilize. Thefinalstableactivationpatternis then taken as the outputof the network.In the convergence-zonemodel,on theother hand, retrieval is a four-step version of simplerecall: first the activationis propagatedfrom the inputmaps to the bindinglayer, thresholded,and then propa-gated back to all feature maps, where it is thresholdedagain. This algorithmcan be seen as a computationalabstractionof an underlyingasynchronousprocess.In amore low-levelimplementation,thresholdingwould beachieved through inhibitory lateral connections.Theneuronswould update their activationone at a time inrandom order, and eventuallystabilize to a state thatrepresentsretrievalof a pattern.

Thecapacityfor theHopfieldnetworkhasbeenshowntheoreticallyto be N/41nN(Amit, 1989; Hertz et al.,1991; Keeler, 1988; McEliece et al., 1986) andexperimentallyabout 0.15N (Hopfield,1982).For theconvergence-zonemodelsucha simpleclosed-formfor-mula is difficultto derive,becausethe modelhas manymore parametersand correlationsthat complicatetheanalysis.However,as was shownabove,a lowerboundcanbe derivedfora givensetofparameters.Suchboundsandalsoexperimentalsimulationsshowthatthecapacityfor themodelcanbe ordersof magnitudehigherthanthenumberofunits,whichis ratheruniquefor an associativememoryneuralnetwork.

However,it shouldbe noted that in the convergence-zone model, each pattern is much smaller than thenetwork.In a Hopfieldnetworkof size N each patterncontainsNbits of information,whilein theconvergence-zonemodeleachpatternconsistsof onlyt features.Eachfeaturecan be seen as a numberbetween 1 andj corre-spondingto its locationin the featuremap.To representsucha number,210g~bits are needed,and a featurepat-tern thuscontainst210g~bits of information.Comparedto the Hopfield model and other similar associativememorymodels,the informationcontentof each patternhasbeentradedoff forthecapacityto storemoreof themin a networkof equal size.

10.2.Backpropagationand RelatedModels

Severalmodels of associativememory have been pro-posedthatarebasedonbackpropagationor similarincre-mental learning rules (Ackley et al., 1985; Andersonet al., 1977;Knapp & Anderson,1984;McClelland&Rumelhart, 1986a, b). However, these models sufferfrom catastrophicinterference,which makes it difficultto apply them to modelinghuman associativememory(Grossberg,1987;McCloskey& Cohen, 1989;Ratcliff,1990).If the patterns are to be learned incrementally,withoutrepeatingthe earlier patterns,the later patternsin the sequencemusterase the earlierassociationsfrommemory.

Several techniqueshave been proposed to alleviate

1030 M. Moll and R. Miikkulainen

forgetti~g,includingusingweightswith differentlearn-ing ratqs (Hinton & Plaut, 1987),gradually includingnew eixamples and phasing out earlier ones(Hether@gton& Seidenberg,1989),forcingsemidistrib-uted hiqlden-layerrepresentations(French, 1991),con-centra~g changeson novelparts of the inputs(Kortge,1990), iusing units with localized receptive fields(KmschMce,1992),and addingnew units and weightstoencode~newinformation(Fahlman, 1991;Fahlman &Lebiere~1990). In these models, one-shot storage isstillnotlpossible,althoughthe numberof requireditera-tions is !reduced,and old informationcan be relearnedvery fast. At this point it is also unclearwhetherthesearchitechueswould scale up to the numberof patternsappropriatefor humanmemory.

10.3.S~tistical AssociativeMemoryModels

The convergence-zonemodel is perhaps most closelyrelated to the correlation matrix memory (Kohonen,1971, 1P72;see also, Anderson, 1972;Cooper, 1973).In this model there are a number of receptors (corre-spondin~ to feature maps in the convergence-zonemodel) i?hatare comected to a set of associators(thebindingilayer).The receptorsare dividedinto key fieldswheret.lperetrievalcue is specified,anddatafieldswherethe re~eved pattern appears. Each key and data fieldcorreqxlmdsto a feature map in the convergence-zonemodel.Insteadof one value for each field,a wholefea-turemaprepresentsthe field,modelingvalue-unitencod-ing in Ibiological perceptual systems. There is nodistinctjbnbetween key and data fields either; everyfeature map can functionas a key in the convergence-zone model.

othe~ related statisticalmodels include the learningmatrix @teinbuch,1961)and the associativenet (Will-shawet Id., 1969),whichareprecursorsof thecorrelationmatrixmodel.Thesehad a uniformmatrixstructurecon-nectinginputsto outputsin a singlestep.Suchmodelsaresimpleandeasyto implementin hardware,althoughtheydo not have a very high capacity(Faris& Maier, 1988;Palm, 1980,1981).Otherassociativematrixmodelshaverelied on progressiverecall, and thereforeare similarinspirittoltheHopfieldnetwork.Thenetworkof stochasticneumnslof Little & Shaw (1975)and the modelsof thehippocampus of Gardner-Medwin (1976) and Marr(1971) fall into this category.Progressiverecall givesthem a potentially higher capacity, which with highconnectivity exceeds that of the Hopfield network(Gibsod & Robinson, 1992). They are also moreplausibk in that they do not requireN*internalconnec-tions (whereN is the numberof units in the network),althoughthe capacitydecreasesrapidlywith decreasingconnectivity.

10.4.SparseDistributedMemory*

TheSparseDistributedMemorymodel(SDM;Kanerva,1988) was originally developed as a mathematicalabstractionof an associativememory machine.Keeler(1988) developeda neural-networkimplementationofthe idea and showedthat the SDM comparesfavorablywith the Hopfield model; the capacity is larger andthe patterns do not have to include all units in thenetwork.

It is possibleto give the convergence-zonememorymodel an interpretationas a special case of the SDMmodel:A fully-connectedtwo-layernetworkconsistingof a combinedinput/outputlayer and a hiddenlayer. Inalternatingsteps, activityis propagatedfrom the input/outputlayerto thehiddenlayerandback.Seenthisway,every unit in the input/outputlayer correspondsto onefeature map in the convergence-zonemodel, and thehiddenlayer correspondsto the bindinglayer.

It can be shownthat the capacityof the SDMis inde-pendentof thesizeof theinputioutputlayer.Moreover,ifthe size of the input/outputlayer is fixed, the capacityincreaseslinearlywiththesizeof thehiddenlayer.Theseresults suggestthat similarpropertiesapply also to theconvergence-zoneepisodicmemorymodel.

10.5.TraceFeatureMaps

The Trace Feature Map model of Miikkulainen(1992,1993)consistsof a self-organizingfeaturemapwherethespaceofallpossibleexperiencesis firstlaidout.Themapis laterallyfullyconnectedwithweightsthat are initiallyinhibitory.Tracesof experiencesare encodedas attrac-tors usingthese lateralconnections.When a partialpat-tern is presented to the map as a cue, the lateralconnections move the activation pattern around thenearestattractor.

The Trace FeatureMap was designedas an episodicmemorycomponentofa storyunderstandingsystem.Themain emphasiswas not on capacity,but on psychologi-cally valid behavior. The basins of attraction for thetracesinteract,generatingmany interestingphenomena.For example,the later traceshave largerattractorbasinsandareeasierto retrieve,anduniquetracesarepreservedeven in an otherwiseoverloadedmemory.On the otherhand,becauseeach basin is encodedthroughthe lateralcomectionsof severalunits,the capacityof the modelisseveral times smaller than the number of units. Also,there is no mechanismfor encodingtruly novel experi-ences; only vectors that are already representedin themap can be stored. In this sense, the Trace FeatureMap model can be seen as the cortical componentofthe human long-termmemory system.It is responsiblefor many of the effects,but incorporatingnovelexperi-encesintoits existingstructureis a lengthyprocess,as itappearsto be in humanmemorysystem(Halgren,1984;McClellandet al., 1995).

———–.—.—- -.

Convergence-Zone Episdoci Memory: Analysis and Simulations 1031

10.6.Modelsof the Hippocampus

A largenumberof modelsof the hippocampalformationand its role in memoryprocessinghave been proposed(Alvarez & Squire, 1994; Gluck & Myers, 1993;McNaughton & Morris, 1987; Marr, 1971; Murre,1995;O’Reilly& McClelland,1994;Read et al., 1994;Schmajuk& DiCarlo, 1992;Sutherland& Rudy, 1989;Teyler & Discenna,1986;Treves& Rolls, 1991,1994;Wickelgren, 1979). They include a more detaileddescription of the circuitry inside hippocampus,andaim at showinghow memorytracescouldbe createdinsuch a circuitry.The convergence-zonemodel operatesat a higherlevelof abstractionthan thesemodels,and inthis senseis complementaryto them.

Man (1971)presenteda detailedtheoryof the hippo-campalformation,includingnumericalconstraints,capa-city analysis, and interpretationat the level of neuralcircuitry.The input is based on local input fibersfromthe neocortex,processedby an inputand outputlayerofhippocampalneuronswithcollateralconnections.Recallis based on recurrentcompletionof a pattern.The con-vergence-zonememory can be seen as a high-levelabstractionof Marr’s theory, with the emphasison theconvergence-zonestructurethat allowsfor highercapa-city than Marr predicted.

Several authorshave proposeda role for the hippo-campus similar to the convergence-zoneidea (Alvarez& Squire, 1994;McClellandet al., 1995;Murre, 1995;Teyler & Discenna,1986;Treves& Rolls, 1994;Wick-elgren, 1979).In thesemodels,hippocampusitselfdoesnot store a completerepresentationof the episode,butacts as an indexingarea, or compressedrepresentation,that binds togetherparts of the actual representationintheneocorticalareas.Treves& Rolls(1994)alsoproposebackprojectioncircuitry for accomplishingsuch recall.Convergence-zonememoryis consistentwithsuchinter-pretations,focusingon analyzingthe capacity of suchstructures.

One of the assumptionsof the convergence-zonememory, motivated by recent results by Wilson &McNaughton(1993), is that the binding encoding issparseand random.The modelby O’Reilly& McClel-land (1994)showshow the hippocampalcircuitrycouldform such sparse,diverseencodings.They exploredthetradeoffbetweenpattern separationand completionandshowedthat the circuitrycouldbe setup to performbothof these tasks simultaneously.The entorhinal-dentate–CA3 pathwaycouldbe responsiblefor formingrandomencodingsof traces in CA3, and the separationbetweenstorageand recall couldbe due to overalldifferenceinthe activitylevel in the system.

11. FUTUREWORK

Futurework on the convergence-zoneepisodicmemorymodelwill focuson three areas.First, the modelcan be

extendedin severalwaystowardsa moreaccuratemodelof the actualneuralprocesses.For instance,lateral inhi-bitory connectionsbetween units within a feature mapcouldbe addedto selecttheunitwiththehighestactivity.A similarextensioncouldbe appliedto thebindinglayer;only insteadof a singleunit, multipleunits shouldstayactivein theend.Lateralconnectionsin thebindinglayercouldalsobe usedto partiallycompletethe bindingpat-tern evenbeforepropagationto the featuremaps.As thenext step, the bindinglayer could be expandedto takeinto accountfinerstructurein the hippocampus,includ-ing the encoding and retrieval circuitry proposed byO’Reilly & McClelland (1994). A variation of theHebbian learning mechanism (Hebb, 1949; Miller &MacKay, 1992)could then be used to implementthestorageand recall mechanisms.In additionto providinginsight into the hippocampal memory system, suchresearchcould lead to a practicalimplementationof theconvergence-zonememory,and perhapseven to a hard-ware implementation.

Second,a numberof potentialextensionsto the modelcouldbe studiedin more detail. It mightbe possibletotakesparseconnectivityintoaccountin the analysis,andobtain tighter lower boundsin this more realisticcase.Recurrencecould be introducedbetween feature mapsand the bindinglayer, and capacitycould be measuredunder progressiverecall. Possibilitiesfor extendingthemodelto toleratemore overlapbetweenpatternsshouldalso be studied.

Third, the model could be used as a steppingstonetowardsa morecomprehensivemodelofhumanepisodicmemory,includingmodulesforthehippocampusandtheneocorticalcomponent.As discussedabove,the conver-gence-zonemodel seems to fit the capabilitiesof thehippocampalcomponentwell, whereas somethingliketrace feature maps could be used to model the corticalcomponent.It wouldbe necessaryto observeand char-acterizethe memoryinterferenceeffectsof the compo-nents and compare them with experimentalresults onhuman memory. However, the main challengeof thisresearchwouldbe on the interactionof the components,that is, how the hippocarnpalmemorycould transferitscontentsto the corticalcomponent.At thispointit is notclearhowthisprocesscouldtakeplace,althoughseveralproposalsexist(Alvarez& Squire,1994;Halgren,1984;McClellandet al., 1995; Milner, 1989; Murre, 1995;Treves & Rolls, 1994). Computationalinvestigationscould prove instrumentalin understandingthe founda-tionsof this remarkablesystem.

12. CONCLUSION

Mathematical analysis and experimental simulationsshow that a large numberof episodescan be stored inthe convergence-zonememory with reliable content-addressableretrieval.For the hippocampus,a sufficientcapacitycan be achievedwith a fairly smallnumberof

1032 M. Moll and R. Miikkulainen

units and connections.Moreover,the convergencezoneitselfrequiresonlya fractionof thehardwarerequiredforperceptualrepresentation.Theseresultsprovidea possi-ble explanationfor why human memory is so efficientwithsu~ha highcapacity,andwhymemoryareasappearsmall aompared to the areas devoted to low-levelperceptualprocessing.It also suggeststhat the computa-tionaluhitsof the hippocampusandthe perceptualmapscan be quite coarse, and gives a computationalreasonwhy the maps and the hippocampusshouldbe sparselyconnected.

The model makes use of the combinatoricsand theclean-up properties of coarse coding in a neurally-inspireclarchitecture.The practical storagecapacityofthe modelappearsto be at leasttwo ordersof magnitudehigher than that of the Hopfieldmodel with the samenumber of units, while using two orders of magnitudefewerconnections.On the otherhand,the patternsin theconvergence-zonemodelare smallerthanin theHopfieldnetwork. Simulationsalso show that psychologicallyvalid error behaviorcan be achievedif the bindingpat-terns are made more descriptive:the erroneouspatternsare closeto the correctones.The convergence-zoneepi-sodic memory is a step towardsa psychologicallyandneurophysiologicallyaccuratemodelof humanepisodicmemory,the foundationsof which are only now begin-ning to be understood.

REFERENCES

Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learningrdgoritbmfor Boltzmann machines. Cognitive Science, 9, 147–169.

Alon, N., & Spencer, J. H. (1992). The Probabilistic Method. NewYork: Wiley.

Alvarez, P., & Squire, L. R. (1994). Memory consolidation and themedial temporrd lobe: A simple network model. Proceedings ofthe National Academy of Sciences of the USA, 91; (pp. 7041-7045).

Amaral, D. G., Ishizuh N. & Claibome, B., (1990). Neurons, numbersand the hippocarnpzdnetwork. In J. Storm-Mathisen, J. Zirmner &O. P. Ottcraen (Eda), Understanding the Brain Through the Hippo-camptis, vol. 83 of Progress in Brain Research (pp. 1–11). NewYork: Elsevier.

Amari, S.-I. (1977). Neural theory of association and conceptfonnafion. Biological Cybernetics, 26, 175-185.

Arnari, S.-I. (1988). Associative memory and its statistical neuro-dynamical anatysis. In H. Haken (Ed.), Neural and Synergetic Com-puters (pp. 85–99). Berlin: Springer Verlag.

Amit, D. J. (1989). Modeling Brain Function: The World of AttractorNeural Networks. Cambridge: Cambridge University press.

Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Spin-glassmodels of neural networka. Physical Review A, 32, 1007-1018.

Amit,D. J., Gutfreund, H., & Sompolinsky, H. (1985). Storing infinitenumbers of patterns in a spin-glass model of neural networks.Physical Review Letters, 55, 1530-1533.

Anderson, J. A. (1972). A simple neural network generating an inter-active memory. Mathematical Biosciences, 14, 197–220.

Andersoq, J. A., Silverstein, J. W., Ritz, S. A., & Jones, R.S. (1977).Distinctive features, categorical perception and probability learn-ing: some applications of a neurrd model. Psychological Review,84,413-451.

Bain, L. J., & EngelhardG M. (1987). Introduction to Probability andMathematical Statistics. Boston, MA: PWS publishers.

Boss, B., Peterson, G., & Cowan, W. (1985). On the nmnbersof neuronsin the dentate gmys of the rat. Brain Research, 338, 144-150.

Boss, B., Turlejski, K., Stanfield, B., & Cowan, W. (1987). On thenumbers of neurons in fields CA1 and CA3 of the hippocrnnpusof Sprague-Dawley and Wistar rats. Brain Research, 406, 280–287.

Cooper, L. N. (1973). A possible organization of animal memory andlearning. In B. Lundquist and S. Lundquist (Ed.), Proceedings of theNobel Symposium on Collective Properties of Physical Systems(pp. 252-264). New York: Academic Press.

Darnaaio, A. R. (1989). The brain binds entities and events by multi-regionrd activation from convergence zones. Neural Computation,1, 123-132.

Darnasio, A. R. (1989). Time-locked mukiregional retroactivation: Asystems-level proposal for the neural substrates of recall andrecognition. Cognition, 33, 25–62.

Fahlrnan, S. E. (1991). The recurrent cascade-correlation architecture.In R. P. Lippmann, J. E. Moody & D. S. Touretzky (Eds), Advancesin Neural Information Processing Systems, 3 (pp. 190–205). SanMateo, CA: Morgan Kaufmann.

Fahlrnan, S. E., & Lebierc, C. (1990). The cascade-correlation learningarchitecture. In D. S. Touretzky (Ed.), Advances in Neural Informa-tion Processing Systems, 2 (pp. 524-532). San MatCo,CA: MorganKaufmarm.

Faris, W. G., & Maier, R. S. (1988). Probabilistic analysis of a learningmatrix. Advances in Applied Probability, 20, 695–705.

French, R. M. (1991). Using semi-distributed representations to over-come catastrophic forgetting in connectionist networks. In Proceed-ings of the 13thAnnual Conference of the Cognitive Science Society(pp. 173-178). Hillsdale, NJ: Erlbamn.

Gardner-Medwin, A. R. (1976). The recall of events through the lear-ningof associations between their parts. Proceedings of the RoyalSociety of Lnndon B, 194 375–402.

Gibson, W. G., & Robinson, J. (1992). Statistical analysis of thedynamics of a sparae associative memory. Neural Networks, 5,645-661.

Gluck, M. A., & Myers, C. E. (1993). Hippocampaf mediation of sti-mulus representation: A computational theory. Hippocampus, 3,491–516.

Grossberg, S. (1983). Studies of Mind and Brain: Neural Principles ofLearning, Perception, Development, Cognition and Motor Control.Boston, Dordrecht, Holland: Reidel.

Grossberg, S. (1987). Competitive learning: From interactive activationto adaptive resonance. Cognitive Science, 11, 23–63.

Halgren, E. (1984). Human hippocarnpal and amygdrda recording andstimulation: Evidence for a neural model of recent memory. In L.Squire & N. Butters (Eds), The Neuropsychology of Memory(pp. 165-182). New York: Guilford.

Hasselmo, M. E., Rolls, E. T., & Baylis, G. C. (1989). The role ofexpression and identity in the face-selective responses of neuronsin the temporal visual cortex of the monkey. Behavioral BrainResearch, 32, 203-218.

Hebb, D. O. (1949). The Organization of Behavior: A Neuropsycho-logical Theory. New York: Wiley.

Heit, G., Smith, M. E., & Halgren, E. (1989). Neural encoding ofindividual words and faces by the human hippocarnpus andarnygdrda.Nature, 333, 773-775.

Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the Theoryof Neural Computation. Reading, MA: Addison-Wesley.

Hetherington, P. A., & Seidenberg, M. S. (1989). Is there “catastrophicinterference” incomrectionist networks? In Proceedings of the IlthAnnual Conference of the Cognitive Science Society (pp. 26-33).Hillsdrde, NJ: Erlbamn.

Hinton, G. E., & Anderson, J. A. (Eds) (1981). Parallel Models ofAssociative Memory. Hillsdale, NJ: Erlbaum.

Hinton, G. E., & Plaut, D. C. (1987). Using fast weights to deblurold memories. In Proceedings of the Ninth Annual Conference ofthe Cognitive Science Society (pp. 177–186). Hillsdale, NJ:Erlbamn.

Convergence-Zone Episdoci Memory: Analysis and Simulations 1033

Hopfield, J. J. (1982). Neural networks and physical systems withemergent collective computational abilities. Proceedings of theNational Academy of Sciences USA, 79, 2554–2558.

Hopfield, J.J. (1984). Neurons with graded response have collectivecomputational properties like those of two-state neurons. Proceed-ings of the National Academy of Sciences USA, 81, 3088–3092.

Kairiss, E. W., & Miranker, W. L. (1997). Cortical memory dynamics.Biological Cybernetics, in press.

Kandel, E. R. (1991). Nerve cells and behavior. In E. R. Karrdel,J. H.Schwartz & T. M. Jessell (Eds), Principles of Neural Science(pp. 18-32). New York: Elsevier.

Kanerva, P. (1988). Sparse distributed nsemo~. Cambridge, MA: MITPress.

Keeler, J. D. (1988). Comparison between Kanerva’s SDM andHopfield-type neural networks. Cognitive Science, 12, 299-329.

Kirkpatrick, S., & Sherrington, D. (1988). Infinite range models of spin-glasses. Physical Review B, 17, 4384-4403.

Knapp, A. G., & Anderson, J. A. (1984). Theory of categorization basedon distributed memory storage. Journal of Experimental Psychol-ogy: Learrring. Memoq and Cognition, 10, 616–637.

Knudsen, E. I., du Lac, S., & Esterly, S. D. (1987). Computational mapsin the brain. In W. M. Cowan, E. M. Shooter, C. F. Stevens &R. F.Thompson (lids),AnnualReview ofNeuroscience (pp. 41–65). PaloAlto, CA: Annual Reviews.

Kohonen, T. (1971). A class of randomly organized associativememories. Acta Polytechnic Scarrdinavica,EL25.

Kohonen, T. (1972). Correlation matrix memories. IEEE Transactionson Computers, C-21, 353–359.

Kohonen, T. (1977). Associative Memory: A System-TheoreticalApproach. Berlin, Heidelberg, New York: Springer.

Kohonen, T. (1989). SeZf-OrganizationmsdAssociative Memory, ThirdEdn. Berlin, Heidelberg, New York: Springer.

Kohonen, T., & Mikisara, K. (1986). Representation of sensory infor-mation in self-organizing feature maps. In J. S. Denker (Ed.) NeuralNetworks for Computing op. 271–76). New York: American Insti-tute of Physics.

Kortge, C. A. (1990). Episodic memory in connectionist networks. InProceedings of the 12th Annual Conference of the CognitiveScience Society (pp. 7~–771). HillsdaJe, NJ: Erlbaum.

Kmscbke, J. K. (1992). ALCOVE: An exemplar-basedconneetionist model of category learning. Psychological Review,99, 22–44.

Little, W. A., & Shaw, G. L. (1975). A statistical theory of short andlong term memory. Behavioral Biology, 14, 115–133.

Marr, D. (1971). Simple memory: a theory for archicortex. Philo-sophical Transactions of the Royal Society of London B, 262, 23–81.

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). whythere are complementary learning systems in the hippocampus andneocortex: Insights from the successes and failures of comrectiorristmodels of learning and memory. Psychological Review, 102, 419–457.

McClelland, J. L., & Rumelhart, D. E. (1986a). Anmesiaand distributedmemory. In D. E. Rumelhart & J. L. McClelland (Eds), ParallelDistributed Processing: Explorations in the Microstructure of Cog-nition, vol. 2, Psychological and Biological Models (pp. 503–527).Cambridge, MA: MIT Press.

McClelland, J. L., & Rumelhart, D. E. (1986b). A distributed model ofhuman learning and memory. In J. L. McClelland &D. E. Rumel-hart (Eds), Parallel Distn’buted Processing: Explorations in theMicrostructure of Cognition, vol. 2, Psychological and BiologicalModels (pp. 170–215). Cambridge, MA: MIT Press.

McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference inconnectionist networks: The sequential learning problem. The Psy-chology of Learning and Motivation, 24, 104-169.

McEliece, R. J., Posner, E. C., Rodemich, E. R., & Verrkatesh, S. S.(1986). The capacity of the Hopfield associative memory. IEEETransactions on Information Theory, 33, 461–482.

McNaughton, B. L., & Morris, R. G. M. (1987). Hippocampal synapticenhancement and information storage within a distributed memorysystem. Trends in Neuroscience, 10, 408–415.

Merzenich, M. M., Nelson, R. J., Stryker, M. P., Cynader, M.S.,Schoppmarm, A., & Zook, J. M. (1984). Somatosensory corticaJmap changes following digit amputation in adult monkeys. Journalof Comparative Neurology, 224, 591-605.

Miikkulainen, R. (1992). Trace feature map: A model of episodic asso-ciative memory. Biological Cybernetics, 67, 273–282.

Miikkulairren, R. (1993). Subsymbolic Natural Language Processing:An Integrated Model of Scripts, Lexicon, and Memory. Cambridge,MA: MIT Press.

Miller, K. D., & MacKay, D. J. C. (1992). The role of constrains inHebbiarr learning. Neural Computation, 6, 100-126.

Milner, P. (1989). A cell assembly theory of bippocarnpal amnesia.Neuropshychologia, 27, 23–30.

Mrrrre, J. M. J. (1995). A model of amnesia. Manuscript.O’Reilly, R. C., & McClelland, J. L. (1994). HippocampaJ conjunctive

encoding, storage, and recall: Avoiding a trade-off. Hippocampus,4, 661–682.

Palm, G. (1980). On associative memory. Biological Cybernetics, 36,19–31.

Palm, G. (1981). On the storage capacity of an associative memory withrandomly distributed storage elements. Biological Cybernetics, 39,125-127.

Peretto, P., & Niez, J. (1986). Collective properties of neural networks.In E. Bienenstock, F. Fogelman Souli6 & G. Weisbuch (E&),Disordered systems and biological organization (pp. 171-185).Berlin, Heidelberg, New York: Springer.

RatcJiff, R. (1990). Connectionist models of recognition memory: Con-straints imposed by learning and forgetting functions. PsychologicalReview, 97, 285-308.

Read, W., Nenov, V. I., & Halgren, E. (1994). Role of inhibition inmemory retrieval by hippocampal area CA3. Neuroscience andBiobehavioral Reviews, 18, 55–68.

Ritter, H. J. (1991). Asymptotic level density for a class of vectorquantization processes. IEEE Transactions on Neural Networks,2, 173-175.

Rolls, E. T. (1984). Nerrronsinthe cortex of the temporal lobe and in theamygdala of the monkey with responses selective for faces. HumanNeurobiology, 3, 209-222.

Schmajuk, N. A., & DiCarlo, J. J. (1992). Stimulus configuration, clas-sical conditioning, and Irippocampal function. PsychologicalReview, 99, 268-305.

Sejnowski, T. J., & Churchland, P. S. (1989). Brain and cognition. InM.I. Posner (Ed.), Foundations of Cognitive Science (Chapter 8,PP. 31$3s6). Cambridge, MA: MIT Press.

Squire, L. R. (1987). Memory and Brain. Oxford, UK; New York:Oxford University Press.

Squire, L. R. (1992). Memory and the hippocarnpus: A synthesis fromfindingswith rats, monkeys, andhurnans. Psychological Review, 99,195–231.

Squire, L. R., SIrimamura,A. P., & Amarrd, D. G. (1989). Memory andthe hippocampus. In J. H. Byrne & W. O. Berry (E&), NeuralModels of PlasticiQ (pp. 208–239). New York: Academic Press.

Steinbuch, K. (1961). Die Lemmatrix. Kybemetic, 1, 36-45.Sutherland, R. W., & Rudy, J. W. (1989). Configural association theory:

The role of the hippoctunpal formation in learning, memory andamnesia. Psychobiology, 17, 129–144.

Teyler, T. J., & Discenna, P. (1986). The Irippocarnpalmemory index-ing theory. Behavioral Neuroscience, 100, 147–154.

Treves, A., & Rolls, E. T. (1991). What determines the capacity ofautoassociative memories in the brain? Network, 2, 371–398.

Treves, A., & Rolls, E. T. (1994). Computational analysis of the role ofthe hippocarnpus in memory. Hippocarnpus,4, 374-391.

Tulving, E. (1972). Episodic and semantic memory. InE. Tulving & W.Donaldson (Eds), organization of memory (pp. 381–403). NewYork: Academic Press.

1034 M. Moll and R. Miikkulainen

Tulving, E. (1983). E2ements of Episodic Memory. Oxford, UK; NewYork: Oxford University Press.

Wickelgren, W. A. (1979). Chunking and consolidation: A theoreticalsynthesis of semantic networks, configuring, S-R versus cognitivelearning, normal forgetting, the amnesic syndrome, and the hippo-campal arousal system. Psychological &view, 86, 44-60.

Willshaw, D. J., Buneman, O. P., & Longuet-Higgins, H. C. (1969).Non.holographic associative memory. Natire, 222, 960-%2.

Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippo-campal ensemble code for space. Science, 261, 1055–1058.

APPENDIXA: PROBABILITYTHEORYBACKGROUNDANDPROOFS

In this appendix, concepts from probability theory that are necessary forunderstanding the main text are briefly reviewed, and details of theprobabilistic formulation and mmtingale analysis are presented (formore background on probability theory and statistics, see e.g. Alon &Spencer, 1992 or Bain & Engelhardt, 1987).

Al. Distributionsand Bounds

In the analysis of Sections 3 and4, two probability density functions areused. Tbe first one is the binomial distribution with parameters n andp,denoted as B(n#), and it gives the outcome of n independent trialswhere ~e two possible alternatives have probability p and 1 – p.This distribution has the expected value np and variance np(l – p).The other one is the hypergeometric distribution with parameters n, mand N, denoted as HYP(n,mM, and representing the number of ele-ments in common between two independently-chosen subsets of n andm elements of a common superset of N elements. The distribution hasthe expected value ~ and variance Y(1– ~) H.

The Chemoff bounds can be used to estimate how likely a binomi-ally distributed variable, X, is to have a value within a given distance 6from its mean:

P(xs(1–a)np)s()

e-’ w, 0<6<1,(1 –/$-6 (Al)

()

e’ ‘,6>0‘(x =‘1‘*)np) = (1 +6)1+6 “

(A.2)

Even if the trials are not independent (and therefore Xis not binomiallydistributed), in some cases the sequence of variables XO,..., X“ can beanalyzed as a martingale, and bounds similar to Chemoff boundsderived using Azrrma’s inequalities (see Appendix A.3).

A.2. Detailsof the ProbabilisticFormulation

Asin,%etion 3, let Zibethe size of the binding constellation of a featoreunit after i patterns have been stored on it, and let Yi be its increase afterstoring the itb pattern on it. Then

iZi= ~~,Yk. (A.3)

Let Yi, i > 1 be hypergeometically distributed with parameters m,n — zi-l), ad n:

‘(y=’’zi-l=zi-’)=(H(::Y)Y(:):) ‘A4)and let ZI = YI = m with probability 1. Tben

‘(y’)=E(m(n-:-l))i—1

=rn - ~E(z._l)=tn- ~,~lE(Yk)

—— (1– ~) E(Yi-,) =m(l – ~)’-]. (A.5)

Using (A.5), an expression for E(Zi) can be derived:

E(Z,)=k~,E(Y~)=,~lm(l -~)’ -’=n(l - (1- ~)i). (A.6)

i

This equation indicates that initially, when no patterns are stored, Zi =O,and that Zi converges ton as i goes to infinity.

The variance of Zi can be computed in a similar way. First a recur-rent expression for the variance of Yiis derived:

Var(Yt)=Ez-l IVarfYilZi - ~)]+ Varzi-, IE(YilZi_ 1)1

=%+(+:)’-’)(:):+HV’JHV-J(A.7)

TOobti the variance of Zi, the covariance of Zi-l ~d Yineeds to kcomputed:

COV(Zi- ~,Yi)=E(Zi - 1~) – E(Zi - l)E(yi)

=Ez,-, [E(Zi- ~Yil~- ~)]–E(Zi- ~)E(~)

( m(n – Zi- I )=E Zi- ~ ~

)– E(Zi - ~)E(Yi)

=mE(Zi-, ) – ~[(E(Zi - ~))z+ Var(Zi- ~)]– E(~- ,)E(Yi)

=— ~Var(Zi-l). (A.8)

The variance of Zi cm now be easily derived:

Var(Zi) = Var(Z’- ~+ Yi)

= Var(Zi- ~)+ Var(Yi)+ 2C0V(Zi_ 1,‘i)

= >Ivar(yk)+z$,c.v(zk-,,y=,$1 Var(Yk)- ~j~ Var(Z~)

()= Var(Yi)+ 1 – ~ Var(Zi-1)

=n(l-:)i(l-~(l- :)i)

( )’+n(?l – 1) 1 – m(:(-–ml; 1) (A.9)

‘rhebase of every exponential in this expression is smaller tharr 1, so thevariance goes to Oas i goes to infinity. This makes sense in the model. Ifmany patterns are stored, it becomes very likely that Zi has a value closeto n, and hence its variance should go to zero.

So far it has been assumed that i is an ordinary variable. If it isreplaced by a stochastic variable I that is bioomially distributed withparameters p and 1~, the previously computed expected values andvariances must be made conditional on 1.To compute the mrconditiomdexpected values and variances, the following lemma is needed:

LEMMAAl. ~X - B(n,p), thenE((l–a~)=(1–ap)n.

Proof.

E((l –aF)= ,.. (l –a~()

n #(l-p)”-xx

n n= q )(p - ap~(l –p)n-x

.r=o x

= ((p – ap) + (1 -p))”

=(1 – ap)”

Convergence-Zone Episdoci Memory: Analysis and Simulations 1035

Let Zdenote the unconditional value of Zi. The desired results followimmediately from Lemma A.1 and the linearity of expected values:

“2’=”(1-(%)’)Var(Z) = Varl[E(ZIII)]+E1[Var(Zrll)]

[( )]=n2Var 1 – ~ 1

[

+E “(1 - ~)r-nz(l - ~)”

(+.(. – 1) 1– )1m(2n – m – 1) ‘

n(n – 1)

‘n(l-;)p(l-n(l-%)p)

(A.10)

In order to apply Azmna’s inequality, the Lipscbitz conditionl~E–Z~-l I= 1 must to be satisfied. The binding units are chosenone at a time. and thev mav either alreadv be Dartof the constellation. .or add one more tit to it. Therefore, tb~ diff~rence between Z’y andZ’v-l is rdways either Oor 1:

Case 1: Z’v–Z’, - ~=0

‘z-z’-”=+-z’x’-+)ki-v+:).—Z’v–( )

fi-v

— —— 1–:n n

=s1.

Case 2: Z’V=Z’, _ 1+ 1

(+n(n – 1) 1 –)

m(2n – m – 1) p(All)

n(n – 1)$ ‘~-z~-l’=l-(n-z’+:)ki-”(’-(’-i))Let ~ be defined as Z-ZIIZ >1. Then, ~ will always be at least m.

Expressions sirniku to (All) and (A.12) can be derived for Z:

“2)=m+(n-m)(’-(1-%Y)> (A.12)

‘ar(2)=@-m’(1-%)p-’ (1-@-ml-; )’-1)

(+ (n – m)(n –m– 1) 1 –

)

m(2n – m – 1) ‘–1(A.13)

n(n – l)fFrom (A.1O)and (A.12) it follows that on the average, ~is larger tbarrZ.Initially the difference is exactly m, and the difference goes to zero aspgoes to infinity. This means that initially, when few patterns are stored,the right units are very likely to be retrieved, since they have the advan-tage of receiving activation from at least m binding units. However,when more patterns are stored, rdmost every unit in a feature mapreceives the same amount of activation. Units that have been usedmost often are most likely to be retrieved.

It is difficult to derive an exact expression for the expected value andvariance of XC(the intersection of c retrieval cues) because the bindingconstellations are correlated. Since the same partial patterns might havebeen stored several times, certain combinations of retrieval cues cangive rise to spurious activity in the binding layer, which is difficult totake into account in the analysis. For this reason, the analysis is carriedout under the assumption that the chance of partial patterns is negligi-ble, which is reasonable in most cases and easy to check.

A.3. Martingales

Martingales give us a way to analyze the outcome of n trials withlimited dependencies. A martingale is a sequence of random variablesXO,..., Xn such that

E(X’IXO,...,Xi-, ) =Xi - ~, 1S i < n. (A.14)

If a martingale satisfies the Lipschitz condition, that is

IXi–Xi- ~I = 1, 1S i = n, (A.15)

then the following inequalities hold:

F’(Xis XO– Xti)s e-A’n (A.16)

These equations are called Azmna’s inequalities and they can beusedtobound the final value of the sequence within a chosen confidence level.

A.4. The BindingConstellationMartingale

Let ~E be defined as in Section 4.2:

~=z’.+(n-z’+(+ki-v)()

~ ki-.

= n – (n – Z’.) 1 – ; . (A.18)

()ki–v+ 1

+ 1–!n

. I-%+)ki-v+w”+’l1(n–1 n–Z’v

)( )

ki-v—— —— — l–~

n n n

– Z’v–l–( )

b–v— l–~

n n

(A.19)

<1. (A.20)

In both cases we have [~E– ~E-, I<1 and hence Azuma’s inequalitycan be applied to obtain a bound for Z.

A.5. The IntersectionMartingaleLet X; be defined as in Appendix 4.3:

XE=X, + (., –v)(rrz –X’v)““ ., —v

(.l – v)nz. HXf jn. —v ‘ n, —v

(A.21)

First, X~,;.uxPEXE is shown to be a martingrde. The expected increaseof X’v is ; -,;;. The conditional expected value of X,B given X:-lis

E(X; IX;- ~=$- ~)=E(X; IX’V- ~=x’j- ,)

rr.2—X’j-, (.l – v)rr~= ~(x’j-l + n, –v+ ~)+ —

s n, – v

n. —nl (n, –nl)nz + (.l – v)nz.—~n, – v + 1x j- 1+ (n, – v)(rr,– v+ 1) ., —v

., —n, (n, –nl)nz.— rn, – v + 1x‘- 1+ (n, – v)(rr,– v + 1)+ (., – v)(rrl– v)nz +(.l –v)nz

(., –v)(n, –v+ 1)rr~—nl (n, – v + l)rr*= —X’j– ~+

n, – v + 1 rr$– v + 1

=2-,. (A.22)

Since E(X~lX~-~)=X~- ~, the sequence X:, ..., X~l is a martingale.To a ply Azuma’s inequality, it is necessary to show that

IX! –X< ~I= 1forv = 1,...,nl. Unlike for the martingale of AppendixA.4, this is not generally true, but true only when certain constraints onnl, .2 md n, are satisfied. If these parameters are fixed, the differencecan be seen as a function of v, X’. and X’,-1. The function is monotonicin these variables, and to find its maximum, it is sufficient to look only

1036 M. Moll and R. Miiklzulainen

at the extreme values of v, X’, snd X .–l. The maximum in turn deter-mines the consh-aintson nl, n2 and n..

As in Appendix A.4, the proof consists of two main cases: (1) X’, =X’y-l and (2) X’v = X’v-l + 1. Each of these is divided into severalsrrbcases.The following tables show the cases, the absolute differencein each case between X.E and X~-l, and the constraint that follows fromthedifference.

case1: X’v = x’v-~.In this case, using (A.21), IX! –X;- ~I can be rewritten

(.2–X1,)(., -n,) ~(n, –,)(.” –v+ 1). ere are three subcases listed in the table below.

as

Subcaae Difference Resultingconstraint

v = 1,x’”=o n2(% – w)rt=(n8– 1) n, >0 v n2 < ns

n2 if X’u = On, – nl + 1

v = n,, X’v = max(O,nl + nz – n,)nl + nz —1s n~

n$ —nlif X’ti = nl + n2 – n. no constraint

n. – nl + 1

n2 — nlif X’v = nl

v= n,, X’v = min(nl,nz) n$ – nl + 1 no constraintOif X’. = nz

case 2: x’, = xd_l + 1.Now IX: –Xv-, I can be rewritten as (“s~~’~~~j,~j~~)x’v).Again

there are three subcases listed in the table below.To conclude, if nl + nz – 1 = n,, then [X?– X:+, I= 1 and

Azuma’s inequality can be applied.

Subcase Difference Resultingconstraint

v = 1,X’v=1(n, – nl)(n, –nz)

n,(n. – 1)

n.– nl – nz+ 1IfX’v=1v = nl, X’v = max(l ,nl + nz – n$) n, – nl + 1 no constraint

Oif X’. = nl + nz – n,

v= n,, X’v = min(nl,nz)

n8— n2if X’v = nl

n. – n~ + 172$—nl

if X’v = nzn. – nl + 1

no constraint

.–—.-