Research Article Encoding Sequential Information in...

Research ArticleEncoding Sequential Information in SemanticSpace Models Comparing Holographic ReducedRepresentation and Random Permutation

Gabriel Recchia1 Magnus Sahlgren2 Pentti Kanerva3 and Michael N Jones4

1University of Cambridge Cambridge CB2 1TN UK2Swedish Institute of Computer Science 164 29 Kista Sweden3Redwood Center for Theoretical Neuroscience University of California Berkeley Berkeley CA 94720 USA4Indiana University Bloomington IN 47405 USA

Correspondence should be addressed to Michael N Jones jonesmnindianaedu

Received 14 December 2014 Accepted 26 February 2015

Academic Editor Carlos M Travieso-Gonzalez

Copyright copy 2015 Gabriel Recchia et alThis is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Circular convolution and random permutation have each been proposed as neurally plausible binding operators capable ofencoding sequential information in semantic memory We perform several controlled comparisons of circular convolution andrandom permutation as means of encoding paired associates as well as encoding sequential information Random permutationsoutperformed convolution with respect to the number of paired associates that can be reliably stored in a single memory tracePerformancewas equal on semantic taskswhen using a small corpus but randompermutationswere ultimately capable of achievingsuperior performance due to their higher scalability to large corpora Finally ldquonoisyrdquo permutations in which units are mappedto other units arbitrarily (no one-to-one mapping) perform nearly as well as true permutations These findings increase theneurological plausibility of random permutations and highlight their utility in vector space models of semantics

1 Introduction

Semantic space models (SSMs) have seen considerable recentattention in cognitive science both as automated tools toestimate semantic similarity between words and as psycho-logical models of how humans learn and represent lexicalsemantics from contextual cooccurrences (for a review see[1]) In general these models build abstract semantic repre-sentations for words from statistical redundancies observedin a large corpus of text (eg [2 3]) As tools the modelshave provided valuable metrics of semantic similarity forstimulus selection and control in behavioral experimentsusing words sentences and larger units of discourse [4ndash6] As psychological models the vectors derived from SSMsserve as useful semantic representations in computationalmodels of word recognition priming and higher-ordercomprehension processes [7ndash12] In addition the semanticabstraction algorithms themselves are often proposed asmodels of the cognitivemechanisms used by humans to learn

word meaning from repeated episodic experience althoughthere has been criticism that this theoretical claim may beoverextending the original intention of SSMs [13ndash15]

A classic example of an SSM is Landauer and Dumaisrsquo[2] latent semantic analysis model (LSA) LSA begins witha word-by-document matrix representation of a text corpuswhere a word is represented as a frequency distribution overdocuments Next a lexical association function is appliedto dampen the importance of a word proportionate to itsentropy across documents (see [16] for a review of functionsused in various SSMs) Finally singular value decompositionis applied to the matrix to reduce its dimensionality Inthe reduced representation a wordrsquos meaning is a vectorof weights over the 300 latent dimensions with the largesteigenvalues The dimensional reduction step has the effectof bringing out latent semantic relationships between wordsThe resulting space positions words proximally if they cooc-cur more frequently than would be expected by chance and

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2015 Article ID 986574 18 pageshttpdxdoiorg1011552015986574

2 Computational Intelligence and Neuroscience

also if they tend to occur in similar semantic contexts (evenif they never directly cooccur)

More recent SSMs employ sophisticated learning mech-anisms borrowed from probabilistic inference [18] holo-graphic encoding [19] minimum description length [20]random indexing [21] and global memory retrieval [22]However all are still based on the fundamental notion thatlexical semanticsmay be induced by observingword cooccur-rences across semantic contexts [23 24] and no single modelhas yet proven itself to be the dominant methodology [1]

Despite their successes both as tools and as psychologicalmodels current SSMs suffer from several shortcomingsFirstly the models have been heavily criticized in recentliterature because they learn only from linguistic informationand are not grounded in perception and action for a reviewof this debate see de Vega et al [25] The lack of perceptualgrounding is clearly at odds with the current literature inembodied cognition and it limits the ability of SSMs toaccount for human behavior on a variety of semantic tasks[26] While the current paper does not address the issueof incorporating perceptual grounding into computationalmodels trained on linguistic data the issue is discussed atlength in several recent papers (eg [15 27ndash30]) SecondlySSMs are often criticized as ldquobag of wordsrdquo models because(with the exception of several models to be discussed in thenext section) they encode only the contexts in which wordscooccur ignoring statistical information about the temporalorder of word use within those contexts Finally many SSMssuffer from a difficulty to scale to linguistic data comparableto what humans experience In this paper we simultaneouslyaddress order and scalability in SSMs

The Role of Word Order in Lexical Semantics A wealthof evidence has emphasized the importance of domain-general sequential learning abilities in language processing(see [31 32] for reviews) with recent evidence suggestingthat individual differences in statistical sequential learningabilitiesmay even partially account for variations in linguisticperformance [33] Bag of words SSMs are blind to wordorder informationwhen learning and this has been criticizedas an ldquoarchitectural failurerdquo of the models [13] insofar as itwas clear a priori that humans utilize order information inalmost all tasks involving semantic cognition For exampleinterpretation of sentencemeaning depends on the sequentialusage tendencies of the specific component words [34ndash40]

One common rebuttal to this objection is that order infor-mation is unimportant for many tasks involving discourse[2] However this seems to apply mostly to applied problemswith large discourse units such as automatic essay grading[41] A second rebuttal is that SSMs are models of how lexicalsemantics are learned and represented but not how wordsare used to build sentencephrase meaning [42 43] Henceorder is not typically thought of as a part of word learningor representation but rather how lexical representations areput together for comprehension of larger units of discourseCompositional semantics is beyond the scope of SSMs andinstead requires a process account of composition to buildmeaning fromSSMrepresentations and this is the likely stageat which order plays a role [9 11]

However this explanation is nowdifficult to defend givena recent flurry of research in psycholinguistics demonstratingthat temporal order information is used by humans whenlearning about words and that order is a core informationcomponent of the lexical representation of the word itselfThe role of statistical information about word order wastraditionally thought to apply only to the rules of wordusage (grammar) rather than the lexical meaning of theword itself However temporal information is now taking amore prominent role in the lexical representation of a wordrsquosmeaning Elman [44] has recently argued that the lexical rep-resentations of individual words contain information aboutcommon temporal context event knowledge and habits ofusage (cf [4 45ndash47]) In addition recent SSMs that integrateword order information have seen greater success at fittinga human data in semantic tasks than SSMs encoding onlycontextual information (eg [16 19 48ndash50])

The Role of Data Scale in Lexical Semantics SSMs have alsobeen criticized due to their inability to scale to realistic sizesof linguistic data [51 52] The current corpora that SSMssuch as LSA are frequently trained on contain approximatelythe number of tokens that children are estimated to haveexperienced in their ambient environment by age three(in the range of 10ndash30 million) not even including wordsproduced by the child during this time [14 53] Giventhat SSMs are typically evaluated using benchmarks elicitedfrom college-age participants it would be ideal if they weretrained upon a quantity of linguistic input approximating theexperience of this age

However SSMs that rely on computationally complexdecomposition techniques to reveal the latent componentsin a word-by-document matrix (eg singular value decom-position) are not able to scale up to corpora of hundredsof millions of tokens even with high-end supercomputingresources Although new methods for scaling up singularvalue decomposition to larger input corpora have shownpromise [54 55] there will always be a practical upper limitto the amount of data that can be processed when comparedto continuous vector accumulation techniques The problemis exacerbated by the fact that as the size of the corpusincreases the numbers of rows and columns in the matrixboth increase significantly the number of columns growslinearly in proportion to the number of documents and thenumber of rows grows approximately in proportion to thesquare root of the number of tokens (Heaprsquos law)

As the availability of text increases it is an open questionwhether a better solution to semantic representation is toemploy simpler algorithms that are capable of both inte-grating order information and scaling up to take advantageof large data samples or whether time would better bespent optimizing decomposition techniques Recchia andJones [52] demonstrated that although using an extremelysimple method (a simplified version of pointwise mutualinformation) to assess word pairsrsquo semantic similarity wasoutperformed by more complex models such as LSA onsmall text corpora the simple metric ultimately achievedbetter fits to human data when it was scaled up to an inputcorpus that was intractable for LSA Similarly Bullinaria and

Computational Intelligence and Neuroscience 3

Levy [16] found that simple vector space representationsachieved high performance on a battery of semantic taskswith performance increasing monotonically with the size ofthe input corpus In addition Louwerse and Connellrsquos [56]simulations indicated that first-order cooccurrence structurein text was sufficient to account for a variety of behavioraltrends that had seemed to be indicative of a ldquolatentrdquo learningmechanism provided that the text learned from was at asufficiently large scaleThese findings were one factor that ledthese authors to favor simple and scalable algorithms tomorecomplex nonscalable algorithms

The issue of scalability is more than simply a practicalconcern of computing time Connectionist models of seman-tic cognition (eg [57 58]) have been criticized because theyare trained on ldquotoyrdquo artificial languages that have desirablestructure built-in by the theorist These small training setsdo not contain the complex structure inherent in real naturallanguage To produce humanlike behavior with an impover-ished training set the models are likely to be positing overlycomplex learning mechanisms compared to humans wholearn from experience with much larger amounts of complexlinguistic data Hence humans may be using considerablysimpler learning mechanisms because much of the requisitecomplexity to produce their semantic structure is the resultof large sampling from a more complex dataset [14 19] Amodel of human learning should be able to learn data at acomparable scale to what humans experience or it risks beingoverly complex As Onnis and Christiansen [59] have notedmany models of semantic learning ldquoassume a computationalcomplexity and linguistic knowledge likely to be beyond theabilities of developing young childrenrdquo [59 abstract]

The same complexity criticism applies to most currentSSMs Although they learn from real-world linguistic datarather than artificial languages the amount of data they learnfrom is only about 5 of what is likely experienced by thecollege-age participants who produce the semantic data thatthe models are fit to Of course a strict version of this argu-ment assumes equal and unchanging attention to incomingtokens which is unlikely to be true (see [7 60]) Hence toproduce a good fit to the human data with impoverishedinput we may be developing SSMs that have unnecessarycomplexity built into them This suggestion explains whyrecent research with simple and scalable semantic modelshas found that simple models that scale to large amountsof data consistently outperform computationally complexmodels that have difficulty scaling (eg [51 52] cf [61])

2 Methods of IntegratingWord Order into SSMs

Early work with recurrent neural networks [57 62ndash64]demonstrated that paradigmatic similarity between wordscould be learned across a distributed representation byattending to the sequential surroundings of the word in thelinguistic stream However this work was limited to smallartificial languages and did not scale to natural languagecorpora More recently work by Howard and colleagueswith temporal context models [65ndash68] has shown promise

at applying neurally inspired recurrent networks of temporalprediction by the hippocampal system to large real-worldlanguage corpora Tong and colleagues have demonstratedthe utility of echo state networks in learning a grammarwith long-distance dependencies [69] although their workfocused on a corpus of an artificial language similar to thatof Elman [70] In a similar vein liquid state machines havebeen successfully trained upon a corpus of conversationsobtained from humans performing cooperative search tasksto recognize phrases unfolding in real time [71]

Other noteworthy works on distributional representa-tions of wordmeaning include ldquodeep learningrdquo methods [72]which have attracted increasing attention in the artificialintelligence and machine learning literature due to theirimpressive performance on a wide variety of tasks (see [7374] for reviews) Deep learning refers to a constellation ofrelated methods for learning functions composed of multiplenonlinear transformations by making use of ldquodeeprdquo (iehighly multilayered) neural networks Intermediate layerscorresponding to intermediate levels of representation aretrained one at a time with restricted Boltzmann Machinesautoencoders or other unsupervised learning algorithms[72 75 76] These methods have been applied to constructdistributed representations of word meaning [77ndash79] andcompositional semantics [80] Of particular relevance to thepresent work recurrent neural networksmdashreferred to as theldquotemporal analoguerdquo of deep neural networks [81]mdashhavebeen successfully used to model sequential dependencies inlanguage By applying a variant of Hessian-free optimizationto recurrent neural networks Sutskever et al [82] surpassedthe previous state-of-the-art performance in character-levellanguage modeling Similarly Mikolov et al [80] achievednew state-of-the-art performance on the Microsoft ResearchSentence Completion challenge with a weighted combinationof an order-sensitive neural network language model and arecurrent neural network language model

The improvements in performance achieved by deeplearningmethods over the past decade and the variety of taskson which these improvements have been realized are suchthat deep learning has been referred to as a ldquobreakthroughrdquoin machine learning within academia and the popular press[74] However reducing the computational complexity oftraining deep networks remains an active area of researchand deep networks have not been compared with humanperformance on ldquosemanticrdquo behavioral tasks (eg semanticpriming and replicating human semantic judgments) asthoroughly as have most of the SSMs described previously inthis section Furthermore although deep learning methodshave several properties that are appealing from a cognitiveperspective [73] researchers in machine learning are typ-ically more concerned with a methodrsquos performance andmathematical properties than its cognitive plausibility Giventhe similarity in the ultimate goals of both approachesmdashthe development of unsupervised and semisupervised meth-ods to compute vector representations of word meaningmdashcognitive scientists and machine learning researchers alikemay benefit from increased familiarity with the most popularmethods in each otherrsquos fields This is particularly true giventhat both fields often settle on similar research questions


for example how best to integrate distributional lexicalstatistics with information from other modalities Similar tofindings in cognitive science demonstrating that better fits tohuman data are achieved when a distributed model is trainedsimultaneously (rather than separately) on textual data anddata derived from perceptual descriptions [27] performancewith deep networks is improved when learning features forone modality (eg video) with features corresponding to asecond modality (eg audio) simultaneously rather than inisolation [83]

One of the earliest large-scale SSMs to integrate sequen-tial information into a lexical representation was the Hyper-space Analogue to Language model (HAL [3]) and ithas been proposed that HAL produces lexical organizationakin to what a large-scale recurrent network would whentrained on language corpora [84] HAL essentially tabulatesa word-by-word cooccurrence matrix in which cell entriesare inversely weighted by distance within a moving window(typically 5ndash10 words) slid across a text corpus A wordrsquosfinal lexical representation is the concatenation of its row(words preceding target) and column (words succeedingtarget) vectors from the matrix normalized by length toreduce the effect of marginal frequency Typically columnswith the lowest variance are removed prior to concatenationto reduce dimensionality HAL has inspired several relatedmodels for tabulating context word distances (eg [50 8586]) and this general class of model has seen considerablesuccess at mimicking human data from sources as diverseas deep dyslexia [87] lexical decision times [88] semanticcategorization [15 89] and information flow [90]

Topic models (eg [18]) have seen a recent surge ofpopularity in modeling the semantic topics from whichlinguistic contexts could be generated Topic models havebeen very successful at explaining high-level semantic phe-nomena such as the structure of word association normsbut they have also previously been integrated with hidden-Markov models to simultaneously learn sequential structure[48 91] These models either independently infer a wordrsquosmeaning and its syntactic category [91] or infer a hierarchicalcoupling of probability distributions for awordrsquos topic contextdependent on its sequential state Although promising formalapproaches neither model has yet been applied to modelbehavioral data

An alternative approach to encoding temporal infor-mation in vector representations is to use vector bindingbased on high-dimensional random representations (for areview see [92]) Two random binding models that havebeen successfully applied to language corpora are the boundencoding of the aggregate language environment model(BEAGLE [19]) and the random permutation model (RPM[17]) BEAGLE and RPM can both be loosely thought ofas noisy 119899-gram models Each uses a dedicated function toassociate two contiguous words in a corpus but may recur-sively apply the same function to create vectors representingmultiple chunks For example in the short phrase ldquoMaryloves Johnrdquo an associative operator can be used to createa new vector that represents the 119899-grams Mary-loves and(Mary-loves)-John The continuous binding of higher-order119899-grams from a single operator in this fashion is remarkably

simple but produces very sophisticated vector representationsthat contain word transition information In addition theassociative operations themselves may be inverted to retrievefrom memory previously stored associations Hence giventhe probe Mary John the operation can be inverted toretrieve plausible words that fit this temporal context fromthe training corpus that are stored in a distributed fashion inthe vector The applications of BEAGLE and RPM to naturallanguage processing tasks have been studied extensivelyelsewhere The focus of this current set of experiments is tostudy their respective association operators in depth

Rather than beginning with a word-by-documentmatrixBEAGLE and RPM each maintain a static randomly gener-ated signal vector for eachword in the lexicon Awordrsquos signalvector is intended to represent the mental representationelicited by its invariant physical properties such as orthog-raphy and phonology In both models this signal structureis assumed to be randomly distributed across words in theenvironment but vectors with realistic physical structure arealso now possible and seem to enhance model predictions[93]

BEAGLE and RPM also maintain dynamic memoryvectors for each word A wordrsquos memory representation isupdated each time it is experienced in a semantic contextas the sum of the signal vectors for the other words in thecontext By this process a wordrsquos context is a mixture ofthe other words that surround it (rather than a frequencytabulation of a document cooccurrence) and words thatappear in similar semantic context will come to have similarmemory representations as they have had many of thesame random signal vectors summed into their memoryrepresentations Thus the dimensional reduction step inthese models is implicitly achieved by superposition of signalvectors and seems to accomplish the same inductive results asthose attained by dimensional reduction algorithms such asin LSA but without the heavy computational requirements[49 94] Because they do not require either the overheadof a large word-by-document matrix or computationallyintensive matrix decomposition techniques both BEAGLEand RPM are significantly more scalable than traditionalSSMs For example encoding with circular convolution inBEAGLE can be accomplished in 119874(119896 log 119896) time where 119896

is a constant representing the number of dimensions in thereduced representation [95] and in 119874(119896) time with randompermutation By contrast the complexity of LSA is 119874(119911 + 119896)where 119911 is the number of nonzero entries in the matrix and 119896

is the number of dimensions in the reduced representation[96] Critically 119911 increases roughly exponentially with thenumber of documents [97] Scalable and incremental randomvector accumulation has been shown to be successful ona range of experimental tasks without being particularlysensitive to the choice of parameters such as dimensionality[21 94 98 99]

To represent statistical information about the temporalorder in which words are used BEAGLE and RPM bindtogether 119899-gram chunks of signal vectors into compositeorder vectors that are added to the memory vectors duringtraining Integrating information about a wordrsquos sequentialcontext (where words tend to appear around a target) in


BEAGLE has produced greater fits to human semantic datathan only encoding a wordrsquos discourse context (what wordstend to appear around a target [19 49]) Similarly Sahlgrenet al [17] report superior performance when incorporatingtemporal information about word order Hence in bothmodels a wordrsquos representation becomes a pattern of ele-ments that reflects both its history of cooccurrence withand position relative to other words in linguistic experienceAlthough BEAGLE and RPM differ in respects such as vectordimensionality and chunk size arguably the most importantdifference between them is the binding operation used tocreate order vectors

BEAGLE uses the operation of circular convolution tobind together signal vectors into a holographic reduced rep-resentation (HRR [95 100]) of 119899-gram chunks that containeach target word Convolution is a binary operation (denotedby ⊛) performed on two vectors such that every element

119894of

= ( ⊛ 119910) is given by

119894=

119863minus1

sum

119895minus0

119895mod119863 sdot 119910

(119894minus119895)mod119863 (1)

where 119863 is the dimensionality of and 119910 Circular convolu-tion can be seen as amodulo-119899 variation of the tensor productof two vectors and 119910 such that is of the same dimen-sionality as and 119910 Furthermore although is dissimilarfrom both and 119910 by any distance metric approximationsof and 119910 can be retrieved via the inverse operation ofcorrelation (not related to Pearsonrsquos 119903) for example 119910 asymp Hence not only can BEAGLE encode temporal informationtogether with contextual information in a single memoryrepresentation but also it can invert the temporal encodingoperation to retrieve grammatical information directly froma wordrsquos memory representation without the need to storegrammatical rules (see [19]) Convolution-based encodingand decoding have many precedents in memory modeling(eg [101ndash106]) and have played a key role in models of manyother cognitive phenomena aswell (eg audition [107] objectperception [108] perceptual-motor skills [109] reasoning[110])

In contrast to convolution RPM employs the unary oper-ation of random permutation (RP [17]) to encode temporalinformation about a word RPs are functions that map inputvectors to output vectors such that the outputs are simplyrandomly shuffled versions of the inputs

Π 997888rarr lowast (2)

such that the expected correlation between and lowast is zero

Just as ( ⊛ 119910) produces a vector that differs from and 119910 butfrom which approximations of and 119910 can be retrieved thesum of two RPs of and 119910 (997888119911 = Π + Π

2119910) where Π

2119910 is

defined asΠ(Π119910) produces a vector 997888119911 dissimilar from and

119910 but from which approximations of the original and 119910 canbe retrieved via Π

minus1 and Π

minus2 respectively

Both convolution and random permutation offer efficientstorage properties compressing order information into asingle composite vector representation and both encodingoperations are reversible However RPs are much more

computationally efficient to compute In language applica-tions of BEAGLE the computationally expensive convolutionoperation iswhat limits the size of a text corpus that themodelcan encode As several studies [16 17 52] have demonstratedscaling a semantic model to more data produces much betterfits to human semantic data Hence both order informationand magnitude of linguistic input have been demonstratedto be important factors in human semantic learning If RPsprove comparable to convolution in terms of storage capacityperformance on semantic evaluation metrics and cognitiveplausibility the scalability of RPs to large datasets may affordthe construction of vector spaces that better approximatehuman semantic structure while preserving many of thecharacteristics that have made convolution attractive as ameans of encoding order information

For scaling to large corpora the implementation ofRPs in semantic space models is more efficient than thatof circular convolution This is partly due to the highercomputational complexity of convolutionwith respect to vec-tor dimensionality Encoding 119896-dimensional bindings withcircular convolution can be accomplished in 119874(119896 log 119896) time[95] by means of the fast Fourier transform (FFT) Thealgorithm to bind two vectors 119886 and 119887 in 119874(119896 log 119896) timeinvolves calculating discrete Fourier transforms of 119886 and 119887multiplying them pointwise to yield a new vector 119888 andcalculating the inverse discrete Fourier transform of 119888 In theBEAGLE model storing a single bigram (eg updating thememory vector of ldquofoxrdquo upon observing ldquored foxrdquo) wouldrequire one such 119874(119896 log 119896) binding as well as the additionof the resulting vector 119888 to the memory vector of ldquofoxrdquo

In contrast encoding with RPs can be accomplished in119874(119896) (ie linear) time as permuting a vector only requirescopying the value at every index of the original vector to adifferent index of another vector of the same dimensionalityFor example the permutation functionmay state that the firstcell in the original vector should be copied to the 1040th cellof the new vector that the next should be copied to the 239thcell of the new vector and so on Thus this process yieldsa new vector that contains a shuffled version of the originalvector in a number of steps that scales linearly with vectordimensionality To update the memory vector of ldquofoxrdquo uponobserving ldquored foxrdquo RPM would need to apply this processto the environmental vector of ldquoredrdquo yielding a new shuffledversion that would then be added to the memory vector ofldquofoxrdquo

In addition to the complexity difference the calculationsinvolved in the FFT implementation of convolution requiremore time to execute on each vector element than the copyoperations involved in random permutation Combiningthese two factors means that circular convolution is consid-erably less efficient than random permutation in practice Ininformal empirical comparisons using the FFT routines in apopular open-source mathematics library (httpmathnet)we found circular convolution to be over 70 times slowerthan randompermutation at a vector dimensionality of 2048Due to convolutionrsquos greater computational complexity thegap widened even further as dimensionality increasedThesefactors made it impossible to perform our simulations withBEAGLE on the large corpus


We conducted four experiments intended to compareconvolution and RP as means of encoding word orderinformation with respect to performance and scalability InExperiment 1 we conducted an empirical comparison ofthe storage capacity and the probability of correct decodingunder each method In Experiment 2 we compared RP withconvolution in the context of a simple vector accumulationmodel equivalent to BEAGLErsquos ldquoorder spacerdquo on a batteryof semantic evaluation tasks when trained on a Wikipediacorpus The model was trained on both the full corpus anda smaller random subset results improved markedly whenRP is allowed to scale up to the full Wikipedia corpus whichproved to be intractable for the convolution-based HRRmodel In Experiment 3 we specifically compared BEAGLEto RPM which differs from BEAGLE in several importantways other than its binding operation to assess whetherusing RP in the context of RPM improves performancefurther Finally Experiment 4 demonstrates that similarresults can be achieved with random permutations when theconstraint that every unit of the input must be mapped toa unique output node is removed We conclude that RP isa promising and scalable alternative to circular convolutionin the context of vector space models of semantic memoryand has properties of interest to computational modelers andresearchers interested in memory processes more generally

3 Experiment 1 AssociativeCapacity of HRR and RP

Plate [95] made a compelling case for the use of circularconvolution in HRRs of associative memory demonstratingits utility in constructing distributed representations withhigh storage capacity and high probability of correct retrievalHowever the storage capacity and probability of correctretrieval with RPs have not been closely investigated Thisexperiment compared the probability of correct retrievalof RPs with that of circular convolution and explored thehow the memory capacity of RPs varies with respect todimensionality number of associations stored and the natureof the input representation

31 Method As a test of the capacity of convolution-basedassociative memories Plate [95 Appendix D] describes asimple paired-associative memory task in which a retrievalalgorithm must select the vector

119894that is bound to its

associate 119910119894out of a set 119864 of 119898 possible random vectors The

retrieval algorithm is provided with a memory vector of theform

=

119896

sum

119894=1

(119894⊛ 119910119894) (3)

that stores a total of 119896 vectors All vectors are of dimensional-ity 119863 and each of

119894and 119910

119894is a normally distributed random

vector iid with elements sampled from 119873(0 1radic119863) Theretrieval algorithm is provided with the memory vector and the probe 119910

119894 and works by first calculating 119886 = ( 119910

119894)

where is the correlation operator described in detail in Plate

[95 pp 94ndash97] an approximate inverse of convolution Thealgorithm then retrieves the vector in the ldquoclean-upmemoryrdquoset 119864 that is the most similar to 119886 This is accomplished bycalculating the cosine between 119886 and each vector in the set119864 and retrieving the vector from 119864 for which the cosine ishighest If this vector is not equal to

119894 this counts as a

retrieval error We replicated Platersquos method to empiricallyderive retrieval accuracies for a variety of choices of 119896 and119863 keeping 119898 fixed at 1000

Sahlgren et al [17] bind signal vectors to positionsby means of successive self-composition of a permutationfunction Π and construct memory vectors by superposingthe results In contrast to circular convolution which requiresnormally distributed random vectors random permutationssupport a variety of possible inputs Sahlgren et al employrandom ternary vectors so-called because elements take onone of three possible values (+1 0 or minus1) These are sparsevectors or ldquospatter codesrdquo [111 112] whose elements are allzero with the exception of a few randomly placed positiveand negative values (eg two +1s and two minus1s) In thisexperiment we tested the storage capacity of an RP-basedassociative memory first with normally distributed randomvectors (Gaussian vectors) to allow a proper comparison toconvolution and second with random ternary vectors (sparsevectors) with a varying number of positive and negative valuesin the input

As for the choice of the permutation function itself anyfunction that maps each element of the input onto a differentelement of the output will do vector rotation (ie mappingelement 119894 of the input to element 119894 + 1 of the output withthe exception of the final element of the input which ismapped to the first element of the output) may be used forthe sake of efficiency [17] Using the notation of functionexponentiation employed in our previous work [17 113] Π119899refers to Π composed with itself 119899 times Π

2 = Π(Π)

Π3 = Π

2(Π) and so forth The notion of a memory

vector of paired associations can then be recast in RP termsas follows

= (Π 1199101+ Π21) + (Π

31199102+ Π42)

+ (Π5

1199103+ Π63) + sdot sdot sdot

(4)

where the task again is to retrieve some 997888119910119894rsquos associate

119894

when presented only with 997888119910119894and A retrieval algorithm

for accomplishing this can be described as follows given aprobe vector997888119910

119894 the algorithmapplies the inverse of the initial

permutation to memory vector yielding Πminus1

Next thecosine between Π

minus1 and the probe vector 997888

119910119894is calculated

yielding a value that represents the similarity between 997888119910119894and

Πminus1

The previous steps are then iterated the algorithmcalculates the cosine between 997888

119910119894andΠ

minus2 between 997888

119910119894and

Πminus3

and so forth until this similarity value exceeds somehigh threshold this indicates that the algorithm has ldquofoundrdquo997888119910119894in the memory At that point is permuted one more

time yielding 1015840 a noisy approximation of 997888

119910119894rsquos associate

119894

This approximation 1015840 can then be compared with clean-up

memory to retrieve the original associate 119894


Alternatively rather than selecting a threshold 119905RP maybe permuted some finite number of times having its cosinesimilarity to 119910

119894stored after each permutation In Platersquos [95

p 252] demonstration of the capacity of convolution-basedassociative memories the maximal number of pairs storedin a single memory vector was 14 we likewise restrict themaximal number of pairs in a singlememory vector to 14 (ie28 vectors total) Let 119899 be the inverse permutation 119899 for whichcosine(Πminus119899119905RP 119910119894)was the highestWe can permute onemoretime to retrieveΠ

minus119899minus1119905RP that is our noisy approximation 119909

1015840This method is appropriate if we always want our algorithmto return an answer (rather than say timing out before thethreshold is exceeded) and is the method we used for thisexperiment

The final clean-up memory step is identical to that usedby Plate [95] we calculate the cosine between 119909

1015840 and eachvector in the clean-up memory 119864 and retrieve the vectorin 119864 for which this cosine is highest As when evaluatingconvolution we keep 119898 (the number of vectors in 119864) fixedat 1000 while varying the number of stored vectors 119896 and thedimensionality 119863

32 Results and Discussion Five hundred pairs of normallydistributed random vectors were sampled with replacementfrom a pool of 1000 and the proportion of correct retrievalswas computed All 1000 vectors in the pool were potentialcandidates for retrieval an accuracy level of 01 would rep-resent chance performance Figure 1 reports retrieval accu-racies for the convolution-based algorithm while Figure 2reports retrieval accuracies for the RP formulation of the taskA 2 (algorithm convolution versus random permutations)times 4 (dimensionality 256 512 1024 2048) ANOVA withnumber of successful retrievals as the dependent variablerevealed a main effect of algorithm 119865(1 48) = 1185 119875 =

0001 with more successful retrievals when using randompermutations (119872 = 457 SD = 86) than when using circularconvolution (119872 = 381 SD = 145) There was also a maineffect of dimensionality 119865(3 48) = 189 119875 lt 0001 Theinteractionwas not significant119865(3 48) = 260119875 = 006 Posthoc TukeyrsquosHSD tests showed a significantly lower number ofsuccessful retrievals with vectors of dimensionality 256 thanwith any other vector dimensionality at an alpha of 005 Allother comparisons were not significant

Figure 3 reports retrieval accuracies for RPs when sparse(ternary) vectors consisting of zeroes and an equal numberof randomly placed minus1 and +1s were used instead of normallydistributed random vectors This change had no impact onperformance A 2 (vector type normally distributed versussparse) times 4 (dimensionality 256 512 1024 2048) ANOVAwas conducted with number of successful retrievals as thedependent variable The main effect of vector type wasnot significant 119865(1 48) = 0011 119875 = 092 revealing anearly identical number of successful retrievals when usingnormally distributed vectors (119872 = 457 SD = 86) asopposed to sparse vectors (119872 = 455 SD = 88) Therewas a main effect of dimensionality 119865(3 48) = 130 119875 lt

0001 and the interaction was not significant 119865(3 48) =

0004 119875 = 1 As before post hoc Tukeyrsquos HSD tests

0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)

Number of pairs stored in trace

20481024

512256

Dimensionality

Figure 1 Retrieval accuracies for convolution-based associativememories with Gaussian vectors

0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality

Figure 2 Retrieval accuracies for RP-based associative memorieswith Gaussian vectors

showed a significantly lower number of successful retrievalswith vectors of dimensionality 256 than with any othervector dimensionality at an alpha of 005 and all othercomparisons were not significant Figure 3 plots retrievalaccuracies against the number of nonzero elements in thesparse vectors demonstrating that retrieval accuracies leveloff after the sparse input vectors are populated with morethan a handful of nonzero elements Figure 4 parallels Figures1 and 2 reporting retrieval accuracies for RPs at a varietyof dimensionalities when sparse vectors consisting of twentynonzero elements were employed


0102030405060708090

100

2 6 10 14 18 22 26 30 34 38

Retr

ieva

l acc

urac

y (

)

Number of nonzero elements in index vectors

6 pairs6 pairs12 pairs12 pairs

Dimensionality512256512256

Figure 3 Retrieval accuracies for RP-based associative memorieswith sparse vectors The first number reported in the legend (6 or12) refers to the number of pairs stored in a single memory vectorwhile the other (256 or 512) refers to the vector dimensionality

2 4 6 8 10 12 14

20481024

512256

Dimensionality

0

10

20

30

40

50

60

70

80

90

100

Retr

ieva

l acc

urac

y (

)


Figure 4 Retrieval accuracies for RP-based associative memorieswith sparse vectors

Circular convolution has an impressive storage capac-ity and excellent probability of correct retrieval at highdimensionalities and our results were comparable to thosereported by Plate [95 p 252] in his test of convolution-based associative memories However RPs seem to sharethese desirable properties as well In fact the storage capacityof RPs seems to drop off more slowly than does the storagecapacity of convolution as dimensionality is reduced Thehigh performance of RPs is particularly interesting given

that RPs are computationally efficient with respect to basicencoding and decoding operations

An important caveat to these promising properties of RPsis that permutation is a unary operation while convolutionis binary While the same convolution operation can be usedto unambiguously bind multiple pairs in the convolution-based memory vector 119905 the particular permutation functionemployed essentially indexed the order in which an itemwas added to the RP-based memory vector 119905RP This iswhy the algorithm used to retrieve paired associates from119905RP necessarily took on the character of a sequential searchwhereby vectors stored inmemory were repeatedly permuted(up to some finite number of times corresponding to themaximumnumber of paired associates presumed to be storedin memory) and compared to memory We do not mean toimply that this is a plausible model of how humans storeand retrieve paired associates Experiment 1 was intendedmerely as a comparison of the theoretical storage capacityof vectors in an RP-based vector space architecture to thosein a convolution-based one and the results do not implythat this is necessarily a superior method of storing pairedassociates In RPM RPs are not used to store paired associatesbut rather to differentiate cue words occurring in differentlocations relative to the word being encoded As a resultpairs of associates cannot be unambiguously extracted fromthe ultimate space For example suppose the word ldquobirdrdquowas commonly followed by the words ldquohas wingsrdquo and in anequal number of contexts by the words ldquoeats wormsrdquo Whilethe vector space built up by RPM preserves the informationthat ldquobirdrdquo tends to be immediately followed by ldquohasrdquo andldquoeatsrdquo rather than ldquowormsrdquo or ldquowingsrdquo it does not preservethe bindings ldquobird eats wormsrdquo and ldquobird eats wingsrdquo wouldbe rated as equally plausible by the model Therefore onemight expect RPM to achieve poorer fits to human data (egsynonymy tests Spearman rank correlations with humansemantic similarity judgments) than a convolution-basedmodel In order to move from the paired-associates problemof Experiment 1 to a real language task we next evaluatehow a simple vector accumulation model akin to Jonesand Mewhortrsquos [19] encoding of order-only information inBEAGLEwould performon a set of semantic tasks if RPswereused in place of circular convolution

4 Experiment 2 HRR versusRP on Linguistic Corpora

In this study we replaced the circular convolution compo-nent of BEAGLE with RPs so that we could quantify theimpact that the choice of operation alone had on fits tohuman-derived judgments of semantic similarity Due to thecomputational efficiency of RPs we were able to scale themto a larger version of the same corpus and simultaneouslyexplore the effect of scalability and the method by whichorder information was encoded

41 Method Order information was trained using both theBEAGLE model and a modified implementation of BEAGLEin which the circular convolution operation was replaced


with RPs as they are described in Sahlgren et al [17] Abrief example will illustrate how this replacement changes thealgorithm Recall that in BEAGLE each word 119908 is assigned astatic ldquoenvironmentalrdquo signal vector 119890

119908as well as a dynamic

memory vector 119898119908that is updated during training Recall

also that the memory vector of a word 119908 is updated byadding the sumof the convolutions of all 119899-grams (up to somemaximum length 120582) containing 119908 Upon encountering thephrase ldquoone two threerdquo in a corpus the memory vector forldquoonerdquo would normally be updated as follows

119898one = 119898one + (Φ ⊛ 119890two) + (Φ ⊛ 119890two ⊛ 119890three) (5)

where Φ is a placeholder signal vector that represents theword whose representation is being updated In the modifiedBEAGLE implementation used in this experiment the mem-ory vector for ldquoonerdquo would instead be updated as

119898one = 119898one + Π119890two + Π2119890three (6)

In addition to order information the complete BEAGLEmodel also updates memory vectors with context informa-tion Each time a word 119908 appears in a document the sum ofall of the environmental signal vectors of words cooccurringwith 119908 in that document is added to 119898

119908 In this experiment

context information is omitted our concern here is with thecomparison between convolution and random permutationswith respect to the encoding of order information onlyEncoding only order information also allowed us to becertain that any differences observed between convolutionand RPs were unaffected by the particular stop list orfrequency threshold applied to handle high-frequencywordsas BEAGLE uses no stop list or frequency thresholding for theencoding of order information

The RP-based BEAGLE implementation was trainedon a 233GB corpus (418 million tokens) of documentsfrom Wikipedia (from [114]) Training on a corpus thislarge proved intractable for the slower convolution-basedapproach Hence we also trained both models on a 35MBsix-million-token subset of this corpus constructed by sam-pling random 10-sentence documents from the larger corpuswithout replacement The vector dimensionality 119863 was set to1024 the lambda parameter indicating the maximum lengthof an encoded 119899-gram was set to 5 and the environmentalsignal vectors were drawn randomly from a normal distribu-tion with 120583 = 0 and 120590 = 1radic119863 Accuracy was evaluated ontwo synonymy tests English as a Second Language (ESL) andthe Test of English as a Foreign Language (TOEFL) synonymyassessments Spearman rank correlations to human judg-ments of the semantic similarity of word pairs were calculatedusing the similarity judgments obtained from Rubensteinand Goodenough (R [115]) Miller and Charles (MC [116])Resnik (R [117]) and Finkelstein et al (F [118]) A detaileddescription of these measures can be found in Recchia andJones [52]

42 Results and Discussion Table 1 provides a comparison ofeach variant of the BEAGLE model Three points about theseresults merit special attention First there are no significant

Table 1 Comparisons of variants of BEAGLE differing by bindingoperation

TaskWikipedia subset Full Wikipedia

Convolution Randompermutation Random permutation

ESL 020 026 032TOEFL 046dagger 046dagger 063dagger

RG 007 minus006 032lowast

MC 008 minus001 033lowast

R 006 minus004 035lowast

F 013lowast 012lowast 033lowastlowastSignificant correlation 119875 lt 005 one-taileddaggerAccuracy score differs significantly from chance 119875 lt 005 one-tailedNote For synonymy tests (ESL TOEFL) values represent the percentage ofcorrect responsesFor all other tasks values represent Spearman rank correlations betweenhuman judgments of semantic similarity and those of the model Abbrevi-ations for tasks are defined in the main text of the paper

differences between the performance of convolution and RPson the small corpus Both performed nearly identically onF and TOEFL neither showed any significant correlationswith human data on RG MC R nor performed better thanchance on ESL Second both models performed the best byfar on the TOEFL synonymy test supporting Sahlgren etalrsquos [17] claim that order information may indeed be moreuseful for synonymy tests than tests of semantic relatednessas paradigmatic rather than syntagmatic information sourcesare most useful for the former It is unclear exactly whyneither model did particularly well on ESL as models oftenachieve scores on it comparable to their scores on TOEFL[52] Finally only RPs were able to scale up to the fullWikipedia corpus and doing so yielded strong benefits forevery task

Note that the absolute performance of these models isirrelevant to the important comparisons Because we wantedto encode order information only to ensure a fair comparisonbetween convolution and RPs context information (eginformation about words cooccurring within the documentirrespective of order) was deliberately omitted from themodel despite the fact that the combination of context andorder information is known to improve BEAGLErsquos absoluteperformance In addition to keep the simulation as well-controlled as possible we did not apply common transfor-mations (eg frequency thresholding) known to improveperformance on synonymy tests [119] Finally despite the factthat our corpus and evaluation tasks differed substantiallyfrom those used by Jones and Mewhort [19] we kept allparameters identical to those used in the original BEAGLEmodel Although these decisions reduced the overall fit ofthe model to human data they allowed us to conduct twokey comparisons in awell-controlled fashion the comparisonbetween the performance of circular convolution and RPswhen all other aspects of the model are held constant and thecomparison in performance between large and small versionsof the same corpus The performance boost afforded by thelarger corpus illustrates that in terms of fits to human data


model scalability may be a more important factor than theprecise method by which order information is integrated intothe lexicon Experiment 3 explores whether the relatively lowfits to human data reported in Experiment 2 are improvedif circular convolution and random permutations are usedwithin their original models (ie the original parametriza-tions of BEAGLE and RPM resp)

5 Experiment 3 BEAGLE versus RPM onLinguistic Corpora

In contrast with Experiment 2 our present aim is to compareconvolution and RPs within the context of their originalmodels Therefore this simulation uses both the completeBEAGLE model (ie combining context and order informa-tion rather than order information alone as in Experiment2) and the complete RPM using the original parameters andimplementation details employed by Sahlgren at al (egsparse signal vectors rather than the normally distributedrandom vectors used in Experiment 2 window size andvector dimensionality) In addition we compare modelfits across the Wikipedia corpus from Experiment 2 andthe well-known TASA corpus of school reading materialsfrom kindergarten through high school used by Jones andMewhort [19] and Sahlgren et al [17]

51 Method Besides using RPs in place of circular con-volution the specific implementation of RPM reported bySahlgren et al [17] differs from BEAGLE in several wayswhich we reimplemented to match their implementationof RPM as closely as possible A primary difference is therepresentation of environmental signal vectors RPM usessparse ternary vectors consisting of all 0s two +1s and twominus1s in place of random Gaussians Additionally RPM usesa window size of 2 words on either side for both orderand context vectors contrasting with BEAGLErsquos windowsize of 5 for order vectors and entire sentences for contextvectors Sahlgren et al also report optimal performancewhen the order-encoding mechanism is restricted to ldquodirec-tion informationrdquo (eg the application of only two distinctpermutations one for words appearing before the wordbeing encoded and another for words occurring immedi-ately after rather than a different permutation for wordsappearing at every possible distance from the encoded word)Other differences included lexicon size (74100 words toBEAGLErsquos 90000) dimensionality (RPMperforms optimallyat a dimensionality of approximately 25000 contrastedwith common BEAGLE dimensionalities of 1024 or 2048)and the handling of high-frequency words RPM applies afrequency threshold omitting the 87 most frequent wordsin the training corpus when adding both order and contextinformation to memory vectors while BEAGLE appliesa standard stop list of 280 function words when addingcontext information only We trained our implementationof RPM and the complete BEAGLE model (context + orderinformation) on theWikipedia subset as well as TASA Otherthan the incorporation of context information (as describedin Experiment 2) to BEAGLE all BEAGLE parameters were

Table 2 Comparison of BEAGLE and RPM by corpus

Task Wikipedia subset TASA Full WikipediaBEAGLE RPM BEAGLE RPM RPM

ESL 024 027 030 036dagger 050dagger

TOEFL 047dagger 040dagger 054dagger 077dagger 066dagger

RG 010 010 021 053lowast 065lowast

MC 009 012 029 052lowast 061lowast

R 009 003 030 056lowast 056lowast

F 023lowast 019lowast 027lowast 033lowast 039lowastlowastSignificant correlation 119875 lt 005 one-taileddaggerAccuracy score differs significantly from chance 119875 lt 005 one-tailedNote For synonymy tests (ESL TOEFL) values represent the percentage ofcorrect responsesFor all other tasks values represent Spearman rank correlations betweenhuman judgments of semantic similarity and those of the model Abbrevi-ations for tasks are defined in the main text of the paper

identical to those used in Experiment 2 Accuracy scores andcorrelations were likewise calculated on the same battery oftasks used in Experiment 2

52 Results and Discussion Performance of all models onall corpora and evaluation tasks are reported in Table 2 Onthe small Wikipedia subset BEAGLE and RPM performedsimilarly across evaluation tasks with fits to human databeingmarginally higher than the versions of BEAGLE trainedon order information only from Experiment 2 respectivelyAs in Experiment 2 only the RP-based model proved capa-ble of scaling up to the full Wikipedia corpus and againachieved much better fits to human data on the large datasetConsistent with previous research [16] the choice of corpusproved just as critical as the amount of training data withboth models performing significantly better on TASA thanon the Wikipedia subset despite a similar quantity of textin both corpora In some cases RPM achieved even betterfits to human data when trained on TASA than on thefull Wikipedia corpus It is perhaps not surprising that aTASA-trained RPM would result in superior performanceon TOEFL as RPM was designed with RPs in mind fromthe start and optimized with respect to its performance onTOEFL [17] However it is nonetheless intriguing that theversion of RPM trained on the full Wikipedia in order spacewas able to perform well on several tasks that are typicallyconceived of as tests of semantic relatedness and not testsof synonymy per se While these results do not approachthe high correlations on these tasks achieved by state-of-the-art machine learning methods in computational linguisticsour results provide a rough approximation of the degree towhich vector space architectures based on neurally plausiblesequential encoding mechanisms can approximate the high-dimensional similarity space encoded in human semanticmemory Our final experiment investigates the degree towhich a particularly neurologically plausible approximationof a random permutation function can achieve similar per-formance to RPs with respect to the evaluation tasks appliedin Experiments 2 and 3


(a) (b)

Figure 5 (a) Visual representation of a random permutation function instantiated by a one-layer recurrent network that maps each nodeto a unique node on the same layer via copy connections The network at left would transform an input pattern of ⟨01 02 03 04 05⟩ to⟨05 01 04 02 03⟩ (b) A one-layer recurrent network in which each node is mapped to a random node on the same layer but which lacksthe uniqueness constraint of a random permutation function Multiple inputs feeding into the same node are summed Thus the network atrightwould transform an input pattern of ⟨01 02 03 04 05⟩ to ⟨04 01 08 0 02⟩ At high dimensions replacing the randompermutationfunction in the vector space model of Sahlgren et al [17] with an arbitrarily connected network such as this has minimal impact on fits tohuman semantic similarity judgments (Experiment 4)

6 Experiment 4 Simulating RP withRandomly Connected Nodes

Various models of cognitive processes for storing and repre-senting information make use of a random element includ-ing self-organizing feature maps the Boltzmann machinerandomly connected networks of sigma-pi neurons andldquoreservoir computingrdquo approaches such as current popularliquid state models of cerebellar function [92 120ndash122]Random states are employed by suchmodels as a stand-in forunknown or arbitrary components of an underlying structure[92] Models that make use of randomness in this way shouldtherefore be seen as succeeding in spite of randomnessnot because of it While the use of randomness does nottherefore pose a problem for the biological plausibility ofmathematical operations proposed to have neural analoguesdefending random permutationsrsquo status as proper functionswhereby each node in the input maps to a unique node in theoutput (Figure 5(a)) would require some biologically plausi-ble mechanism for ensuring that this uniqueness constraintwas met However this constraint may impose needlessrestrictions In this experiment we investigate random per-mutation functions with simple random connections (RCs)mapping each input node to a random not necessarily uniquenode in the same layer (Figure 5(b)) This is equivalent tothe use of random sampling with replacement as opposedto random sampling without replacement While randomsampling without replacement requires some process to beposited by which a single element is not selected twicerandom sampling with replacement requires no such con-straint If a model employing random connections (withoutreplacement) can achieve similar fits to human data as thesame model employing random permutations this would

eliminate one constraint that might otherwise be a strikeagainst the neural plausibility of the general method

61 Method We replaced the random permutation functionin RPMwith random connections for example randommap-pings between input and output nodes with no uniquenessconstraint (eg Figure 5(b)) For each node a unidirectionalconnection was generated to a random node in the samelayer After one application of the transformation the valueof each node was replaced with the sum of the values of thenodes feeding into it Nodes with no incoming connectionshad their values set to zero The operation can be formallydescribed as a transformation 119879 whereby each element of anoutput vector119910 is updated according to the rule119910

119896= sum119894119909119894119908119894119895

in which 119908 is a matrix consisting of all zeros except for arandomly placed 1 in every row (with no constraints on thenumber of nonzero elements present in a single column)As described in Experiment 3 Sahlgren et al [17] reportedthe highest fits to human data when encoding ldquodirectioninformationrdquo with a window size of two words on either sideof the target (ie the application of one permutation to thesignal vectors of the two words appearing immediately beforethe word being encoded and another to the signal vectors twowords occurring immediately after it) We approximated thistechnique with random connections in two different waysIn Simulation 1 we trained by applying the transformation 119879

only to the signal vectors of the words appearing immediatelybefore the encoded word applying no transformation tothe signal vectors of the words appearing afterward InSimulation 2 we trained by applying the update rule twiceto generate a second transformation 119879

2 (cf Sahlgren etalrsquos generation of successive permutation functions Π

2 Π3etc via reapplication of the same permutation function)


Table 3 Comparison of RPM using random connections (RC)versus random permutations (RP)

Task Full Wikipedia TASARC Sim 1 RC Sim 2 RP RC Sim 1 RC Sim 2 RP

ESL 050dagger 048dagger 050dagger 032 032 036dagger

TOEFL 066dagger 066dagger 066dagger 073dagger 074dagger 077dagger

RG 066lowast 064lowast 065lowast 053lowast 052lowast 053lowast

MC 063lowast 061lowast 061lowast 053lowast 051lowast 052lowast

R 058lowast 056lowast 056lowast 055lowast 054lowast 056lowast

F 039lowast 037lowast 039lowast 032lowast 033lowast 033lowastlowastSignificant correlation 119875 lt 005 one-taileddaggerAccuracy score differs significantly from chance 119875 lt 005 one-tailedNote For synonymy tests (ESL TOEFL) values represent the percentage ofcorrect responsesFor all other tasks values represent Spearman rank correlations betweenhuman judgments of semantic similarity and those of the model Abbrevi-ations for tasks are defined in the main text of the paper

119879 was applied to the signal vectors of the words appearingimmediately before the encoded word while 119879

2 was appliedto the signal vectors of the words appearing immediatelyafter All other details of RPM and all evaluation tasks wereidentical to those used in Experiment 3

62 Results andDiscussion Table 3 reports simulation resultsacross the fullWikipedia corpus and TASARC Sim 1 refers tothe random connection simulation referred to as Simulation1 in Methods RC Sim 2 to the one referred to as Simulation 2and RP to the original RPM simulation conducted in Experi-ment 3 Across tasks no consistent advantage was observedfor the version of RPM using RPs when contrasted witheither of the simulations in which RPs were replaced withrandom connections These results suggest that the unique-ness constraint is unnecessary for a network of randomlyconnected nodes to encode order information and that amore biologically plausible approximation of RPs can achievesimilar results However it is also worth noting that the lackof a one-to-one correspondence between inputs and outputsconfers certain disadvantages such as the lack of an exactinverse In addition nodes with no incoming connectionswill have a value of zero after a single transformation as wellas any successive transformations Thus in Simulation 2 thenumber of nodes actively engaged in representing any wordis limited to those with at least one incoming connectionFor applications of RPs that are restricted to low-dimensionalvectors orwhich requiremany repetitions of a transformation(eg models that require the storage of very long chainsof ordered elements without chunking) the removal of theuniqueness constraint may no longer produce comparableresults

7 General Discussion

Thecurrent study performed a number of controlled compar-isons of circular convolution and random permutation (RP)namely as means of encoding paired associates (Experiment1) as well as encoding sequential information in word space

both in a single model that differed only in the use ofRPs versus convolution (Experiment 2) and in the contextof two different models in which each operation had beenutilized in previous work (Experiment 3) Finally a variant ofrandom permutations was explored in which the constraintof a one-to-one mapping between input and output vectorswas relaxed (Experiment 4) Experiment 1 showed that RPsare capable of high retrieval accuracy even when manypaired associates are stored in a single memory vector andtheir storage capacity appears to be better than that ofcircular convolution for low dimensionalities Experiments2 and 3 revealed that both methods achieved approximatelyequal performance on a battery of semantic tasks whentrained on a small corpus but that RPs were ultimatelycapable of achieving superior performance due to their higherscalability Finally Experiment 4 demonstrated that RPsrsquouniqueness constraint (eg mapping each element of theinput to a unique element of the output) is not essential anda completely random element mapping function can achievesimilar fits to human data on the tasks from Experiments 2and 3

71 Cognitive Plausibility in Semantic Space Models Com-putational models of human semantic representation varywidely in their goals and the strengths of their assumptionsThe strongest claim for a model of semantic memory is thatcomputational units in the algorithm implemented by themodel have direct neural correlates for example traditionalconnectionist networks inwhich the nodes of the network areanalogues of individual neurons or populations of neuronsAdvances in neurology rendered several of the assumptionsof early learning algorithms such as the reverse connec-tions that would be necessary for backpropagation largelyuntenable [123] Mismatches with human learning abilitiessuch as the lack of fast mapping catastrophic interferenceand a difficulty with learning systematic rules that can begeneralized beyond particular linguistic exemplars remainproblematic for many neural networks as well

Although we have learned much more about the under-pinnings of neural and synaptic function efforts to makeexisting vector space models more neurally plausible arefew One notable exception is Gorrellrsquos [55] implementa-tion of an incremental version of LSA that makes use ofthe Generalized Hebbian Algorithm a linear feedforwardneural network model for unsupervised learning to derivethe decomposition of a mostly unseen word-by-documentmatrix based on serially presented observations There arealso neurally inspired models of semantic representation thatdo not attempt to construct a vector space but nonethelessare capable of accounting for some empirical phenomenaThese include the work of Murakoshi and Suganuma [124]who present a neural circuit model capable of representingpropositions representing general facts (ldquobirds can generallyflyrdquo) and implements exceptions (ldquoan ostrich is a bird butcannot flyrdquo) via a model of the spike-timing-dependentplasticity of inhibitory synapses Cuppini et al [125] invokegamma-band synchronization of neural oscillators and atime-dependent Hebbian rule to link lexical representations


with collections of multimodal semantic features in a neuralnetwork model of semantic memory On the whole howeverShepardrsquos [126] observation that connectionist approaches tohigh-level behavior have assumptions that ignore (and insome cases contradict) many important properties of neuraland synaptic organization continues to ring true today formost models of semantic representation [123]

A model of semantic memory capable of tracing out apath from biological models of synaptic activity to high-levelsemantic behavior is perhaps the most ambitious goal thata researcher in this field can pursue and the difficulty ofbridging these vastly different levels of analysis has attractedfew attempts A more common approach is to developrepresentational architectures that remain agnostic as tothe underlying neural implementation but which representinformation in more abstract ways that are argued to shareimportant high-level properties with human semantic rep-resentations Vector space models are one such approachAlthough nodes in a traditional connectionist network forsemantic memory often correspond to binary switches indi-cating the presence or absence of a particular high-levelsemantic feature distributed vector representations are moreneurally plausible as a basic unit of analysis However dueto the ease of acquiring large text bases and the difficultyof acquiring reasonable proxies for perceptual and motorrepresentations that contribute to human semantic repre-sentations vector space models of lexical representationare necessarily incomplete Vector-based representations ofsemantic properties generated by human participants [127128] have been integrated into Bayesian and vector spacemodels to help alleviate this problem [27 28 30 129] butthese have their limitations as well most importantly thefact that they are mediated by unknown retrieval processesand as such should not be interpreted as providing a directldquosnapshotrdquo of a conceptrsquos property structure [129] Thusalthough vector space models trained on large text corporahave unavoidable limitations as models of human semanticrepresentation they at least provide a starting point for acomputational approach

Additionally close investigation of the properties ofvector spaces derived from language data and the ways inwhich they change in response to different dimensionalitiescorpora and mathematical operations has yielded insightsinto cooccurrence-based vector spaces that are independentof any particular model (eg [16 119 130]) Just as Wattsand Strogatzrsquos [131] high-level analysis of the fundamentalproperties of small-world networks has found applicationin many areas of cognition unanticipated by the originalauthors high-level theoretical analyses of the conceptualunderpinnings of co-occurrence models in the context ofnatural language processing (eg [132]) strongly influencedtheir adoption by cognitive modelers such as Andrews et al[27] Similarly theoretical analyses of vector space modelsfrom a more cognitive perspective [92 133 134] have servedto clarify the strengths andweaknesses of existingmodels andmay find additional applications as well

Finally other works have approached the problem ofrepresenting word meaning from a Bayesian perspectiverepresenting word meanings as weightings over a set of

probabilistic topics [18] within the Latent Dirichlet Allo-cation (LDA) framework of Blei et al [135] Topics arelatent variables inferred from observed patterns of wordcooccurrence and represent probability distributions overwords that tend to cooccur in similar contexts An individualtopic generally turns out to be composed of words thatshare a common discourse theme (ie differentiate calculusderivative etc) and can be thought of as a more semanticallytransparent analogue to the reduced dimensions that singularvalue decomposition yields in LSA Although the Bayesianapproach requires less theoretical commitment to themannerin which suchmodels might be implemented on neurologicalhardware it requires a stronger theoretical commitment tothe notion that whatever algorithm the brain uses it is likelyto be one that takes advantage of available probabilistic infor-mation in an optimal or near-optimal manner Andrews et al[27] also work within a Bayesian framework to demonstratethat probabilisticmodels that take advantage of distributionalinformation (from lexical cooccurrence) as well as expe-riential (from human-generated semantic feature norms)demonstrating that the joint probability distribution of thesetwo information sources is more predictive of human-basedmeasures of semantic representation than either informationsource alone or their average

72 Cognitive Implications of Holographic Reduced Represen-tation and Random Permutation Kanervarsquos [92] review ofhigh-dimensional vector space models and the operationsthat are used to construct them highlights a number ofbiologically plausible properties of such models includingtheir use of lexical representations that are highly distributedtolerant of noise in the input and robust to error andcomponent failure In contrast to the large number of studiesfocused on LSA and HAL relatively little work has inves-tigated the properties of lexical representations constructedby means of circular convolution and random permutationsperhaps due in part to the relative youth of BEAGLE andRPM as models of semantic memory However there areseveral cognitively motivated reasons for modelers to beinterested in these particular operators as psychologicallyplausible components of a theory of semantic representationIn traditional connectionist networks (TCNs) an artificialneuronrsquos job is to integrate magnitude information overincoming connections into a scalar value to be forwardpropagated While this is one plausible conception of howneurons represent information it is not the only one Spikedensity over a time scale or phase modulations rather thanthe magnitude of the electrical signal may be importantfactors in information transmission [136] In holographicneural networks (HNNs [121]) neurons represent boththe magnitude and phase of an incoming pattern with acomplex value HNNs can respond uniquely to differentphase patterns even if they have the same magnitude Eachnode contains information about the entire set of stimulus-response pairings which produces a convolution of stimulusand response signals The value of the complex node not theconnections is what is stored in the model In addition to themany precedents for the use of convolution-based encoding


and decoding in memory modeling (eg [95 101ndash105 137])and in models of other cognitive phenomena [107ndash110] thereis evidence that the mathematics of convolution may reflectreal operations in the brain such as thework of Pribram [108]Sutherland [137] and the mathematical framework of neuralcoding developed by Eliasmith andAnderson [136] Formorerecent applications of circular convolution in biologicallyplausible models of various cognitive phenomena see Chooand Eliasmith [138] Eliasmith [110] Eliasmith et al [139]Rasmussen and Eliasmith [140] and Stewart et al [141]

When formulated as a modulo-119899 tensor product con-volution is computationally expensive making it difficult toapply to largermodels However fast Fourier transformationsprovide a reasonably efficient means of computing convolu-tions in the frequency domain [95] The Fourier transform(FT) of the convolution of two functions is equal to theproduct of their individual FTs and the product of the FTof one function with the complex conjugate of the FT of theother is equal to the FT of their correlation Thus HNNscan be created simply by multiplying the FTs of stimulus andresponse patterns calling to mind the brainrsquos tendency torespond to the Fourier components of auditory and spatialstimuli [142] [143 p 47] and [144]

As previously described the random permutation modelof Sahlgren et al [17] constitutes an extension to randomindexing which is identical to it in all respects other thanthe application of random permutations to signal vectors torepresent order information Like LSA random indexing wasnot originally conceived of as a cognitive model Karlgrenand Sahlgren [99] emphasize that its impressive perfor-mance on evaluation benchmarks suggests that it capturesa certain level of functional equivalence but not necessarilyrepresentational equivalence to the human semantic systemHowever the other work has investigated possible points ofconvergence between human semantic representations andthe representations employed by random indexingRPM inmore detail In particular the sparse distributed representa-tions employed by RPM and random indexing which tracetheir lineage to Kanervarsquos Sparse Distributed Memory [111]aremathematically compatiblewith several knownpropertiesof neural circuitry Foldiak and Endres [145] provide anexcellent review of evidence for sparse codes as a pervasiveencoding mechanism within the brain and numerous neuralnetwork models Given recent advances in generative modelsof lexical representation it is worth noting that vector spacemodels employing sparse coding should not be interpretedas being antithetical to Bayesian methods Olshausen andField [146] demonstrate how measures of sparseness can beinterpreted as prior probability distributions and how recon-struction error in clean-up memory can be interpreted interms of likelihoods These isomorphisms indicate potentialpoints of convergence between sparse coding models and theBayesian approach [145]

Randompermutations constitute an extension to randomindexing that incorporates word order information intowhat would otherwise be a ldquobag of wordsrdquo model improvesperformance on semantic tasks [17] and is extremely simpleto implement in connectionist terms A random permutationcan simply be thought of as a recurrent one-layer network

with randomly placed copy connections that map each inputnode to a unique node in the same layer As we will showthis uniqueness constraint is not required nearly identicalresults can be achieved with completely random connections(Figure 1) Given that random permutations can be approxi-mated with such simple network structures it does not seemparticularly far-fetched to propose that some analogue of thisprocess may take place within neural tissue

73 Conclusion This paper builds on the work of Kanerva[92] to present the first in-depth analysis of random permu-tations in the context of models of semantic representationinvestigating basic properties such as their storage capacityand computational complexity in a manner analogous toPlatersquos [95] systematic investigation of holographic reducedrepresentations constructed with circular convolution Inaddition comparing circular convolution and random per-mutations in the context of semantic memory models affordsus a better understanding of two psychologically plausibleoperations for encoding semantic information that havenever been systematically compared

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] MN Jones JWillits and SDennis ldquoModels of semanticmem-oryrdquo in Oxford Handbook of Mathematical and ComputationalPsychology J R Busemeyer and J T Townsend Eds 2015

[2] T K Landauer and S T Dumais ldquoA solution to Platorsquos problemthe latent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[3] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[4] M Hare M N Jones C Thomson S Kelly and K McRaeldquoActivating event knowledgerdquo Cognition vol 111 no 2 pp 151ndash167 2009

[5] L L Jones ldquoPure mediated priming a retrospective semanticmatching modelrdquo Journal of Experimental Psychology LearningMemory and Cognition vol 36 no 1 pp 135ndash146 2010

[6] T Warren K McConnell and K Rayner ldquoEffects of contexton eye movements when reading about possible and impossibleeventsrdquo Journal of Experimental Psychology Learning Memoryand Cognition vol 34 no 4 pp 1001ndash1010 2008

[7] B T Johns and M N Jones ldquoPredicting word-naming andlexical decision times from a semantic spacemodelrdquo in Proceed-ings of the 30th Annual Meeting of the Cognitive Science SocietyB C Love K McRae and V M Sloutsky Eds pp 279ndash284Cognitive Science Society Austin Tex USA 2008

[8] B T Johns andM N Jones ldquoFalse recognition through seman-tic amplificationrdquo inProceedings of the 31st AnnualMeeting of theCognitive Science Society N A Taatgen and H van Rijn Edspp 2795ndash2800 Cognitive Science Society Austin Tex USA2009


[9] W Kintsch and P Mangalath ldquoThe construction of meaningrdquoTopics in Cognitive Science vol 3 no 2 pp 346ndash370 2011

[10] T Landauer D McNamara S Dennis and W Kintsch Hand-book of Latent Semantic Analysis Erlbaum Mahwah NJ USA2006

[11] J Mitchell and M Lapata ldquoComposition in distributionalmodels of semanticsrdquo Cognitive Science vol 34 no 8 pp 1388ndash1429 2010

[12] A Utsumi and M Sakamoto ldquoPredicative metaphor compre-hension as indirect categorizationrdquo in Proceedings of the 32ndAnnual Meeting of the Cognitive Science Society S Ohlsson andR Catrambone Eds pp 1034ndash1039 Cognitive Science SocietyAustin Tex USA 2010

[13] C A Perfetti ldquoThe limits of cominusoccurrence tools and theoriesin language researchrdquo Discourse Processes vol 25 no 2-3 pp363ndash377 1998

[14] B Riordan andMN Jones ldquoComparing semantic spacemodelsusing child-directed speechrdquo in Proceedings of the 29th AnnualCognitive Science Society D S MacNamara and J G TraftonEds pp 599ndash604 Cognitive Science Society Austin Tex USA2007

[15] B Riordan andMN Jones ldquoRedundancy in perceptual and lin-guistic experience comparing feature-based and distributionalmodels of semantic representationrdquo Topics in Cognitive Sciencevol 3 no 2 pp 303ndash345 2011

[16] J A Bullinaria and J P Levy ldquoExtracting semantic represen-tations from word co-occurrence statistics a computationalstudyrdquo Behavior Research Methods vol 39 no 3 pp 510ndash5262007

[17] M Sahlgren A Holst and P Kanerva ldquoPermutations as ameans to encode order in word spacerdquo in Proceedings of the30th Annual Meeting of the Cognitive Science Society B C LoveK McRae and V M Sloutsky Eds pp 1300ndash1305 CognitiveScience Society Austin Tex USA 2008

[18] T L Griffiths M Steyvers and J B Tenenbaum ldquoTopics insemantic representationrdquo Psychological Review vol 114 no 2pp 211ndash244 2007

[19] M N Jones andD J KMewhort ldquoRepresenting wordmeaningand order information in a composite holographic lexiconrdquoPsychological Review vol 114 no 1 pp 1ndash37 2007

[20] S Dennis ldquoA memory-based theory of verbal cognitionrdquoCognitive Science vol 29 no 2 pp 145ndash193 2005

[21] P Kanerva J Kristoferson and A Holst ldquoRandom indexingof text samples for latent semantic analysisrdquo in Proceedings ofthe 22nd Annual Meeting of the Cognitive Science Society L RGleitman and A K Joshi Eds pp 103ndash106 Cognitive ScienceSociety Philadelphia Pa USA 2000

[22] P J Kwantes ldquoUsing context to build semanticsrdquo PsychonomicBulletin amp Review vol 12 no 4 pp 703ndash710 2005

[23] J R Firth ldquoA synopsis of linguistic theory 1930ndash1955rdquo inStudies in Linguistic Analysis J R Firth Ed pp 1ndash32 BlackwellOxford UK 1957

[24] ZHarrisDistributional Structure Humanities Press NewYorkNY USA 1970

[25] M de Vega A M Glenberg and A C Graesser Symbolsand Embodiment Debates on Meaning and Cognition OxfordUniversity Press Oxford UK 2008

[26] A M Glenberg and D A Robertson ldquoSymbol grounding andmeaning a comparison of high-dimensional and embodiedtheories of meaningrdquo Journal of Memory and Language vol 43no 3 pp 379ndash401 2000

[27] M Andrews G Vigliocco and D Vinson ldquoIntegrating experi-ential and distributional data to learn semantic representationsrdquoPsychological Review vol 116 no 3 pp 463ndash498 2009

[28] K Durda L Buchanan and R Caron ldquoGrounding co-occurrence identifying features in a lexical co-occurrencemodel of semantic memoryrdquo Behavior Research Methods vol41 no 4 pp 1210ndash1223 2009

[29] M N Jones and G L Recchia ldquoYou canrsquot wear a coat racka binding framework to avoid illusory feature migrations inperceptually grounded semantic modelsrdquo in Proceedings ofthe 32nd Annual Meeting of the Cognitive Science Society SOhisson and R Catrambone Eds pp 877ndash882 CognitiveScience Society Austin Tex USA 2010

[30] M Steyvers ldquoCombining feature norms and text data with topicmodelsrdquo Acta Psychologica vol 133 no 3 pp 234ndash243 2010

[31] R L Gomez and L A Gerken ldquoInfant artificial languagelearning and language acquisitionrdquoTrends in Cognitive Sciencesvol 4 no 5 pp 178ndash186 2000

[32] J R Saffran ldquoStatistical language learning mechanisms andconstraintsrdquo Current Directions in Psychological Science vol 12no 4 pp 110ndash114 2003

[33] C M Conway A Bauernschmidt S S Huang and D BPisoni ldquoImplicit statistical learning in language processingword predictability is the keyrdquoCognition vol 114 no 3 pp 356ndash371 2010

[34] D J Foss ldquoA discourse on semantic primingrdquoCognitive Psychol-ogy vol 14 no 4 pp 590ndash607 1982

[35] M Hare K McRae and J L Elman ldquoSense and structuremeaning as a determinant of verb subcategorization prefer-encesrdquo Journal of Memory and Language vol 48 no 2 pp 281ndash303 2003

[36] M Hare K McRae and J L Elman ldquoAdmitting that admittingverb sense into corpus analyses makes senserdquo Language andCognitive Processes vol 19 no 2 pp 181ndash224 2004

[37] M C Macdonald ldquoThe interaction of lexical and syntacticambiguityrdquo Journal of Memory and Language vol 32 no 5 pp692ndash715 1993

[38] M C MacDonald ldquoLexical representations and sentence pro-cessing an introductionrdquo Language andCognitive Processes vol12 no 2-3 pp 121ndash136 1997

[39] K McRae M Hare J L Elman and T Ferretti ldquoA basis forgenerating expectancies for verbs from nounsrdquo Memory andCognition vol 33 no 7 pp 1174ndash1184 2005

[40] R Taraban and J L McClelland ldquoConstituent attachment andthematic role assignment in sentence processing influences ofcontent-based expectationsrdquo Journal of Memory amp Languagevol 27 no 6 pp 597ndash632 1988

[41] T K Landauer D Laham B Rehder and M E SchreinerldquoHow well can passage meaning be derived without using wordorder A comparison of latent semantic analysis and humansrdquoinProceedings of the 19thAnnualMeeting of the Cognitive ScienceSociety M G Shafto and P Langley Eds pp 412ndash417 LawrenceErlbaum Associates Mahwah NJ USA 1997

[42] C Burgess ldquoFrom simple associations to the building blocks oflanguage modeling meaning in memory with the HALmodelrdquoBehavior Research Methods Instruments and Computers vol30 no 2 pp 188ndash198 1998

[43] W Kintsch ldquoAn overview of top-down and bottom-up effectsin comprehension the CI perspectiverdquo Discourse Processes vol39 no 2-3 pp 125ndash129 2005


[44] J L Elman ldquoOn the meaning of words and dinosaur boneslexical knowledge without a lexiconrdquo Cognitive Science vol 33no 4 pp 547ndash582 2009

[45] G McKoon and R Ratcliff ldquoMeaning through syntax languagecomprehension and the reduced relative clause constructionrdquoPsychological Review vol 110 no 3 pp 490ndash525 2003

[46] R Jackendoff Foundations of Language Brain Meaning Gram-mar Evolution Oxford University Press Oxford UK 2002

[47] P G OrsquoSeaghdha ldquoThe dependence of lexical relatednesseffects on syntactic connectednessrdquo Journal of ExperimentalPsychology Learning Memory and Cognition vol 15 no 1 pp73ndash87 1989

[48] M Andrews and G Vigliocco ldquoThe hidden Markov topicmodel a probabilistic model of semantic representationrdquoTopicsin Cognitive Science vol 2 no 1 pp 101ndash113 2010

[49] M N Jones W Kintsch and D J K Mewhort ldquoHigh-dimensional semantic space accounts of primingrdquo Journal ofMemory and Language vol 55 no 4 pp 534ndash552 2006

[50] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

[51] M M Louwerse ldquoSymbol interdependency in symbolic andembodied cognitionrdquo Topics in Cognitive Science vol 3 no 2pp 273ndash302 2011

[52] G L Recchia and M N Jones ldquoMore data trumps smarteralgorithms comparing pointwise mutual information withlatent semantic analysisrdquoBehavior ResearchMethods vol 41 no3 pp 647ndash656 2009

[53] T R Risley and B Hart ldquoPromoting early language develop-mentrdquo in The Crisis in Youth Mental Health Critical Issues andEffective Programs Volume 4 Early Intervention Programs andPolicies N F Watt C Ayoub R H Bradley J E Puma andW A LeBoeuf Eds pp 83ndash88 Praeger Westport Conn USA2006

[54] M Brand ldquoFast low-rank modifications of the thin singularvalue decompositionrdquo Linear Algebra and its Applications vol415 no 1 pp 20ndash30 2006

[55] G Gorrell ldquoGeneralized Hebbian algorithm for incrementalsingular value decomposition in natural language processingrdquoin Proceedings of the 11th Conference of the European Chapter ofthe Association for Computational Linguistics S Pado J Readand V Seretan Eds pp 97ndash104 April 2006

[56] M Louwerse and L Connell ldquoA taste of words linguisticcontext and perceptual simulation predict the modality ofwordsrdquo Cognitive Science vol 35 no 2 pp 381ndash398 2011

[57] J L Elman ldquoFinding structure in timerdquo Cognitive Science vol14 no 2 pp 179ndash211 1990

[58] T T Rogers and J LMcClelland Semantic Cognition A ParallelDistributed Processing Approach MIT Press Cambridge UK2004

[59] L Onnis andMH Christiansen ldquoLexical categories at the edgeof the wordrdquo Cognitive Science vol 32 no 1 pp 184ndash221 2008

[60] J A Legate andC Yang ldquoAssessing child and adult grammarrdquo inRich Languages from Poor Inputs M Piattelli-Palmarini and RC Berwick Eds pp 68ndash182 Oxford University Press OxfordUK 2013

[61] H A Simon ldquoThe architecture of complexity hierarchic sys-temsrdquo inThe Sciences of the Artificial H A Simon Ed pp 183ndash216 MIT Press Cambridge Mass USA 2nd edition 1996

[62] J L Elman ldquoLearning and development in neural networks theimportance of starting smallrdquoCognition vol 48 no 1 pp 71ndash991993

[63] D Servan-Schreiber A Cleeremans and J L McclellandldquoGraded state machines the representation of temporal contin-gencies in simple recurrent networksrdquoMachine Learning vol 7no 2-3 pp 161ndash193 1991

[64] M F St John and J L McClelland ldquoLearning and applyingcontextual constraints in sentence comprehensionrdquo ArtificialIntelligence vol 46 no 1-2 pp 217ndash257 1990

[65] M W Howard ldquoNeurobiology can provide meaningful con-straints to cognitive modeling the temporal context model asa description of medial temporal lobe functionrdquo in Proceedingsof the 37th Annual Meeting of the Society for MathematicalPsychology Ann Arbor Mich USA July 2004

[66] M W Howard and M J Kahana ldquoA distributed representationof temporal contextrdquo Journal of Mathematical Psychology vol46 no 3 pp 269ndash299 2002

[67] M W Howard K H Shankar and U K K Jagadisan ldquoCon-structing semantic representations from a gradually changingrepresentation of temporal contextrdquo Topics in Cognitive Sciencevol 3 no 1 pp 48ndash73 2011

[68] V A Rao and M W Howard ldquoRetrieved context and the dis-covery of semantic structurerdquo Advances in Neural InformationProcessing Systems vol 20 pp 1193ndash1200 2008

[69] M H Tong A D Bickett E M Christiansen and G WCottrell ldquoLearning grammatical structure with echo state net-worksrdquo Neural Networks vol 20 no 3 pp 424ndash432 2007

[70] J L Elman ldquoDistributed representations simple recurrentnetworks and grammatical structurerdquo Machine Learning vol7 no 2-3 pp 195ndash225 1991

[71] R Veale and M Scheutz ldquoNeural circuits for any-time phraserecognition with applications in cognitive models and human-robot interactionrdquo in Proceedings of the 34th Annual Conferenceof the Cognitive Science Society N Miyake D Peebles and R PCooper Eds pp 1072ndash1077 Cognitive Science Society AustinTex USA 2012

[72] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[73] Y Bengio ldquoLearning deep architectures for AIrdquo Foundationsand Trends in Machine Learning vol 2 no 1 pp 1ndash127 2009

[74] Y Bengio ldquoDeep learning of representations looking forwardrdquoin Statistical Language and Speech Processing vol 7978 ofLecture Notes in Computer Science pp 1ndash37 Springer BerlinGermany 2013

[75] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006

[76] J Martens and I Sutskever ldquoLearning recurrent neural net-works with Hessian-free optimizationrdquo in Proceedings of the28th International Conference on Machine Learning (ICML 11)pp 1033ndash1040 Bellevue Wash USA July 2011

[77] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conferenceon Machine Learning (ICML rsquo08) pp 160ndash167 ACM HelsinkiFinland July 2008

[78] A Mnih and G E Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of the Advances in NeuralInformation Processing Systems (NIPS rsquo09) pp 1081ndash1088 2009

[79] JWeston F Ratle HMobahi and R Collobert ldquoDeep learningvia semi-supervised embeddingrdquo in Neural Networks Tricks ofthe Trade vol 7700 of Lecture Notes in Computer Science pp639ndash655 Springer Berlin Germany 2012


[80] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems 26 pp 3111ndash3119 2013

[81] I Sutskever J Martens G Dahl and G Hinton ldquoOn theimportance of initialization and momentum in deep learningrdquoin Proceedings of the 30th International Conference on MachineLearning (ICML rsquo13) pp 1139ndash1147 2013

[82] I Sutskever J Martens and G E Hinton ldquoGenerating textwith recurrent neural networksrdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp1017ndash1024 New York NY USA July 2011

[83] J Ngiam A Khosla M Kim J Nam H Lee and A YNg ldquoMultimodal deep learningrdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML 11) pp689ndash696 Bellevue Wash USA July 2011

[84] C Burgess andK Lund ldquoThedynamics ofmeaning inmemoryrdquoinCognitiveDynamics Conceptual andRepresentational Changein Humans andMachines E Dietrich Eric and A BMarkmanEds pp 117ndash156 Lawrence Erlbaum Associates PublishersMahwah NJ USA 2000

[85] K Durda and L Buchanan ldquoWINDSORS windsor improvednorms of distance and similarity of representations of seman-ticsrdquo Behavior Research Methods vol 40 no 3 pp 705ndash7122008

[86] D Rohde L Gonnerman and D Plaut ldquoAn improved model ofsemantic similarity based on lexical co-occurencerdquo in Proceed-ings of the AnnualMeeting of the Cognitive Science Society 2006

[87] L Buchanan C Burgess and K Lund ldquoOvercrowding insemantic neighborhoods modeling deep dyslexiardquo Brain andCognition vol 32 no 2 pp 111ndash114 1996

[88] L Buchanan C Westbury and C Burgess ldquoCharacterizingsemantic space neighborhood effects in word recognitionrdquoPsychonomic Bulletin andReview vol 8 no 3 pp 531ndash544 2001

[89] P D Siakaluk C Westbury and L Buchanan ldquoThe effects offiller type and semantic distance in naming evidence for changein time criterion rather than change in routesrdquo in Proceedingsof the 13th Annual Meeting of the Canadian Society for BrainBehaviour and Cognitive Science Hamilton CanadaMay 2003

[90] D Song and P Bruza ldquoDiscovering information flow using ahigh dimensional conceptual spacerdquo in Proceedings of the 24thAnnual International Conference on Research and Developmentin Information Retrieval W B Croft D J Harper D H Kraftand J Zobel Eds pp 327ndash333 Association for ComputingMachinery New York NY USA 2001

[91] T L Griffiths M Steyvers D M Blei and J B TenenbaumldquoIntegrating topics and syntaxrdquo Advances in Neural InformationProcessing Systems vol 17 pp 537ndash544 2005

[92] P Kanerva ldquoHyperdimensional computing an introduction tocomputing in distributed representationwith high-dimensionalrandom vectorsrdquo Cognitive Computation vol 1 no 2 pp 139ndash159 2009

[93] G E Cox G Kachergis G Recchia and M N Jones ldquoTowarda scalable holographic word-form representationrdquo BehaviorResearch Methods vol 43 no 3 pp 602ndash615 2011

[94] M Sahlgren ldquoAn introduction to random indexingrdquo in Pro-ceedings of the Methods and Applications of Semantic IndexingWorkshop at the 7th International Conference on Terminologyand Knowledge Engineering Copenhagen Denmark August2005

[95] T PlateHolographic Reduced Representation Distributed Repre-sentation for Cognitive Structures CSLI Publications StanfordCalif USA 2003

[96] T K Landauer and S Dumais ldquoLatent semantic analysisrdquoScholarpedia vol 3 no 11 p 4356 2013

[97] D C van Leijenhorst and T P van der Weide ldquoA formalderivation of Heapsrsquo Lawrdquo Information Sciences vol 170 no 2ndash4 pp 263ndash272 2005

[98] M N Jones Learning semantics and word order from statisticalredundancies in language a computational approach [PhDdissertation] Queenrsquos University 2005

[99] J Karlgren and M Sahlgren ldquoFrom words to understandingrdquoin Foundations of Real-World Intelligence Y Uesaka P Kanervaand H Asoh Eds CSLI Publications Stanford Calif USA2001

[100] T A Plate ldquoHolographic reduced representationsrdquo IEEE Trans-actions on Neural Networks vol 6 no 3 pp 623ndash641 1995

[101] A Borsellino and T Poggio ldquoConvolution and correlationalgebrasrdquo Kybernetik vol 13 no 2 pp 113ndash122 1973

[102] J M Eich ldquoA composite holographic associative recall modelrdquoPsychological Review vol 89 no 6 pp 627ndash661 1982

[103] J M Eich ldquoLevels of processing encoding specificity elabora-tion andCHARMrdquo Psychological Review vol 92 no 1 pp 1ndash381985

[104] P Liepa Models of Content Addressable Distributed AssociativeMemory (CADAM) University of Toronto 1977

[105] B B Murdock ldquoA theory for the storage and retrieval of itemand associative informationrdquo Psychological Review vol 89 no6 pp 609ndash626 1982

[106] B B Murdock ldquoItem and associative information in a dis-tributed memory modelrdquo Journal of Mathematical Psychologyvol 36 no 1 pp 68ndash99 1992

[107] J C R Licklider ldquoThree auditory theoriesrdquo in Psychology AStudy of a Science S Koch Ed vol 1 pp 41ndash144 McGraw-HillNew York NY USA 1959

[108] K H Pribram ldquoConvolution and matrix systems as contentaddressible distributed brain processes in perception andmem-oryrdquo Journal of Neurolinguistics vol 2 no 1-2 pp 349ndash3641986

[109] W E Reichardt ldquoCybernetics of the insect optomotorresponserdquo in Cerebral Correlates of Conscious Experience PBuser Ed North Holland AmsterdamThe Netherlands 1978

[110] C Eliasmith ldquoCognition with neurons a large-scale biologi-cally realistic model of the Wason taskrdquo in Proceedings of the27th Annual Meeting of the Cognitive Science Society B G BaraL Barsalou and M Bucciarelli Eds pp 624ndash629 ErlbaumMahwah NJ USA 2005

[111] P Kanerva Sparse Distributed Memory MIT Press CambridgeMass USA 1988

[112] P Kanerva ldquoBinary spatter-coding of ordered k-tuplesrdquo inProceedings of the International Conference on Artificial NeuralNetworks C von der Malsburg W von Seelen J Vorbruggenand B Sendhoff Eds vol 1112 of Lecture Notes in ComputerScience pp 869ndash873 Springer Bochum Germany 1996

[113] G Recchia M Jones M Sahlgren and P Kanerva ldquoEncodingsequential information in vector space models of semanticsComparing holographic reduced representation and randompermutationrdquo in Proceedings of the 32nd Cognitive ScienceSociety S Ohlsson and R Catrambone Eds pp 865ndash870Cognitive Science Society Austin Tex USA 2010


[114] J A Willits S K DMello N D Duran and A OlneyldquoDistributional statistics and thematic role relationshipsrdquo inProceedings of the 29th Annual Meeting of the Cognitive ScienceSociety pp 707ndash712 Nashville Tenn USA 2007

[115] H Rubenstein and J Goodenough ldquoContextual correlates ofsynonymyrdquo Communications of the ACM vol 8 no 10 pp 627ndash633 1965

[116] G A Miller and W G Charles ldquoContextual correlates ofsemantic similarityrdquo Language and Cognitive Processes vol 6no 1 pp 1ndash28 1991

[117] P Resnik ldquoUsing information content to evaluate semanticsimilarityrdquo in Proceedings of the Fourteenth International JointConference on Artificial Intelligence (IJCAI rsquo95) C S MellishEd pp 448ndash453 Morgan Kaufmann Montreal Canada 1995

[118] L Finkelstein E Gabrilovich YMatias et al ldquoPlacing search incontext the concept revisitedrdquo ACM Transactions on Informa-tion Systems vol 20 no 1 pp 116ndash131 2002

[119] M SahlgrenTheWord-Space model using distributional analy-sis to represent syntagmatic and paradigmatic relations betweenwords in high-dimensional vector spaces [PhD dissertation]Stockholm University 2006

[120] M Lukosevicius and H Jaeger ldquoReservoir computingapproaches to recurrent neural network trainingrdquo ComputerScience Review vol 3 no 3 pp 127ndash149 2009

[121] T A Plate ldquoRandomly connected sigma-pi neurons can formassociator networksrdquo Network Computation in Neural Systemsvol 11 no 4 pp 321ndash332 2000

[122] T Yamazaki and S Tanaka ldquoThe cerebellum as a liquid statemachinerdquo Neural Networks vol 20 no 3 pp 290ndash297 2007

[123] J Garson ldquoConnectionismrdquo in The Stanford Encyclopediaof Philosophy (Winter 2012 Edition) E N Zalta Ed 2012httpplatostanfordeduarchiveswin2012entriesconnec-tionism

[124] K Murakoshi and K Suganuma ldquoA neural circuit modelforming semantic network with exception using spike-timing-dependent plasticity of inhibitory synapsesrdquoBioSystems vol 90no 3 pp 903ndash910 2007

[125] C Cuppini E Magosso and M Ursino ldquoA neural networkmodel of semantic memory linking feature-based object rep-resentation and wordsrdquo BioSystems vol 96 no 3 pp 195ndash2052009

[126] RN Shepard ldquoNeural nets for generalization and classificationcomment on Staddon and Reidrdquo Psychological Review vol 97no 4 pp 579ndash580 1990

[127] K McRae V R de Sa and M S Seidenberg ldquoOn the natureand scope of featural representations of wordmeaningrdquo Journalof Experimental Psychology General vol 126 no 2 pp 99ndash1301997

[128] G Vigliocco D P Vinson W Lewis and M F Garrett ldquoRep-resenting the meanings of object and action words the featuraland unitary semantic space hypothesisrdquo Cognitive Psychologyvol 48 no 4 pp 422ndash488 2004

[129] M Baroni and A Lenci ldquoConcepts and properties in wordspacesrdquo Italian Journal of Linguistics vol 20 no 1 pp 53ndash862008

[130] T N Rubin J A Willits B Kievit-Kylar and M N JonesldquoOrganizing the space and behavior of semantic modelsrdquo inProceedings of the 35th Annual Conference of the CognitiveScience Society Cognitive Science Society Austin Tex USA2014

[131] D J Watts and S H Strogatz ldquoCollective dynamics of lsquosmall-worldrsquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998

[132] H Schutze Dimensions of Meaning IEEE Computer SocietyPress Washington DC USA 1992

[133] P Gardenfors The Geometry of Meaning Semantics Based onConceptual Spaces MIT Press Cambridge Mass USA 2014

[134] W Lowe ldquoTowards a theory of semantic spacerdquo inProceedings ofthe 23rd Annual Conference of the Cognitive Science Society J DMoore and K Stenning Eds pp 576ndash581 Lawrence ErlbaumAssociates Mahwah NJ USA 2001

[135] D M Blei A Y Ng and M I Jordan ldquoLatent dirichletallocationrdquoThe Journal ofMachine Learning Research vol 3 no4-5 pp 993ndash1022 2003

[136] C Eliasmith and C H Anderson Neural Engineering Com-putation Representation and Dynamics in Neurobiological Sys-tems MIT Press Cambridge Mass USA 2003

[137] J G Sutherland ldquoA holographicmodel ofmemory learning andexpressionrdquo International Journal of Neural Systems vol 1 no 3pp 259ndash267 1990

[138] F Choo and C Eliasmith ldquoA spiking neuron model of serial-order recallrdquo in Proceedings of the 32nd Annual Meeting of theCognitive Science Society S Ohisson and S Catrambone Edspp 2188ndash2193 Cognitive Science Society Austin TeXUSA2010

[139] C Eliasmith T C Stewart X Choo et al ldquoA large-scale modelof the functioning brainrdquo Science vol 338 no 6111 pp 1202ndash1205 2012

[140] D Rasmussen and C Eliasmith ldquoA neural model of rulegeneration in inductive reasoningrdquo Topics in Cognitive Sciencevol 3 no 1 pp 140ndash153 2011

[141] T C Stewart X Choo and C Eliasmith ldquoDynamic behaviourof a spiking model of action selection in the basal gangliardquo inProceedings of the 10th International Conference on CognitiveModeling DD Salvucci andGGunzelmann Eds pp 235ndash240Drexel University Austin Tex USA 2010

[142] K K de Valois R L de Valois and E W Yund ldquoResponsesof striate cortex cells to grating and checkerboard patternsrdquoJournal of Physiology vol 291 no 1 pp 483ndash505 1979

[143] M S Gazzaniga R B Ivry G R Mangun and E A PhelpsCognitive Neuroscience The Biology of the Mind W W Nortonamp Company New York NY USA 2002

[144] K H Pribram ldquoThe primate frontal cortexmdashexecutive of thebrainrdquo in Psychophysiology of the Frontal Lobes K H Pribramand A R Luria Eds pp 293ndash314 1973

[145] P Foldiak and D Endres ldquoSparse codingrdquo Scholarpedia vol 3no 1 article 2984 2008

[146] B A Olshausen and D J Field ldquoSparse coding with an over-complete basis set a strategy employed by V1rdquoVision Researchvol 37 no 23 pp 3311ndash3325 1997

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



also if they tend to occur in similar semantic contexts (evenif they never directly cooccur)

More recent SSMs employ sophisticated learning mech-anisms borrowed from probabilistic inference [18] holo-graphic encoding [19] minimum description length [20]random indexing [21] and global memory retrieval [22]However all are still based on the fundamental notion thatlexical semanticsmay be induced by observingword cooccur-rences across semantic contexts [23 24] and no single modelhas yet proven itself to be the dominant methodology [1]

Despite their successes both as tools and as psychologicalmodels current SSMs suffer from several shortcomingsFirstly the models have been heavily criticized in recentliterature because they learn only from linguistic informationand are not grounded in perception and action for a reviewof this debate see de Vega et al [25] The lack of perceptualgrounding is clearly at odds with the current literature inembodied cognition and it limits the ability of SSMs toaccount for human behavior on a variety of semantic tasks[26] While the current paper does not address the issueof incorporating perceptual grounding into computationalmodels trained on linguistic data the issue is discussed atlength in several recent papers (eg [15 27ndash30]) SecondlySSMs are often criticized as ldquobag of wordsrdquo models because(with the exception of several models to be discussed in thenext section) they encode only the contexts in which wordscooccur ignoring statistical information about the temporalorder of word use within those contexts Finally many SSMssuffer from a difficulty to scale to linguistic data comparableto what humans experience In this paper we simultaneouslyaddress order and scalability in SSMs

The Role of Word Order in Lexical Semantics A wealthof evidence has emphasized the importance of domain-general sequential learning abilities in language processing(see [31 32] for reviews) with recent evidence suggestingthat individual differences in statistical sequential learningabilitiesmay even partially account for variations in linguisticperformance [33] Bag of words SSMs are blind to wordorder informationwhen learning and this has been criticizedas an ldquoarchitectural failurerdquo of the models [13] insofar as itwas clear a priori that humans utilize order information inalmost all tasks involving semantic cognition For exampleinterpretation of sentencemeaning depends on the sequentialusage tendencies of the specific component words [34ndash40]

One common rebuttal to this objection is that order infor-mation is unimportant for many tasks involving discourse[2] However this seems to apply mostly to applied problemswith large discourse units such as automatic essay grading[41] A second rebuttal is that SSMs are models of how lexicalsemantics are learned and represented but not how wordsare used to build sentencephrase meaning [42 43] Henceorder is not typically thought of as a part of word learningor representation but rather how lexical representations areput together for comprehension of larger units of discourseCompositional semantics is beyond the scope of SSMs andinstead requires a process account of composition to buildmeaning fromSSMrepresentations and this is the likely stageat which order plays a role [9 11]

However this explanation is nowdifficult to defend givena recent flurry of research in psycholinguistics demonstratingthat temporal order information is used by humans whenlearning about words and that order is a core informationcomponent of the lexical representation of the word itselfThe role of statistical information about word order wastraditionally thought to apply only to the rules of wordusage (grammar) rather than the lexical meaning of theword itself However temporal information is now taking amore prominent role in the lexical representation of a wordrsquosmeaning Elman [44] has recently argued that the lexical rep-resentations of individual words contain information aboutcommon temporal context event knowledge and habits ofusage (cf [4 45ndash47]) In addition recent SSMs that integrateword order information have seen greater success at fittinga human data in semantic tasks than SSMs encoding onlycontextual information (eg [16 19 48ndash50])

The Role of Data Scale in Lexical Semantics SSMs have alsobeen criticized due to their inability to scale to realistic sizesof linguistic data [51 52] The current corpora that SSMssuch as LSA are frequently trained on contain approximatelythe number of tokens that children are estimated to haveexperienced in their ambient environment by age three(in the range of 10ndash30 million) not even including wordsproduced by the child during this time [14 53] Giventhat SSMs are typically evaluated using benchmarks elicitedfrom college-age participants it would be ideal if they weretrained upon a quantity of linguistic input approximating theexperience of this age

However SSMs that rely on computationally complexdecomposition techniques to reveal the latent componentsin a word-by-document matrix (eg singular value decom-position) are not able to scale up to corpora of hundredsof millions of tokens even with high-end supercomputingresources Although new methods for scaling up singularvalue decomposition to larger input corpora have shownpromise [54 55] there will always be a practical upper limitto the amount of data that can be processed when comparedto continuous vector accumulation techniques The problemis exacerbated by the fact that as the size of the corpusincreases the numbers of rows and columns in the matrixboth increase significantly the number of columns growslinearly in proportion to the number of documents and thenumber of rows grows approximately in proportion to thesquare root of the number of tokens (Heaprsquos law)

As the availability of text increases it is an open questionwhether a better solution to semantic representation is toemploy simpler algorithms that are capable of both inte-grating order information and scaling up to take advantageof large data samples or whether time would better bespent optimizing decomposition techniques Recchia andJones [52] demonstrated that although using an extremelysimple method (a simplified version of pointwise mutualinformation) to assess word pairsrsquo semantic similarity wasoutperformed by more complex models such as LSA onsmall text corpora the simple metric ultimately achievedbetter fits to human data when it was scaled up to an inputcorpus that was intractable for LSA Similarly Bullinaria and
























119894of

= ( ⊛ 119910) is given by

119894=

119863minus1

sum

119895minus0

119895mod119863 sdot 119910

(119894minus119895)mod119863 (1)






2119910) where Π

2119910 is



minus1 and Π

minus2 respectively














=

119896

sum

119894=1

(119894⊛ 119910119894) (3)


119894and 119910




119894)







2 = Π(Π)

Π3 = Π



= (Π 1199101+ Π21) + (Π

31199102+ Π42)

+ (Π5


(4)


119894









Πminus1


119910119894andΠ


119910119894and

Πminus3




119894
















0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality


0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality




0102030405060708090

100

2 6 10 14 18 22 26 30 34 38

Retr

ieva

l acc

urac

y (

)





2 4 6 8 10 12 14

20481024

512256

Dimensionality

0

10

20

30

40

50

60

70

80

90

100

Retr

ieva

l acc

urac

y (

)











































R 009 003 030 056lowast 056lowast





(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in


























119894of

= ( ⊛ 119910) is given by

119894=

119863minus1

sum

119895minus0

119895mod119863 sdot 119910

(119894minus119895)mod119863 (1)






2119910) where Π

2119910 is



minus1 and Π

minus2 respectively














=

119896

sum

119894=1

(119894⊛ 119910119894) (3)


119894and 119910




119894)







2 = Π(Π)

Π3 = Π



= (Π 1199101+ Π21) + (Π

31199102+ Π42)

+ (Π5


(4)


119894









Πminus1


119910119894andΠ


119910119894and

Πminus3




119894
















0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality


0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality




0102030405060708090

100

2 6 10 14 18 22 26 30 34 38

Retr

ieva

l acc

urac

y (

)





2 4 6 8 10 12 14

20481024

512256

Dimensionality

0

10

20

30

40

50

60

70

80

90

100

Retr

ieva

l acc

urac

y (

)











































R 009 003 030 056lowast 056lowast





(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in











=

119896

sum

119894=1

(119894⊛ 119910119894) (3)


119894and 119910




119894)







2 = Π(Π)

Π3 = Π



= (Π 1199101+ Π21) + (Π

31199102+ Π42)

+ (Π5


(4)


119894









Πminus1


119910119894andΠ


119910119894and

Πminus3




119894
















0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality


0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality




0102030405060708090

100

2 6 10 14 18 22 26 30 34 38

Retr

ieva

l acc

urac

y (

)





2 4 6 8 10 12 14

20481024

512256

Dimensionality

0

10

20

30

40

50

60

70

80

90

100

Retr

ieva

l acc

urac

y (

)











































R 009 003 030 056lowast 056lowast





(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in
















0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality


0

10

20

30

40

50

60

70

80

90

100

2 4 6 8 10 12 14

Retr

ieva

l acc

urac

y (

)


20481024

512256

Dimensionality




0102030405060708090

100

2 6 10 14 18 22 26 30 34 38

Retr

ieva

l acc

urac

y (

)





2 4 6 8 10 12 14

20481024

512256

Dimensionality

0

10

20

30

40

50

60

70

80

90

100

Retr

ieva

l acc

urac

y (

)











































R 009 003 030 056lowast 056lowast





(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




0102030405060708090

100

2 6 10 14 18 22 26 30 34 38

Retr

ieva

l acc

urac

y (

)





2 4 6 8 10 12 14

20481024

512256

Dimensionality

0

10

20

30

40

50

60

70

80

90

100

Retr

ieva

l acc

urac

y (

)











































R 009 003 030 056lowast 056lowast





(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





































R 009 003 030 056lowast 056lowast





(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




(a) (b)






119896= sum119894119909119894119908119894119895






































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




































References






























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in
























































































































































Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



Research Article Encoding Sequential Information in...

Documents

Transcript of Research Article Encoding Sequential Information in...