Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the...

14
1 Supplement to DARPA Quarterly Report Q2 Task 1.2 Biologically motivated spike-timing dependent plasticity (STDP) learning 1. Spiking Neural Network Model and STDP Rules We first re-implemented the spiking neural network (SNN) model for MNIST handwritten digit classification from [1]. We tried models of various sizes (100 and 400 neurons), and found that classification results were similar to those reported in [1]. To learn the synaptic weights of the SNN, we implemented four different spike time-dependent (STDP) rules. All were implemented in an “online” fashion, where we only keep track of a “trace”, or a memory, of the most recent spike (temporal nearest neighbor), which has arrived at a post- and presynaptic neurons. The first was online STDP without presynaptic weight modification, the second added an exponential dependence on the previous synaptic weight, the third included a rule for decreasing synaptic weight on presynaptic spikes, and the fourth was a combination of the second and third rules together. The SNN model from [1] is illustrated below (Figure 1). The training of the model is as follows: At first, all neuron activations in the excitatory layer are set below their resting potentials. For 350ms (or 700 computation steps, with a simulation time step of 0.5ms), a single training image is presented to the network. For the MNIST digit dataset, each data sample is a 28 x 28 grayscale pixelated image of a hand- drawn digit from 0 to 9. The grayscale value of each pixel of the image determines the mean firing rate of a Poisson spike train it corresponds to. We include a synapse from all input Poisson spike trains to each neuron in the excitatory layer. There are then 784 x n synapses from input to the excitatory layer, if there are n neurons in the excitatory layer. The goal of the training procedure is to learn a setting of these weights that will cause distinct excitatory neurons to fire for distinct digit classes. Those neurons which fire most for a certain digit in the training phase we label with that digit’s value. In the test phase, we classify new data examples by taking the majority vote of the labels of those neurons ,which fire for a test data sample. 2. Spiking Neural Network for MNIST Handwritten Digit Classification 2.1 - Description of ETH SNN Model The excitatory neurons are connected in a one-to-one fashion with an equally-sized layer of inhibitory neurons, each of which laterally inhibit all excitatory neurons but the one which it received its synaptic connection from. The lateral inhibition mechanism typically causes a single neuron to fire strongly for a given data sample in a “winner take all” fashion, allowing the neuron which fires to adjust its weights to remember data samples of a similar shape and pixel intensity.

Transcript of Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the...

Page 1: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

1

SupplementtoDARPAQuarterlyReportQ2

Task 1.2 Biologically motivated spike-timing dependent plasticity (STDP)learning

1.SpikingNeuralNetworkModelandSTDPRules

We first re-implemented the spiking neural network (SNN) model for MNIST handwritten digitclassification from [1]. We tried models of various sizes (100 and 400 neurons), and found thatclassificationresultsweresimilartothosereportedin[1].TolearnthesynapticweightsoftheSNN,weimplemented four different spike time-dependent (STDP) rules. All were implemented in an “online”fashion,whereweonlykeeptrackofa“trace”,oramemory,ofthemostrecentspike(temporalnearestneighbor), which has arrived at a post- and presynaptic neurons. The first was online STDP withoutpresynapticweightmodification,thesecondaddedanexponentialdependenceontheprevioussynapticweight, the third includeda rule fordecreasing synapticweightonpresynaptic spikes,and the fourthwasacombinationofthesecondandthirdrulestogether.

TheSNNmodelfrom[1]isillustratedbelow(Figure1).Thetrainingofthemodelisasfollows:Atfirst,allneuron activations in the excitatory layer are set below their resting potentials. For 350ms (or 700computation steps,with a simulation time stepof 0.5ms), a single training image is presented to thenetwork.FortheMNISTdigitdataset,eachdatasampleisa28x28grayscalepixelatedimageofahand-drawndigitfrom0to9.ThegrayscalevalueofeachpixeloftheimagedeterminesthemeanfiringrateofaPoissonspike train it corresponds to.We includea synapse fromall inputPoissonspike trains toeachneuronintheexcitatorylayer.Therearethen784xnsynapsesfrominputtotheexcitatorylayer,iftherearenneurons in theexcitatory layer.Thegoalof the trainingprocedure is to learna settingoftheseweightsthatwillcausedistinctexcitatoryneuronstofirefordistinctdigitclasses.Thoseneuronswhichfiremostforacertaindigitinthetrainingphasewelabelwiththatdigit’svalue.Inthetestphase,weclassifynewdataexamplesbytakingthemajorityvoteofthelabelsofthoseneurons,whichfireforatestdatasample.

2.SpikingNeuralNetworkforMNISTHandwrittenDigitClassification

2.1-DescriptionofETHSNNModel

Theexcitatoryneuronsareconnectedinaone-to-onefashionwithanequally-sizedlayerofinhibitoryneurons,eachofwhichlaterallyinhibitallexcitatoryneuronsbuttheonewhichitreceiveditssynapticconnectionfrom.Thelateralinhibitionmechanismtypicallycausesasingleneurontofirestronglyforagivendatasampleina“winnertakeall”fashion,allowingtheneuronwhichfirestoadjustitsweightstorememberdatasamplesofasimilarshapeandpixelintensity.

Page 2: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

2

Figure1:ETHSpikingNeuralNetworkModelforMNISTDigitRecognition

2.2-OnlineSTDPRules

ToimplementSTDPinanonlinefashion,apre-andpostsynaptictraceiskeptasabriefrecordofspike

activity.Onaspike fromapresynapticneuron, thepresynaptic trace, , is set to1,andonaspike

fromapostsynapticneuron,thepostsynaptictrace, ,issetto1.Bothexponentiallydecaytowardszerootherwise,asgivenby:

, ,

where , aretimeconstantsgivenas1msand2ms,respectively.

TheonlineSTDPrulewithoutpresynapticweightmodificationisgivenbythefollowingequation,wheredenotestherelativeweightchangeontheaffectedsynapse, denotesalearningrate(typicallyset

to0.01), indicates themaximumvalueaweightcanattain,and denotesaweightdependenceterm,whichissetto1forbestresults:

Thislearningruleisillustratedbelow(Figure2).

Page 3: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

3

Figure2:STDPwithoutpresynapticmodification

The online STDP rule without presynaptic weight modification and exponential weightdependenceisgivenby

,

where determinesthestrengthoftheweightdependence,andistypicallychosentobe1.TheonlineSTDPrulewithpresynapticweightmodificationisgivenbythefollowingtworules:

,

where and are post- and presynaptic learning rates typically set to 0.01 and 0.0001,respectively.Finally,theonlineSTDPrulewhichisacombinationofthesecondandthirdSTDPrulesisgivenbythefollowingtworules:

,

Aftereachtrainingstep,thesynapticweightsofeachexcitatoryneuronarenormalizedsothattheydonotgrowsolargeastofireindiscriminatelyforeverypossibleinputsample.Furthermore,thenetworkisrunfor150ms(300computationstepswith0.5mssimulationtimestep)aftereachinputsampletoallowtheactivationsoftheexcitatoryneuronstodecaybacktotheirrestingpotential.Wealsoimplementanadaptivemembranethresholdpotentialforeachexcitatoryneuron,whichstipulatesthattheneuron’s

thresholdisnotonlydeterminedbyafixedthresholdvoltage ,butbythesum ,whereincreases by a small fixed constant each time the -th neuron fires, and is decaying exponentiallyotherwise.

Onthenextpage,weshowaplotoftheactivationsofaspikingneuralnetworkasabove,withonly4excitatoryandinhibitoryneurons,andaplotofthechangeintheweightsofthesynapsesfromtheinputtothe“0”-indexedexcitatoryneuron,bothoverasingle inputpresentationandrestperiod(500msor1000timestepsare0.5mssimulationtimestep).Asyoumayobserve,duringthisiteration,onlythe“0”-

Page 4: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

4

thneuronever reaches its thresholdandspikes, laterally inhibitingallotherexcitatoryneurons in theprocess.

Noticethattheincreaseinsynapticweightcorrespondsintimetothefiringoftheexcitatoryneuron(inthis case, the postsynpatic neuron), and are larger when the difference between presynaptic andpostsynapticspikesissmaller.Thosesynapticweights,whichdonotchangecorrespondtopresynapticneuronsthathaven’tfired;i.e.,theirmeanfiringrateiseitherzeroorcloseenoughtozerosothattheyhadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potentialmechanism,onecanalsoobservetheincreaseinthethresholdparametereachtimethe“0”-thneuronfires a spike. After the 350ms training portion (or 700 timesteps), the 150ms resting portion (or 300timesteps)allowstheactivationsofeachoftheexcitatoryneuronstosettletotheirrestingpotential.

Figure3:Neuronactivations(left)andcorrespondingweightchanges(right)

Weproposedthetrainingtechniquesof inputsmoothing, inwhichthemeanfiringrateofeachoftheinputPoissonspiketrainswasaveragedwitheachofitsneighbors’totheimmediateleft,right,top,andbottom, and weight smoothing, in which, during the training phase, after each input example waspresentedtothenetworkandnetworkweightshadbeenmodified,weaveragedtheweightsfromtheinputtoasingleneuronintheexcitatorylayerusingtheneighborhoodsmoothingdescribedabove.

Tocomparethe fourSTDPrulesandthesmoothingoperations,wetraineda400excitatory-inhibitoryneuron SNN on the MNIST handwritten digit training dataset (60K examples), and tested it on thecorresponding test dataset (10Kexamples), for all possible combinationsof STDP rule and smoothingoperation.Theresultsarebelow(Table1).

Table1:Classificationresults:ETHmodelandmodifications

StandardOnlineSTDP

ExponentialWeight

Dependence

WithPresynapticWeight

Modification

Both

Diehl&CookAlgorithm

89.35 89.28 90.54 88.51

WithInputSmoothing

88.42 88.75 90.23 89.23

WithWeightSmoothing

88.90 89.85 90.02 89.54

Page 5: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

5

3.NovelApproaches:ConvolutionalSpikingNeuralNetworks

3.1-ProposedNetworkArchitecture

To extend the spiking neural network (SNN) model from [1], we first implemented a single layer ofconvolution“features”or“patches”.Thislayerisadoptedfromthewidely-adoptedconvolutionalneuralnetworkoriginatingfromdeeplearningresearch[2].Toaccomplishthis,weuseconvolutionwindowsofsize ,withastrideofsize . Inourcase,aconvolutionwindowcorrespondstoafunctionwhichmapsasectionofinputtoasingleexcitatoryneuron.Asingleconvolutionfeaturemapsregularlyspacedsubsectionsoftheinputtoa“patch”ofexcitatoryneurons,allviasharedsynapticweights(asinglesetofweights is learnedforasingleconvolutionfeature).Weusemultipleof theseconvolutionfeatures,mappingtodistinctexcitatoryneuronpatches,inordertolearnmultipledifferentfeaturesoftheinputdata.Ourgoal is to learnasettingof theseweightssuchthatwemayseparatethe inputdata (in thecurrentstudy,theMNISThandwrittendigitdataset)intosalientandfrequentcategoriesoffeatures,asevidencedbythefiringpatternoftheexcitatoryneurons,sothatwemightlatercomposetheminordertoclassifythedigitswiththeirrespectivecategoricallabels.

AswiththeSNNfrom[1],theweightsfrominput(convolutionwindows)toexcitatorylayer(subdividedinto convolution patches) are learned in an unsupervised manner using spike-timing dependentplasticity (STDP) in the trainingphase. In the testphase, theseweightsareheldconstant,andweusethem to predict the labels of a held-out test dataset. The labels of newdata are determined first byassigninglabelstotheexcitatoryneuronsbasedontheirspikingactivityontrainingdata,andthenusingtheselabelsinamajority“vote”fortheclassificationofnewdatasamples.

Thenumberofexcitatoryneuronsperconvolutionalpatchcanbedeterminedviatheconstants and,andisgivenby

,

where isthedimensionalityoftheinput,assuming isasquarenumber.Thenumberofconvolutionfeaturesisthenetworkdesigner’schoice,whichwedenoteby ,butwithmorefeaturescomegreaterrepresentation capacity and computational cost.As a first pass,we chose touse exactly inhibitoryneurons,eachofwhichcorrespondtoasingleconvolutionpatch.Eachneuron inaconvolutionpatchconnectstothisinhibitoryneuron,whichisconnectedtoallotherneuronsofallotherpatchesbuttheone it receives its connections from. This has the effect of allowing a particular convolution patch tospikewhiledampingouttheactivityofallothers,creatingcompetitiontorepresentcertainfeaturesoftheinputdata.Below,wehaveprovidedadiagramofthenetworkarchitecture(Figure4).

Page 6: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

6

Figure4:ConvolutionalSpikingNeuralNetwork(SingleConvolutionalLayer)

3.2-Experiments

Weexperimentwithdifferentchoicesof and andobservehowtheclassificationperformanceofthenetworkdegradesfromthatoftheSNNmodelfrom[1].Weusethesametrainingandtestprocedurefrom[1],butuseonlyasingleiterationthroughtheentiretrainingdataset(60,000examples),becausetrainingthenetworkisratherslow.Wechosetoset =27and =1forourfirstexperiments,causingthe convolution windows to cover almost the entire input space, and we therefore expect that theclassification performance of the networkwill not degrade toomuch from themodel from [1] givensufficient .We thengraduallydecreased , keeping and constant,expecting to seeagradualdegradation in classification performance. The results are below in Figure 5; for reference, the testclassificationperformanceoftheSNNmodelfrom[1]with100excitatoryneuronsisalsoplotted.

Figure5:Classificationaccuracybyconvolutionsize,numberofconvolutionpatches

Page 7: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

7

3.3-WeightSmoothing

Weexperimentedwith aweight smoothing technique,which canbedescribed as follows:After eachtrainingsamplehasbeenrunthroughthenetwork(forthe500msor1,000timestepsat0.5mseach),foreachconvolutionpatch,wetakeaweightedsumoftheweightsoftheneuronwhichspikedthemostduringthetrainingiterationanditsneighborsinahorizontal/verticallatticeintheconvolutionpatch,and copy this across all neurons in the same patch. For example, if the -th neuron is the “winner”(havingthemostspikeswithinitspatch),wecompute

,

where denotesthesetofweightsconnectedaconvolutionwindow(aportionoftheinput)tothe-thneuronintheconvolutionpatch,and isthesetofindicesofneighboringneuronsof -

thneuroninthelattice,inthesameconvolutionpatch.SeeFigure6foranillustrationofthisprocedure.This approach does not appear to improve the algorithm’s performance, indeed, it often harms theclassificationperformanceoftheoriginalalgorithm,sowedonotreportperformanceresultsfromthenetworks we tested.

Figure6:Weightsmoothingaftereachtrainingexample

3.4-ConvolutionalSpikingNeuralNetworkWithBetween-PatchConnectivity

Wedescribeamodificationtoourconvolutionalspikingneuralnetwork(SNN)modelwhich includesalatticeconnectivitybetweenpairsofneighboringconvolutionalpatches.Theweightsconnectivitypairsof convolutionalpatchesare learnedvia the sameSTDP rulewhich isused to learn theweights frominputtothepatches.Wealsodescribeanewprocedureforinhibitingexcitatoryneuronsandassigninglabelstonewdatawhichallowsforthenetworktoquicklylearnfeaturesofinputdata.

AddedConnectivity

Page 8: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

8

Suppose there are convolutional patches (or “features”) of convolutionneurons, and let be the

patch indexedby the integer . If iseven, thenweconnect to ;otherwise,weconnect it to

. The connectivity pattern is as follows: for every neuron in a patch, we connect it to allneurons intheneighboringpatch,aslongastheindicesremainwithinthe patch. This gives a horizontal and vertical lattice connectivity pattern between neighboring

convolution patches.Note that there are only such lattice connections; and are connected,

and are connected, up to and . The connectivity pattern is illustrated, for a singleneuron in a single pair of patches, in Figure 7. The rest of the synapses are omitted for ease ofvisualization.

Figure7:Connectivitybetweenconvolutionalpatches(“i”iseven)

Thisisafirstattemptataddingconnectivitybetweenpatches.Wemayexperimentwithdifferentlatticeneighborhoods (to encompass more of the nearby input space), and with a different overarchingconnectivity pattern; e.g., instead of connectivity certain neighboring patches, we could connect allneighboringpatches,orconnecteachpatchtoallothers.

Inhibition

Sincewewishtolearnweightsbetweenneighboringneuronsinneighboringpatches,itnolongermakessenseforasingleinhibitoryneurontobefiredbyallneuronsinasinglepatch,whichwouldtheninhibitall other patches totally. Instead, to allow for the learningof correlationsbetween thepatches, eachexcitatory neuron projects to its own inhibitory neuron. This inhibitory neuron, when it reaches itsthresholdandspikes,willinhibittheexcitatoryneuroninallotherexcitatorypatcheswhichresideinthesamespatiallocationastheneuronfromwhichitreceivesitssynapse.ThisinhibitionschemeisdepictedinFigure8.

Page 9: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

9

Figure8:ConvolutionalSNNwithbetween-patchconnectivity

LabelingNewData

Forthisarchitecturemodification,weimplementedadifferentlabelingschemefortheclassificationofnewinputdata.Recallthat,fortheoriginalSNNmodel,welabelanewinputdatumwiththelabelwhichcorrespondstothehighnumberofspikesfromtheexcitatorylayer.Theneurons,inturn,areassigneddigitlabelsbasedonthedigitclassforwhichtheyspikethemost.Sinceeachconvolutionpatchlearnsasingle “feature” of the input (theweights on the connections from input to all neurons in the samepatchareidentical),wechoosetoincludeonlythecountsofspikesonanewinputdatumfromonlytheneuronineachpatchwhichspikedthemostwhileitwaspresentedtothenetwork.Thejustificationforthiswasthatmanyoftheneuronsinthenetworkwouldspikeinfrequentlyonanygiveninput,butonlya few neuronswould spike consistently. Removing the counts from the infrequently spiking neuronsallowsustoremovethe“junk”countsfromthelabelingscheme.

3.5-PreliminaryResults

We are currently running a number of simulations of the above described network, for a variety ofsettingsofconvolutionsize,stridesize,andnumberofconvolutionpatches.WeincludeinFigure9,10,and 11 plots of the accuracy of the network with the added pairwise lattice connectivity, evaluatedevery100trainingexamples.Wesetthe labelsof theneuronsusingthespikesresulting fromthe last100examples,andusetheselabelstoclassifythenext100examples.Sinceweareevaluatingonsofewdata at each iteration, the accuracy curve exhibits high variance. Each of these networks have 50convolutionfeatures.

Page 10: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

10

Figure9:27x27convolutions,stride1,50features

(62.3%averageaccuracyvs.61.4%withoutlatticeconnections)

Figure10:18x18convolutions,stride2,50features

(50.3%averageaccuracyvs.43.25%withoutlatticeconnections)

Figure11:10x10convolutions,stride6,50features

(40.1%averageaccuracyvs.21.08%withoutlatticeconnections)

Page 11: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

11

4.DiscussiononSNNComputationalComplexity

Wedescribethecomputationalrequirementsofourspikingneuralnetworkmodelsandcompare ittothat required by a convolutional neural network in the deep learning literature. Our spiking neuralnetworkmodelsareprogrammedinPython,usingtheBRIANneuralnetworksimulator[3].

4.1-SpikingNeuralNetworkComputation-MNIST

Weconsidera single training “iteration”of the spikingneuralnetworkmodel from [2] (seeFigure1).Recall that, for each input example,we “present” the input to the network for 350ms and allow thenetworktorelaxbacktoequilibriumfor150ms,foratotalof500msperiteration(or1,000integrationsteps with a 0.5ms time step). The input is encoded as Poisson spike trains [3], as described thedocument“SpikingNeuralNetworkModelandSTDPRules”,andeachareconnectedtoeveryneuroninan arbitrarily-sized layer of excitatory neurons. ThesePoisson spike trains emit a spikewith a certainprobabilityateachtimestep,whichincreaseswiththetimeelapsedsinceitlastemittedaspike.So,fortheMNISTdigitdataset,weconsider28*28=784independentPoissonprocesses,eachofwhichmustbesolvedforineachtimestepofeachtrainingiteration.

The neurons in the excitatory and inhibitory layers of the network are modeled with leakyintegrate-and-fireneurons,whosemembranepotentialsaredescribedby

,

Where is the resting potential of the neuron, and are the equilibrium potentials of

excitatoryandinhibitoryneurons,respectively, isabiologicallyplausibletimeconstant,and andare theconductancesofexcitatoryand inhibitorysynapses, respectively.Theexcitatoryand inhibitoryconductancesupdateruleisgivenby

, ,

Where , arebiologicallyplausibletimeconstants.Foreachexcitatoryand inhibitoryneuron,we

solve the above equations at each time step. Assuming there are excitatory neurons, and

inhibitoryneurons,wemustsolve equationsateachtimestep,accountingforall inputPoisson spike trains spikes and excitatory, inhibitory neurons’ membrane potentials and synapseconductance.

Atthebeginningofthetrainingphase,theweightsonthesynapsesfrominputtothe layerofexcitatoryneuronsarechosenuniformlyatrandomintherange[0.01,0.3],andthroughoutthetrainingphase,theseareupdatedusingourchosenSTDPrule.ForeachinputPoissonspiketrainandexcitatoryneuron,a“trace”tellsushowsooninthepastthisneuronhasfired.It’sgeneralformisgivenby

,

Page 12: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

12

where is the trace,which is set to1whentheneuron fires,andupdatesaccording to theequation

aboveotherwise,and isatimeconstant,whichmaybechosentobedifferentforpresynaptic(inputPoissonspiketrains)orpostsynaptic(excitatorylayer)neurons.Whenaninputspiketrainemitsaspike,weupdatethesynapseconductanceofthoseneuronsitisconnectedto(allexcitatoryneurons)bytheweightalongitsconnectiontoit.Whenanexcitatoryneuronemitsaspike,weapplytheSTDPupdaterule to the connections from the input layer to itself, and update the synapse conductance of theinhibitory neuron to which it connects. Since there are approximately 15 spikes emitted from theexcitatorylayerpertrainingiteration,thereareapproximately weightupdatesperiteration.

Alltogether,overonetrainingiteration(inwhichasingletrainingexampleispresentedtothenetwork),thereisontheorderof operationstocompute(weonlycalculateweightupdatesduringtheactive inputperiod; thenetwork isat“rest” for the last300timesteps).Forexample,withanetworkof400excitatoryandinhibitoryneurons(whichgivesapproximately90%classificationaccuracy,dependingon theSTDP ruleutilized), thisamounts toapproximately10.6millionoperationspertrainingexample.

SincetheSTDPlearningruleisonlyappliedlocally(i.e.,betweenanytwogivenneurons),wedon’thavetowait to propagate signals through the network before computing a backward pass in order, as ingradient-based deep learningmethods. Additionally, very littlememory is required to simulate thesenetworks. We keep track of matrices of weights, membrane potentials, synapse conductances, andsynaptic traces only, and even with large networks, the memory overhead for the network ismanageableonlaptopmachines.

4.2-ConvolutionalNeuralNetworkComputation–MNIST

WeestimatethecomputationalburdenimposedbyasimpleconvolutionalnetworkusedtoclassifytheMNIST handwritten digit dataset. The described network achieves approximately 99% classificationaccuracyonthetestdatasetaftertrainingonsome500,000images(wesubsample50imagesfromthetrainingdataset10,000times).Allconvolutional layersofthenetworkutilizezero-paddingofsize2onalledgesoftheimages.Thenetworkconsistsofaconvolutionallayer,withpatchsize5x5andstride1,with32convolutionfeatures(giving28x28x32dimensionality),thenamax-poolinglayerwithpatchsize2x2andstride2(giving14x14x32dimensionality),anotherconvolutionallayer,againwithpatchsize5x5andstride1,with64convolutionfeatures(giving14x14x64dimensionality)anothermax-poolinglayer,again with patch size 2x2 and stride 2 (giving 7x7x64 dimensionality), a fully-connected layer of size1024,andalogisticregressionclassifierofdimensionality10,fromwhichweextractthedigitlabelusingthesoftmaxfunction.

Forasingletrainingexample,thefirstconvolutional layerofthisnetworkrequiresthecomputationof28*28*32 ReLU() (rectified linear unit) functions, each on the sum of 5*5 entries in the input layermultipliedbytheirconnectionweights,foratotalofontheorderof627,200operations.Thefirstmax-pooling layer simply requires computing themax operation over some 14*14*32 2x2matrices, for atotal of 6,272operations. Similarly, the second convolutional layer and the secondmax-pooling layerrequiresome313,600and3,136operations,respectively.Thefully-connectedlayerrequirescomputing1,024ReLU functions, eachon the sumofproductsof theentries in theprevious layerby their edgeweights,resultinginontheorderof operations.Finally,thelogisticregressionrequires another1024*10operations, and then simply takes theargmax() functionof theoutput 10-

Page 13: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

13

dimensional vector. Therefore, the forward pass of the network takes approximately operations. Thebackwardpass,ontheotherhand,requiringcomputingpartialderivativesforeachparameterineachofthesefunctions,butitscomplexityisthesameasthecomplexityoftheforwardpass.Tounderestimate,we say that the backward pass requires, again, 4,171,712 operations, and so we estimate that thisnetworkrequiresapproximately8.3millionoperationspertrainingexample.

4.3-ClassificationResults

We include a plot of the performance of the spiking neural network model from [1] on the MNISThandwrittendigit testdataset inFigure2.Thisplotgives theclassificationaccuracyof themodelasafunctionofthenumberofexcitatoryandinhibitoryneuronsinthenetwork.Eachnetworkwastrainedusing a single pass through the entire 60,000 digit dataset. We have also include the classificationaccuracyoftheabovedescribedconvolutionalneuralnetwork,alsoonthetestdataset.WetrainedandtestedtheSNNmodelusingexcitatory,inhibitorylayersizesof25,50,100,200,400,800,1600,3200,6400.Theclassificationaccuraciesfortheseparticularrunsincreasemonotonically,ascanbeseeninthefigure.

Figure12:ClassificationaccuracyofETHSNNmodelbynetworksize

4.4-PossibleImprovements-SpikingNeuralNetworkOnesimplewaytoreducethetimecomplexityofthespikingneuralnetworkmodelfrom[1]istoreducethenumberoftimestepsusedperiterationofthetrainingandtestphases.Currently,we“present”thenetworkwithaninputfor350ms(700timestepsata0.5mstimeincrement),andthen“turnoff”theinputfor150ms(300timesteps),allowingthemembranepotentialsandsynapseconductancesoftheexcitatoryandinhibitoryneuronstodecaybacktowardstheirrestingvalues.Reducingthenumberoftimestepswouldrequireadjustingtheparameteroftheequationswhichgoverntheneuronsinthenetworkinorderfortheneuronstorespondtotheinputdatamorequickly.

Page 14: Supplement to DARPA Quarterly Report Q2 Task 1.2 ... · hadn’t emitted a spike during the training iteration. Thanks to the adaptive membrane potential mechanism, one can also observe

14

AcurrentseverelimitationoftheBRIANneuralnetworksimulationsoftware(usingversion1.4.3)isthatitworksmost efficiently as a single-core process. Upgrading to version 2.2.x should allow us to takeadvantage ofmulti-core andGPU processing in order to speed up our experiments, butwemust becarefultoreplicatethecodecorrectlyandconsideralltheparallelprocessingoptionsavailable.

5.References

[1]Diehl,P.&Cook,M.(2015).Unsupervisedlearningofdigitrecognitionusingspike-timing-dependentplasticity.FrontiersInComputationalNeuroscience,9.doi:10.3389/fncom.2015.00099

[2]Krizhevsky, A., Sutskever, I.,&Hinton,G. (2012). ImageNet ClassificationwithDeep ConvolutionalNeuralNetworks.Papers.nips.cc.

[3] Goodman DF and Brette R (2009). The Brian simulator. Front Neuroscidoi:10.3389/neuro.01.026.2009

[4]http://www.cns.nyu.edu/~david/handouts/poisson.pdf