AER protocol

ASYNCHRONOUS EVENT REDIRECTING IN BIO-INSPIRED COMMUNICATION

Ph. Hafliger

Institute of InformaticsUniversity of Oslo, Norwaye-mail: [email protected]

ABSTRACT

The paper presents the FPGA implementation of a pro-grammable asynchronous digital circuit (henceforth calledAE-map) that remaps ‘address events’. Address eventrepresentation (AER) is an event driven communicationprotocol originally used in VLSI implementations of neu-ral networks to transfer action potentials (neural volt-age pulses) between neurons. More generally speakingit is suited to transmit a number of analog values thatare coded in frequency of events over an asynchronousdigital bus. The AE-map allows to redirect such eventsbetween an AE sender and an AE receiver, thereby forinstance programming the connection scheme of a neu-ral network. Earlier approaches for redirecting AEs haveused digital synchronous devices such as DSPs or micro-controllers. The more simple and more dedicated asyn-chronous solution presented here is more energy efficient,does not impose a discretization on the time axis andachieves a much faster throughput. In the present imple-mentation AEs (9 bit input, 7 bit output) can be processedat intervals of less than 84ns per output AE.

1. INTRODUCTION

1.1. Address Event Representation

Address event representation (AER) is an event drivencommunication protocol that has originally been put for-ward within the field of ‘neuromorphic aVLSI’ [6, 5, 9].Neuromorphic engineering tries to incorporate operating-principles of the nervous system into technical devices[8]. AER was first used to approach the massive con-nectivity of biological neural networks but in general itis suited to convey a large number of analog values (e.g.sensory data) through a low capacity channel (an asyn-chronous digital bus).

It works as follows: AER is used to transmit ‘events’.Events are characterized by a location (address) and atime. For example, in a network of neurons the addressidentifies one particular neuron and the time would bethe time at which the neuron fires an action potential(AP=nerve pulse). For the transmission of a number ofanalog values (e.g. pixels in a camera) one would codethe intensity in the frequency of such events (rate cod-ing). (This transformation of an intensity (e.g. a photo-

diode current) into a event rate can be achieved quiteeasily by placing a simple integrate-and-fire neuron cir-cuit (6 transistors, 2 small capacitors) into the pixel.)An asynchronous digital bus is used for the actual trans-mission. The event’s location is encoded digitally as an‘address’ which is placed on the bus at the time of theevent. On the receiver end of the bus this address is againdecoded into a receiving location. For neural networksthat location would be a particular synapse (input site)of a particular neuron on that receiver chip and for ratecoded analog values it could be some integrator that re-constructs the analog value (e.g. a pixel on a screen). Orthese addresses can directly be used by a digital devicewithout the effort of an AD conversion.

This event driven strategy is more energy efficientthan scanning (as for example in video connections), ifthe data is sparse, i.e. if only a few sender locations tendto be very active at a time. An example of such datawould be the output of a silicon retina [6]. This is an‘intelligent camera’ inspired by the biological retina (thephoto sensitive tissue in the back of the eye). It performssome processing on an image already in the recordingpixel. One variant of a silicon retina for example is onlysensitive to changes. And since natural scenes tend to berather static, fast changes happening only around edgesof moving objects, a scanning strategy wastes a lot of en-ergy on reading pixels where nothing is happening. Inthe worst case detection of changes might be delayedfor the time it takes to scan through the whole image oreven be missed, if they are synchronous with the framerate. Whereas an AER strategy immediately reports on achange in a pixel.

The drawback is a risk of over-running the bus andthe need for collision handling. Other publications dealwith these issues [5, 9, 1, 2, 4] (and the AER map imple-mentation presented in this paper assumes that collisionsare resolved, before AEs are placed on the bus). In gen-eral it can be said that for transmitting analog data thereis a trade off in temporal resolution, intensity resolution,size of address space, and expected occupation of thataddress space.

To come back to the example of the change sensitiveretina, given the timing of the AEs not only the rate ofchange (rate coding) can be reconstructed but also theonset of the change is quite evident an undisturbed by a

frame rate (the first event in a burst of activity). The orderof those onsets in neighbouring pixels indicates a direc-tion of motion and by measuring the intervals betweenthe onsets (temporal coding) the speed of that motion be-comes evident. The asynchronous unclocked implemen-tation of the AE bus avoids introducing a discretizationerror on this temporal code.

1.2. Asynchronous Devices

In asynchronous designs as opposed to synchronous oneseach component works at its own pace. In sequentialprocesses each component has to know when the datait is supposed to process is ready. It has to obtain thatinformation from the component that provides the data.Pipelining, which considerably sped up synchronous pro-cessors two decades or so ago, is a natural result of thisapproach. However in a pipelined operation, the slowestcomponent limits the overall speed of a sequence of op-erations. Still an asynchronous design can get en edge oneven optimally pipelined synchronous solutions for tworeasons. Firstly the slowest component (that dictates theclockrate in the synchronous approach) might not alwaysbe part of every operation, and secondly in synchronousdesigns the next clock cycle startsafter all local oper-ations are completed, whereas in asynchronous designsideally the next step in a sequential operation starts im-mediatelywhenthe previous operation is completed. The‘ideally’ refers to the fact, that it is locally not alwayseasily possible tocompute, whether a component has fin-ished its operation, and so as a work around the unit cansimply indicate that it is finished after a fixed delay, inwhich case there is no real advantage gained by the sec-ond argument.

Concerning energy efficiency, asynchronous circuitshave the advantage that they do not actively consume cur-rent when they are idle and that they do not need a clock,which consumes a considerable percentage of the totalpower in fast, highly integrated circuits.

These arguments have convinced researchers to evenstart developing asynchronous micro-processors [7] andan increasing number of commercial asynchronous de-vices are nowadays available.

And as previously mentioned, if the AER map is tobe used in a system that relies on temporal codes, im-plementing it asynchronously avoids introducing a dis-cretization error in the time domain.

2. ARCHITECTURE OF AN ASYNCHRONOUSADDRESS EVENT MAP

In a neural net structure normally one neuron is con-nected to many other neurons. In AER that means thatthe sender address has to be mapped to several receiveraddresses. This mapping could be hardwired on the send-ing and receiving IC-chip, such that the address on theAE-bus would correspond to one sending and several re-

NOR2NOR2

NOR2

NOR2NOR2

NOR2

NOTNOTreqin INPUT

processed INPUTconsumed INPUT

ackinINPUT

ackout OUTPUT

process OUTPUT

reqoutOUTPUT

consume OUTPUT

AND2

AND2

WIRE

WIRE

process

processed

Figure 2: The ‘HSpropagate’ circuit (used in figure 1)synchronizes a pipelining stage with its neighbours.

72 ns 52 ns

50 ns / div

88 ns

Figure 3: A recording from the FPGA by a logic analyzerthat illustrates the minimal output interval and the latencyof processed AEs.

ceiving sites (or vice versa), or it can also be handledby a separate component, mapping addresses on a senderbus to addresses on a receiver bus. Such an AE map canbe designed to be programmable such that arbitrary net-work structures (mappings) can be investigated. A syn-chronous programmable AE map based on a DSP hasbeen presented in [3], and others have used micro con-trollers (unpublished). In the following there will be amuch simpler asynchronous device presented, that is morededicated to that particular task.

Figure 1 shows the block diagram of the asynchronousimplementation. The size of the input and output addressspaces were chosen to connect a particular retina chip toan array of artificial neurons. The whole design is imple-mented on an ALTERA Flex FPGA (EPF10K20RC208-3). Communication with a Sun Ultra 5 workstation isachieved by a ‘PCI 16D’ card from EDT, which pro-vides fast 16-bit parallel handshake controlled commu-

84 ns 84 ns

50 ns / div

Figure 4: A recording from the AER map depicting theminimal output interval in case of a ’one to one’ map-ping.

RAM2

RAM1

DECODE

LATCH

DOWN

COUNTER

ADD

RAM3

access_mode[1..0]we[3..0]

we_glob

A[8..0]

D[8..0] D

A

WE

Q

D

A

WE

D

A

WE

EN

D

ALD

HS_propagate

consume

consumed

process

processed

consume

consumed

process

processed

Q

QD

4ns 12ns12ns

consume

consumed

process

processed

16ns

EQ0

cnt_clk

1

0

we_glob

we_glob

req_in

ack_out

eq0

20ns

eq0

cnt_clk

BUSMUX

we[3]Q

Q

we[2]

we[1]

req

ack

req

ackHS_propagate

req

ack

req

ackHS_propagate

req

ack

req

ack

req_in

ack_out

9

5

9

8

8

7AE_out[6..0]

cnt_en

cnt_en

Figure 1: The schematics of the AER map.

nication. The FPGA is placed on a simple PCB boardthat contains additional bus drivers to connect the FPGAto the 16D bus. Some additional blocks on the FPGA(not shown) make it possible to configure the AE mapvia the 16D bus. AEs can be sentto the map by this 16Dbus or through two other connectors on the PCB. Addi-tional circuitry on the FPGA (not shown) performs anasynchronous arbitration between those three sources ofinput. AEsfrom the map are put out on a fourth connec-tor.

The circuit of the AE map (figure 1) is subdividedinto three pipelining stages (separated by dashed lines).The ’HS propagate’ circuits (described in figure 2) andthe surrounding logic on the bottom of the figure con-trol the timing and the sequence of events in the asyn-chronous computation. Each ’HSpropagate’ circuit isin control of one pipelining stage. The ’delay’ elementscontain an appropriate number of RS-flipflops in seriesto achieve the indicated delays. The first stage reads inthe incoming AE that addresses the content in the two lefthand RAMs (RAM1 and RAM2). RAM1 contains point-ers to memory blocks in RAM3 to the right of the fig-ure. These blocks in RAM3 contain all the outgoing AEsto which the incoming AEs are to be mapped. RAM2contains the sizes of these blocks. (Note that the blocksfor different incoming AEs can overlap. This can be ex-ploited to save memory as for instance two incoming AEsthat are supposed to produce the same outputs can simplypoint to the same block.) When the pointer and the blocksize are stable, the first ’HSpropagate’ circuit issues arequest to the next pipelining stage which latches the

pointer and loads the block size into a counter. The logicto the right of the second ’HSpropagate’ block gener-ates now the number of hand-shakes as determined bythe block size. The pointer to the block and the countervalue are added and the resulting pointer into the mem-ory block is handed to the last pipelining stage, whichmerely controls the memory access to RAM3 an handlesthe handshake with the external receiver of the outgoingAEs. The signal ’accessmode[1..0]’ distinguishes be-tween an AE input (accessmode=0) and write accessesto the three RAMs (accessmode=1,2 or 3).

The ‘HS propagate’ circuit depicted in figure 2 syn-chronizes a pipelining stages with the previous and thenext stage using a 4 phase handshake. If the circuit isnot already busy (i.e. the ‘process’ signal is not active)an incoming request is acknowledged as soon as the in-coming data is ‘consumed’ and the ‘process’ signal is set.When the processing is completed (‘processed’) an out-going handshake is initiated at the completion of which‘process’ is reset. Thereafter new incoming requests areaccepted again. This circuit has the important advantagethat it does not hang even if the causality rules of a hand-shake are not followed by the providing and the receivingpartner, i.e. when an acknowledge or a request is with-drawn too soon.

3. PERFORMANCE

If the request and acknowledge signals of the output portare short circuited on the PCB, then the AE map circuitputs out AEs in intervals between 52 ns and 84 ns (de-

pendent on the nature of the mapping, see figures 3 and4) if the AER map is overrun.

In order to overrun the map with varying input fromthe 16D bus (figure 3) it had to be programmed to mapevery incoming AE to at least 6 outgoing AEs, sincein our setup we could only provide changing input AEswith a minimal interval of 300 ns. This minimal inter-val for transmission from the 16D bus to the AER mapwas given by the delay on the bus, in the drivers thatconnected the bus to the PCB, and in the arbitration cir-cuits on the FPGA (not shown) that allow three differentsources of input. Figure 3 shows a recording by a logicanalyzer of such a scenario. It also illustrates the latencyof processed AEs. The circuit is programmed to map anincoming 4 (on bus P16OUT all) to (7F, 7E, 7D, 7C, 7B,7A) (bus AEOUT all) and a 5 to (5, 4, 3, 2, 1, 0). Anincoming AE sequence of (5, 4) is processed. The la-tency of 88 ns of the AER map is measured between theonset of the incoming request signal (MAPREQ) and theonset of the first outgoing request (REQOUT). The sig-nals MAPREQ and MAPACK are measured directly atinput to the AER map (nodes ’reqin’ and ’ackout’ in fig-ure 1). The latency from off-board (not shown) was 156ns. The interval between subsequent outputs caused bytwo different inputs is 72 ns and the interval between twosubsequent outputs caused by the same input is 52 ns.

In order to overrun the map, when it was programmedto map one incoming AE to only one outgoing AE (figure4) a faster sender was ‘simulated’ by inverting the outgo-ing request and feeding it back as acknowledge. The in-put address was hold constant. One of the input connec-tors that did not go through bus drivers was used. For therecording in figure 4 the map was programmed to put outan F for an incoming 4. The delays caused by the signalsgoing to and coming from off-chip plus the additionalcircuitry that arbitrates between three possible sources ofinput use up slightly more time than the map uses to pro-cess two subsequent inputs. Therefore the output intervalis increased to 84 ns in this scenario.

84 ns is about two orders of magnitude faster than thepublished 10µs from a DSP based solution [3], althoughthe DSP solution offers a bigger address space and theauthors hope to be able to optimize their software furtherto achieve a shorter transmission interval of the order of1 µs (private communication).

Unfortunately the energy consumption of the FPGAcould not be measured directly on the PCB, since therewas no separate power line going to it and most of thepower of the board goes into the bus drivers. In anycase we maintain the claim that the asynchronous solu-tion fares better than a comparable synchronous imple-mentation on the same FPGA. A synchronous solutionwith the same output rate would need to be clocked withat least 12 MHz (84 ns cycles) and would always con-sume the current that is necessary to drive that clock line.And especially while there are no AEs to process will theasynchronous implementation fare better.

4. CONCLUSION

A simple and dedicated architecture and its implementa-tion on a FPGA is presented that performs address eventmapping. It is an asynchronous design that is simpler,faster and cheaper as compared to systems based on DSPsor micro controllers. The asynchronous implementationsaves the power that would go into driving the clock in asynchronous design, and its current consumption is min-imal when no events are processed. When testing theimplementation presented here on an FPGA it can pro-cess address events in less than 84 ns per output event.Since the architecture is asynchronous, no discretizationis imposed on the time and therefore discretization errorsin continuous time computations on address events areavoided.

5. REFERENCES

[1] A. Abusland, T. S. Lande, and M. Høvin. A aVLSIcommunication architecture for stochastically pulse-encoded analog signals.ISCAS, III:401–404, 1996.

[2] K. Boahen. A throughput-on-demand address-eventtransmitter for neuromorphic chips. InAdv. Res. inVLSI, pages 72–86. IEEE Comp. Soc. Press, 1999.

[3] S. R. Deiss, R. J. Douglas, and A. M. Whatley. Apulse-coded communications infrastructure for neu-romorphic systems. InPulsed Neural Networks,pages 157–178. The MIT Press, 1999.

[4] P. Hafliger. A spike based learning rule and its im-plementation in analog hardware. PhD thesis, ETHZurich, Switzerland, 2000.http://www.ifi.uio.no/˜hafliger.

[5] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Sil-viotti, and D. Gillespie. Silicon auditory processorsas computer peripherals.IEEE Trans. on Neural Net-works, 4(3):523–528, 1993.

[6] M. Mahowald.VLSI analogs of neuronal visual pro-cessing: A synthesis of form and function. PhD the-sis, Cal. Inst. of Tech., Pasadena, California, 1992.

[7] A. J. Martin, A. Lines, R. Manohar, M. Nystr¨om,P. Penez, R. Southworth, U. Cummings, and T. KwanLee. The design of an asynchronous MIPS R3000microprocessor.Adv. Res. in VLSI, (17), September1997.

[8] C. A. Mead. Neuromorphic electronic systems.Proc. IEEE, 78:1629–1636, 1990.

[9] A. Mortara and E. A. Vittoz. A communicationarchitecture tailored for analog VLSI artificial neu-ral networks: intrinsic performance and limitations.IEEE Trans. on Neural Networks, 5:459–466, 1994.

AER protocol

Documents

Transcript of AER protocol