Rebooting Neuromorphic Design - A Complexity Engineering ...

Rebooting Neuromorphic Design - A ComplexityEngineering Approach

Natesh GaneshApp. & Comp. Mathematics Division, NIST

& Dept. of Physics, CU BoulderEmail: [email protected]

Abstract—As the compute demands for machine learning andartificial intelligence applications continue to grow, neuromorphichardware has been touted as a potential solution. New emerg-ing devices like memristors, atomic switches, etc have showntremendous potential to replace CMOS-based circuits but havebeen hindered by multiple challenges with respect to devicevariability, stochastic behavior and scalability. In this paperwe will introduce a Description↔Design framework to analyzepast successes in computing, understand current problems andidentify solutions moving forward. Engineering systems withthese emerging devices might require the modification of boththe type of descriptions of learning that we will design for,and the design methodologies we employ in order to realizethese new descriptions. We will explore ideas from complexityengineering and analyze the advantages and challenges they offerover traditional approaches to neuromorphic design with novelcomputing fabrics. A reservoir computing example is used tounderstand the specific changes that would accompany in movingtowards a complexity engineering approach. The time is idealfor a significant reboot of our design methodologies and successwill represent a radical shift in how neuromorphic hardware isdesigned and pave the way for a new paradigm.

I. INTRODUCTION

The compute resources required to train state ofthe art (SOTA) machine/deep learning/artificial intelligence(ML/DL/AI) models is increasing at a ‘super-Moore’ rate- doubling every 3.5 months [1], massively increasing theamount of energy required to generate these models [2]. Theproposed solutions have been centered around improved co-design around architecture and algorithms as seen in CMOS-based TPUs, FPGAs, ASICs and spike-based hardware. Theuse of transistors for efficient analog computing has regainedsome popularity but are not mainstream. There is also grow-ing interest in exploring the use of emerging devices likememristors, photonics and atomic switch networks to builda new generation of AI hardware. While these novel devicesshow great promise of energy efficiency, high density and non-linearity, they have often been hindered by stochastic devicebehavior, manufacturing variability and challenges of largescale implementation relative to traditional CMOS. Successfulrealization of neuromorphic systems with these emergingdevices is key to building more efficient hardware to meetthe growing demands for compute.

The goal of this paper is to identify the the fundamentalproblems in the current framework that hinder the successfulintegration of these novel devices for AI hardware. If we are

able to successfully address these problems, we would thenbe able to engineer a novel paradigm of complex systemswith the potential to realize faster, robust and more efficientinformation processing [3]. We will start by analyzing theexponential success we have achieved over the last six decadesunder a description ↔ design framework in Sec. II. In Sec.III, we will use the same framework to explain why the timeis ideal to completely reboot some of our fundamental ideas inboth description and design in order to make progress. Ideasof complexity, complexity engineering and self-organizationwill be introduced in Sec. IV and V, and will pave theway towards discussing a complexity engineering approach toneuromorphic design in Sec. VI. We will discuss the changesnecessary to the descriptive framework with respect to thereservoir computing framework and provide a path forwardusing non-equilibrium thermodynamics in sec. VII. We willsummarize the paper and conclude in Sec. VIII.

II. DESCRIPTION↔ DESIGN

The title of this section represents one of the central ideasfrom this paper. In our field, the description of computationboth influences and is influenced by elements of design.The modern computing technology stack is complex withmultiple interdependent components. We will break it downto 4 fundamental parts that will be our focus (Fig. 1) -(a) Task for which the system is being built for.(b) Theoretical framework used to describe how computa-

tional states are represented and the algorithm used toachieve the task.

(c) System architecture describes how the different parts ofthe systems are connected.

(d) Physical computing devices corresponds to how com-putation is physically realized i.e. the hardware of thesystem.

The task and theoretical framework components correspondto Description - How is the task described computationally,what is the algorithm, how are inputs and outputs represented?What is considered as achieving the task in a computationalmanner? The latter two - architecture and devices correspondto Design - How are the different blocks necessary to achievethe computation arranged efficiently and what are the physicaldevices that can realize the specific input and outputs? These2 categories and the 4 components constantly influence eachother, which we will explore further in the next section.

arX

iv:2

005.

0052

2v2

[cs

.ET

] 2

2 Se

p 20

20

Fig. 1. Description ↔ Design - The computing stack divided into four fun-damental interconnected components: Description consisting of the Task andthe Theoretical Framework, and the Design consisting of System Architectureand Physical Devices.

The components during the era of digital computing are -

(a) Tasks - Performing large mathematical operations.(b) Theoretical framework - Boolean algebra, finite state

automata and Turing machines.(c) System architecture - General purpose computing has

been built on a variant of the von Neumann architecture.(d) Physical devices - CMOS devices in binary digital mode.

The relative stability of these factors represent a perfect stormthat drove the digital computing revolution. Let us exploredescription-design relationship further with respect to thesecomponents.

At the heart of modern-day computing is Turing’s seminalwork in 1936, in which he established a very general model forcomputation using the idea of Turing machines, showing that‘any function computable by an algorithm, can be computedon a Turing machine’ by manipulating binary symbols like‘0’ and ‘1’ on the machine tape [4]. Modern computers arenot replicas of Turing machines but are based on the idea ofmanipulating symbols based on efficient algorithms in orderto achieve computations. Claude Shannon’s work in provingthe equivalence between the behavior of networks of electricalswitches and Boolean logic functions is another fundamentalbuilding block of digital computing [5]. The first establishedthe theoretical framework and the latter indicated the type ofphysical systems that can implement the framework - whichtogether pushed for the search for switching devices requiredto instantiate the binary symbols.

The primary task of building computers to perform largemathematical calculations was influenced by early digitalcomputers built around the 2nd World War for performing thecalculations needed in artillery firing tables, cryptoanalysis, etcand utilized electromechanical switches. The ENIAC machinecompleted in 1945 utilized vaccuum tubes and is historicallyimportant as it introduced the stored-program architecture(also known as the von-Neumann architecture) [6]. It was thefirst general purpose digital computer, Turing complete andallowed for the system to be reprogrammed by storing thedata and program in an external memory. Before the stored-program architecture, we had fixed-program systems in which

the program was hardwired in the system for a particular taskand could not be reprogrammed - similar to our design ofmodern day ASICs, albeit a lot less efficient and flexible.Modern day computer architecture is a lot more advanced andcomplicated, but are built on top of the original von-Neumannarchitecture.

Transistor technologies (BJT, MOS, CMOS, etc) given theirsmaller size, faster speeds, lower power consumption, betterSNR and ability to be combined with the integrated chip (IC)technology became the preferred device of choice to realize0’s and 1’s, and quickly replaced bulkier vacuum tubes inthe 1950s. A decade later in 1965, Gordon Moore made hisfamous observation about the number of transistors on an in-tegrated chip doubling about every two years i.e. Moore’s law[7]. With the powerful Turing machine theoretical framework,a von-Neumann stored program architecture, Shannon’s workon digital switches and the exponential increase in transistordensity to realize it, decrease in cost per compute and thegrowing interest in the scientific study of computers, efficientalgorithms, etc, the digital technological revolution was wellunderway. As the decades passed by, more and more prob-lems across different fields of science, engineering, medicine,economics, etc were made tractable by casting them as acomputational problem. And computers became ubiquitousin our everyday lives. Given this exponential progress, itis reasonable to question why the time is right for anotherrevolution of ideas.

III. VIVA LA REVOLUTION!!

The unintended consequence of the incredible success ofcomputing has been a streetlight effect i.e. continuing to dowhat we have already been very successful at. We live in aperiod where the availability of cheap and powerful computeencourages us to cast all problems in a manner that can besolved by our existing computers, and then look to optimizeboth the system hardware and software to improve the im-plementation efficiency. This has also served to discourage anumber of ideas to replace conventional systems.

CMOS devices are considered near irreplaceable in thecomputing stack with billions of dollars invested in their con-tinued development and in construction of SOTA fabricationfacilities. Moore’s law has been both the tip of the spear for ourprogress, and as a shield for CMOS transistor devices (againstpossible novel replacements) while components (a)-(c) haveremained relatively unchanged. Over the many decades, therehave been number of research programs focused on identi-fying devices like spintronics, carbon nanotubes, graphene,quantum-dots, molecular cellular automata, etc (sometimesreferred to as unconventional computing [8]). While someof these have been able to match and even surpass CMOSdevices in terms of device speed and power dissipation, criticsof these novel approaches often point towards their inabilityto match device robustness, signal-to-noise ratios, scalabilityand integration with IC design processes. The ability of thesedevices to construct robust logical gates at scale, which iscentral to the current computational paradigm is also seen as

a major roadblock to their adoption. However with Moore’slaw slowing down (and Dennard scaling completely stopped)as we approach the physical limits of device scaling, now isthe time to invest heavily in the research and development ofthese new emerging devices at the levels comparable to CMOStechnology [9]. This should help us both extend our currentprogress, as well as identify suitable devices for new tasks ofinterest.

The architecture of the system has been the more flexi-ble component when compared to physical devices. FPGAs,ASICs and system on a chip (SoC) for parallel processing,scientific computing, high performance computing, graphicprocessing units, etc are perfect examples of modifying (c) thesystem architecture according to the (a) specific task of inter-est while keeping the fundamental (b) theoretical computingframework (though they use specialized algorithms) and (d)CMOS devices unchanged. Increasingly the focus of the fieldhas shifted away from general purpose computing and towardsAI tasks - a set of tasks that are associated with intelligenceand cognitive abilities. With this shift has come the increasingdemand for compute in the field of ML and AI to realizethese tasks. The backpropagation algorithm, central to MLwas invented in 1986 by Rumelhart and Hinton [10], but thealgorithms were not feasible until the availability of GPUswith increased parallelism to perform the large number ofcomputations required [11]. The lesson here being - the valueof an algorithm is dependent on the availability of existinghardware to execute it feasibly. Thus design of computationalalgorithmic descriptions (of learning and intelligence) are un-doubtedly influenced by the type of operations that are feasibleon existing hardware i.e. existing design driving description.

The hardware solutions to provide the necessary supportfor ML have mainly focused on architectural improvements,which have been influenced by the machine learning algo-rithms themselves that needed to be executed. Learning isgenerally described as weight changes, using gradient descenttechniques on a suitable loss function E, given by the equationbelow

wt+1 = wt − ηdE

dw(1)

Learning is achieved during the training phase by perform-ing the above operation in Eq.(1) on billions of parame-ters using large amounts of training data. This requires thehardware to perform an extremely large number of matrixmultiplication and addition operations. The shift towards moreparallel architectures, crossbar structures for more efficientmatrix operations, reduced precision arithmetic and improveddata movement to combat the memory bottleneck representsignificant changes to the system architecture, influenced bydescriptions of what learning entails i.e. a case of descriptiondriving design. These have been adopted by both industrygiants (like Intel, NVidia, AMD, ARM, Google) and startups(Cerebras, Mythic, Graphcore, SambaNova) alike to improvethe efficiency of the hardware implementing these computeintensive algorithms. Of course more radical descriptions of

learning will drive the search for novel hardware (for eg:Shor’s algorithm [12] for prime factorization was a majordriving factor for quantum computing).

While we have focused on the use of transistors as switchesfor digital computation, they can also function as analog com-putational elements when used in appropriate device modes(Interestingly the use of transistors in this analog mannerexploiting the richer device physics is reminiscent of ideas em-ployed in unconventional computing). An important hardwareparadigm that has re-emerged is the field of neuromorphiccomputing. Neuromorphic computing was coined in 1990 byCarver Mead [13], who defined “neuromorphic systems” asthe use of very large scale integration (VLSI) techniques withanalog components that mimicked computation in biologicalneural systems (and digital blocks for communication). How-ever the use of this term has evolved to become much broader,meaning different things to different researchers. Systems areoften defined to be neuromorphic at very various levels ofthe computing stack - algorithm, architecture and device. Itincludes a wide range of implementations based on bothspike-based biologically-inspired algorithms as well deep-learning based artificial neural networks. A detailed survey ofneuromorphic systems has been explored in [14] illustratingthis very point. Many of these systems have shown tremendousimprovements in terms of energy efficiency but much work isneeded in improving these algorithms to compete with SOTAdeep learning techniques. It might serve the field to clearlydefine where this neuromorphic boundary lies in order forthe term to be meaningful in an useful sense. In any case,hybrid digital-analog systems built based on Mead’s originaldefinition can be seen as an natural co-design extension of thefully digital CMOS systems discussed above. In addition to thearchitectural changes to the system, the transistor devices havebeen used in an unconventional but natural analog manner tomimic neuronal and synaptic behavior to achieve the tasksin the AI suite. The task, architecture and physical devicecomponents have changed to learning tasks, crossbar/parallelarchitectures and analog computation to efficiently imple-ment the learning algorithms, while the theoretical comput-ing framework i.e. describing learning as the computationof weight changes using Hebbian or gradient descent basedtechniques, remains consistent across the various systems.

Of particular interest in this paper is the design of neuro-morphic hardware using novel emerging devices like mem-ristors [15], photonics [16], spintronics [17], atomic switchnetworks etc. While we will mainly refer to memristors inthis paper (given their increasing popularity in their use inmemory, in-memory compute and artificial neural networks),the underlying ideas can be extended to other novel devices aswell. Both on-chip and off-chip learning have been achieved inthese systems using mainly gradient descent-based algorithms(while some systems have utilized more biologically inspiredlocal Hebbian mechanisms as well). These devices giventheir small sizes (and thus large density), energy efficiency,speed, non-linearity, etc have shown great promise, but devicevariability, sneak path currents, self-discharge [18] and latency

due necessary control circuitry in dense crossbar structureshave hindered their progress with respect to scalability andstackability [19]. As incremental advances continue to bemade to improve the realization of existing algorithms as welltuning algorithms to account for device variations [20], it isnecessary to question if the problem isn’t one algorithm versusanother but rather the underlying computational descriptionand engineering methodologies itself? We must be willingto ask if we need to fully rethink descriptions and designin a manner that maximizes the potential of these noveldevices. Changing the description framework alongside thetask, architecture and devices will represent a change in allfour components concurrently for the first time in over sixdecades - a big reason why the time might be ripe to makefundamental changes. We will explore this in further detailover the next few sections.

IV. COMPLEX SYSTEMS, COMPLEXITY SCIENCE &ENGINEERING

The goal of building neuromorphic hardware is to identifyproperties of the human brain that are useful for intelligenceand emulate it using a different hardware substrate. The humanbrain is a complex system. It is necessary to clearly understandwhat this term complex entails as we look to engineer systemsthat mimic it. Systems like the human brain, social networks,ant colonies, the internet are a few examples of complexsystems. Complexity is roughly defined as being situatedbetween order and disorder. Complex systems are usuallyidentified using some properties that are common to systemsthat we identify as complex [21] -

(a) Large number of simple components.(b) Non-linear interaction among parts.(c) No central control.(d) Emergent behaviors like hierarchical organizations, ro-

bustness, phase transitions, complex dynamics, informa-tion processing capabilities, evolution and learning.

Here emergence corresponds to properties that are seen in thesystem as a whole but not in the components, that arise dueto the interaction between them (colloquially referred to asthe ‘whole being greater than the sum of the parts’). Theauthor in [21] also distinguishes between disorganized andorganized complexity. The first involves billions and billions ofparameters and assumes very little interaction between thosevariables. However our focus will be on the latter, which in-volves a moderate number of strongly interacting variables andexhibits the properties listed above. The burgeoning science ofcomplexity seeks to identify a unified theory across multipledisciplines.

The main research direction in complexity is to under-stand it’s emergence in natural and engineered systems byidentifying and studying the properties of networks. Criticalto this task is to define measurable quantities suitable forcharacterization and analysis [22]. An important aspect of thisin engineered systems is to address it as a problem that needsto be tackled and to augment the system to cope accordingly[23]. Another option as proposed by the authors in [24] is to

engineer systems that looks to take advantage of the complex-ity rather than suppressing or managing it. This is sometimesreferred to as emergent [25] or complexity engineering [26].It is important to differentiate here between complex andcomplicated systems. Complicated systems are systems whichhave a large number of components, predictable, do notshow emergent properties and ultimately reducible. Given thedifference in properties between complicated and complexsystems, the engineering of complex system will look verydifferent to traditional classical engineering ideas and requirea significant shift in our design thinking. Let us explore thisdifference further.

Classical engineering is what is usually taught in univer-sities everywhere and corresponds to applying methods andtechniques to solve problems using a reductionist approach,characterized by intuitive analysis, detailed understanding, de-terminism and predictability [28]. It requires the systems to bedesigned to be well-defined and engineers make the reasonablehypothesis that the parts of a system interact in some well-known and well defined ways without further influencing eachother. An example of this the divide and conquer strategy [29]- the problem is cut into it’s simplest components, and each isanalyzed separately, and detailed descriptions are generated.The parts are then connected together as required (ideally thecomponents are modular in nature) and the entire problem issolved.

Complexity engineering is less formalized currently and isakin to a search of the design space to produce a robustcomplex system to be situated in a dynamic environment.These systems, by definition are not reducible into theirvarious components. They have emergent functionality whichmeans that the required function is not instantiated by a singlecomponent or restricted to a part of the system, but insteaddistributed across the entire system arising at the macroscaledue to the interaction of many microscale components (heremacro and microscale are relative). It is thus necessary toengineer the right type of interactions between the differentcomponents so that the overall system dynamics produces thefunction of interest. Unlike classical engineering, where thedynamics and functions of every component is fully under-stood and specified, we will have to relax this constraint in thedesign of complex systems. We replace it with an approximateunderstanding of the overall behavior of the system, and theability to control and predict some aspects of the systemoutput even though it might be difficult (and computationallyexpensive) to understand how the system produced the output.It is important to understand that it is not a question of whichof the two - classical vs complexity engineering is better, butrather a question of the situations for which one might be moresuited than the other.

Classical and complexity engineering is also going havedifferent roles for the engineer. Rather than specifying theperformance of different components and controlling it, theengineer must now act more as a facilitator to guide and enablethe system’s self-organizing process to produce the resultsof value, as discussed in [25]. A loss of complete control

and predictability over the systems we design might seemalien for engineers, but it is something we must be willing toexplore moving forward. However we are in the early stagesin this discipline of complexity engineering. Moving forwardwe need to expand on the theoretical base from complexityscience, a framework to describe and translate concepts suchas emergence, evolvability and learning from natural systemsto be used in the engineering of technological systems, anda solid methodology to obtain ‘design’ protocols to engineerthe complex systems with properties we desire.

V. SELF-ORGANIZATION - SPECIFICATION TRADE-OFF

In addition to being a complex system, the brain (like allbiological systems) is a self-organized system. Self-organizedsystems share some properties that overlap with complex sys-tems like dynamical, robustness, adaptivity and autonomous(with no external or centralized control) [30]. For the purposesof this paper, we will provide a more rigorous definition ofself-organization [27]. Self-organization is the ‘process thatproduces dissipative non-equilibrium order at macroscopiclevels, because of collective, nonlinear interactions betweenmultiple microscopic components. This order is induced byinterplay between intrinsic and extrinsic factors, and decaysupon removal of the energy source.’ It is not to be confusedwith self-assembly which is non-dissipative order at equilib-rium, persisting without the need for energy. Though we use amore thermodynamics based definition, there are others basedon the use of measures that relates self-organization with theincrease in statistical complexity of the system [31]. It is alsoimportant in the context of ML/AI to distinguish between self-organization as defined above and self-organizing or Kohonenmaps [32], which are unsupervised learning algorithms forweights in a neural network. A very useful way to thinkabout classical vs complexity engineering is by using a self-organization - specification trade-off introduced in [24]. Theauthor states - On one hand we need certain functionalityin the systems, i.e. we have to be able to specify what thesystem should do (or a part of it). On the other hand, ifwe specify “every” detail of the system, if we design itby decomposing it, “linearizing” the problem, then no self-organization will happen and no emergent phenomena can beharnessed. Thus, there is a trade-off between self-organizationon one hand and specification or controllability on the other:If you increase the control over your system you will suppressself-organization capabilities. If you do not suppress the self-organization processes by avoiding constraining and control-ling many variables, it is difficult to specify what the systemshould do”.

We can discuss this trade-off in a more concrete mannerin terms of the number of variables N used to describe asystem at some suitable level of abstraction [24]. Let NC

be the number of constrained variables, which indicate thevariables or parameters that the engineer can control in order toextract the required functionality from the system. This makesthe rest of the N variables - the unconstrained variables NU

which are not under the control of the engineer and influenced

by the system dynamics alone (and evolve as allowed byphysical law). By definition we have NC + NU = N . Thetwo limits include a fully engineered system with NC = Nand no self-organization and no emergent phenomenon on oneend, and a fully self-organized system with NU = N . Incomplexity engineering, the goal is to produce systems withNU >> NC - to not exert control over most variables exceptfor a small number of constrained variables to guide the systemevolution in the directions that will produce efficient solutions.This does not imply that we can achieve self-organization bytaking an existing system and remove the controls in it tomake NU >> NC . Instead we have to take into account thedifferent components and the interactions between them sothat the self-organization of the system over time, under theconstraints of NC produces the results we want. By increasingthe number of degrees of freedom NU that self-organize tobe larger, we are looking to exploit the intrinsic physics ofthe system to a greater extent to achieve computing goals.The work presented in this paper can be seen as a naturalextension of Mead’s originial idea to further exploit CMOSdevice physics to build analog elements of neuromorphicsystem, as well as recent work looking to leverage greateramount of physics in algorithms and nanoscale material forneuromorphic computing [33].

VI. COMPLEXITY ENGINEERING APPROACH TONEUROMORPHIC DESIGN

The use of complexity engineering approaches to engineeremergence in computing has been limited to unconventional[34] and intrinsic [35] computing, and there is no literatureof it’s use in neuromorphic hardaware to the extent of theauthor’s knowledge. In this section, we will study the advan-tages of designing neuromorphic hardware using a complexityengineering approach. We start by analyzing neuromorphicdesign under the traditional engineering approach, which startswith a specific learning task and an algorithm to solve itefficiently. Most learning algorithms - local Hebbian type orglobal backpropagation based rules are of the form givenin Eq.(1) and operate at the level of weights. This levelof the description, that we will refer to as fine-grained ormicroscale influences the design of the circuits to implementthe algorithm as mentioned earlier. Irrespective of transistoror memristor-based synaptic circuits at the crossbar junctions,we need to build read, write and control circuitry at thismicroscale level in order to be able to change the weights(constrained variables) - this corresponds to having a systemwith NC >>> NU . As we scale up, the additional circuitryrequired to overcome sneak path currents become increasinglyharder and expensive to achieve. Furthermore the issue ofvariability in the memristor device and behavior is also prob-lematic if we require very specific weight changes. For thesereasons, traditional approaches to neuromorphic design usingmemristors suffer many sizeable challenges. As we continueto make improvements under traditional approaches, we mustask whether it is feasible to engineer such complex systems

with the required emergent properties using novel devices inthis paradigm [36].

We explore the complexity engineering approach by firstanalyzing a set of conditions a system would need to satisfy inorder to be engineered through self-organization [23] - (a) Au-tonomous and interacting units, (b) Minimal external control,(c) Positive and negative feedback, (d) Fluctuations/variationsand (e) Flat dynamic internal architecture. We now map theseconditions onto the required neuromorphic hardware systemsthat is built through self-organization. Such a system willbe made of a large number of non-linear neurons interactingthrough their weights. Currently we build external control intothe circuit at a fine-grained (microscale) individual neuron andweight level (i.e. NC >> NU ) to correctly realize the learningalgorithm. In order to allow for self-organization, we will haveto use a macroscale descriptions of learning (we will explorein further detail in the next section) that would reduce thenumber of constrained variables and allow the system to evolvefreely under it’s native dynamics (NU >> NC), exploitingthe rich intinsic behavior of these new devices. Note thatby reducing NC , we also make the system evolution moreautonomous, satisfying (b) from above. It is also necessary toensure that as we control the small number of NC , we preventthe system from entering no-go regions in their state-space thatare of no value to the users. The use of recurrent architectureand external error signals to influence system evolution canprovide the necessary feedback signals to the system. Thevariations in the device manufacture and behavior are nowpreferred as we seek for a diversity enabled sweet-spot inself-organized networks [37]. Fluctuations in the microscale(weight) components are tolerable as long as the overall sys-tem can realize the necessary functionality and make the sys-tem more robust. Noise corresponds to unexpected variationsto constrained variables NC . On the other hand, variations inthe unconstrained variables are acceptable and ‘noise’ is nowa resource we can exploit (as done in simulated annealing[38]). A flat internal architecture is achieved by using numberof neurons of similar behavior and system connectivity thatevolves dynamically based on the input stimuli presented toit. For these reasons, neuromorphic hardware systems basedon emerging non-linear devices (like memristors) would be avery promising candidate for being engineered through self-organization. These systems can further exploit the existingbottom-up fabrication techniques that are already used to buildsystems with these devices. It is important to understand theexact type of conditions in which self-organization and com-plexity engineering approaches will trump traditional ideas.We are not proposing that all neuromorphic systems to bebuilt this way. If we are using digital CMOS devices in whichwe have engineered away most of the device physics, thentraditional techniques will continue to be much better suited.

VII. MACROSCALE DESCRIPTIONS OF LEARNING

A macroscale description of learning corresponds to thechange in the theoretical framework that we discussed theneed for in section III. For tasks that fall under the AI suite

to be realized more effectively, using emerging devices likememristors and brain-inspired architectures requires a changein the description from microscale to macroscale (designdriving description), which in turn will also require a changefrom traditional to complexity engineering methodologies (de-scription driving design).

In computational neuroscience, Marr’s levels of analysis[39] was inspired from how information systems have beentraditionally built and understood, and there has been acontinuous overlap of ideas of between the two fields overthe decades. The change in the description of learning canbe seen as a technological extension of recent ideas beingdiscussed in computational and systems neuroscience [40].In this paper, the authors describe the classical framework insystems neuroscience as observing and developing a theory ofhow individual neurons compute and then assembling a circuit-level picture of how the neurons combine their operations.They note how this approach has worked well for simplecomputations but not for more complicated functions thatrequire a very large number of neurons. They propose toreplace it with a deep-learning based framework that shiftsthe focus from computations realized by individual neurons,to a description comprising of - (a) Objective functions thatdescribe goals of the system, (b) Learning rules that specifythe dynamics of neurons and weights and (c) Architectures thatconstrain how the different units are connected together. Theyfurther state that the - “This optimization framework has anadded benefit: as with ANNs, the architectures, learning rulesand objective functions of the brain are likely relatively simpleand compact, at least in comparison to the list of computationsperformed by individual neurons.” This idea of understandingartificial neural networks at a macroscale or coarse-grainedlevel of objective functions, learning rules and architecture ofthe network overall as opposed to studying the system at themicroscale or fine-grained level of individual neurons (whereit is often hard to intelligibly describe the system comprisingof billions of parameters) has been explored in [41]. We aresimply suggesting that as we move towards adopting thesemacroscale descriptions of neural networks, we must looktowards complexity engineering methodologies to design andbuild systems based on these newer descriptions.

We will now introduce a simple example system of reservoircomputing (Fig.2) to better clarify the ideas discussed andidentify the types of problems that need to be addressed.The reservoir computing paradigm is an umbrella term thatincludes techniques like liquid state machines and echo statenetworks [42]. They have been physically implemented withnon-linear dynamic elements [44], [43], and provide a muchsimpler way to train recurrent neural networks (RNN). Reser-voir computing systems consists of an input layer, a RNN-based reservoir and a single output layer. The weights inthe reservoir remain fixed during training and the networkgenerates a non-linear transformation of the inputs in it’sstates. The output signal is generated at the output layer as acombination of the reservoir signals. Only the weights in thesingle output layer are trained using gradient descent with a

Fig. 2. Reservoir computing with input layer feeding input signals u(t)to the static reservoir. The reservoir generates a higher order non-lineartransformation of the input signals in it’s states x(t). The output layer istrained to generate the outputs y(t) using the reservoir states [43].

teacher signal as the target. In order for the system to functionproperly and approximate the target signal, the weights inthe reservoir (generated prior to training using evolutionaryalgorithms) are chosen such that connection matrix W ofthe entire network satisfy the echo state or fading memoryproperty [44]. The property is characterized in different ways- a simple and popular one is having the spectral radiusof W < 1 (a sufficient condition that depends upon inputstatistics was proposed in [45]). This is an example of amacroscale condition for learning and inference on the entirereservoir network (that allows for multiple weight level solu-tions) as opposed to a microscale condition on the individualweights. We can view the static weights as unconstrainedvariables while the macroscale condition on W’s spectralradius or Schur stability is a constraint. The extension ofthe static reservoir is an adaptive reservoir in which wehave a new macroscale reservoir condition to achieve echostate property. This condition accommodates changes to themicroscale weights of the network over time depending uponthe input statistics in order to adapt and learn new inputs.Unlike traditional RNNs, we do not train the reservoir usingan algorithm at the microscale weight-level and instead let theweights evolve (self-organize) as an unconstrained variable.Thus successful identification and implementation of thismacroscale condition will result in a RNN with weights thatare evolving without external control at the microscale level,while producing the required network functionality.

We propose a (non-exhaustive) list of properties anymacroscale description of learning would need to satisfy inorder to be useful in a complexity engineering approach todesign. These include the ability to:(a) Address system evolution and self-organization.(b) Be implementation independent like computation.(c) Quantify information processing and computation. Map-

ping to existing work in ML is a bonus.(d) Address questions of accuracy and efficiency.(e) Can be studied experimentally and in simulation.(f) Tied to physical law and generate no-go results.

The author proposes the use of thermodynamics as a possiblemacroscale descriptive framework for learning. The field ofthermodynamics was invented in the 19th century to addressquestions of efficiency in engines and has evolved to addressthe same in information engines [46], and computationaldescriptions, it is universally applicable to all systems. Thephysical costs of information processing has been widelystudied in the thermodynamics of information field [47]. Thereis also a rich history of using thermodynamics based ideas inthe history of machine learning. Early energy-based modelslike Hopfield and Boltzmann machines [48], and free-energybased Helmholtz machines [49] are based on equilibrium ther-modynamics - evolving the network towards a state of max-imum entropy/minimum free-energy at equilibrium. Howeverself-organized complex systems of interest (like the humanbrain) are open systems that continuously exchange matterand energy with an external dynamic environment, show awider range of behavior and are more accurately described bynon-equilibrium thermodynamics.

In the last few decades, the field of non-equilibrium ther-modynamics has undergone a revolution making a tremen-dous improvement in theoretical tools available to charac-terize systems far from equilibrium [50], [51], [52]. Whileequilibrium thermodynamics focuses on the distributions ofstates at static equilibrium (in the infinite time limit), non-equilibrium fluctuation theorems characterize the associatedwork and heat distributions of dynamical trajectories of statesas it is driven by external inputs over finite time. There isa growing body of work in understanding the relationshipbetween non-equilibrium thermodynamics and learning inphysical systems [54], [55], [56], [57]. In [58], the authordiscusses the relationship between minimizing heat dissipa-tion and information-bottleneck based predictive inference inMarkov-chain models. There is also very interesting workgeneralizing the generative model of the Boltzmann machine,in which learning is characterized as distributions of the workdone by varying the energy landscape of a non-equilibriumsystem [59]. The authors in [59] cast the contrastive divergencealgorithm for training restricted Boltzmann machines as aphysical process that approximately minimizes the differencebetween entropy variation and average heat, and discuss therelationship between annealed importance sampling [60] andthe thermodynamic Jarzynski equality [50]. In [61], the au-thors establish an equivalence between thermodynamics andmachine learning by showing that agents that maximize workproduction also maximize their environmental models log-likelihood. The relationships between computational descrip-tions of learning processes and non-equilibrium thermody-namic descriptions of the same as a physical process arewhat we are looking to achieve. These descriptions can serveas the basis for developing non-equilibrium control protocolsusing ideas from thermodynamic control [62], [63] to designphysical systems to realize these protocols. These systemswill realize the corresponding thermodynamic condition andby the aforementioned equivalency the computational learningas well. Engineering hardware based on these thermodynamic

Fig. 3. (a) An optical image of an SiO2 coated wafer with 120 Pt electrodes(top) & SEM of a self-assembled Ag+ network after submersion in AgNO3(bottom). (b) The silver nanowire network (top) takes the form of a tiny squareof mesh at the center of the device (bottom) [70].

descriptions might seem alien when compared to our existingtop-down standard protocols for design using computationaldescriptions. However designing physical systems based onthermodynamic considerations is very common in the field ofmolecular and bio-engineering, and unconventional comput-ing. A good example of this are chemical Boltzmann machines[64], in which the authors construct a mapping from Boltz-mann machines to chemical reactions. And these reactions areusually designed based on kinetic and thermodynamic factors[65].

A list of some of the considerations when trying to engineera self-organized system is provided in [66] - (a) identifysuitable interactions, (b) choosing competing interactions andpotentials, (c) choosing proper scale and (d) synthesis - mov-ing from small systems with minimal components to largersystems. A detailed review of the principles behind directedself-organization by manipulating the energy and entropylandscape of the system is available in [67]. This change inour design approach to computing hardware is going to needa restructuring of our philosophies and significant interdis-ciplinary work. We do not have start from scratch though,as we can look to build upon work in nanoarchitectonics[68], computational matter and in-materio evolution [69]. Wealso have a good idea of the properties that we want in thefinal engineered self-organized systems - a network with largenumber of heterogeneous components, sparse connections withsynaptic behavior, scale-free/small-world topology, criticality,etc. Examples of self-organized networks that satisfy suchproperties and have been shown to be capable of achievinglearning include [70] (Fig.3), [71], [72], [73] and [74]. Fab-rication of these networks are not based on computationaldescriptions and provide the ideal base to experiment on andbuild a framework to understand novel macroscale descriptionsof learning and the relationship between choices in the designprocess to functional capabilities of the self-organized system.

VIII. DISCUSSION & CONCLUSION

In this paper, we studied the connections between theDescription and Design components of the computing stack.This framework allowed us to understand the exponentialsuccess that we have achieved over the years, and alsoidentify the issues facing us moving forward in the designof neuromorphic hardware. If our goal is to engineer energyefficient neuromorphic hardware that can mimic the abilitiesof a complex self-organized brain using emerging stochasticdevices, we must be willing to replace the traditional re-ductionist engineering approach with complexity engineeringmethodologies. This will require a significant shift in howwe understand computation and describe learning in physicalsystems, our design philosophy on what it means for a complexsystem to be ‘designed,’ and the role of the engineer in thesesystems. In this paper, the author identifies two challengesthat need to be addressed to achieve progress - (a) identifyingnew macroscale descriptions of learning that would leveragethe intrinsic physics of the system to a greater degree andsuggested non-equilibrium thermodynamics as a possible pathforward and (b) take inspiration from other fields to developnew design protocols using the above descriptions that wouldhave better synergy with these emerging fabrics.

The author recognizes the tremendous challenges and workthat lies ahead of us, but views these as an unique setof opportunities that not are often available to the researchcommunity. Complexity engineering is a relatively new fieldthat requires a lot of research to be formalized, expandedand brought into the mainstream of hardware design. Onecan look back at history and point to traditional hardwaredesign to be facing the exact same challenges at the start ofthe digital computing paradigm in the 1940s. However eightdecades later, with the appropriate investments in research wehave made great strides in understanding traditional designand building a large number of tools and infrastructure tomake it feasible and profitable. The author hopes that giventhe massive benefits that we could reap from efficient self-organizing hardware for AI applications, the community willincrease focus on these new ideas. Success would allow usto meet the growing compute demand in the short term andusher in a technological revolution in the long term.

REFERENCES

[1] Amodei, Dario, et al., “AI and Compute,” OpenAI blog, 2019.[2] Strubell, Emma, Ananya Ganesh, and Andrew McCallum, “Energy

and policy considerations for deep learning in NLP,” arXiv preprint,arXiv:1906.02243 (2019).

[3] Teuscher, Christof, “The weird, the small, and the uncontrollable:Redefining the frontiers of computing,” Computer, 50.8 (2017): 52-58.

[4] Turing, Alan Mathison, “On computable numbers, with an applicationto the Entscheidungsproblem,” Journal of Math, 58.345-363 (1936): 5.

[5] Shannon, Claude E, “A symbolic analysis of relay and switchingcircuits,” Electrical Engineering, 57.12 (1938): 713-723.

[6] Von Neumann, John, “First Draft of a Report on the EDVAC,” IEEEAnnals of the History of Computing, 15.4 (1993): 27-75.

[7] Moore, Gordon E, “Cramming more components onto integrated cir-cuits,” Proceedings of the IEEE, 86.1 (1998): 82-85.

[8] Stepney, Susan, and Simon J. Hickinbotham, “UCOMP Roadmap: Sur-vey, Challenges, Recommendations,” Computational Matter, Springer,Cham, 2018. 9-31.

http://arxiv.org/abs/1906.02243

[9] Apte, Pushkar, and George Scalise, “The recession’s silver lining,” IEEESpectrum, 46.10 (2009): 46-51.

[10] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams,“Learning representations by back-propagating errors,” Nature, 323.6088(1986): 533-536.

[11] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, “Imagenetclassification with deep convolutional neural networks,” Advances inNeural Information Processing Systems, 2012.

[12] Shor, Peter W, “Polynomial-time algorithms for prime factorization anddiscrete logarithms on a quantum computer,” SIAM Review, 41.2 (1999):303-332.

[13] Mead, Carver, “Neuromorphic electronic systems,” Proceedings of theIEEE, vol. 78, no. 10, pp. 16291636, Oct 1990.

[14] Schuman, Catherine D., et al., “A survey of neuromorphic computingand neural networks in hardware,” arXiv preprint, arXiv:1705.06963(2017).

[15] Li, Can, et al., “Efficient and self-adaptive in-situ learning in multilayermemristor neural networks,” Nature communications, 9.1 (2018): 1-8.

[16] De Lima, Thomas Ferreira, et al., ”Progress in neuromorphic photonics,”Nanophotonics, 6.3 (2017): 577-599.

[17] Grollier, Julie, et al., “Neuromorphic spintronics,” Nature Electronics,(2020): 1-11.

[18] Upadhyay, Navnidhi K, et al., “Emerging memory devices for neu-romorphic computing,” Advanced Materials Technologies, 4.4 (2019):1800589.

[19] Adam, Gina C., Ali Khiat, and Themis Prodromakis, “Challengeshindering memristive neuromorphic hardware from going mainstream,”Nature Communications, 9.1 (2018): 1-4.

[20] Querlioz, Damien, et al., “Immunity to device variations in a spikingneural network with memristive nanodevices,” IEEE Transactions onNanotechnology, 12.3 (2013): 288-295.

[21] Mitchell, Melanie, Complexity: A guided tour, Oxford University Press,2009.

[22] Lloyd, Seth, “Measures of complexity: a nonexhaustive list,” IEEEControl Systems Magazine, 21.4 (2001): 7-8.

[23] Frei, Regina, and Giovanna Di Marzo Serugendo, “Concepts in complex-ity engineering,” International Journal of Bio-Inspired Computation, 3.2(2011): 123-139.

[24] Buchli, Jonas, and Cristina Costa Santini, “Complexity engineering:Harnessing emergent phenomena as opportunities for engineering,”Santa Fe Institute, 2005.

[25] Ulieru, Mihaela, and Rene Doursat, “Emergent engineering: a radicalparadigm shift,” International Journal of Autonomous and AdaptiveCommunications Systems, 4.1 (2011): 39-60.

[26] Wolfram, Stephen, “Approaches to complexity engineering,” Physica D:Nonlinear Phenomena, 22.1-3 (1986): 385-399.

[27] Halley, Julianne D., and David A. Winkler, “Consistent concepts ofselforganization and selfassembly,” Complexity 14.2 (2008): 10-17.

[28] Carlos, Gersenson, “Design and control of self-organizing systems,” PhDthesis, Faculty of Science and Center Leo Apostel for InterdisciplinaryStudies, Vrije Universiteit, Brussels, Belgium.

[29] Heylighen, Francis, ‘‘Complexity and self-organization,” (2008).[30] De Wolf, Tom, and Tom Holvoet, “Emergence versus self-organisation:

Different concepts but promising when combined,” International work-shop on engineering self-organising applications, Springer, Berlin, Hei-delberg, 2004.

[31] Shalizi, C.R, “Causal Architecture, Complexity and Self-Organizationin Time Series and Cellular Automata,” PhD thesis, University ofWisconsin at Madison (2001).

[32] Kohonen, Teuvo, “Self-organized formation of topologically correctfeature maps,” Biological Cybernetic,s 43.1 (1982): 59-69.

[33] Markovic, Danijela, et al., “Physics for Neuromorphic Computing,”arXiv preprint, arXiv:2003.04711 (2020).

[34] Stepney, Susan, Fiona A C Polack, and Heather R. Turner, “Engineeringemergence,” 11th IEEE International Conference on Engineering ofComplex Computer Systems (ICECCS’06), IEEE, 2006.

[35] Feldman, David P., Carl S. McTague, and James P. Crutchfield, “Theorganization of intrinsic computation: Complexity-entropy diagrams andthe diversity of natural information processing,” Chaos: An Interdisci-plinary Journal of Nonlinear Science, 18.4 (2008): 043106.

[36] Johnson, Christopher W, “What are emergent properties and how do theyaffect the engineering of complex systems?,” Reliability Engineering andSystem Safety, 91.12 (2006): 1475-1481.

[37] Nakahira, Yorie, et al., ”Diversity-enabled sweet spots in layered archi-tectures and speed-accuracy trade-offs in sensorimotor control,” arXivpreprint, arXiv:1909.08601 (2019).

[38] Johnson, David S., et al., “Optimization by simulated annealing: An ex-perimental evaluation; part I, graph partitioning,” Operations Research,37.6 (1989): 865-892.

[39] Marr, David, “Vision: A computational investigation into the humanrepresentation and processing of visual information.” (1982).

[40] Richards, Blake A, et al., “A deep learning framework for neuroscience.”Nature neuroscience 22.11 (2019): 1761-1770.

[41] Lillicrap, Timothy P. and Konrad P. Kording, “What does it mean tounderstand a neural network?” arXiv preprint, arXiv:1907.06374 (2019).

[42] Lukoeviius, Mantas, and Herbert Jaeger, ”Reservoir computing ap-proaches to recurrent neural network training,” Computer Science Re-view, 3.3 (2009): 127-149.

[43] Du, Chao, et al., “Reservoir computing using dynamic memristors fortemporal information processing,” Nature communications, 8.1 (2017):2204.

[44] Tanaka, Gouhei, et al., ”Recent advances in physical reservoir comput-ing: A review,” Neural Networks, (2019).

[45] Manjunath, Gandhi & Herbert Jaeger, “Echo state property linked toan input: Exploring a fundamental characteristic of recurrent neuralnetworks,” Neural computation, 25.3 (2013): 671-696.

[46] Um, Jaegon, et al., “Total cost of operating an information engine,” NewJournal of Physics, 17.8 (2015): 085001.

[47] Parrondo, Juan MR, Jordan M. Horowitz, and Takahiro Sagawa, “Ther-modynamics of information,” Nature physics, 11.2 (2015): 131-139.

[48] Rojas, Ral, Neural networks: a systematic introduction, Springer Science& Business Media, 2013.

[49] Dayan, Peter, et al., “The helmholtz machine,” Neural computation, 7.5(1995): 889-904.

[50] Jarzynski, Christopher, ”Nonequilibrium equality for free energy differ-ences,” Physical Review Letters, 78.14 (1997): 2690.

[51] Crooks, Gavin E, ”Entropy production fluctuation theorem and the non-equilibrium work relation for free energy differences,” Physical ReviewE, 60.3 (1999): 2721.

[52] England, Jeremy L, ”Statistical physics of self-replication,” The Journalof Chemical Physics, 139.12 (2013): 09B623-1.

[53] Barlow, Horace B, “Possible principles underlying the transformation ofsensory messages,” Sensory communication, 1 (1961): 217-234.

[54] Still, Susanne, ”Thermodynamic cost and benefit of memory,” PhysicalReview Letters, 124.5 (2020): 050601.

[55] Ganesh, Natesh, ”A Thermodynamic Treatment of Intelligent Systems,”2017 IEEE International Conference on Rebooting Computing (ICRC),IEEE, 2017.

[56] Sohl-Dickstein, Jascha, et al., ”Deep unsupervised learning usingnonequilibrium thermodynamics,” arXiv preprint, arXiv:1503.03585(2015).

[57] Bahri, Yasaman, et al., ”Statistical mechanics of deep learning,” AnnualReview of Condensed Matter Physics (2020).

[58] Still, Susanne, “Information bottleneck approach to predictive infer-ence,” Entropy, 16.2 (2014): 968-989.

[59] Salazar, Domingos S P, “Nonequilibrium thermodynamics of restrictedBoltzmann machines,” Physical Review E, 96.2 (2017): 022131.

[60] Neal, Radford M, “Annealed importance sampling,” Statistics and Com-puting, 11.2 (2001): 125-139.

[61] Boyd, Alexander B., James P. Crutchfield, & Mile Gu, “ThermodynamicMachine Learning through Maximum Work Production,” arXiv preprint,arXiv:2006.15416 (2020).

[62] Deffner, Sebastian, & Marcus VS Bonana, “Thermodynamic controlAnold paradigm with new applications,” Europhysics Letters, 131.2 (2020):20001.

[63] Rotskoff, Grant M., Gavin E. Crooks, & Eric Vanden-Eijnden, “Geomet-ric approach to optimal non-equilibrium control: Minimizing dissipationin nanomagnetic spin systems,” Physical Review E, 95.1 (2017): 012148.

[64] Poole, William, et al., “Chemical boltzmann machines,” InternationalConference on DNA-Based Computers, Springer, Cham, 2017.

[65] Prausnitz, John M, “Molecular thermodynamics for chemical processdesign,” Angewandte Chemie International Edition in English, 29.11(1990): 1246-1255.

[66] Fialkowski, Marcin, et al., ”Principles and implementations of dissipative(dynamic) self-assembly,” J. Phys. Chem. B, 110, 2482-2496 (2006):2482-2496.







[67] Grzelczak, Marek, et al., “Directed self-assembly of nanoparticles,” ACSNano, 4.7 (2010): 3591-3605.

[68] Ariga, Katsuhiko, et al., “Nanoarchitectonics for dynamic functional ma-terials from atomic/molecularlevel manipulation to macroscopic action,”Advanced Materials, 28.6 (2016): 1251-1286.

[69] Konkoli, Zoran, et al., “Reservoir computing with computational matter,”Computational Matter, Springer, Cham, 2018. 269-293.

[70] Avizienis, Audrius V., et al., “Neuromorphic atomic switch networks,”PloS One, 7.8 (2012).

[71] Diaz-Alvarez, Adrian, et al., “Emergent dynamics of neuromorphicnanowire networks,” Scientific Reports, 9.1 (2019): 1-13.

[72] Manning, Hugh G., et al., “Emergence of winner-takes-all connectivitypaths in random nanowire networks,” Nature communications,

[73] Bose, Saurabh K., et al., “Stable self-assembled atomic-switch networksfor neuromorphic applications,” IEEE Transactions on Electron Devices,64.12 (2017): 5194-5201.9.1 (2018): 1-9.

[74] Milano, Gianluca, et al., “Self-organizing memristive nanowire networkswith structural plasticity emulate biological neuronal circuits,” arXivpreprint, arXiv:1909.02438 (2019).


Rebooting Neuromorphic Design - A Complexity Engineering ...

Documents

Transcript of Rebooting Neuromorphic Design - A Complexity Engineering ...