Eindhoven University of Technology MASTER Design and ...

Eindhoven University of Technology

MASTER

Design and implementation of training in hardware for efficient artificial intelligence

Shubham, P.

Award date:2021

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentTheses/d1ef49ae-ab8a-4d36-afc7-3a9d557b522e

February 21, 2020

Declaration concerning the TU/e Code of Scientific Conduct I have read the TU/e Code of Scientific Conducti. In carrying out research, design and educational activities, I shall observe the five central values of scientific integrity, namely: trustworthiness, intellectual honesty, openness, independence and societal responsibility, as well as the norms and principles which follow from them. Date …………………………………………………..………….. Name …………………………………………………..………….. ID-number …………………………………………………..………….. Signature …………………………………………………..………….. Submit the signed declaration to the student administration of your department. i See: https://www.tue.nl/en/our-university/about-the-university/organization/integrity/scientific-integrity/ The Netherlands Code of Conduct for Scientific Integrity, endorsed by 6 umbrella organizations, including the VSNU, can be found here also. More information about scientific integrity is published on the websites of TU/e and VSNU

11/08/2020

PRANSHU SHUBHAM

1297155

https://www.tue.nl/universiteit/over-de-universiteit/integriteit/wetenschappelijke-integriteit/

https://www.tue.nl/universiteit/over-de-universiteit/integriteit/wetenschappelijke-integriteit/

Department of Mechanical EngineeringMicrosystems Research Group

Design and implementation of training inhardware for efficient artificial intelligence

Master’s Thesis

2020-2021

Student Name Pranshu ShubhamStudent ID 1297155Research Group MicrosystemsDepartment Mechanical EngineeringMaster’s Program Mechanical EngineeringResearch Group MicrosystemsSupervisor(s) E.R.W. van Doremaele

Y.B. van de Burgt

Neuromorphic Computing TU/e - Mechanical Engineering

Abstract

Machine Learning applications train on datasets with numerous attributes to be able to perform fore-casting and classification operations on unfamiliar data. These applications are built by programmingartificial neural networks (ANN) software which consist of multiple layers of tunable weights thattake different values during the training step. These softwares run on existing computer hardwarearchitecture that is called the von Neumann hardware architecture. The von Neumann architectureutilizes back and forth information travel between memory and CPU and is suitable to compute welldefined computations. Since the computing and memory blocks are separate and require a back andforth communication channel, this architecture is prone to have operational bottleneck. This resultsin longer computation time and higher energy requirements when working with large data sets andneural networks consisting of several layers and inputs. To aid the existing architecture, low powerededge devices that can perform data analysis independent of software have been been proposed. Thesedevices are inspired by the functioning of the brain and are essentially hardware realizations of artificialneural network. Training hardware neural networks is possible by using a state retaining device calledthe memristor which is deemed as the fourth fundamental electrical device. The memristor is utilizedas the physical realization of the weight. Current research shows that hardware neural networks withmemristors are functional but have inherent limitations and working capacity due to the memristorsinherent volatility and limited tunability and the complexity of the circuit. The ENODe, a proposedthree terminal organic tunable memory device, is a possible alternative that overcomes the volatilityand tunability limitations. To realize a hardware neural network with ENODe’s as the weight element,a training circuit is needed that can update the ENODe on hardware. The aim of this thesis is toimplement this training scheme (backpropagation learning) using the ENODe as the memory element.This architecture would then be extended for complex classification problems with larger inputs andlayers for the neural network.

TU/e i


Acknowledgement

This report is a culmination of of my engineering master’s education in the Netherlands where I spentthe last 3 years. I would like to first thank Prof. dr. ir. Jaap den Toonder, the faculty and researchersat the Microsystems Group, for providing me with the opportunity to study and work on variousprojects in the field of microsystems. The subjects taught me about micro-scale physics as well asthe ability to appreciate their macro-level impact for the society, in terms of life science applications,medical device technology, consumer electronics and energy.

To carry out my research in the multi-domain topic of neuromorphic computing, I would begin withby thanking my supervisors, dr. ir. Yoeri van de Burgt and ir. Eveline Doremaele. Yoeri provided mewith an opportunity in the form of this project, to explore the field of neuromorphic computing andtechnology as well as learn from the researchers in the group. I would like thank Eveline Doremaele,my supervisor with whom I was in direct contact through the entire project, who patiently suppor-ted me and clarified my doubts, discussed ideas and motivated me simply with her dedication andperseverance. Through our communication, I tried to imbibe her research mindset and improve uponmy ability to deliver results that are backed by solid understanding. The bi-weekly meetings not onlygave an overall view but also motivated me when I listened to the researchers talk about their work.

I would like to acknowledge and thank my parents, who have supported me through my studiesand times when I needed the strength to persevere. I would also like to thank my friends (ones I madein Eindhoven and ones back home), who in their own ways inspired me and expanded my world view.

Finally and most importantly, I would like to thank all the essential workers, IT engineers, doc-tors, researchers, food producers and concerned citizens who sprung into action to assist each otherthrough this pandemic with all means available to them. Today, if I sit comfortably at my table whileI write this report, with food in my fridge, continuous clean water to drink, means of communicationto stay in touch with my family and the vaccines doses on their way, it is because of them.

TU/e ii


Abbreviations

Abbreviation Full Form

ANN Artificial Neural NetworkCMOS Complementary Metal Oxide SemiconductorENODe Electrochemical Neuromorphic Organic DeviceMNIST Modified National Institute of Standards and Technology databasePEDOT:PSS poly(3,4-ethylenedioxythiophene):polystyrene sulfonatePEI poly(ethylenimine)SPYDER Scientific Python Development EnvironmentIDE Integrated Development EnvironmentReLU Rectified Linear Unit

List of Abbreviations

TU/e iii


List of Figures

1.1 Architecture of the ENODe with circuit showing the gate, source and drain inputs. [1] 2

2.1 LTSpice workspace with a circuit diagram . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Structure of a perceptron. Starting from the left, multiple inputs are multiplied withtheir respective weights (arrows) and summed up (square box labelled ’net’). They arethen sent to an activation function ’f’. The output form the activation function is theoutput of the perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Sigmoid Activation Function [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 ReLU Activation Function [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.4 Threshold Activation Function [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.5 A neural network with a hidden layer. This network will be used to explain the math-ematics behind updating the weights in each layer. . . . . . . . . . . . . . . . . . . . . 7

3.6 Updating the hidden layer weight w1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.7 Updating the output layer weight w5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.8 Forward Propagation circuit using the ENODe . . . . . . . . . . . . . . . . . . . . . . 9

3.9 Summing Op-Amp to translate current values into voltage values . . . . . . . . . . . . 10

3.10 XOR truth table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.11 XOR Logic gate to calculate a high output if the inputs are different based on its truthtable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.12 Error estimation layer to train the device . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.13 Backpropagation circuit for output weights . . . . . . . . . . . . . . . . . . . . . . . . 12

3.14 Backpropagation circuit for hidden weights . . . . . . . . . . . . . . . . . . . . . . . . 12

3.15 Structure of the ENODe. [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.16 Waveforms of (a) potentiating ENODe over a period of 1200s and (b) the gate pulsevoltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.17 Zoomed in waveforms of (a) ENODe conductance state for negative gate pulses shownin (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.18 Zoomed in waveforms of (a) ENODe conductance state for negative gate pulses shownin (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.19 State retention comparison for 3 conductance states after 300s of potentiation. . . . . 14

3.20 ENODe model in LTSpice which is utilized in the modelling the hardware neural net-work circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.21 ENODe conductance state output waveform in the LTSpice model . . . . . . . . . . . 15

4.1 Model of a Single Perceptron Classifier. There are two inputs and a bias input. A singleactivation function and output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Forward Propagation and Summation circuit . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Circuit to estimate error from output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Back Propagation circuit to change ENODe conductance value . . . . . . . . . . . . . 18

4.5 Backpropagation circuit using programmable element . . . . . . . . . . . . . . . . . . . 19

4.6 Backpropagation circuit modified to work without programmable elements . . . . . . . 20

4.7 XOR logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.8 XOR network models. (a) The network with 3 perceptrons are combined to model theXOR gate. (b) The network which is uses a hidden layer to model the XOR gate. . . . 21

4.9 XOR as single network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.10 XOR python script logic flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.1 The weight values (first panel), error between predicted and expected (middle panel)and input signals (bottom panel) for the single layer perceptron that models the ANDgate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Practical Implementation to multiply signals and the signal outputs for the practicalcircuit and programmable circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.3 Backpropagation circuit with behavioural voltage source for product operation . . . . 24

TU/e iv


5.4 Backpropagation circuit with signal inputs modified to multiply using kirchoff’s voltagelaws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.5 XOR as a single network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.6 Error and weight value waveforms for Single network XOR circuit . . . . . . . . . . . 26

5.7 XOR Python script outputs describing: (a) the oscillating weights and (b)the associatederror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.8 Graphs comparing the effect of order inputs on training. (a) shows training withoutrandom inputs and (b) shows training with random inputs . . . . . . . . . . . . . . . . 28

5.9 XOR Python script outputs for training with randomized inputs. (a) shows the weightconvergence and (b) shows the associated error. . . . . . . . . . . . . . . . . . . . . . . 28

5.10 Error waveform and weight convergence for Single network XOR circuit . . . . . . . . 29

5.11 Error waveform and weight convergence for Single network XOR circuit. (NOTE: Thelabeled waveforms are the same as the ones shown in Figure 5.10 with the weight numberrepeated twice for labelling purposes in LTSpice.) . . . . . . . . . . . . . . . . . . . . . 30

5.12 XOR Python script outputs for XOR modelled sing a sigmoid function. (a) shows theweights and (b) shows the epochs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

C.1 Updating the hidden layer weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

C.2 Updating the hidden layer weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

C.3 Updating the output layer weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

C.4 Updating the output layer weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

D.1 Sigmoid Activation Function [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

D.2 ReLU Activation Function [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

D.3 Threshold Activation Function [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

E.1 Structure of the ENODe device used for measurement [3] . . . . . . . . . . . . . . . . 46

E.2 The setup of the ENODe device to measure the data . . . . . . . . . . . . . . . . . . . 46

E.3 ENODe data measurement setup depicting the positions of the Gate, Drain and Sourceprobes at the three corners. A resistance of 1Mohm (not visible) is connected to thegate probe. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

G.1 XOR as a combination of individual perceptrons. The weights in red boxes model theOR gate, weights in yellow boxes model the NAND gate and the weights in green boxesmodel the AND gate. When all three perceptrons get trained, the overall architecturebehaves as an XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

G.2 Error waveform and weight convergence for Combination XOR. Weights 1,2 and bh1the OR gate perceptron, weights 3,4 and bh2 are for the NAND gate perceptron andweights 5,6 and bo are for the AND gate perceptron. . . . . . . . . . . . . . . . . . . 51

List of Tables

3.1 Fundamental neural network equations (left column) and their electrical analogies (rightcolumn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

F.1 LTSpice Electrical Component List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

TU/e v


Contents

List of Figures iv

List of Tables v

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Research Question and Goals 2

2.1 Scope and project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Simulation setup and work flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2.1 Measurement and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Theory 4

3.1 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1.1 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1.2 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1.3 Neural Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.4 Hidden Layer Weight update . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.5 Output Layer Weight update . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Electronic Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.2 Error Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.3 Backward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 ENODe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Circuit Models 16

4.1 Perceptron Circuit Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Forward Propagation Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Error Estimation Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4 Back Propagation Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5 Product circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.6 Modelling the XOR network on hardware . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.7 XOR with hidden layer updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.8 Python Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Results & Evaluation 23

5.1 Single Layer Perceptron Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Comparison of product circuit outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3 Evaluation of XOR circuit performance and outputs . . . . . . . . . . . . . . . . . . . 25

5.3.1 XOR as a single network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.3.2 Insights from Python Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.3.3 XOR as a single network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.4 Evaluation of the capabilities of the proposed circuit . . . . . . . . . . . . . . . . . . . 30

5.4.1 Unity Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.4.2 Binary Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.4.3 Extension to larger arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Conclusions and Future Recommendations 32

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

TU/e


6.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

References 33

Appendix 35

A XOR Neural Network Python script 35

B XOR Neural Network using Sigmoid function 37

C Neural Network Mathematics 40

C.1 Hidden Layer Weight Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

C.2 Hidden Layer Bias Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

C.3 Output Layer Weight Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

C.4 Output Layer Bias Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

D Activation Functions 44

D.1 Sigmoid Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

D.2 Linear Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

D.3 Threshold Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

E ENODe Data Logging 46

F LTSpice Electrical Elements 48

G XOR as a combination of single layer perceptrons 50

TU/e


1 Introduction

In many applications such as speech recognition, pattern recognition [4] and cancer detection [5],[6],artificial intelligence and deep learning algorithms play an important role. While these algorithms aresimilar to those found in the human brain, they are typically implemented as software rather thanhardware.

Artificial neural networks are used in deep learning. These are programmed and run on computerswhose hardware is based on the von Neumann architecture. Von Neumann architecture allows pro-grams to solve well defined problems as well as data analysis and classification based on multiple vari-ables. But in terms of energy efficiency, the von Neumann architecture is prone to performance bottleneck. The architecture has a sequential working mechanism where data is communicated between amemory unit and a computing unit therefore the number of instructions that can be communicatedcause a limitation in performance[7]. This leads to longer computation time and energy requirements.The brain’s architecture, on the other hand, works in massively parallel and tightly interconnectednetwork of neurons and is able to perform complex functions with less energy.

Taking inspiration from the brain, devices can be designed to behave similar to the brain and findapplication in edge sensor devices (devices that can perform data analysis at point of data collection).They can aid the von Neumann system by distributing the analysis operations at the sensor and othercalculations with the von Neumann systems, thereby making it more efficient. Future applicationwould involve creating integrated circuit chips with large number of interconnected hardware neuronsto perform efficient learning and prediction operations.

Implementation of artificial neural network as a hardware is currently realized using the crossbararchitecture and the use of state retaining devices called memristors. Pattern recognition and ma-chine learning are being built using Complementary Metal Oxide Semiconductor (CMOS) and IonGate Floating devices based neural architectures and memristors, which are inspired by the brain’sperformance.

To design and implement such hardware networks, Krestinskaya et al [8] explored different archi-tectures for on chip implementation of deep neural networks, binary neural networks and multilayerneural networks. They simulated their proposed models based on multiple memory devices to classifythe XOR, MNIST and Yale Face Recognition datasets. Prezioso et al [9] experimentally worked withimplementing single perceptron classifier with metal oxide memristors to create transistor free arrays.They were able to classify 3x3 pixel images into 3 groups.

The authors, however, noted that the circuit becomes complicated to implement sign changes toupdate the weights in different layers. Also, modular architecture to overcome leakage current addsfurther to the complexity and power consumption. These added certain limitations to hardware real-ization as it posed a limit to scalability. The memristors modelled by the authors only took two states(switch on and off) thereby limiting the performance of the proposed model. Memristors themselvespose a problem in hardware implementation due to non-linearities and hysteresis. Lim et al [10] notedthe issue of non-linearities and proposed a learning rule to update the memristors considering theirnon-ideal properties. Ueda et al [11] proposed a learning rule to include the hysteresis characteristicsof memristors when they would be applied.

Agrawal et al [12] evaluated the operating parameters to understand the effect of noise and non-linearities in write and read operations as well as the physical effects of resistance values on theoutput of a resistive memory crossbar array. They also determined parameters required to implementback propagation on a chip which had been benchmarked for implementation in commercial chips.An array based on a 2 layer neural network was simulated in python to analyze 3 different data sets:8x8pixel images of handwritten digits, MNIST dataset (28x28pixel) and Sandia file classification data-

TU/e 1


set. Through the studies the authors were able to a set of limits for the read and write noises withinwhich the crossbar array would not have its output degraded beyond 1%. They suggested furtherresearch in resistive memory development to optimize energy consumption of such devices to meet allthe requirements together with minimum tradeoffs.

Fuller et al [13] studied the strengths and limitations of existing memristor based arrays. To overcomethe efficiency and computational requirements, a large array (1000×1000 synapses) would be requiredto perform in memory learning. Existing hardware were studied by Agarwal et al [12] for their oper-ational non-linearities and instabilities that hindered accurate learning owing to voltage and currentdependence. Hence, Fuller et al [13] introduce a stable low voltage electrochemical memory devicebased array and demonstrated learning on XOR dataset using a 3 by 3 layer network. The authorsimplemented a parallel weight update design and were able to demonstrate improved weight updateand 100% classification accuracy for the XOR dataset. Parallel weight update was also demonstratedby Prezioso et al [9] on CMOS based arrays.

Given the successful implementation, it was noted that the volatility, design complexity and highsupply voltages for CMOS architectures, and the energy-costly switching of memristors complicatethe path to achieve the interconnectivity, information density, and energy efficiency of the brain usingeither approach.

As observed by the authors, a limitation to hardware realization of neural network is the limita-tion of memristor devices, which are, tunability of volatility. To create a device that is tunable andcan retain its memory (non-volatile). The Electronchemical Neuromorphic Organice Device (EN-ODe), with non-volatile characteristics, was first demonstrated by van de Burgt and colleagues in2017. The ENODe (Figure 3.15) is composed of a layer of electrolyte sandwiched between two layersof a polymer, one, consisting of PEDOT:PSS and the other of PEDOT:PSS partially reduced withPEI (poly(ethylenimine)) as illustrated. When applying a positive voltage to the gate electrode, theconductivity is increased and when applying a negatively biased voltage to the gate, the conductivityis reduced. van de Burgt et al[1],[14] also demonstrated with a simulation, based on experimentallymeasured properties of ENODe, a neural network that could classify a data set of hand-written digits.

The ENODe offers the following advantages compared with the devices mentioned in the literaturereview:

1. Their bio-compatibility to implement them in biological systems. [15],

2. They have low operating power, which makes them energy efficient and suitable for standalonebattery-powered devices.[1]

Figure 1.1: Architecture of the ENODe with circuit showing the gate, source and drain inputs. [1]

In this graduation project, the objective is to create a learning circuit that allows a hardware neuralnetwork to train itself on hardware. The ENODe device will be utilized to function as the hardware

TU/e 2


realization of neural network weights. These networks will be able to perform on chip classificationoperation independent of any software (external computer).

1.1 Motivation

Hardware implementation of an ANN would allow the system to train itself independent of any kindof software. Such devices can have the potential to be used the following:

1. Active re-trainable learning devices to classify diseases based on bio-markers. (non-invasivebiosensor applications)

2. Hardware based training can be extended to an array-architecture, that can allow energy efficientcomputation to be carried out and overcome short coming of the von-Neumann Bottleneck.

1.2 Objectives

1. Study and develop 1-layer hardware neural network (in-situ hardware training) to classify ana-log/digital signals.

2. Extend network to implement a 2-layer application of mimicing the XOR gate.

3. Further extend/propose a design to implement array for larger systems consisting of multipleinputs and layers.

1.3 Outline of Thesis

In Chapter 2, the scope of the project and experimental tools are explained. Next, Chapter 3 dwellsinto the theory of neural networks, electronics and the ENODe device. This is followed by Chapter 4,which provides a description of the circuit models developed to create a hardware neural network. InChapter 5, an evaluation of the developed models is presented. Finally, Chapter 6 concludes with asummary of the entire thesis and provides recommendations for future work.

TU/e 1


2 Research Question and Goals

2.1 Scope and project goals

In this project, a hardware model of an ANN will be modelled and worked upon to create a hardwarelearning architecture for an 2× 2 problem that will be extended to N×N problem. The initial modelof a single layer perceptron classifier built by my thesis supervisor is taken as a starting point to studyand extend to accomplish the project goals which are as follows:

1. To study single perceptron neural network, understand its working and hardware realization.

A literature review on existing implementation of ANN to hardware would be done. This wouldbe followed by studying the single perceptron classifier built by my supervisor to analyze howthe architecture can be extended to multiple layers.

2. Model this network and implement backpropagation training circuit.

The feedforward and backpropagation circuit design would be particularly focused on. Ini-tially the hardware feedforward circuit can be trained and tested by modifying the ENODevalues based on the trained software ANN’s weight values. In addition to this, the aim is toimplement a backpropagation circuit that can update the ENODe weight values without theneed of training a software ANN to obtain the trained weights.

3. Develop a 2 layer hardware model to model 2x2 array input data.

Following the implementation of a backpropagation circuit for a single perceptron device, thesame would be done for a 2x2 XOR device. The difference here is that the data is now a 2 di-mensional array instead of one dimension. A python script would be implemented which wouldbe based on the algorithm implemented on the hardware. The insights from the python modelwill allow modifications on the hardware model to improve its ability to converge and predict.

2.2 Simulation setup and work flow

To model the circuit for implementing training algorithm on hardware, LTSpice software is utilized(Figure 2.1). LTSpice has a library of general electronic components (F) as programs that can besimulated. The training algorithm will also be implemented using Python in SPYDER IDE to studyits performance and validate its working.

TU/e 2


Figure 2.1: LTSpice workspace with a circuit diagram

2.2.1 Measurement and Evaluation

1. LTSpice simulations log the input and output wave-forms, current and voltage data for thecomponents at individual circuit nodes.

2. The python script (A) is developed to implement the training algorithm used for the circuit forevaluation and insights to improve upon the circuit.

TU/e 3


3 Theory

In this section, the theory of neural networks, its components and the training algorithm will bedescribed. Following that, the use of electronic circuits and electrical components to apply the math-ematics using physics will be discussed followed by an overview of the ENODe device that is beingmade use of in the circuit.

3.1 Neural Network

A neural network is a computation algorithm that has the capability to detect patterns and predictoutcomes or new insights based on the data being sent to it. It comprises of an input layer where datais sent in, a hidden layer for processing the data and an output layer to give the desired output fromthe data. The layers comprise of nodes that are a unit of neural network. These nodes are termedas neurons. The nodes layers are connected to each other at these nodes with weights. Weights arevalues that can change themselves based on the type of training involved to detect patterns or performclassification of the data.

There are three types of training methods for neural network [16].

1. Supervised Learning: In supervised learning, the neural network tries to find the value of weightsto classify the known inputs to give known outputs. This type of learning is useful for classific-ation problems..

2. Unsupervised Learning: In unsupervised learning, the neural network only has access to theknown input values. Such a learning scheme is useful for exploratory analysis.

3. Semi-supervised Learning: Semi-supervised learning lies between supervised and unsupervisedlearning.Here, the neural network has access to only some known outputs for the known inputs.This method is useful for problem sets which have large amount of data and hand labelling theoutputs for all the inputs is impractical. It overcomes the burden of data quality and allowscreation of models that are able to perform prediction and classification without the need ofhigh quality data.

In this project, supervised learning is used as the problem being implemented to realize the prototypeis an XOR gate, which has a small set of accessible labelled data.

3.1.1 Perceptron

A perceptron is an algorithm for binary classification. It can either allow or not allow the output fora given input based on it’s activating nature. Perceptron can classify data which are linearly separable.

A perceptron consists of multiple inputs which are multiplied to weights and summed up (Figure 3.1).The sum is then sent as an input to an activation function, σ, which gives an output (Equation 3.1).Based on the given output, the weights can be trained and the perceptron linearly classifies the inputvalues.[17]

TU/e 4


Figure 3.1: Structure of a perceptron. Starting from the left, multiple inputs are multiplied with theirrespective weights (arrows) and summed up (square box labelled ’net’). They are then sent to anactivation function ’f’. The output form the activation function is the output of the perceptron.

Output = σ ∗ (n∑

n=1

(inputi ∗ weighti)) (3.1)

3.1.2 Activation Function

The activation function takes in the summed value of the products of inputs and weights and basedon it’s type gives out an output that is linearly or non-linearly related with the input. Given beloware some examples of activation functions[2]. The mathematical derivation of the derivatives of theactivation functions can be found in the Appendix D.3

3.1.2.1 Sigmoid Activation Function

Sigmoid activation is the logistic function which is defined in Equation 3.2. (Figure 3.2) shows thesigmoid function curve.

f(x) =1

1 + e−x(3.2)

Figure 3.2: Sigmoid Activation Function [2]

The derivative of the logistic curve is a value between [0, 1] and is given in Equation 3.3

f ′(x) = σ(x)(1− (σ(x))) (3.3)

TU/e 5


3.1.2.2 Linear Activation Function

Linear activation function is a Linear curve, whose value can be in the range of [0, x]. A form of thelinear activation function is the ReLU (Rectified Linear Unit) function which is defined in Equation 3.4.Figure 3.3 shows the curve of the ReLU activation function.

f(x) =

x, if x ≥ 0

0, ≤ 0(3.4)

Figure 3.3: ReLU Activation Function [2]

The derivative of the ReLU curve is a Binary function that takes 0, 1 and is given in Equation 3.5

f ′(x) =

1, if x ≥ 0

0, ≤ 0(3.5)

3.1.2.3 Threshold Activation Function

Threshold activation function is a step function, whose value can be either 0, 1. It is defined as inEquation 3.6 with it’s curve depicted in Figure 3.4.

H(x) =

1, if x ≥ 0

0, ≤ 0(3.6)

Figure 3.4: Threshold Activation Function [2]

The derivative of a threshold function is 0 everywhere except at x = 0, where it is not differentiable(Equation 3.7).

H ′(x) =

0, if x < 0, x > 0

not− defined, x = 0(3.7)

TU/e 6


In the development of the circuit, the threshold function is modelled using a schmitt trigger. Sincethe derivative of the threshold function is either 0 or ”not-defined”, for practical application, it is keptas 1 so as to produce non-zero values in the backpropagation update calculation where a product isrequired. This will be described in the following section 3.1.3.

3.1.3 Neural Network Training

The process of changing the weights in the perceptron algorithm based on the desired output canbe called as training the network. In this project, the backpropagation algorithm which is based ongradient decent optimization algorithm is used.

In the gradient decent algorithm, the error between the network output and the expected outputis calculated. Then the weights are adjusted based on this error. The aim is to reduce the error tozero. This method is termed such as the gradient of error with respect to the weights is made use ofto adjust the value of weights in order to reduce the error.

To study the mathematics of backpropagation training, a neural network architecture as shown inFigure 3.5 is considered. There is one hidden layer present. The bias input values are set at 1. Theactivation functions for each perceptron is considered to be a sigmoid activation function Equation 3.2and it’s derivative Equation 3.3 will be used to calculate the update for the hidden and output weights.

Figure 3.5: A neural network with a hidden layer. This network will be used to explain the mathematicsbehind updating the weights in each layer.

In the following sections, the mathematics [18] behind updating the hidden and output weights willbe described in brief. Detailed explanation of the derivation can be found in Appendix C.4.

3.1.4 Hidden Layer Weight update

The mathematical equations for hidden weight update are as follows:

To change the value of the hidden weight, here w1, we need to understand the amount by whichthe error changes due to a change in that weight. The error, also called the loss function (3.8), iscalculated at the end of the forward propagation for the predicted output. Consider the networkshown in Figure 3.6.

TU/e 7


Figure 3.6: Updating the hidden layer weight w1

E =1

2∗ (Target− Predicted)2 (3.8)

The hidden weight w1 is updated using the Equation 3.9. The other weights including the biasesweights can similarly be updated.

w1 = w1− η ∂E∂w1

(3.9)

∂E

∂w1= −( Target − Predicted ) ∗ out 0 (1− out 0) ∗ w5 ∗ out h1 (1− out h1) ∗ x1 (3.10)

In the circuit, a threshold function using a schmitt trigger is implemented. Since the derivative of thisfunction with a constant output would be zero, it would cause the neural network to decay with onlyzero outputs. Hence, the derivative is considered to be 1.

Therefore, the final output to be implemented would be a modified form of Equation 3.10, wherethe derivatives of the activation functions are replaced by 1.

∂E

∂w1= −( Target − Predicted ) ∗ w5 ∗ x1 (3.11)

3.1.5 Output Layer Weight update

For the output layer weight update Figure 3.7, again, the loss function (Equation 3.8) is calculatedfollowed by the partial derivative of the loss function with respect to the output layer weight, here w5(Equation 3.12).

Figure 3.7: Updating the output layer weight w5

TU/e 8


∂E

∂w5=

∂E

∂outo∗ ∂outo∂neto

∗ ∂neto∂w5

(3.12)

w5 = w5− η ∂E∂w5

(3.13)

∂E

∂w5= −( Target − Predicted ) ∗ out 0 (1− out 0) ∗ outh1 (3.14)

Modifying Equation 3.14 by replacing the activation function derivative with 1, the equation for theerror derivative is as follows:

∂E

∂w5= −( Target − Predicted ) ∗ outh1 (3.15)

3.2 Electronic Realization

To realize the mathematics of forward and back propagation in a neural network, the Kirchhoff’svoltage, current laws [19], standard available electronic components in LTSpice library (Appendix F)will be made use of. To model the weights, the electronic model (Figure 3.20 of the ENODe is used.Table 3.1 shows the mathematics of neural network and their electrical analogies.

Forward Propagation Electronics∑nn=1wixi

∑nn=1 Ii =

∑nn=1 ViGi

Expected-Predicted XOR(Vexpected, Vpredicted)

Table 3.1: Fundamental neural network equations (left column) and their electrical analogies (rightcolumn)

3.2.1 Forward Propagation

For the forward propagation, the circuit shown in Figure 3.8 is created in LTSpice.

Figure 3.8: Forward Propagation circuit using the ENODe

TU/e 9


The voltage inputs are sent to a summing op-amp. The ENODe act as a resistance, however it’sreciprocal value, or the conductance (G), is utilized to calculate the product of the input and weights(Equation 3.16).

Inet =n∑

n=1

(In) =n∑

n=1

(Vi ∗Gi) (3.16)

The summing op-Amp (Figure 3.9) is utilized to transform the output from current to a voltagesince practically, voltage measurements are convenient to measure. The output voltage is one to onetranslation of the current value from Equation 3.17.

Vout = −Rf ∗n∑

n=1

(VnRn

) = −Rf ∗n∑

n=1

(Vn ∗Gn) = −Rf ∗n∑

n=1

(In) (3.17)

Figure 3.9: Summing Op-Amp to translate current values into voltage values

This signal output from Equation 3.17 is then sent to a schmitt trigger that acts as a threshold func-tion for the forward propagation.

The schmitt trigger is a threshold device that gives an output signal if the input signal is abovethe switching voltage (Vt)to activate the schmitt trigger (Equation 3.18).

PredictedOutput =

5V, if Vt =≥ Predefined Value

0V, ≤ Predefined Value(3.18)

At the input, the ENODe is coupled with a bias resistor whose input voltage is the negative of theinput sent to the ENODe. The role of this bias resistor is to generate a negative value of conductanceusing the current law arithmetic. This is because conductance cannot have a negative value, however,the weights of a neural network can. Therefore to model negative weight values, the bias resistor isutilized. Consider the input branch with ’V1’ input in Figure 3.8. The current out of that branch canbe written as shown in Equation 3.19 using Kirchoff’s current law and the current voltage relation.

Ii = I1 + I2 = V1 ×GENODe − V1 ×GRb = V1 × (GENODe −GRb) (3.19)

The output can then be sent to be summed up using Equation 3.17, translated into voltage and thento the schmitt trigger.

3.2.2 Error Estimation

Error in the circuit is calculated using an XOR gate. From the Predicted Output value, the signal issent to an XOR gate, where the other signal is the expected value. Based on the truth table for an

TU/e 10


XOR gate (Figure 3.10), an update signal gets generated for the cases when the the inputs are notthe same.

Figure 3.10: XOR truth table

Figure 3.11: XOR Logic gate to calculate a high output if the inputs are different based on its truthtable

The error calculated from the XOR gate is then latched with a update signal to communicate howoften the update pulse has to be sent to update the ENODe. This update signal does so by operatinga switch in the backpropagation circuit as shown in Figure 3.13 and Figure 3.14.

Figure 3.12: Error estimation layer to train the device

3.2.3 Backward Propagation

For backward propagation, two types of circuits are required based on the layer of weights. As perEquation 3.13 for output layer weight update and Equation 3.9 for input layer update, the circuits

TU/e 11


shown in Figure 3.8 are created in LTSpice. To allow both positive and negative update, the 2.5V issubtracted from the update signal to scale the signal from -2.5V to 2.5V.

Figure 3.13: Backpropagation circuit for output weights

Figure 3.14: Backpropagation circuit for hidden weights

The switch only allows the gate to send an update voltage to the ENODe based on the switch voltage.When the update is 0V, the switch is open and when it is 5V, it is closed, which allows the ENODeto be updated.

3.3 ENODe

The ENODe (Figure 3.15) is composed of a layer of electrolyte sandwiched between two layers of apolymer, one, consisting of PEDOT:PSS and the other of PEDOT:PSS partially reduced with PEI(poly(ethylenimine)) as illustrated in Figure 3.15. When applying a positive voltage to the gateelectrode, the conductivity is increased and when applying a negatively biased voltage to the gate,the conductivity is reduced. van de Burgt et al[1],[14] also demonstrated with a simulation, basedon experimentally measured properties of ENODe, a neural network that could classify a data set ofhand-written digits.

Figure 3.15: Structure of the ENODe. [1]

TU/e 12


(a) ENODe conductance states (b) Gate voltage pulses

Figure 3.16: Waveforms of (a) potentiating ENODe over a period of 1200s and (b) the gate pulsevoltage

(a) ENODe conductance states for -2V pulses (b) Gate pulses of -2V

Figure 3.17: Zoomed in waveforms of (a) ENODe conductance state for negative gate pulses shown in(b)

(a) ENODe conductance states for 2V pulses (b) Gate pulses of 2V

Figure 3.18: Zoomed in waveforms of (a) ENODe conductance state for negative gate pulses shown in(b)

The ENODe retains a charge state after application of a voltage pulse at the gate. This is the conduct-ance state, which is represented by ∆G. The value of ∆G is dependent on the length and voltage valueof the pulse. The ON/OFF ratio of the ENODe, which is the range of the minimum and maximumconductance within the linear region of the ENODe. Both properties of ∆G and the ON/OFF ratioare shown in Figure 3.17a. Figure 3.16a shows the potentiation of the ENODe which is a shark-finpattern, where the maximum and minimum of the pattern indicate the ON/OFF ratio.

TU/e 13


Figure 3.19 depicts a graph of a conductance states and time. The plots in orange, green and red areplots where the gate is disconnected during a pulse retention state to depict the retention stability ofthe ENODe. The gate voltage were of square pulse pattern and had a value of 2V and -2V. The drainvoltage set was -0.02V. It can be noted that the graph does not start at the same starting point orhas the exact same shape due to operational non-linearities of the fabricated device used to log thedata. The procedure to log the data can be found in Appendix E.

Figure 3.19: State retention comparison for 3 conductance states after 300s of potentiation.

The ENODe device used for logging the data above is also modelled in LTSpice as shown in Figure 3.20.The output waveform from this circuit for the same gate voltage, drain voltage and pulse pattern isshown in Figure 3.21.

Figure 3.20: ENODe model in LTSpice which is utilized in the modelling the hardware neural networkcircuit

TU/e 14


Figure 3.21: ENODe conductance state output waveform in the LTSpice model

TU/e 15


4 Circuit Models

To implement the backpropagation algorithm, first a perceptron is modelled for a single layer classi-fication problem. This model is modified to include practical components to realize the mathematicsof backpropagation.

LTSpice software is utilized to design the circuit.LTSpice is a simulation tool that allows users to cre-ate circuits using fundamental electronic components and propriety models built by Analog Devices,Inc (formerly Linear Technologies ).

4.1 Perceptron Circuit Model

To begin with implementing a backpropagation algorithm, a classification problem is defined thatcan be linearly solved. For instance, the AND gate is an example of a problem that has a linearlyclassifiable solution. A single perceptron can perform the classification of two binary inputs by trainingitself to behave as an AND gate. Figure 4.1 shows the model of a perceptron based on which thecircuit will be modelled and realized.

Figure 4.1: Model of a Single Perceptron Classifier. There are two inputs and a bias input. A singleactivation function and output.

To realize the forward propagation, error estimation and backpropagation training of the perceptron, acircuit was designed. The circuit was created by my thesis supervisor and previous student to predictCystic Fibrosis from a sweat sensor [20]. Here the same circuit is modified and trains itself to act asa simple AND gate classifier.

The circuit is divided into 3 parts, each performing the mathematics of froward propagation, errorestimation and backpropagation for weight update.

4.2 Forward Propagation Circuit

Figure 4.2 shows the input layer where the binary signals and bias signals are sent in. The signals aresummed up using a summing an inverting summing op-amp and then sent through a schmitt trigger.The schmitt trigger acts as a threshold that sends out only binary outputs. The outputs from theschmitt trigger are the predicted output from the circuit.

TU/e 16


Figure 4.2: Forward Propagation and Summation circuit

4.3 Error Estimation Circuit

In the error estimation circuit shown in Figure 4.3, the output from the forward propagation iscompared with the expected output to generate an error. The expected output, labelled ”Y verwacht”,obtained from the AND gate (A9) which the circuit is training itself to be. In case of a different logicgate or combination, this AND gate (A9) is replaced. The output is then sent to a a latch element. Apush button acts as the timing element whose value combined with the error cause the backpropagationcircuit to send an update signal to the ENODe.

TU/e 17


Figure 4.3: Circuit to estimate error from output

4.4 Back Propagation Circuit

Backpropagation layer, where the error and the inputs are multiplied based on the gradient decentalgorithm to generate a signal to change the conductance value of the ENODe.

Figure 4.4: Back Propagation circuit to change ENODe conductance value

TU/e 18


4.5 Product circuit

In the backpropagation segment of the circuit, a behavioral voltage source is utilized to generate theproduct of the error signal and the input signal to the network. This programmable element is mod-elled using practical elements in the same way as the forward propagation is modelled.

The signals are sent in as an input and as a resistance value to the ENODe. Using the currentvoltage relation, the product between voltage and conductance can be calculated. This product is acurrent value, that is translated to voltage using an opamp, as described in subsubsection 3.2.1. Fig-ure 5.4 shows the backpropagation circuit with the programmable elements B4, B5 and B6. Figure 4.6shows the modified circuit.

Figure 4.5: Backpropagation circuit using programmable element

TU/e 19


Figure 4.6: Backpropagation circuit modified to work without programmable elements

4.6 Modelling the XOR network on hardware

The aim of creating the single layer perceptron is to extend it to a two layer classification problemand further ahead towards generalizing the architecture for larger arrays. To begin with a two layerproblem, modelling the XOR gate (Figure 4.7a) is a particular case in the field of machine learningsince it cannot be solved with a single layer perceptron. This is because the XOR classification requiresa non linear classification operation that a single perceptron is not able to perform[21]. This is shownin 4.7b.

(a) XOR truth table (b) XOR non linear classification

Figure 4.7: XOR logic

There are two ways in which the XOR circuit can be modelled:

1. An XOR circuit as a combination of NAND, AND and OR gates. The details of this circuit canbe found in the Appendix G as it is a test case application. Figure 4.8a shows the network ofan XOR ANN as a combination of 3 individual perceptrons.

TU/e 20


2. An XOR circuit with hidden weight training: In this model, the entire network is trained as awhole entity. The backpropagation circuit involves extra variables based on the gradient decentmathematical equations.(Figure 4.8b)

(a) XOR as a combination of independentlytrained networks

(b) XOR as a single network

Figure 4.8: XOR network models. (a) The network with 3 perceptrons are combined to model theXOR gate. (b) The network which is uses a hidden layer to model the XOR gate.

4.7 XOR with hidden layer updates

In this section the XOR problem is modelled entirely as a single network with one point of error totrain both the output layers as well as the hidden weights (Figure 4.9).

Figure 4.9: XOR as single network

4.8 Python Model

The circuits described above implement binary neural networks to classify linear and non linear (XOR)problems. But to analyze the algorithm, the circuit can be complex. Validating the algorithm bysimulating the same problem using a Python script would allow one to find areas of improvementswithin the algorithm and thereby the circuit. Hence, a python script (Appendix A) is described thatimplements the XOR neural network described in 4.7. (Figure 4.10) shows the flowchart to implementthe script.

TU/e 21


Figure 4.10: XOR python script logic flow

TU/e 22


5 Results & Evaluation

To evaluate the circuits and their performance, the signal outputs for the predicted output and con-vergence of the weights are studied. The model implemented using Python is also studied to deriveinsights that improve convergence in the circuit as well as verify the algorithms performance in caseof binary classification.

5.1 Single Layer Perceptron Output

Figure 5.1 shows the outputs obtained from the single layer perceptron that models the AND gate.The circuit was originally created to train to classify sweat sensor data. In this circuit the inputsignals a square wave pulses sent in a repeating pattern.

Figure 5.1: The weight values (first panel), error between predicted and expected (middle panel) andinput signals (bottom panel) for the single layer perceptron that models the AND gate.

5.2 Comparison of product circuit outputs

The backpropagation circuit shown in Figure 5.2 makes use of the behavioural voltage elements B4, B5and B6. These are programmable elements to generate a certain function. To implement the circuitin practise, these elements are required to be replaced by a product circuit as shown in Figure 5.4.This circuit is modelled in the same way as the forward propagation circuit. Instead of the ENODe’sconductance being used to store weight value, it takes the value of the input signal.

TU/e 23


Figure 5.2: Practical Implementation to multiply signals and the signal outputs for the practicalcircuit and programmable circuits.

Figure 5.3: Backpropagation circuit with behavioural voltage source for product operation

TU/e 24


Figure 5.4: Backpropagation circuit with signal inputs modified to multiply using kirchoff’s voltagelaws

5.3 Evaluation of XOR circuit performance and outputs

Extending the single layer perceptron to two layers, the XOR gate is modelled. The XOR model isperformed in two ways, one as a combination of single layer perceptrons that simultaneously updatewith their individual errors and the other where hidden weights that are updated using the error fromthe final predicted output value.

5.3.1 XOR as a single network

The XOR gate model as a single network is the next and the required first step towards generalmultiple layer models. Implementing the backpropagation circuit for hidden layers using the weightvalues of the output layers, the circuit can train and perform as an XOR gate. Figure 5.6 shows thepanels for the error and weight update signals from initialization to convergence.

Figure 5.5: XOR as a single network

TU/e 25


The above model suffers from slow convergence on hardware. A possible cause of this is that the inputsignals are sent in a repeating pattern and not in a random order. Analyzing the python script willprovided us with insights to improve the same.

Figure 5.6: Error and weight value waveforms for Single network XOR circuit

5.3.2 Insights from Python Model

The python model to build the backpropagation algorithm for XOR classification provided insights toimprove the convergence ability of the model. Given the lack of a varying derivative for the activationfunction, very little information is provided to the weights as to by how much they should change tobring the error closer to zero. This lack of information can cause two problems:

1. If the value of the weights is close to the solution but not exactly the same, sending in updateswith a constant derivative (which implies a constant value of of ∆w would cause it to oscillatearound the solution and never converge.

2. If the weights diverge and encounter a situation of zero update value, they would not change atall and hence the solution would not converge.

Figure 5.7 shows a case where the problem described above is illustrated.

TU/e 26


(a) Weights vs epochs. The weights are oscil-lating about a mean value and cannot reachthat exact value.

(b) Error vs epochs for the script with theoscillation problem.

Figure 5.7: XOR Python script outputs describing: (a) the oscillating weights and (b)the associatederror

To eliminate the above issues the, following modifications are introduced to allow the algorithm toovercome the limitations of a constant derivative and ∆w value:

1. Sending the input cases and respective solutions in a random manner alleviates the issue ofweight update getting stuck around the optimal value due to chosen learning rate. Figure 5.8When the inputs were sent in a specific order, the weights did not have an unknown input casethat is out of the order. Since the ∆w value remained the same, the weights would be stuckbetween two ends whose mean is the solution. Sending the inputs at random cases cause theweight value to change in random directions and never oscillate about a mean value. Hence theypass the point where it would oscillate about the solution and converge.

2. In the circuit, the inputs sent are either 0V or 5 V and the update is scaled between -2.5V and2.5V. The use 0’s and 1’s in the python script without such scaling can cause a phenomenonof the network dying. Defining the input cases and corresponding solutions as 0’s and 1’s aswell as the threshold output to be between 0 or 1 will cause the output from the first layer tobe a matrix of 0s and 1s. This depends on the initial random values of the weights. If all theweights are defined to be negative, then the hidden layer output will be 0. These would bethe inputs to the second layer whose output would also be 0. Output layer weights depend onhidden layer output and hidden layer weights depend on the output layer weight values. Sincethe hidden layer updates are zero, the output layer weights become zero in the update step. Andsince these become zero, the hidden layer weights become zero in the following epoch. Over theentire training period, lack of training/update piles up causing the network to only have zeroinputs. Canging the inputs and threshold functions to be -1 and 1 eliminates all the 0 from thesystem thereby allowing only non zero number to exist in the network. Hence decay to zero iseliminated.

3. Another method to improve convergence is to optimize the learning rate. This, however, is anoptimisation hack used in software ANNs. The learning rate is multiplied with a factor of 0.9.Hence in each epoch, the learning rate is reduced to 0.999 times it’s original value and therebythe update as well. Therefore, when the value of the weights are close to the solution, theupdate is also decreased. This results in smaller changes in the weights which allows it to reachconvergence eventually. This is the opposite of the result obtained with a constant learning rateand ∆w as mentioned in the first point.

TU/e 27


(a) Training without randomized inputs (b) Training with random inputs

Figure 5.8: Graphs comparing the effect of order inputs on training. (a) shows training withoutrandom inputs and (b) shows training with random inputs

Implementation of the above changes allows the python script to achieve convergence easily for anycombination of initial random weights. Similar changes can be applied to the LTSpice scripts to allowfaster convergence. One change is to randomize the input signals.

(a) Weights vs epochs with randomized in-puts

(b) Error vs epochs for randomized inputmodel

Figure 5.9: XOR Python script outputs for training with randomized inputs. (a) shows the weightconvergence and (b) shows the associated error.

5.3.3 XOR as a single network

The XOR gate model as a single network is modified using the inputs from the python script. Themain modification done was to send randomised inputs instead of steady ordered inputs. It was ob-served that for any random initial condition of the ENODe, the circuit was able to achieve convergencewithin the first 100 seconds unlike the ordered input case where it took the circuit more than 100seconds to train.

Looking into the output waveforms outputs for the conductance values of the ENODe’s (withoutthe bias resistor, as described in subsubsection 3.2.1) in Figure 5.11, it can be noted that the weightstake at most 20 distinct states (waveform V(weight11)) measured in increments of 0.1. For the devicementioned in Figure 3.16a, which has 100 distinct states, the circuit model makes use of 20% of theoverall rand of the conductance states available. But, it cannot be said with assurance, whether thesestates lie in the linear regime or not without knowing the entire range of conductance states availablein the given device.

TU/e 28


Figure 5.10: Error waveform and weight convergence for Single network XOR circuit

TU/e 29


Figure 5.11: Error waveform and weight convergence for Single network XOR circuit. (NOTE: Thelabeled waveforms are the same as the ones shown in Figure 5.10 with the weight number repeatedtwice for labelling purposes in LTSpice.)

5.4 Evaluation of the capabilities of the proposed circuit

The circuits designed and implemented have certain capabilities and limitations. Even though theXOR circuit is able to train itself and behave as an XOR gate, the circuit has the following limitationsat the moment which keep it from being fully robust:

5.4.1 Unity Gradient

The schmitt trigger which is implemented to behave as a step activation function has a gradient whichis either 0 or ’not-defined’. For the implementation, this implies that the value of change the weightshould take is fixed and not variable based on how far it is from the the correct value. This is alimitation as well as an advantage. In software applications, continuous functions such as the sigmoidfunction are utilized. The functions have gradients which can take other values apart from 1 basedon how much the weight must change, to reduce the error. For multiple class classification, the sig-moid function are preferably used since in such problems, the approximate output generated by thefunction helps in deciding how probable the output is to belong to a given class. In case of a binaryclassification network, the gradients of these functions do not take a whole number and therefore theprediction is a close approximation since the weights never truly take a constant final value as shownin Figure 5.9a. The problem keeps optimizing itself. Therefore this causes a limitation in gettingabsolutely certain classification performance. Here, the threshold function does a better job since itcan only take absolute states.(Chapter 2 Summary [2])

TU/e 30


Also, in terms of performance, the python script [Appendix B] was utilized to train an XOR gatewith the sigmoid activation function. In that case (Figure 5.12a), the number of epochs required toconverge to the solution (1000 epochs) was significantly higher than when a threshold function wasutilized (50 epochs) as shown in Figure 5.9a.

(a) Weights vs epochs (b) Error vs epochs

Figure 5.12: XOR Python script outputs for XOR modelled sing a sigmoid function. (a) shows theweights and (b) shows the epochs.

5.4.2 Binary Output

The circuit can make use of binary inputs or binary converted analog inputs for binary classification.The output it gives are also in binary, ie, either a yes or no form of a classification. Therefore,the circuits ability to perform multi class classification is limited. Therefore, to go beyond binaryclassification continuous activation function (example: sigmoid function) circuits would be requiredso that the the circuit can behave like an approximate classifier.

5.4.3 Extension to larger arrays

The circuit utilized can be separated into modular blocks of input, error estimation and backpropaga-tion networks. [8] in their research, made use of a modular approach to model the circuits andcommented on the advantage of the modular approach to extend the circuit for complex classificationapplications. Still, given the use of a threshold function, presently the extension of the circuit to havemultiple inputs and layers is only useful to perform a binary classification. And in this case, a singlehidden layer would be enough as all the logic gates except the XOR. To have an XOR type classifier,at least one hidden layer would be required in the network.

TU/e 31


6 Conclusions and Future Recommendations

6.1 Conclusions

In this report the method of realizing a hardware training circuit for a neural network discussed withthe relevant theory.

Before starting the project, the understanding of training a neural network as well as the theoryof electronic components to model a physical neural network is required. For this, a literature reviewwas done to understand the current work on the problem. The literature review contained the theor-etical foundation to progress ahead with the project.

The XOR problem, which is an interesting case to study multi-layer perceptron implementation isutilized to test the circuit. This was followed by comparing the circuit mechanism with a softwaremodel to validate the working and find insights from the efficient learning that the software modelprovides. These insights are then implemented on the circuit.

6.2 Recommendations

The major goal is to create a training circuit that can run gradient decent algorithm to train theENODe and which can be extended to arrays comprising of multiple inputs and hidden layers. Theoutcomes of this project are limited to a single hidden layer circuit that performs binary classificationby training as an XOR gate.

It was noted that the XOR circuit makes use of ideal circuit elements as well as is limited to binaryclassification capabilities. Considering these limitations and the insights obtained from the evaluationof the circuits, the following recommendations are applicable to carry out further improvements andresearch in this project:

1. The circuit requires to be modelled with elements that have practical no ideal characteristics.The circuit should be built and studied in practise to understand the presence and effect of sneakcurrents that can be present and how they can be eliminated. Also, a system of randomizedinput signal generation can be implemented to have quicker and consistently trainable circuit.

2. The activation functions can modelled and utilized so that the circuit can behave as a approx-imating classifier apart from solely functioning as a binary classifier.

3. Advanced circuit design methods (VLSI) can be utilized to model the circuit with greater effi-ciency.

TU/e 32


References

[1] Y. Van De Burgt, E. Lubberman, E. J. Fuller, S. T. Keene, G. C. Faria, S. Agarwal, M. J.Marinella, A. Alec Talin, and A. Salleo, “A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing,” Nature Materials, vol. 16, no. 4, pp.414–418, 2017.

[2] O. Calin, Deep Learning Architectures: A Mathematical Approach. Springer Nature Switzerland,2020.

[3] S. Dankers, “Optimal tuning of Electrochemical Neuromorphic Organic Devices for on-chip neuralnetwork applications Optical tuning of Electrochemical Neuromorphic Organic Devices for on-chipneural network applications Master Thesis,” Ph.D. dissertation, TU Eindhoven, 2020.

[4] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444,2015.

[5] D. Ardila, A. P. Kiraly, S. Bharadwaj, B. Choi, J. J. Reicher, L. Peng, D. Tse, M. Etemadi,W. Ye, G. Corrado, D. P. Naidich, and S. Shetty, “End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography,” Nature Medicine, vol. 25,no. 6, pp. 954–961, 2019. [Online]. Available: http://dx.doi.org/10.1038/s41591-019-0447-x

[6] I. Sechopoulos, J. Teuwen, and R. Mann, “Artificial intelligence for breast cancerdetection in mammography and digital breast tomosynthesis: State of the art,” Seminarsin Cancer Biology, vol. 72, no. June 2020, pp. 214–225, 2021. [Online]. Available:https://doi.org/10.1016/j.semcancer.2020.06.002

[7] I. Arikpo, F. Ogban, and I. Eteng, “Von neumann architecture and modern computers,” GlobalJournal of Mathematical Sciences, vol. 6, no. 2, pp. 97–103, 2007.

[8] O. Krestinskaya, K. N. Salama, and A. P. James, “Learning in memristive neural network archi-tectures using analog backpropagation circuits,” IEEE Transactions on Circuits and Systems I:Regular Papers, vol. 66, no. 2, pp. 719–732, 2019.

[9] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov,“Training and operation of an integrated neuromorphic network based on metal-oxide memris-tors,” Nature, vol. 521, no. 7550, pp. 61–64, 2015.

[10] S. Lim, J. H. Bae, J. H. Eum, S. Lee, C. H. Kim, D. Kwon, B. G. Park, and J. H.Lee, “Adaptive learning rule for hardware-based deep neural networks using electronic synapsedevices,” Neural Computing and Applications, vol. 31, no. 11, pp. 8101–8116, 2019. [Online].Available: https://doi.org/10.1007/s00521-018-3659-y

[11] M. Ueda, Y. Nishitani, Y. Kaneko, and A. Omote, “Back-propagation operation for analog neuralnetwork hardware with synapse components having hysteresis characteristics,” PLoS ONE, vol. 9,no. 11, 2014.

[12] S. Agarwal, S. J. Plimpton, D. R. Hughart, A. H. Hsia, I. Richter, J. A. Cox, C. D. James,and M. J. Marinella, “Resistive memory device requirements for a neural algorithm accelerator,”Proceedings of the International Joint Conference on Neural Networks, vol. 2016-Octob, pp. 929–938, 2016.

[13] E. J. Fuller, S. T. Keene, A. Melianas, Z. Wang, S. Agarwal, Y. Li, Y. Tuchman, C. D. James,M. J. Marinella, J. J. Yang, A. Salleo, and A. A. Talin, “Parallel programming of an ionicfloating-gate memory array for scalable neuromorphic computing,” Science, vol. 364, no. 6440,pp. 570–574, 2019.

TU/e 33

http://dx.doi.org/10.1038/s41591-019-0447-x

https://doi.org/10.1016/j.semcancer.2020.06.002

https://doi.org/10.1007/s00521-018-3659-y


[14] Y. Van De Burgt, A. Melianas, S. T. Keene, G. Malliaras, and A. Salleo, “Organic electronicsfor neuromorphic computing,” Nature Electronics, vol. 1, no. 7, pp. 386–397, 2018. [Online].Available: http://dx.doi.org/10.1038/s41928-018-0103-3

[15] M. Roemer, Emily J., West, Kesley L., Northrup, Jessica B., Iverson, Jana, “NeuroGrid: recordingaction potentials from the surface of the brain,” Physiology & behavior, vol. 176, no. 12, pp. 139–148, 2016.

[16] J. Brownlee, Master Machine Learning Algorithms: Discover How They Work and ImplementThem From Scratch. Jason Brownlee, 2016. [Online]. Available: https://books.google.nl/books?id=n--oDwAAQBAJ

[17] M. A. Nielson, Neural Networks and Deep Learning. Determination Press, 2015. [Online].Available: http://neuralnetworksanddeeplearning.com/index.html

[18] Matt Mazur, “A Step by Step Backpropagation Example,” 2015. [Online]. Available:https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

[19] R. Rojas and Neural, Neural Networks A Systematic Introduction, 1996.

[20] R. Huisman, “Circuit design of an ENODe based neural network for supervised learning Bachelorend project Department of Mechanical Engineering,” Ph.D. dissertation, TU Eindhoven, 2020.

[21] R. Bland, “Learning XOR : exploring the space of a classic problem,” University of StirlingDepartment of Computing Science and Mathematics, Tech. Rep. June, 1998. [Online]. Available:http://www.maths.stir.ac.uk/∼kjt/techreps/pdf/TR148.pdf

TU/e 34

http://dx.doi.org/10.1038/s41928-018-0103-3

https://books.google.nl/books?id=n--oDwAAQBAJ

https://books.google.nl/books?id=n--oDwAAQBAJ

http://neuralnetworksanddeeplearning.com/index.html

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

http://www.maths.stir.ac.uk/~kjt/techreps/pdf/TR148.pdf


Appendix

A XOR Neural Network Python script 1 import numpy as np

2 import matplotlib . pyplot as plt

3 plt . style . use ( ’seaborn -whitegrid ’ )4 import random

5

6 # threshold activation function

7 def thrsh ( x ) :8

9 s=np . asarray ( np . shape ( x ) )10 y=np . zeros ( shape=s )11 for i in range (0 , s [ 0 ] ) :12 if x [ i ] <=0.0:13 y [ i ]= - 1 . 014 elif x [ i ] >0 .0 :15 y [ i ]=1.016 return y

17

18 #def d_thrsh(x): 1 hence excluded everywhere

19 # 2Inputs , 1 Hidden Layer , 1 Output layer. Shape

20 #

21 # b1 --\

22 # i1---O

23 # \ / \ b3 -\

24 # X ------O--y

25 # / \ /

26 # i2---O

27 # b2 --/

28 #

29

30 inputs=( [ [ - 1 , - 1 ] , [ - 1 , 1 ] , [ 1 , - 1 ] , [ 1 , 1 ] ] )31 expected_output = ( [ [ - 1 ] , [ 1 ] , [ 1 ] , [ - 1 ] ] )32

33 epochs = 40034 lr = 0.635 inputLayerNeurons , hiddenLayerNeurons , outputLayerNeurons = 2 ,2 ,136 #hidden_weights = np.array ([[3.0 ,1.0] ,[ -1.8 ,3.2]])

37

38 hidden_weights = np . random . randint ( - 5 ,5 , size=(2 ,2) ) /1039 #hidden_bias =np.array ([1. ,1.])

40

41 hidden_bias = np . random . randint ( - 5 ,5 , size=(2) ) /1042 output_weights = np . random . randint ( - 5 ,5 , size=(2) ) /1043 #output_bias =np.array ([1.])

44

45 output_bias = np . random . randint ( - 5 ,5 , size=(1) ) /1046

47 print ( "hidden_weights=" , end=’’ )48 print (∗ hidden_weights )49 print ( "hidden_biases=" , end=’’ )50 print (∗ hidden_bias )51 print ( "output_weights=" , end=’’ )52 print (∗ output_weights )53 print ( "output_biases=" , end=’’ )54 print (∗ output_bias )55

56 i=057 h1=np . zeros ( shape=(epochs , 9 ) )58 #w=np.zeros(shape=(epochs ,9))

59 er= np . zeros ( shape=(epochs , 4 ) , dtype=object )60 hla=np . zeros ( shape=(epochs ) , dtype=object )61 hlo=np . zeros ( shape=(epochs ) , dtype=object )62 ola=np . zeros ( shape=(epochs ) , dtype=object )63 olo=np . zeros ( shape=(epochs ) , dtype=object )64 pdh=np . zeros ( shape=(epochs ) , dtype=object )65 pdo=np . zeros ( shape=(epochs ) , dtype=object )66 #Training algorithm

67

68 er1=[]69 for i in range ( epochs ) :70 for j in range ( len ( inputs ) ) :

TU/e 35


71 er=[]72 # lr *=0.999

73 inputs_fp=random . choice ( inputs )74 hidden_layer_activation = np . dot ( inputs_fp , hidden_weights )75 hidden_layer_activation += np . transpose ( hidden_bias )76 # hla[i]= hidden_layer_activation

77 hidden_layer_output = thrsh ( hidden_layer_activation )78 # hlo[i]= hidden_layer_output

79

80 output_layer_activation = np . dot ( hidden_layer_output , output_weights )81 output_layer_activation += output_bias

82 # ola[i]= output_layer_activation

83 predicted_output = thrsh ( output_layer_activation )84 # olo[i]= predicted_output

85 #backprop

86 #output layer

87 d_error = ( - expected_output [ inputs . index ( inputs_fp ) ] [ 0 ] + predicted_output )88 er=np . append ( er , d_error )89 d_predicted_output = d_error

90 # pdo[i]= d_predicted_output

91 error_hidden_layer = ( d_error ∗( output_weights ) )92 d_hidden_layer = error_hidden_layer

93 # pdh[i]= d_hidden_layer

94

95 # #weight update

96 output_weights -= ( ( d_predicted_output ) ∗( hidden_layer_output ) ) ∗ lr

97 output_bias -= ( d_error ) ∗ lr

98 hidden_weights -= ( ( np . transpose ( inputs_fp ) ∗( d_hidden_layer ) ) ) ∗ lr

99 hidden_bias -= ( d_hidden_layer ) ∗ lr

100 # #store values for plots

101

102 er1=np . append ( er1 , np . sum ( er , axis=0) )103 h1 [ i ] [ 0 ]= hidden_weights [ 0 ] [ 0 ]104 h1 [ i ] [ 1 ]= hidden_weights [ 0 ] [ 1 ]105 h1 [ i ] [ 2 ]= hidden_weights [ 1 ] [ 0 ]106 h1 [ i ] [ 3 ]= hidden_weights [ 1 ] [ 1 ]107 h1 [ i ] [ 4 ]= output_weights [ 0 ]108 h1 [ i ] [ 5 ]= output_weights [ 1 ]109 h1 [ i ] [ 6 ]= hidden_bias [ 0 ]110 h1 [ i ] [ 7 ]= hidden_bias [ 1 ]111 h1 [ i ] [ 8 ]= output_bias

112

113 ex=list ( range (1 , epochs+1) )114 fig , axs = plt . subplots (1 )115 fig . suptitle ( ’Weights ’ )116 a=plt . plot ( ex , h1 [ : , 0 ] )117 b=plt . plot ( ex , h1 [ : , 1 ] )118 c=plt . plot ( ex , h1 [ : , 2 ] )119 d=plt . plot ( ex , h1 [ : , 3 ] )120 e=plt . plot ( ex , h1 [ : , 4 ] )121 f=plt . plot ( ex , h1 [ : , 5 ] )122 g=plt . plot ( ex , h1 [ : , 6 ] )123 h=plt . plot ( ex , h1 [ : , 7 ] )124 i=plt . plot ( ex , h1 [ : , 8 ] )125

126 plt . legend ( ( ’w1’ , ’w2’ , ’w3’ , ’w4’ , ’w5’ , ’w6’ , ’b1’ , ’b2’ , ’b3’ ) )127

128 #plt.axis([0, 1000, -0.5, 0.5])

129 fig , axs = plt . subplots (1 )130 fig . suptitle ( ’Error’ )131 axs . plot ( ex , 0 . 5 ∗ np . square ( er1 ) )132

133 #print final values

134 print ( "Final hidden weights: " , end=’’ )135 print ( hidden_weights )136 print ( "Final hidden bias: " , end=’’ )137 print (∗ hidden_bias )138 print ( "Final output weights: " , end=’’ )139 print (∗ output_weights )140 print ( "Final output bias: " , end=’’ )141 print (∗ output_bias )142

143 #TEST CASE FOR FINAL WEIGHTS

144 po=[]145 for k in range ( len ( inputs ) ) :146 hidden_layer_activation = np . dot ( inputs [ k ] , hidden_weights )

TU/e 36


147 hidden_layer_activation += np . transpose ( hidden_bias )148 # hla[i]= hidden_layer_activation

149 hidden_layer_output = thrsh ( hidden_layer_activation )150 # hlo[i]= hidden_layer_output

151



155 predicted_output = thrsh ( output_layer_activation )156 po=np . append ( po , predicted_output )157

158 #TEST CASE OUTPUT

159 print (∗ po ) ## B XOR Neural Network using Sigmoid function

1

2 # -*- coding: utf -8 -*-

3 """

4 Created on Wed Feb 10 12:05:24 2021

5

6 @author: 20181046

7 """

8

9 import numpy as np

10 #np.random.seed (0)

11 import matplotlib . pyplot as plt

12 plt . style . use ( ’seaborn -whitegrid ’ )13 import random

14

15 def sigmoid ( x ) :16 number= 1/(1 + np . exp ( - x ) )17 f = np . around ( number , 3 )18 return f

19

20 def sigmoid_derivative ( x ) :21 number= x ∗ (1 - x )22 f = np . around ( number , 3 )23 return f

24

25 #Input datasets

26 inputs = ( [ [ 0 , 0 ] , [ 0 , 1 ] , [ 1 , 0 ] , [ 1 , 1 ] ] )27 expected_output = ( [ [ 0 ] , [ 1 ] , [ 1 ] , [ 0 ] ] )28 epochs = 500029 lr = 0.630 inputLayerNeurons , hiddenLayerNeurons , outputLayerNeurons = 2 ,2 ,131

32 # 2Inputs , 1 Hidden Layer , 1 Output layer. Shape

33 #

34 # b1 --\

35 # i1---O

36 # \ / \ b3 -\

37 # X ------O--y

38 # / \ /

39 # i2---O

40 # b2 --/

41 #

42

43 hidden_weights = np . random . randint ( - 5 ,5 , size=(2 ,2) ) /1044 #hidden_bias =np.array ([1. ,1.])

45

46 hidden_bias = np . random . randint ( - 5 ,5 , size=(2) ) /1047 output_weights = np . random . randint ( - 5 ,5 , size=(2) ) /1048 #output_bias =np.array ([1.])

49

50 output_bias = np . random . randint ( - 5 ,5 , size=(1) ) /1051

52 print ( ( inputLayerNeurons∗hiddenLayerNeurons ) )53 print ( (1∗ hiddenLayerNeurons ) )54 print ( ( hiddenLayerNeurons∗outputLayerNeurons ) )55 print ( (1∗ outputLayerNeurons ) )56 print ( "Initial hidden weights: " , end=’’ )57 print (∗ hidden_weights )

TU/e 37


58 #print(" Initial hidden biases: ",end=’’)

59 #print(* hidden_bias)

60 print ( "Initial output weights: " , end=’’ )61 print (∗ output_weights )62 #print(" Initial output biases: ",end=’’)

63 #print(* output_bias)

64

65 i=066 #Training algorithm

67

68 h1=np . zeros ( shape=(epochs , 9 ) )69 er1= np . empty ( ( 0 , 4 ) )70 hla= np . empty ( ( 0 , 4 ) )71 hlo=np . zeros ( shape=(epochs ) , dtype=object )72 ola=np . zeros ( shape=(epochs ) , dtype=object )73 olo=np . zeros ( shape=(epochs ) , dtype=object )74 pdh=np . zeros ( shape=(epochs ) , dtype=object )75 pdo=np . zeros ( shape=(epochs ) , dtype=object )76 er1=[]77 for i in range ( epochs ) :78 for j in range ( len ( inputs ) ) :79 er=[]80 # lr *=0.999

81 inputs_fp=random . choice ( inputs )82

83 hidden_layer_activation = np . dot ( inputs_fp , hidden_weights )84 hidden_layer_activation += np . transpose ( hidden_bias )85 # hla[i]= hidden_layer_activation

86 hidden_layer_output = sigmoid ( hidden_layer_activation )87 # hlo[i]= hidden_layer_output

88



92 predicted_output = sigmoid ( output_layer_activation )93

94 #backprop

95 #output layer

96 d_error = ( - expected_output [ inputs . index ( inputs_fp ) ] [ 0 ] + predicted_output )97 er=np . append ( er , d_error )98 d_predicted_output = d_error

99

100 d_op_net_opl = sigmoid_derivative ( predicted_output )101 d_predicted_output = d_error ∗ sigmoid_derivative ( predicted_output )102 error_hidden_layer = ( d_error ∗( output_weights ) )103 # d_hidden_layer = error_hidden_layer

104 d_hidden_layer = error_hidden_layer ∗ sigmoid_derivative ( hidden_layer_output )105

106 # #weight update

107 output_weights -= ( ( d_predicted_output ) ∗( hidden_layer_output ) ) ∗ lr

108 output_bias -= ( d_error ) ∗ lr

109 hidden_weights -= ( ( np . transpose ( inputs_fp ) ∗( d_hidden_layer ) ) ) ∗ lr

110 hidden_bias -= ( d_hidden_layer ) ∗ lr

111

112 er1=np . append ( er1 , np . sum ( er , axis=0) )113 h1 [ i ] [ 0 ]= hidden_weights [ 0 ] [ 0 ]114 h1 [ i ] [ 1 ]= hidden_weights [ 0 ] [ 1 ]115 h1 [ i ] [ 2 ]= hidden_weights [ 1 ] [ 0 ]116 h1 [ i ] [ 3 ]= hidden_weights [ 1 ] [ 1 ]117 h1 [ i ] [ 4 ]= output_weights [ 0 ]118 h1 [ i ] [ 5 ]= output_weights [ 1 ]119 h1 [ i ] [ 6 ]= hidden_bias [ 0 ]120 h1 [ i ] [ 7 ]= hidden_bias [ 1 ]121 h1 [ i ] [ 8 ]= output_bias

122

123 ex=list ( range (1 , epochs+1) )124 fig , axs = plt . subplots (1 )125 fig . suptitle ( ’Weights ’ )126 a=plt . plot ( ex , h1 [ : , 0 ] )127 b=plt . plot ( ex , h1 [ : , 1 ] )128 c=plt . plot ( ex , h1 [ : , 2 ] )129 d=plt . plot ( ex , h1 [ : , 3 ] )130 e=plt . plot ( ex , h1 [ : , 4 ] )131 f=plt . plot ( ex , h1 [ : , 5 ] )132 g=plt . plot ( ex , h1 [ : , 6 ] )133 h=plt . plot ( ex , h1 [ : , 7 ] )

TU/e 38


134 i=plt . plot ( ex , h1 [ : , 8 ] )135

136 plt . legend ( ( ’w1’ , ’w2’ , ’w3’ , ’w4’ , ’w5’ , ’w6’ , ’b1’ , ’b2’ , ’b3’ ) )137

138 #plt.axis([0, 1000, -0.5, 0.5])

139 fig , axs = plt . subplots (1 )140 fig . suptitle ( ’Error’ )141 axs . plot ( ex , 0 . 5 ∗ np . square ( er1 ) )142

143 #print final values

144 print ( "Final hidden weights: " , end=’’ )145 print ( hidden_weights )146 print ( "Final hidden bias: " , end=’’ )147 print (∗ hidden_bias )148 print ( "Final output weights: " , end=’’ )149 print (∗ output_weights )150 print ( "Final output bias: " , end=’’ )151 print (∗ output_bias )152

153 #TEST CASE FOR FINAL WEIGHTS

154 po=[]155 for k in range ( len ( inputs ) ) :156 hidden_layer_activation = np . dot ( inputs [ k ] , hidden_weights )157 hidden_layer_activation += np . transpose ( hidden_bias )158 # hla[i]= hidden_layer_activation

159 hidden_layer_output = sigmoid ( hidden_layer_activation )160 # hlo[i]= hidden_layer_output

161



165 predicted_output = sigmoid ( output_layer_activation )166 po=np . append ( po , predicted_output )167

168 #TEST CASE OUTPUT

169 print (∗ po ) ##

TU/e 39


C Neural Network Mathematics

In the following appendix, the mathematics behind updating the hidden and output weights will bedescribed.[18]

C.1 Hidden Layer Weight Update

The mathematical equations for hidden weight update are as follows:

To change the value of the hidden weight, here w1, we need to find the amount by which the er-ror changes due to a change in this particular weight. The error, also called the loss function (C.9), iscalculated at the end of the forward propagation step for the predicted output. Consider the networkshown in Figure C.1.

Figure C.1: Updating the hidden layer weights

E =1

2∗ (Target− Predicted)2 (C.1)

Equation C.2 shows the derivative of the loss function with respect to one of the hidden weights, W1,which is calculated using the chain rule.

∂E

∂w1=

∂E

∂outh1∗ ∂outh1∂neth1

∗ ∂neth1∂w1

(C.2)

∂E

∂outh1=

∂E


∗ ∂neto∂outh1

= −( Target − Predicted ) ∗ out0 (1− out0) ∗ w5 (C.3)

∂ net o

∂ out h1=∂ ( out h1 ∗ w5 + out h2 ∗ w6 + 1 ∗ b)

∂outh1= w5 (C.4)

∂outh1∂neth1

=∂(

11+ε−1eth1

)∂nθth1

=e−x

(1 + e−x)2=

1

(1 + e−x)− 1

(1 + e−x)2= out h1− out 2

h1 = out h1 (1− out h1)

(C.5)

∂ net h1

∂w1=∂(x1 ∗ w1 + x2 ∗ w2 + 1 ∗ b)

∂w1= x1 (C.6)

Solving the partial derivatives in the chain rule, (Equation C.3,Equation C.4,Equation C.5 and Equa-tion C.6) the final equation after simplification is shown in Equation C.7.

TU/e 40


∂E

∂w1= −( Target − Predicted ) ∗ out 0 (1− out 0) ∗ w5 ∗ out h1 (1− out h1) ∗ x1 (C.7)

The hidden weight w1 is updated using the equation Equation C.8 The other weights including thebiases weights can similarly be updated.

w1 = w1− η ∂E∂w1

(C.8)

C.2 Hidden Layer Bias Update

The mathematical equations for hidden bias update are as follows:

To change the value of the hidden bias, we need to find the amount by which the error changesdue to a change in this particular weight. The error, also called the loss function (C.9), is calculatedat the end of the forward propagation step for the predicted output. Consider the network shown inFigure C.2.

Figure C.2: Updating the hidden layer weights

E =1

2∗ (Target− Predicted)2 (C.9)

Equation C.10 shows the derivative of the loss function with respect to hidden bias, bh1, which iscalculated using the chain rule.

∂E

∂bh1=

∂E

∂outh1∗ ∂outh1∂neth1

∗ ∂neth1∂bh1

(C.10)

∂E

∂outh1=

∂E


∗ ∂neto∂outh1

= −( Target − Predicted ) ∗ out0 (1− out0) ∗ w5 (C.11)

∂ net o

∂ out h1=∂ ( out h1 ∗ w5 + out h2 ∗ w6 + 1 ∗ b)

∂outh1= w5 (C.12)

∂outh1∂neth1

=∂(

11+ε−1eth1

)∂nθth1

=e−x

(1 + e−x)2=

1

(1 + e−x)− 1

(1 + e−x)2= out h1− out 2

h1 = out h1 (1− out h1)

(C.13)

∂ net h1

∂wbh1=∂(x1 ∗ w1 + x2 ∗ w2 + 1 ∗ bh1)

∂bh1= 1 (C.14)

TU/e 41


Solving the partial derivatives in the chain rule, (Equation C.11,Equation C.12,Equation C.13 andEquation C.14) the final equation after simplification is shown in Equation C.15.

∂E

∂bh1= −( Target − Predicted ) ∗ out 0 (1− out 0) ∗ w5 ∗ out h1 (1− out h1) ∗ 1 (C.15)

The hidden bias bh1 is updated using the equation Equation C.16

bh1 = bh1− η ∂E∂bh1

(C.16)

C.3 Output Layer Weight Update

For the output layer weight update Figure C.3, again, the loss function (Equation C.9) is calculatedfollowed by the partial derivative of the loss function with respect to the output layer weight, here w5(Equation C.17).

Figure C.3: Updating the output layer weights

∂E

∂w5=

∂E


∗ ∂neto∂w5

(C.17)

∂outo∂neto

=∂(

11+ε−1eto

)∂neto

=e−x

(1 + e−x)2=

1

(1 + e−x)− 1

(1 + e−x)2= out o − out 2

o = out o (1− out o)

(C.18)

∂ net o

∂w5=∂(outh1 ∗ w5 + outh2 ∗ w6 + 1 ∗ b)

∂w5= outh1 (C.19)

∂E

∂w5= −( Target − Predicted ) ∗ out 0 (1− out 0) ∗ outh1 (C.20)

Simplifying the partial derivatives in the chain rule, (Equation C.18 andEquation C.19) the finalequation after simplification is shown in Equation C.20. The output weight w5 is updated as shownin Equation C.21

w5 = w5− η ∂E∂w5

(C.21)

C.4 Output Layer Bias Update

For the output layer bias update Figure C.4, again, the loss function (Equation C.9) is calculatedfollowed by the partial derivative of the loss function with respect to the output layer weight, here bo(Equation C.22).

TU/e 42


Figure C.4: Updating the output layer weights

∂E

∂bo=

∂E


∗ ∂neto∂bo

(C.22)

∂outo∂neto

=∂(

11+ε−neto

)∂neto

=e−x

(1 + e−x)2=

1

(1 + e−x)− 1

(1 + e−x)2= out o − out 2

o = out o (1− out o)

(C.23)

∂ net o

∂bo=∂(outh1 ∗ w5 + outh2 ∗ w6 + 1 ∗ bo)

∂bo= 1 (C.24)

∂E

∂bo= −( Target − Predicted ) ∗ out 0 (1− out 0) ∗ 1 (C.25)

Simplifying the partial derivatives in the chain rule, (Equation C.23 andEquation C.24) the finalequation after simplification is shown in Equation C.25. The output weight bo is updated as shown inEquation C.26

bo = bo − η∂E

∂bo(C.26)

TU/e 43


D Activation Functions

The activation function takes in the summed value of the products of inputs and weights and basedon it’s type gives out an output that is linearly or non-linearly related with the input. Given beloware some examples of activation functions[2].

D.1 Sigmoid Activation Function

Sigmoid activation (Equation D.1) function is a smooth curve, whose value can be in the range of[0, 1]. A form of the sigmoid activation function , where c = 1, is the logistic function which is definedin Equation D.2. (Figure D.1) shows the sigmoid curve.

f(x) =1

1 + e−cx(D.1)

f(x) =1

1 + e−x(D.2)

Figure D.1: Sigmoid Activation Function [2]

The derivative of the logistic curve is a value between [0, 1] and is given in Equation D.3

f ′(x) =∂ 11+e−x

∂x=

e−x

(1 + e−x)2=

1 + e−x − 1

(1 + e−x)2= σ(x)− (σ(x))2 = σ(x)(1− (σ(x))) (D.3)

D.2 Linear Activation Function

Linear activation function is a Linear curve, whose value can be in the range of [0, x]. A formof the linear activation function is the ReLU (Rectified Lineear Unit) function which is defined inEquation D.4. Figure D.2 shows the curve of the ReLU activation function.

f(x) =

x, if x ≥ 0

0, ≤ 0(D.4)

TU/e 44


Figure D.2: ReLU Activation Function [2]

The derivative of the ReLU curve is a Binary function that takes 0, 1 and is given in Equation D.5

f ′(x) =

1, if x ≥ 0

0, ≤ 0(D.5)

D.3 Threshold Activation Function

Threshold activation function is a step function, whose value can be either 0, 1. It is defined as inEquation D.6 with it’s curve depicted in Figure D.3.

H(x) =

1, if x ≥ 0

0, ≤ 0(D.6)

Figure D.3: Threshold Activation Function [2]

The derivative of a threshold function is 0 everywhere except at x = 0, where it is not differentiable(Equation D.7).

H ′(x) =

0, if x < 0, x > 0

not− defined, 0(D.7)

In the development of the circuit, the threshold function is modelled using a schmitt trigger. Sincethe derivative of the Threshold function is either 0 or not − defined, for practical application, it iskept as 1 so as to produce non-zero values in the backpropagation update calculation.

TU/e 45


E ENODe Data Logging

To log the ENODe conductance state data, the device is setup as shown in Figure E.2 and the labsetup is shown in Figure E.3.Figure E.1 shows the structure of the device used in the lab [3]. Thefollowing procedure is carried out to log the data:

1. First setup the ENODe device. Clean the glass ITO (Indium Titanium Oxide) slide with anedge of an absorbing fabric tissue, that has isopropyl alcohol solution applied to it. This stepis performed to remove dust particles and clean the surface to create clean contact are for theelectrode probes. Pour NaCl solution (electrolyte) of 100mM in the cavities.

2. Connect the Gate, Drain and Source probes to the ENODe device. A 1Mohm resistor is con-nected at the gate to retain the charges after the gate pulse becomes zero.

3. Using a conductor, short the Gate and Source probes to remove any charges.

4. Set the desired voltage input values for the gate and drain in the Labview model and start thesimulation.

5. Once the data is logged, short the gate and drain probes again to reset the device state for thenext measurement.

Figure E.1: Structure of the ENODe device used for measurement [3]

Figure E.2: The setup of the ENODe device to measure the data

TU/e 46


Figure E.3: ENODe data measurement setup depicting the positions of the Gate, Drain and Sourceprobes at the three corners. A resistance of 1Mohm (not visible) is connected to the gate probe.

TU/e 47


F LTSpice Electrical Elements

Table F.1: LTSpice Electrical Component List

Capacitor: A device that stores electric charge. The unit of a capacit-ance is Farad [F]

Ground. The ground is the voltage reference point whose value is 0 [V].

Opamp: Operational amplifier is a voltage amplifier.

OR Logic Gate

Resistor: A device having resistance to the passage of current. Resist-ance of a resistor is measured in Ohm [Ω]

Schmitt Trigger: A device that gives constant output for an input basedon its threshold value.

Voltage triggered Switch: A switch that opens or closes based on thevoltage value.

TU/e 48


Voltage Element: Provides constant/pulsed/sinusoidal/function voltageoutput. [V]

XOR Logic Gate.

Behavioural Flipflop: A device that stores a single bit of data.

Sample: Behavioral a-device Sample and Hold. It has two modes ofoperation. The output may follow the input whenever the S/H input istrue or the output may latch to the input when the CLK (clock) inputgoes true.

AND Logic Gate

LED: Light Emitting Diode

Label: LTSpice label is a tool to give a name to a circuit node. It alsoallows one to use the voltage output of the node elsewhere.

TU/e 49


G XOR as a combination of single layer perceptrons

The XOR as a combination of single layer perceptrons is a way of modelling a neural network with ahidden layer that trains to work as an XOR. Unlike using the output from the final output layer tocalculate the error to update all the weights, in this network, each perceptron has an error calculatedat its output. The respective errors are then utilized to update the weights of the correspondingperceptrons that model the OR, NAND and AND gates. This can be seen from the expression inEquation G.1 which is the Boolean representation of the XOR. Figure G.1 shows the structure of theimplemented neural network. The colour scheme depicts which weights function to model a particularlogic gate. The AND, OR and NAND logic gates can easily be modelled with a single perceptron.However the combination, which is similar to a two layer neural network can also model the XORgate. Convergence of all the three perceptrons models the XOR gate when the network is consideredas a single entity. This network however depends on the slowest perceptron which converges the last.The purpose of this circuit was as a trial application of multiple layer hardware network.

XOR(AB) = (A ·B) + (A ·B) = (A+B) · (A+B) (G.1)

Figure G.1: XOR as a combination of individual perceptrons. The weights in red boxes model theOR gate, weights in yellow boxes model the NAND gate and the weights in green boxes model theAND gate. When all three perceptrons get trained, the overall architecture behaves as an XOR gate.

TU/e 50


Figure G.2: Error waveform and weight convergence for Combination XOR. Weights 1,2 and bh1 theOR gate perceptron, weights 3,4 and bh2 are for the NAND gate perceptron and weights 5,6 and boare for the AND gate perceptron.

TU/e 51

Eindhoven University of Technology MASTER Design and ...

Documents

Transcript of Eindhoven University of Technology MASTER Design and ...