UNIVERSITY OF CALGARY Fault-tolerant Architectures for Nanowire and Quantum Array Devices

UNIVERSITY OF CALGARY

Fault-tolerant Architectures for Nanowire and Quantum Array Devices

by

Tamer S. Mohamed

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

CALGARY, ALBERTA

April, 2013

c© Tamer S. Mohamed 2013

Abstract

This work investigates techniques for building fault-tolerant digital circuits at the nano-

scale. It provides an overview of some nano-scale technology candidates that can be used

in the next generation of digital circuit based on nanoelectronic logic fabrics. It focuses on

fault tolerance of such circuits at both the circuit and the architecture level. A case study

based on pass-transistor logic using wrap-gate nanowire devices is presented. Such circuits

implement logic computing in the form of binary decision diagrams (BDDs), however, they

are not fault-immune. In this thesis, the BDD based nanowire devices that incorporate er-

ror correction coding are proposed. In addition, a planarization algorithm is presented and

implemented in order to synthesize planar error correcting circuits using such devices. Al-

ternative architecture, such as the cross-bar nano-FPGA, is considered as another candidate

for fault-tolerance. Simulation and modeling of all the presented architectures are performed

using the developed software “BDD processing tool”, CUDD package and SPICE.

i

Acknowledgements

alh.mdo lillahi rbbi alalamyn

áÖÏ A ªË @ H. P é

<Ë

YÒmÌ'@

I would like to thank Dr. S Yanushkevich, my supervisor for her help, her great patience

and support in finishing this work. I would also like to thank Dr. Graham Jullien and Dr.

Vassil Dimitrov for their help, support and very enlightening discussions. My wife and my

parents provided me with love, encouragement and faith. I hope I will be able to fulfil my

promises to them. My friends were always by my side encouraging me and I am indebted

to them in many ways. My friend, Hazem Gomaa, gave me very helpful comments and

feedback about my presentation. I would also like to thank Dr. Anton Zeilinger who, during

his visit to Calgary, patiently answered my questions about Quantum entanglement. Dr.

D. Michael Miller, my external examiner, gave me encouraging remarks and inspiring ideas

about future research. I would also like to thank the University of Calgary, and the funding

agencies; NSERC, AIF and iCore for the financial support. Many thanks are also due to

the most helpful and cheerful staff working in the Electrical Engineering department at the

University. Thank you Lisa Bensmiller, Judy Trumble and Ella Lok.

ii

Table of Contents

Abstract i

Acknowledgements ii

Table of Contents iii

List of Tables vi

List of Figures and Illustrations vii

List of Symbols, Abbreviations and Nomenclature x

1 Introduction 11.1 Research Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Research Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Nanoelectronic Logic Fabric 42.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Types of materials for nano electronics . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Carbon in nano electronics . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Unimolecular compound materials . . . . . . . . . . . . . . . . . . . . 7

2.3 Devices not modeled by conventional charge transport . . . . . . . . . . . . . 92.3.1 III-IV Quantum devices . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.2 Quantum cellular automata . . . . . . . . . . . . . . . . . . . . . . . 102.3.3 Quantum computation . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Device assembly techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Circuit architectures and defect tolerance . . . . . . . . . . . . . . . . . . . . 132.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Overview of Fault Tolerance 153.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Construction of fault tolerant systems . . . . . . . . . . . . . . . . . . . . . . 173.4 Fault tolerance via hardware redundancy . . . . . . . . . . . . . . . . . . . . 173.5 Fault tolerance via information redundancy . . . . . . . . . . . . . . . . . . . 203.6 Fault tolerance via probabilistic computing . . . . . . . . . . . . . . . . . . . 213.7 Fault tolerance via algorithmic/approximate computing . . . . . . . . . . . . 223.8 Fault tolerance via time redundancy . . . . . . . . . . . . . . . . . . . . . . . 223.9 Fault tolerance via energy minimization . . . . . . . . . . . . . . . . . . . . . 233.10 Fault Tolerance via reconfiguration . . . . . . . . . . . . . . . . . . . . . . . 233.11 Fault Tolerance via dynamic routing . . . . . . . . . . . . . . . . . . . . . . 243.12 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

iii

3.12.1 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . . 253.12.2 Signal-to-noise ratio (SNR) . . . . . . . . . . . . . . . . . . . . . . . 263.12.3 Bit error rate (BER) . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.13 Performance analysis techniques . . . . . . . . . . . . . . . . . . . . . . . . . 263.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 BDD-based Nanowire Error Correcting Circuits 284.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Gate reliability without error correction . . . . . . . . . . . . . . . . . . . . . 314.4 Probabilistic error model in a binary decision diagram . . . . . . . . . . . . . 35

4.4.1 Input Error Probability and SNR . . . . . . . . . . . . . . . . . . . . 434.5 Error-correction coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5.1 Shortened codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.6 BDD model with error correction . . . . . . . . . . . . . . . . . . . . . . . . 484.7 Reliability of the error-correcting BDD . . . . . . . . . . . . . . . . . . . . . 564.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Synthesis of Planar Nano-Circuits 675.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2 Algorithm 1: Linear-time node processing . . . . . . . . . . . . . . . . . . . 715.3 Algorithm 2: Multi-pass diagram processing . . . . . . . . . . . . . . . . . . 735.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6 Crossbar Latch-based Combinational and Sequential Logic for nano FPGA 786.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.2 Device modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.3 Operation model of the crossbar latch . . . . . . . . . . . . . . . . . . . . . . 816.4 Combinational circuit models . . . . . . . . . . . . . . . . . . . . . . . . . . 876.5 Sequential circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.6 Organization of a nano FPGA using crossbar arrays . . . . . . . . . . . . . . 916.7 Area and timing of the nano FPGA . . . . . . . . . . . . . . . . . . . . . . . 956.8 Fault and defect Tolerance in nano FPGA . . . . . . . . . . . . . . . . . . . 996.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Quantum Computing Alternative 1037.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.1.1 The qubit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.1.2 A system of more than one qubit . . . . . . . . . . . . . . . . . . . . 1097.1.3 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107.1.4 Quantum gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1127.1.5 Matrix expansion and refactoring for quantum gates . . . . . . . . . . 1147.1.6 Quantum algorithms and the realization of quantum computers . . . 116

iv

7.2 Simulation of quantum computers . . . . . . . . . . . . . . . . . . . . . . . . 1187.3 Emulating quantum computation using classical resources . . . . . . . . . . . 119

7.3.1 Approximate storage requirement for emulating a qubit . . . . . . . . 1207.3.2 Qubit representation using algebraic integers . . . . . . . . . . . . . . 1217.3.3 Emulating superposition of states . . . . . . . . . . . . . . . . . . . . 1227.3.4 Emulating entanglement . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

8 Conclusions and Future Work 127

Appendix A BDD processor tool 129

Appendix B SPICE net listings for crossbar circuits 139

Appendix C Matlab code for the simulations 149

Bibliography 154

v

List of Tables

4.1 Input probabilities for a 2-input gate . . . . . . . . . . . . . . . . . . . . . . 324.2 Gate reliability given the input error probability and the input signal proba-

bilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Reliability of the gates implemented using BDDs, given the input probabilities

are X1 and X2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.4 Probability of error vs SNR, and the value of the noise power for VDD = 0.3V 444.5 Standard decoding array for Hamming(5,2) shortened code . . . . . . . . . . 494.6 Standard decoding array for Hamming(6,3) shortened code . . . . . . . . . . 494.7 Error-correcting BDDs of the elementary gates . . . . . . . . . . . . . . . . . 564.8 Noise tolerance in error-correcting 2x2 bit adder with uncorrelated noise added

at all 4 inputs for various SNR levels . . . . . . . . . . . . . . . . . . . . . . 644.9 Performance comparison of noise-tolerant NAND gate models for different

SNR levels (16nm predictive transistor simulation model). . . . . . . . . . . 66

5.1 Planarization results (variable ordering is performed using SIFT algorithmunless the exact ordering (denoted (exact)) is used) . . . . . . . . . . . . . . 76

6.1 Comparison of nanoelectronic architectures . . . . . . . . . . . . . . . . . . . 99

7.1 Sin/Cos reduced lookup table by exploiting Sin/Cos octant symmetry . . . . 121

vi

List of Figures and Illustrations

2.1 Carbon molecules. Top row: C60 buckyball and graphene sheet. Bottomthree: armchair, zigzag, chiral single walled carbon nanotubes. (adaptedfrom [1]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Quantum cellular automata arranged as a wire. . . . . . . . . . . . . . . . . 11

3.1 Dynamic fault tolerant system . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 R-Modular Redundancy configuration . . . . . . . . . . . . . . . . . . . . . . 183.3 Cascaded Triple Modular redundancy . . . . . . . . . . . . . . . . . . . . . . 193.4 NAND multiplexing scheme for a NAND operation with N = 4 . . . . . . . 20

4.1 A BDD node is equivalent to a 2× 1 multiplexer . . . . . . . . . . . . . . . . 294.2 Implementation and Simulation models of a BDD node: two NMOS transis-

tors, two transmission gates, and bi-directional hysteresis switches . . . . . . 304.3 BDD Node Circuit using Hexagonal Nanowire controlled by WPG (from [155]

with permission from the second author). . . . . . . . . . . . . . . . . . . . . 314.4 Probabilistic Output Error model for a NAND gate . . . . . . . . . . . . . . 314.5 Probabilistic output error model for a node in a BDD. . . . . . . . . . . . . 354.6 Example BDD for probabilistic calculation . . . . . . . . . . . . . . . . . . . 364.7 BDD of a buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.8 BDD of a 2-input NAND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . 394.9 Reliability of a 2 input NAND gate implemented as a BDD . . . . . . . . . . 414.10 Probability of Input error vs Input signal SNR . . . . . . . . . . . . . . . . . 454.11 Error-correcting NAND gate BDD with indicator for unmapped vector values. 514.12 An error-correcting multi-valued decision diagram for a generic 2-input func-

tions. In binary representation, the values of the terminal nodes are 0 or 1,and the nodes are merged accordingly. . . . . . . . . . . . . . . . . . . . . . 52

4.13 A parity bit generator for the shortened Hamming(5,2). . . . . . . . . . . . . 534.14 Error-correcting 2x2 bit adder. . . . . . . . . . . . . . . . . . . . . . . . . . . 574.15 Reliability of the error-correcting BDD for the buffer/inverter . . . . . . . . 604.16 An error-correcting BDD node used in TMR simulations . . . . . . . . . . . 614.17 Reliability of the error-correcting BDD for the AND/NAND gate . . . . . . 614.18 Reliability of the error-correcting BDD for the XOR/XNOR gate . . . . . . 624.19 Reliability of the error-correcting BDD for a 2-bit adder . . . . . . . . . . . . 624.20 Average reliability of the error-correcting BDD for a 2-bit adder . . . . . . . 634.21 Spice simulation of EC buffer with different random noise applied at each level. 644.22 (a)Simulation of the 2x2 adder without error-correction at SNR = 9dB.

(b)Simulation of the adder with error-correction. BER values are averagedfor all 3 output bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1 Planarized EC-BDD NAND gate. Nodes with a single vertical branch aredummy nodes. Shaded nodes are duplicate nodes. . . . . . . . . . . . . . . . 69

5.2 Planarized BDD implementing the output s2 of the EC-BDD 2-bit adder . . 705.3 Two adjacent parent nodes with no common child nodes. . . . . . . . . . . . 72

vii

5.4 Two adjacent parent nodes with one common child node. . . . . . . . . . . . 725.5 Two adjacent parent nodes with two common child nodes. . . . . . . . . . . 735.6 Arbitrary position coupled nodes with a common child node in the fourth level. 74

6.1 (a) Crossbar with molecular devices. (b) Basic logic operations requiring onlypassive components. (c) Implementation of the basic logic operations. (blackarrows represent enabled diode junctions) . . . . . . . . . . . . . . . . . . . . 82

6.2 Crossbar Latch hysteresis based operation . . . . . . . . . . . . . . . . . . . 846.3 Crossbar latch hysteresis characteristics . . . . . . . . . . . . . . . . . . . . . 856.4 (a) 3-D Structure of a crossbar latch. (b) The PSPICE model of the crossbar

latch using hysteresis switches. . . . . . . . . . . . . . . . . . . . . . . . . . . 866.5 A PSPICE model of a nano architecture model of a full adder, utilizing the

crossbar latches for signal restoration and inversion. . . . . . . . . . . . . . . 886.6 4-to-1 Multiplexer model using the crossbar latches in decoding the selection

signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.7 (a) A 4-bit shift register from D-latches. (b) Modifications to the basic shift

register to make it suitable for crossbar implementation. (Two-phase controlsignals and rectifier junctions to force signal direction) (c) Crossbar imple-mentation of the 4-bit shift register. (solid black arrows represent rectifierjunctions, forcing signal direction) . . . . . . . . . . . . . . . . . . . . . . . . 91

6.8 A PSPICE model of the shift register using 2 pairs of out-of-phase controlsignals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.9 (a) Waveforms of two out-of-phase control voltage pairs for latching the in-put signal (b) SPICE Simulation of the operation of the crossbar-based shiftregister at steady state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.10 (a) A generic synchronous counter architecture with an arbitrary countingsequence. (b) Crossbar implementation of the generic counter requires onlyone control signal pair. (c) Floorplan of a generic counter. . . . . . . . . . . 94

6.11 A PSPICE model of a T-flipflop using a 2-to-1 MUX. . . . . . . . . . . . . . 946.12 Shared routing/device plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.13 Example for the organization of the nano FPGA . . . . . . . . . . . . . . . . 966.14 Dynamic fault tolerant system . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.1 Possible physical realizations of a qubit as a physical subsystem of a certainphenomenon. (a)The photon direction of travel is restricted to one of twovalues as in the Mach-Zehnder interferometer with one photon entering theapparatus. (b)Single photon direction of travel in the Michelson interferom-eter with the directions not necessarily perpendicular but the system statesare nevertheless orthogonal. (c)The Stern-Gerlach apparatus with the elec-tron spin (up or down) as the qubit. . . . . . . . . . . . . . . . . . . . . . . . 106

7.2 Bloch sphere representation of possible states of a single qubit . . . . . . . . 1087.3 CNOT gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147.4 Direct implementation of a quantum emulator using registers and matrix op-

erations represented by gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

viii

7.5 Complex number representation using algebraic integers (a)R=4 or using 2variables. (b)R=12 or using 6 variables. . . . . . . . . . . . . . . . . . . . . . 123

7.6 Orthogonality of dilated Haar wavelets. The translation is zero. . . . . . . . 1247.7 Dilated Daubechies-2 wavelet. . . . . . . . . . . . . . . . . . . . . . . . . . . 124

A.1 Software main window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130A.2 Open file type pla or blif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131A.3 BDD variable reordering choices . . . . . . . . . . . . . . . . . . . . . . . . . 132A.4 Tools menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132A.5 Planar layout generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.6 Planar layout without connections to a zero terminal . . . . . . . . . . . . . 134A.7 Planar layout export options . . . . . . . . . . . . . . . . . . . . . . . . . . . 134A.8 Spice netlist generator and simulator window . . . . . . . . . . . . . . . . . . 136A.9 Error Correction PLA generation window . . . . . . . . . . . . . . . . . . . . 137

ix

List of Symbols, Abbreviations and Nomenclature

Symbol Definition

ADD Arithmetic decision diagram

BDD Binary decision diagram

BER Bit error rate

BIST Built in self test

CMOS Complementary metal-oxide-semiconductor

CMOL CMOS-molecular Electronics

CTMR Cascaded Triple modular redundancy

CUDD Colorado University Decision Diagram package

EC-BDD Binary decision diagram with error correction

EP Error probability

FPGA Field Programmable Gate Array

HDL Hardware description language

high-K high permittivity (ǫ = κǫ0) materials

KLD Kullback-Leibler Distance

MC Monte Carlo simulation

MVL Mutli-valued logic

NMR Nuclear magnetic resonance

PTL Pass transistor logic

QCA Quantum-dot Cellular Automata

R Reliability

RMR R-fold modular redundancy

SNR Signal to noise ratio

SPICE Simulation program with integrated circuit emphasis

x

TMR Triple modular redundancy

WPG Wrap gate device

Xi probability that a gate input xi takes the value 1

εi probability of error at a gate input

εg probability of erroneous output value inversion in a gate

εn probability of incorrect switching at a BDD node

xi

Chapter 1

Introduction

1.1 Research Motivation

In this research, we investigate the architecture and fault tolerance of two technological alter-

natives to the prevalent Complementary Metal-Oxide-Semiconductor (CMOS) technology.

CMOS technology has been increasingly successful for several decades, and it is currently

the dominant technology for all state-of-the-art microprocessors, digital signal processors

and analog integrated circuits. One of the reasons for the success of CMOS is its scalability,

which translates to successful effort in shrinking the device area, voltage supply and power

consumption. This effort, while successful for several decades, has hit a hard wall as the

device dimensions are scaled down to atomic dimensions. For example, the gate of a CMOS

transistor, without employing high-permittivity materials, could be just 5 atoms thick. At

this scale, the devices begin to behave according to quantum mechanical principles, and may

exhibit large charge tunneling through the gate oxide. This is in contrast to the classical

assumptions that were able to explain and predict the device behaviour in a circuit.

Although the main drive in the technological advance of the capabilities of electronic

circuits and systems has been the ability to integrate more devices in the same area, it is

no longer viable to add more devices, because the added effect of the power consumption

of billions of devices translates to unpractical power generation requirement and unpractical

associated cooling.

Therefore, it is imperative to find new paradigms that can successfully replace CMOS.

In this research, we investigate the feasibility of some recently proposed candidates for

replacing CMOS. One candidate is the GaAs devices that can be built using Quantum dots

and wrap-gate nanowires. The other is the crossbar architecture using molecular rectifying

1

devices. These devices require at least a semi-classical approach to explain their behaviour.

This means that we also need to develop new simulation tools that can predict the behaviour

of such devices in a circuit or a system.

We investigate several issues regarding these devices, including operating temperatures,

manufacturing tolerances, fault tolerance, redundancy and interfacing. This investigation

shall provide a guide line to both industry and design engineers. Since the development of a

design automation tool is also crucial to the success of the new technology, we investigate how

much this transition can be facilitated using an automated CAD tool that can take a classical

design, automatically incorporate fault-tolerance, and generate a nano-circuit layout.

1.2 Research Objectives

• Study and develop semi-classical simulation models that can accurately predict

the behaviour of circuits based on nano devices.

• Study and develop fault-tolerance techniques that can be incorporated at the

nano-scale to increase yield and reliability in the presence of both circuit de-

fects and low signal-to-noise-ratio (SNR) at the input and control signals.

• Develop a library of “standard cells”, which represents a number of logic cores,

typically found in a commercial design entry and placement tool.

• Integrate the design flow with industry standard simulation and synthesis

tools. We mainly target SPICE BSIM4 simulation and binary decision di-

agram synthesis using a BDD tool, which is based on the Colorado University

Decision Diagram (CUDD) package.

1.3 Research Outcomes

This research has resulted in:

2

• Contribution to a theory for fault tolerant BDD based circuits using error

correction.

• Development of a tool to automate the design of error correcting BDD circuits.

• Development of a tool to automate the generation of planar layouts of the

BDD based circuits.

• Development of sequential and combinational circuit models using crossbar

molecular models.

• Evaluation of emulation models of quantum computation with FPGA-based

hardware acceleration.

The research outcomes have been published in the following papers [83–85, 133, 151]:

T. Mohamed, G. A. Jullien, and W. Badawy. Crossbar latch-based combinational and

sequential logic for nano-FPGA. In IEEE Int. Symposium on Nanoscale Architectures.

NANOSARCH, pages 117–122, 2007.

T. Mohamed, W. Badawy, and G. Jullien. On using FPGAs to accelerate the emula-

tion of quantum computing. In Proc. Canadian Conference on Electrical and Computer

Engineering. CCECE, pages 175 –179, 2009.

T. Mohamed, S. N. Yanushkevich, and S. Kasai. Fault-tolerant nanowire BDD circuits.

In Proc. Int. Workshop on Physics and Computing in nano-scale Photonics and Materials,

2012.

G. Tangim, T. Mohamed, S. N. Yanushkevich, and S. E. Lyshevski. Comparison of noise-

tolerant architectures of logic gates for nanoscaled CMOS. In Proc. Int. Conference on High

Performance Computing. HPC-UA, 2012.

S. N. Yanushkevich, S. Kasai, G. Tangim, A. H. Tran, T. Mohamed, and V. P. Shmerko.

Introduction to Noise-Resilient Computing. Morgan and Claypool, 2013.

3

Chapter 2

Nanoelectronic Logic Fabric

2.1 Introduction

In this chapter, we discuss the main principles behind the technology migration towards

nanoscale electronics. We provide a brief overview of some of the nano-scale structures

considered as the basis for new devices. This chapter also presents issues, such as device

assembly, circuit architectures, defect tolerance and scalability. We focus on programmable

logic implementations using crossbar arrays and BDD-based nanowire networks.

In order to discuss nanoscale electronics, we need to define what is meant by nanoscale

and what is meant by having a new material. A definition of nanotechnology provided by

the U.S. National Nanotechnology Initiative (NNI) is as follows [2]:

”Nanotechnology is the understanding and control of matter

at dimensions of roughly 1 to 100 nanometers, where unique

phenomena enable novel applications. Encompassing nanoscale

science, engineering and technology, nanotechnology involves

imaging, measuring, modelling, and manipulating matter at this

length scale.”

The state-of-the-art integrated circuits have reached an integration level of 1012 transistors.

A 64GB (the equivalent of 256 billion multi-value storage cells) SD-card is common place at

the time of this writing and is being sold for under $20. The transistor feature size is 32nm

and is already being phased out in favour of more aggressive scaled down transistors, with

even smaller feature sizes. Avogadro’s number is 6× 1023. This implies that the current IC

manufacturing capacity is almost at the molecular level.

4

For example, the capacitance of the CMOS gate given by C = ǫAd. The scaled down

conventional transistor would require scaling both the area A and the oxide thickness d

in order to retain the same capacitance value. This would lead to a gate oxide thickness

that is just 5 atoms thick, and thus, the conventional transistor becomes unusable due

to the dominance of quantum effects at this scale. The oxide would not have performed

its desired insulation properties, had it not been for the introduction of high-K materials

(higher permittivity) which allowed the gate thickness to remain large while the gate length

(area) shrinks [13]. The introduction of more effective high-K materials gave the conventional

CMOS industry a life boost for at least 10 more years. Nevertheless, a paradigm shift in

technology is inevitable, in order to keep up with the trend set by Moore’s law.

We need to address several issues with regard to the future of development of electronics.

The first issue is whether new types of materials should be used other than the mainstream

semiconductors, namely, silicon. Such materialsinclude the various type of carbon molecules

or other chemical compounds that have favourable electrical characteristics.

The second issue is whether different physics is required to describe device operation.

Carrier transport through a device, with the associated resistance, inductance, capacitance

and current dissipation, is the conventional physics that engineers usually use in designing

electronic circuits. As the devices shrink and quantum effects are dominant, it is plausible

that we take advantage of the quantum effects in building new types of devices that do not

rely on charge transport.

The third issue addresses the device assembly techniques, ranging from lithography to

self-assembly and DNA scaffolding. Each of the assembly techniques has certain limitations.

Lithography is limited optically and is difficult to scale, while self-assembly can provide

circuits with limited topology.

The fourth issue is related to circuit architectures that can be built given the limitations

on the types of devices available and on the device assembly.

5

The fifth issue is fault tolerance. The question is whether a system with a significant

number of defects is still capable of performing its function. Defect-free systems at the

nanoscale are not possible, due to the manufacturing tolerances.

The remainder of this chapter intends to briefly elaborate on these issues and how they

are addressed in the literature.

2.2 Types of materials for nano electronics

The choice of a material for electronic devices can start from examining the properties of

the materials in the periodic table. However, there are few points that must be taken into

consideration.

A conventional CMOS switch is a device, that performs its function based on the ability

to control its conductivity between two states by applying a control (gate) voltage. Silicon

has been the material of choice for such switches because it is a semiconductor that can be

easily obtained in crystalline form and its conductivity can be changed by the introduction

of controlled amounts of impurities. The conductivity of a silicon channel is controlled via

an electric field due to voltage applied on top of the channel separated by an insulator (the

Metal-Oxide-Semiconductor arrangement). In addition to silicon, germanium and galium-

arsenide (GaxAs1−x) are accepted bases for the conventional semiconductor industry. The

metal of choice for semiconductor solutions was aluminium, and then copper, with the advent

of the damascene process. Most of the recent work in the industry has been focused on the

insulator properties. The gate insulation is made from a high-K compound and the insulation

below the metallization is made from a low-K compound.

2.2.1 Carbon in nano electronics

Carbon is as abundant as silicon. However, it was not considered in electronics for a very

long time. The first reason is that till recently the known crystalline form of carbon (the

6

diamond) is hard to manufacture. High pressure and heat could only yield microscopic

diamond crystals. A crystalline Silicon ingot, in comparison, is much easier to produce.

Crystalline carbon, unlike silicon, is also a perfect insulator. This perspective of carbon

greatly changed with the discovery of buckyballs in 1985. Buckyballs are a single molecule

of carbon composed of 60 atoms (C60). This started the investigation of new nano-scale

carbon molecules and resulted in the discovery of carbon nanotubes in 1991 and more recently

graphene which is a single atomic layer of the more common graphite. Carbon nanotubes

are classified into single walled nano tubes (SWNT) and multi-walled nanotubes (MWNT).

They are also classified according to their chirality; zigzag, chiral and armchair. The chirality

of a carbon nanotube depends on the angle on which a planar graphene sheet is rolled to

form the tube. Figure 2.1 shows the structure of the common types of carbon molecules.

Carbon nanotubes are of great interest because it is easy to make them, their properties

are reproducible and because they have interesting electrical properties. One of the most

interesting properties is their ability to sustain ballistic electron transport. A field effect

transistor with the channel made of a carbon nanotube has a gate voltage independent

transport characteristics. The current capability of carbon nanotubes is much larger than

that of metals and it is perfectly resistant to electromigration. This is why nanotubes

are considered as conducting channel in FET like switches and also in interconnects. The

porosity of nanotubes also makes them a desirable candidate for use as electrodes in super

capacitors because of the large surface area exposed to an electrolyte inside the capacitor.

The great potential of carbon nanotubes has led to the study of inorganic nanotubes.

This type of nanotubes is composed of compounds that possess structures comparable to

that of graphite like metal halides, oxides, hydroxides and dichalcogenides [109].

2.2.2 Unimolecular compound materials

Carbon nanotubes and buckyballs are examples of a single molecule structures composed

of atoms of one element. Many research groups are investigating unimolecular compound

7

Figure 2.1: Carbon molecules. Top row: C60 buckyball and graphene sheet. Bottom three:armchair, zigzag, chiral single walled carbon nanotubes. (adapted from [1])

8

materials for use in electronic circuits. The goal is to synthesize a molecule that exhibits two

distinct electrical states (ON/OFF) and it is possible to switch it between these two states in

a circuit. It is desirable that the two states have a very large ratio of conductivity so that it

is easy to distinguish between the ON/OFF states. There are a lot of published experimental

results on such molecules in the literature. One problem with unimolecular switches, that

seldom gets mentioned, is the resilience of such molecules to switching. After a few thousand

state changes, the characteristics of the molecule degrade to the point that the ON/OFF

states are no longer distinguishable. If we assume 1 GHz operation, then the computing

device will fail permanently after a few microseconds. This problem, however does not exist

if we want to program the device only once which can be the case for read-only-memories and

programmable-logic-arrays. A fixed switch can be modeled electrically as a diode. Digital

circuits can not be solely built using diodes because signal regeneration and inversion can not

be achieved by diodes. Unimolecular switches are mainly organic compounds which raises

another question on the temperature stability of such devices [79].

Unimolecular organic compounds are not only candidates as rectifier junctions. Ref-

erence [79] lists resistors, rectifiers, bi-stable switches, capacitors, NDR oscillators, single-

electron transistors (SET), bipolar transistors, and interconnects. A molecular flash memory

device is mentioned in [20]. The characteristics of such uni-molecules are interesting. How-

ever, they operate at temperatures close to absolute zero, are difficult to assemble and lack

long term stability.

2.3 Devices not modeled by conventional charge transport

Classical devices are generally described using current flow equations. Maxwell’s equations

along with charge continuity and charge distribution equations are used to solve for conduc-

tion, induction and displacement currents. A classical device, accordingly, has an associated

I-V characteristic. Quantum effects corrupt the traditional I-V characteristics and lead to

9

difficulties in designing or evaluating the performance of a classical circuit. Quantum effects,

however, can be exploited to produce new types of devices.

2.3.1 III-IV Quantum devices

Quantum devices include quantum wire transistors (QWRTrs), resonant tunneling transistors

(RTDs), single electron transistors (SETs) and various spintronic devices.

These devices utilize quantum transport such as conductance quantization and single

electron tunneling in a double barrier structure. These types of devices are hard to integrate

in a circuit because of assembly problems and low current driving capabilities. Kasai et

al., proposed using such devices in a hexagonal layout based on BDDs [49, 60]. A practical

hexagonal network of nodes representing a BDD is used to demonstrate a working 4-bit ALU

in [155]. Theses circuits require a planar layout and are very prone to noise. We will discuss

in the next chapters, the construction of such layouts with error correction.

2.3.2 Quantum cellular automata

A quantum dot is a physical structure that confines a charge in all spatial dimensions such

that a confined charge cloud can not overcome the potential barrier in all three directions.

The charge (electron) cloud can tunnel between two quantum dots in close proximity if the

barrier is intentionally lowered. Several cells, with each cell composed of two closely packed

quantum dots, can interact with each other by Coulomb (electrostatic) effects. Although

there is no actual charge transport, a change in the state of one cell can propagate through all

the cells in its neighbourhood. This is a form of cellular automata (CA). Such structures are

called Quantum-dot cellular automata (QCA). An example arrangement of QCA is shown

in Figure 2.2.

Quantum-dot cellular automata are distinguished from Quantum cellular automata but

confusingly both have the acronym ”QCA”. In Quantum-dot CA, the only quantum effect in

play is the 3D confinement. There is no quantum computation involved as would be in a true

10

Figure 2.2: Quantum cellular automata arranged as a wire.

quantum system. It is, also, not truly cellular automata, because the only similarity between

Quantum-dot CA and Von Neumann’s cellular automata, in [17], is the propagation of the

excitation/response through neighbouring cells. Quantum-dot CA is arranged to form classic

structures such as majority gates and wires [71, 108, 147]. This classic structure performs

conventional binary computations. Cellular automata, however, implies that the collective

evolution of the system of cells is interpreted as a computation.

Quantum-dot CA are the subject of criticism due to several problems. The first problem

is in the layout of the quantum dots. Cells that are not supposed to interact have to be

far apart. This creates huge gaps in the layout. The second problem is that a multi-phase

clock has to be used in order to induce the interaction across adjacent cells [5]. A quantum

dot grown on a semiconductor substrate looks like a pyramid and the charge is confined at

the tip of the pyramid. This pyramid has a footprint equivalent to several state-of-the-art

conventional transistors [52]. The fourth problem is the stringent manufacturing tolerance,

less than a fraction of an angstrom, required to achieve accurate interaction [74]. The fifth

problem is the operating temperature for such circuits which is very near to the absolute

zero.

2.3.3 Quantum computation

Spintronics are electronics that make use of the quantum spin of electrons as well as their

charge transport. There are several devices described in the literature that exploit spin.

11

One of the spintronic devices is the spin-based quantum computer in solid-state structures1.

A quantum computer can be built using any system that exhibits two distinct quantum

states. Other than the spin of electrons, there is also the polarization of photons and several

other phenomena that can be used to build a quantum device that does not rely on charge

transport. The two distinct states of the system can be used to represent a quantum bit

(qubit). A qubit is not restricted to one or zero as a classical bit but can exist in a state

of superposition of both states. This leads to the power of quantum computers which can

perform calculations on all the possible states of the system simultaneously by exploiting

superposition and entanglement. This approach requires complete quantum analysis of the

system as compared to the classical computation/quantum transport mechanisms in the

previous sections. Simulating quantum computation using a classical computer requires

exponential time. Appendix 7 discusses quantum computation in more detail.

2.4 Device assembly techniques

Conventional device assembly is carried out by lithography techniques. Lithography is a

planar technique that is becoming limited at the nanoscale. This is mainly due to the optical

effects that come into play when the wave length of the light used in processing becomes

comparable to the feature size. A process called optical proximity correction (OPC) is

required in this case or the usage of shorter wavelengthes as in electron beam lithography

or even X-rays. As the cost for lithography becomes increasingly prohibitive and reaches

physical limits, alternative techniques should be considered for assembly. Device assembly

by humans is possible using a scanning tunneling microscope (STM). By varying the electric

field at the tip of the STM it is possible to manipulate individual molecules on a metallic

surface and arrange them in place. Human assembly is used to assemble single devices for

research purposes and even if the process can be automated, it is too slow and cannot be used

1See Appendix 7 for more details.

12

for large scale production. Controlled crystallization/crystal growth can be used to produce

certain structures like quantum dots for example [12] and nanowires [58]. Crystalline growth

is controlled by varying a solution concentration, applied electric field, substrate geometry,

temperature etc.

A different technique for device assembly is using deoxyribonucleic acid (DNA) scaf-

folding. DNA strands bind and fold according to specific rules and thus form in space a

certain geometric structure. If nanoscale components or molecules are attached to locations

on the DNA strands then DNA can be used to organize these molecules into a nanostruc-

ture [40, 103]. The advantages of this technique include the ability to use CAD tools to

automatically generate the DNA sequence that would produce the geometrical structure.

Another advantage is the ability to build 3D structures right from the start without further

processing.

2.5 Circuit architectures and defect tolerance

Given the limitations in device and interconnect assembly, very simple circuit architectures

must be expected. The types of molecules in a unimolecular circuit will most probably be one

molecule and this will limit the types of devices available in a circuit. Errors in self assembly

will lead to high defect rates since the control over the assembly process is diminished. The

challenge is to design circuit architectures that are functional, albeit formed of one or two

types of devices at maximum and contain a large number of defective nodes or interconnects

in the order of 10−2. In conventional technology, defects affect the manufacturing yield. In

nanoscale technology a work around in the circuit design must be incorporated to accommo-

date the inevitable defects. Techniques for fault tolerance are discussed in Chapter 3. The

simplified types of devices and the simplified arrangement direct the researchers towards

simple regular arrays such as the BDD-based hexagonal circuits in Chapter 4 and the nano

FPGA that we discuss in Chapter 6.

13

2.6 Conclusion

In this chapter, we briefly introduced the topic of nano electronics and compared it with

conventional CMOS technology. The reasoning behind the technology migration towards

nanoscale electronics was illustrated. A brief discussion on electronic molecular elements

(namely carbon) and experimental compounds was presented. The chapter also briefly dis-

cussed the issues of device types, device assembly, circuit architectures, fault tolerance and

scalability.

14

Chapter 3

Overview of Fault Tolerance

3.1 Introduction

Techniques to overcome the incorrect operation of circuits have been studied since the time of

the early computers that were built using unreliable components [10,113,148,149]. There is a

renewed interest in fault tolerance for several reasons. One reason is the shrinking of CMOS

devices with the respective shrinking of threshold voltages and voltage supplies which leads to

the situation where circuit operation is greatly affected by noise and probabilistic techniques

become necessary to analyze and enhance the performance of such circuits [11, 91, 92, 119].

Another reason is the investigation of new technologies other than CMOS to build digital

circuits. Such technologies aim to build circuits using molecular devices and self assembly.

The reliability of such molecular devices is projected to be small and without high defect

and fault tolerances, it is not possible to have working circuits from such devices [34,66,67,

81, 119, 131].

Fault tolerant techniques at the circuit level range from simple circuit redundancy to high-

level performance analysis and circuit control [38]. There are also techniques for masking

hardware faults at the software level [57].

Fault tolerance is important in some conventional applications that include critical, long-

life, delayed-maintenance, and high-availability applications. Typical examples for these are

in aircraft control and space applications, where maintenance is not possible, and long-life

availability is required. In new technologies, fault tolerance is important because of the

expected low reliability of nanoscale components, and because of the effects of noise on their

performance due to very low supply voltage levels.

15

3.2 Definitions

The following definitions are generally used in the literature and we repeat them in this

section [39, 136].

Definition 1 Fault is defined as a physical defect, imperfection or flaw that occurs in hard-

ware or software. Faults can result in errors. Faults are caused by specification mistakes,

implementation mistakes, component/manufacturing defects or external factors such as cos-

mic radiation or human error. Faults can be permanent, transient or intermittent [27].

Definition 2 Error is defined as a deviation from correctness or accuracy and is represented

as incorrect values in the system state. Errors can lead to system failures.

Definition 3 Failure is a non-performance of some action that is due or expected.

Definition 4 Defect Tolerance is defined as the ability to operate correctly in the presence

of permanent hardware errors that emerged in the manufacturing process.

Definition 5 Dependability is the ability of a system to deliver its intended level of service

to its users. Its attributes are reliability, availability and safety.

Definition 6 Reliability R(t), is the conditional probability that a system operates without

failure in the time interval [0, t], given that it worked at time 0. Reliability can be increased

by either using reliable components or by using fault tolerance.

Definition 7 Fault tolerance is defined as the development of a system which functions

correctly in the presence of faults. It is achieved by some kind of redundancy and a sys-

tem architecture that allows error masking, fault detection, fault location and recovery or

autonomous repair.

16

Figure 3.1: Dynamic fault tolerant system

3.3 Construction of fault tolerant systems

There are three main system architectures for fault tolerance. These are the static (passive),

dynamic (active) and the hybrid systems [136]. Static (passive) systems do not detect or

perform any action to control the source of the error. Their operation relies on error masking

only. This technique is based on a majority voter as discussed in the next section. Dynamic

(or active) systems use fault detection followed by diagnosis and reconfiguration. Masking

is not used in dynamic redundancy. The errors are handled by actively isolating/replacing

faulty components. Figure 3.1 shows an example of a dynamic (active) fault tolerant system,

consisting of two pairs of modules. Each pair is self-checking and if an error is detected in

the primary pair (A pair), the system switches to the spare (B pair). In hybrid systems,

masking is used to prevent the propagation of errors, while error detection, diagnosis, and

reconfiguration are used to isolate/replace faulty components. All these systems rely on

redundancy, that can be achieved by duplicating resources.

3.4 Fault tolerance via hardware redundancy

Redundancy can serve both defect and fault tolerance. Circuit redundancy is usually con-

structed using an odd number of identical copies of the same circuit (R-fold modular re-

17

Figure 3.2: R-Modular Redundancy configuration

dundancy) and a majority voter. R-fold modular redundancy (RMR) is also referred to as

N-tuple modular redundancy, NMR. In RMR, a group of R modules works correctly if at

least (R+ 1)/2 modules and the majority voter work correctly. This is shown in Figure 3.2.

If R equals 3, this technique is called triple modular redundancy (TMR). The reliability

of such TMR system in terms of probability of failure pf is given as a summation of all the

possibilities that the system will still operate correctly. These possibilities are either all units

are working or one out of 3 is faulty.

R = (1− pf )3 +

(

3

1

)

pf(1− pf)2 (3.1)

In the case where majority voters are also feared to have errors, cascaded modular redun-

dancy is used. Combining the outputs of three TMR units by a majority gate on a second

level and so on in a hierarchy of levels, we obtain Cascaded Triple Modular Redundancy

(CTMR) with increased reliability higher in the hierarchy. An example of CTMR is shown

in Figure 3.3.

NAND multiplexing is another technique proposed by von Neumann in 1956 [145]. This

technique is similar to RMR, but instead of a majority gate, the output is carried on a bundle

of wires. A bundle of N wires for every bit convey its value to the next stage. A multiplex

unit consists of two stages. The first stage is the executive stage, that include parallel copies

of the processing unit. The second stage is the restorative stage, and its function is to reduce

18

Figure 3.3: Cascaded Triple Modular redundancy

the degradation caused by the executive stage. One example is a NAND function with N = 4

is shown in Figure 3.4. Each input and output is repeated 4 times. The first stage is the

executive unit, which is simply the desired function repeated 4 times. Because of errors, the

outputs of the repeated function units may not be the same. The restorative unit takes care

of that. The outputs of the executive stage are duplicated and fed to the restorative stage.

The rectangle U is supposed to perform a permutation of the signal wires such that each

signal from the first group is randomly paired with a signal from the second group in order to

form the input pair of one of the NANDs in the restorative section. There are two groups in

the restorative section to overcome the signal inversion by the first group. The final output

is considered to be 1 if more than (1 − α)N lines are stimulated, and 0 if less than αN

lines are stimulated, where α is a critical level that is predefined (0 < α < 0.5). Anything

in between these two values is undefined, and results in an error. This output result is a

function of the value representation in both input bundles and the gate error probability.

For large values of N , the von Neumann theory states that the output is stochastic with a

19

Figure 3.4: NAND multiplexing scheme for a NAND operation with N = 4

Gaussian distribution. As the NAND gate is universal and can be used to build any logic

circuit, each gate can be replaced by the equivalent executive/restoration blocks shown in

Figure 3.4.

Although modular redundancy (including NAND multiplexing) is still being considered

as a viable method [42, 142], Nikolic et al. argued otherwise [93, 94] because of the huge

redundancy requirement in the order of 103 to 105 for defect rates on the order of 0.01,

which is expected in nanoscale devices.

3.5 Fault tolerance via information redundancy

Error correction coding is an example of information redundancy. Redundant information is

added to enable fault detection and fault tolerance by correcting the affected bits. Informa-

tion redundancy includes repetition codes, parity bits or checksums, cyclic codes, Hamming

20

codes, ...etc. Error coding techniques requires time, hardware and extra storage. This in-

volves tradeoffs in the design and is highly dependent on the system abstraction level at

which coding is to be utilized. At the gate level, coding becomes very expensive but as

the circuit size increases, the cost decreases. One example in [98] describes an abstract

asynchronous cellular array in conjunction with error correction coding. Incorporating error

correction in circuit design, using binary decision diagrams, is discussed in Chapter 4.

Another form of using information is the sparseness of a signal in a certain representation

domain. This is exploited in compressed sensing techniques where information recovery from

a small number of samples is achieved using a greedy algorithm and with the assumption of

great sparseness of the signal in a certain domain [6,37]. There is no literature covering this

specific application of compressed sensing, thus, it is the subject of future work.

3.6 Fault tolerance via probabilistic computing

Probabilistic-based design methodologies are based on Markov random fields (MRF). The

MRF technique is used to express arbitrary logic circuits using interactions between a system

of nodes which correspond to the inputs and outputs of the logic function. A subset of

graph nodes, also called a clique, represents their functional dependency. The computation

proceeds via probabilistic propagation of states through the circuit and a logic function is

correctly evaluated by maximizing the probability of correct state configurations in the logic

network [91, 92].

In the MRF-based model, each input or output is assumed as a random variable (node in

graphical representation), which value varies within the range between 0 V (logic 0) and VDD

(logic 1). That is, instead of a correct logic signal (0 or 1), the MRF model operates with

the probability of correct logic signal. Given the observed logic signal, correct logic values

are those that maximize the joint probability distribution of all the logic variables. The

probability of state at a given node can be determined by marginalizing (summing) over the

21

joint probabilities for the states of neighborhood nodes [133, 153]. Probabilistic computing

trades area and power for noise tolerance. The area is consumed in the probabilistic nodes

and the feedback network that incorporates them.

3.7 Fault tolerance via algorithmic/approximate computing

In some application such as signal processing, graphics and wireless communications, exact

computation is not required [141]. The data processing in these applications involves a lot

of information redundancy which is usually corrupted by significant noise. The processing

uses computations that are statistical, probabilistic or qualitative in nature. This relaxes

the requirement on the numerical exactness of the underlying circuit which is referred to as

a stochastic processor and algorithmic noise tolerance [116, 121]. This means that software

does not really need the hardware to be defect and fault free. The solution lies in the

algorithm being used which can be at the hardware level or the software level. This solution

is not universal because many applications require exact computations.

3.8 Fault tolerance via time redundancy

Time redundancy attempts to reduce the hardware requirement overhead of the other tech-

niques. The extra time is used to repeat a certain computation more than one time. If there

are differences in the results, the computation can be repeated until results match. This can

mask transient faults. For hardware faults, operand coding can be used in conjunction with

time redundancy in order to mask the effect of the faulty blocks. Operand coding include

shifting, complementing and swapping.

22

3.9 Fault tolerance via energy minimization

Neuromorphic models have been reported in [138, 139]. These biologically inspired circuits

utilize the concept of neurons or threshold gates and arrange them in a network. In this

network of nodes, there is a node that represents each of the inputs and the outputs. The

nodes calculate an energy minimization function that converges after multiple iterations.

The weights/thresholds in the nodes are programmed such that any error outcome does

not represent a minimum energy state, and thus, rejected. This approach is fault tolerant

and is resistant to noise. It can be used to implement robust elementary gate functions.

The disadvantage is the complexity of such circuit and the requirement for calculating the

thresholds.

3.10 Fault Tolerance via reconfiguration

Signal routing is one of the techniques that can be used to go around defects. Defect tolerance

is different from fault tolerance. In DRAM, defect tolerance is achieved by having a backup

set of memory cells that are address mapped to the defective cells. Fault tolerance, on

the other hand, is achieved by incorporating error correction algorithms and storing excess

CRC or parity bits. In both cases, the solution to the problem is based on redundancy. If

a digital design is mapped onto a nano FPGA whose bad cells are known then the place

and route tool solves the problem by using the extra available resources while avoiding the

bad marked blocks. In order to detect the bad blocks, all blocks are scanned in a way

similar to a DRAM self scan, and the location of the non-responsive blocks are marked

in a database [28, 78, 135]. The drawback of this assumed method of operation is that it

relies on similarity between circuit blocks, availability of test vectors and wiring resources.

Also a central control and global signal routing is required which introduce complexity to

the system. One solution is to have two types of circuits; a complex microscale circuit for

control and global signal propagation and another simple nanoscale circuit for performing

23

the computations. The two circuits have to be interfaced which introduces another set of

complexities. In [34], seven strategies are outlined to address this problem. The strategies

include lightweight configurable cross points, a reliable support superstructure, individual

wire sparing, M-choose-N sparing on large sets of interchangeable resources, matching to use

wires with defective cross points, transformations to guarantee cross-point sparseness that

matches defect rates, and on-chip test and configuration support.

Reconfiguration in real-time is the technique used in active (dynamic) systems that are

capable of bypassing faulty components as they arise during operation. One example is

in [82] which describes an autonomous system capable of self repair. The system consists of

a coupled pair of FPGAs with built-in soft microcontrollers. Each microcontroller monitors

and assesses the health of the other FPGA and, if necessary, reconfigures it. The health

assessment is based on error detection in each logic function implemented on the FPGA.

3.11 Fault Tolerance via dynamic routing

One of the major tasks in chip design is wire routing. Wires account for most of the perfor-

mance delay in the state of the art technology. Global signals like clocks are usually the most

difficult type of signals, and require synthesis of clock trees and addition of several buffers

and delay locked loops to prevent clock skew. Part of the problem can be alleviated using

asynchronous logic. The problem can be alleviated completely, if it is not necessary to route

wires at all but route packets instead, using an on-chip network [29]. This means that the

iterations between placement and routing to reach timing closures become unnecessary. The

other advantage is the possibility to dynamically avoid bad or defective structures inside the

chip. This is similar to routers on the internet failing, but the connectionless service keeps

performing by finding an alternative path.

As packet routing requires sophisticated macro blocks on the chip, on the other hand,

simple celluar structures are also viable. In such a scheme, each cell is capable of performing

24

a simple calculation, and also route the data. Data routing is simple as the cell needs to

avoid a defective neighbour cell and adjusts the address accordingly. Since the cells are

arranged in grid, the address is simply a number of shifts in the x-direction and the y-

direction. As the data passes by each cell, it decrements the number of shifts required until

it reaches the destination. Assuming that the x-shifts are carried out first, if a cell wants

to avoid a defective neighbour cell, it passes the data to a different row and adjusts the

number of y-shifts accordingly. This solution is simple to implement, and it avoids global

wiring requirements. The handshake between cells can be asynchronous, in order to avoid

synchronized clock signals as well. A fault tolerant cellular structure with six rules was

proposed in [98]. The asynchronous structures are key in avoiding global signals in the nano

device such as clocks and the associated clock trees.

3.12 Performance measures

In the experimental study of fault tolerant models of logic functions, the following metrics

are useful: (a) Kullback-Leibler divergence (KLD), (b) signal-to-noise ratio (SNR), and (c)

bit error-rate (BER).

3.12.1 Kullback-Leibler divergence

Given a stochastic system with a set of known states, let p(x) and q(x) be the probabilities

that a random variable X is in state x under two different operating conditions.

KLD in terms of probability distributions. The Kullback-Leibler Distance or Diver-

gence (KLD) between the two probability distribution functions p(x) and q(x) is defined as

follows [69]:

KLD =∑

States x

p(x) lnp(x)

q(x), (3.2)

where the sum is over all possible states of the random variable X and q(x) plays the role of

a reference measure. If the distributions are the same then the KLD is zero; the closer they

25

are, the smaller the value of KLD.

KLD in terms of mutual information. The KLD can be defined in terms of mutual in-

formation. Mutual information, I(X;Y), between X andY is equal to the KLD between the

joint probability function f(X,Y) and the product f(X)f(Y) of the probability distribution

functions f(X) and f(Y).

KLD in experimental study. In our experimental study of probabilistic models, equa-

tion (3.2) is used, where p(x) and q(x) are the probability distributions of the noise-free

output (ideal discrete signal) and the noisy output (real discrete signal), respectively.

3.12.2 Signal-to-noise ratio (SNR)

The SNR, measured in decibels (dB), is calculated as:

SNR = 10 log10σ2y

σ2e

(dB) (3.3)

where σ2y and σ2

e are the variances of the desired signal y and the noise e, respectively.

3.12.3 Bit error rate (BER)

The BER is the fraction of information bits in error; it is defined as follows:

BER =# errors

Total # bits(3.4)

The number of errors due to signal delay (both rise and fall time) is also considered along

with errors due to bit flips while counting the total error in the output.

3.13 Performance analysis techniques

Performance analysis can be performed analytically or by experiment (simulation) [129].

The analytical methods are usually viable for small circuits, and are used to provide an

insight of the parameters that can be tuned to enhance a system’s performance.

26

Experimental methods are used to implicitly analyze a circuit performance by observing

the results obtained from many simulation runs. This technique in general is referred to as a

Monte Carlo simulation. The simulation relies on random number generators that affect one

or more of the parameters of the system. After conducting many sample runs, a conclusion

is drawn about the behavior of the system.

In the Monte Carlo approach, a subset of states (sample) is randomly chosen from the

set of all possible states. The points in this subset space are simulated, and the ratio of

states with correct behaviour over all the states in the sample is used as an estimate of the

reliability in the complete set. The accuracy (or error bound) of the estimate depends on

the sample size (the number of Monte Carlo iterations).

3.14 Conclusion

In this chapter we gave an overview of the approaches to fault tolerance, and how they

affect a circuit structure. The main techniques include hardware, information and time

redundancy. These types of redundancies are classified as static or of the passive type

where they can only be used to mask errors in the systems but not to diagnose faulty

units. Dynamic or active systems isolate faulty units and use spares via fault detection and

dynamic reconfiguration. Another example is data packet routing that can be a candidate

in replacing the conventional wire routing. The main advantages of this technique is that

global clock signals are not required, and dynamic fault tolerance can be achieved albeit at

a higher system level. Fault tolerance is important in some conventional applications that

include critical, long-life, delayed-maintenance, and high-availability applications. In new

technologies, fault tolerance is important because of the expected low reliability of nanoscale

components, as well as the effects of noise on their performance.

27

Chapter 4

BDD-based Nanowire Error Correcting Circuits

4.1 Introduction

Decision diagrams are an efficient way of representing switching functions. Such diagrams

can be mapped directly to the synthesized circuit by exchanging each switching node with a

multiplexer circuit [16]. A node in a binary decision diagram is equivalent to a multiplexer,

as shown in Figure 4.1. The cost of implementing a multiplexer circuit is quite low in certain

technologies, in particular, it requires a couple of pass transistors in CMOS technology. In

[49,60,155], a mapping of binary decision diagrams to nano-scale technology was introduced

through the hexagonal BDD quantum node devices. Correct operation of such devices at

nanoscale requires mitigation of two distinct sources of faults. The first source is noise, as

the signal levels are extremely low. The second source is incorrect switching at the nodes or

missing wiring due to defects [67].

Recently, error-correcting techniques have been revived, in particular, Astola et.al. [7]

suggested incorporating block error correcting codes in decision diagrams. However, this

approach has not been implemented at circuit level. The advantage of using the block error

correcting codes is that the code rate is usually high, which translates to a small constant

overhead in designing the circuit. The second advantage is that such systems can cope well

with any types of the aforementioned faults.

In this chapter, we present the results of incorporating the error correction in a pass-

transistor based BDD circuit, as well as simulation of these circuit behaviour under the

effect of both noise and random signal propagation errors. The structure of the circuits

corresponds to the hexagonal BDD quantum nanowire devices [155], thus, the next step is

manufacturing the error-correcting BDDs on these nanowire devices.

28

S

Figure 4.1: A BDD node is equivalent to a 2× 1 multiplexer

4.2 Background

A BDD is a rooted directed graph, derived from a binary decision tree, representing a logic

function via Shannon expansion, f = xifxi=0 ∨ xifxi=1, where fxi=0 is the function after

substituting the constant zero value for all the occurrences of the variable xi, and fxi=1 is

the function after substituting the constant one value for all the occurrences of the same

variable. A BDD is ordered (OBDD) if on all paths through the graph, the variables respect

a given linear order x1 < x2 < ... < xn. Reduction rules are used to reduce the OBDD size;

in terms of the number of nodes, such that it becomes canonical and more compact than

the representation by a full binary tree [16]. There are two reduction rules that are applied

recursively to a decision tree. The first rule is to merge any two nodes that are terminal

and have the same label, or are internal and have the same children. The second rule is to

remove any internal node that has the same (if, then) children, and route its incoming nodes

to its child node. The result of the reduction depends on the order of the variables. In this

chapter we use the term BDD to refer to the reduced ordered BDD.

BDDs are easily mapped into technology, since the layout of a circuit can be directly

determined by the structure of the BDD, and each node is substituted by a 2-to-1 multiplexer

circuit. In conventional CMOS technology, the implementation cost is low, if the multiplexers

are realized as pass-gates (as shown in Figure 4.2). Without level restoration, a pass-gate

CMOS design requires just a pair of pass transistors.

At the nanoscale level, BDD quantum nanowire devices have been manufactured at the

29

Figure 4.2: Implementation and Simulation models of a BDD node: two NMOS transistors,two transmission gates, and bi-directional hysteresis switches

Research Center for Integrated Quantum Electronics at Hokkaido University [49, 155]; a

fragment of such a circuit is shown in Figure 4.3. In Figure 4.3, the control voltages and

their complements represent the binary variables, and they are used to direct the messenger

electron along a specific path by lowering the barrier for electron tunneling in one direction

only. The wrap gate device (WPG) represents the tunneling site for the electron.

Correct operation of such devices at the nanoscale requires mitigation of two distinct

sources of faults. The first source is the incorrect switching at the nodes due to defects. The

second source is noise, as the signal levels are extremely low. In the remainder of this chapter

we investigate noise tolerance at the switching nodes using error correction techniques.

Such models borrow some ideas from communication theory, in particular, the use of the

block error correcting codes. In such codes, the code rate is usually high, which translates to

a small constant overhead in the circuit design. The second advantage is that such systems

can cope well with signal propagation errors and noise.

30

Figure 4.3: BDD Node Circuit using Hexagonal Nanowire controlled by WPG (from [155]with permission from the second author).

4.3 Gate reliability without error correction

Gate reliability is defined as the probability that the gate will correctly perform its operation.

In other words,

R = 1−EP

where EP is the probability of error. There are two sources of error. The first source is the

gate error (εg). The gate error effect is modeled as the gate itself, followed by a probabilistic

inverter. This model is shown in Figure 4.4.

x 0 0

1 1

y

Noise-free gate

Channel probabilistic

model

Noise-free output Noisy output

Figure 4.4: Probabilistic Output Error model for a NAND gate

This means that the reliability of any gate as a function of the gate error will always be

given as Rgate = 1− εg, regardless of the gate type.

31

The second source of error is due to noise superimposed on the input signal resulting

in a wrong (inverted) interpretation of the input signal. The reliability of the gate in this

case will depend on the number of inputs of the gate and the gate truth table [24, 65]. The

reason for dependence on the truth table is that not all errors in the inputs will result in an

incorrect output result. We need to consider only errors that result in the output changing

from 0 (1) to 1 (0). To account for error at the gate inputs, we denote the probability that

an input signal is erroneously inverted as εi, and the probability for it to stay correct as

1 − εi. The effect on the output of the gate has to be derived in accordance with the truth

table of the function. This type of analysis was first studied in [96, 97], and has since been

revisited multiple times due to renewed interest in reliability calculations [44].

If we assume that the probability that the first input takes the binary value 1 is given

by X1 and the probability that the second input is 1 is given by X2 then we can define the

probabilities of each pattern in the truth table as shown in Table 4.1.

Input Probability

00 P00 = (1−X1)(1−X2)01 P01 = (1−X1)X2

10 P10 = X1(1−X2)11 P11 = X1X2

Table 4.1: Input probabilities for a 2-input gate

For a buffer gate, erroneous inversion of the input with a probability ε results in an error

probability at the output to be ε. The same argument can be applied to the inverter and,

therefore, the error probability of a buffer/inverter due to input inversion is given by:

EPbuffer/inverter = ε (4.1)

In the case of a NAND gate, the ouput is not affected if one of the inputs stays at 0,

regardless whether the other value is correct or not. The probability to get an incorrect

output from a NAND gate, when the inputs are 00, is (1 − X1)(1 − X2)ε1ε2. This means

32

that we are measuring the probability that both inputs are erroneously inverted (changed to

’1’), which will result in the output going from the correct value of ’1’ to the incorrect value

of ’0’. Since all the four events are assumed to be independent, the probability of this error

event is the product of the individual probabilities of the signal values and their erroneous

inversion. Note that the exact same argument can be applied to an AND gate because the

output values are the exact inverse of the NAND gate. Therefore, we compute the error

probability due to erroneous input inversion for the elementary gates in pairs; AND/NAND,

OR/NOR, XOR/XNOR. The total error probability of an AND/NAND is given by:

EPAND/NAND = (1−X1)(1−X2)ε1ε2

+ (1−X1)X2ε1(1− ε2)

+X1(1−X2)(1− ε1)ε2

+X1X2(ε1 + ε2 − ε1ε2)

= X1ε2 +X2ε1 + (1− 2X1 − 2X2 + 2X1X2)ε1ε2

(4.2)

The last term in the equation; (ε1 + ε2 − ε1ε2), corresponds to the union probability ε(1or2)

representing a change in the first or the second inputs, which will lead to change in the output

from the correct value 0 to the erroneous value 1 for this input. Assuming ε1 = ε2 = εi, and

X1 = X2 = 0.5, the gate reliability is given by:

RNANDinputs= 1− EPNANDinputs

= 1− εi + 0.5ε2i (4.3)

Similarly, from the truth table of the OR gate, we find that if the inputs are 00 then the

probability to get an error in the output arises from erroneously inverting either of the inputs.

This can be written as (1−X1)(1−X2)(ε1 + ε2 − ε1ε2). Continuing the same argument for

the remaining 3 rows of the truth table, the total error probability of an OR/NOR gate is

33

given by:

EPOR/NOR = (1−X1)(1−X2)(ε1 + ε2 − ε1ε2)

+ (1−X1)X2(1− ε1)ε2

+X1(1−X2)ε1(1− ε2)

+X1X2ε1ε2

= (1−X1)ε2 + (1−X2)ε1 + (2X1X2 − 1)ε1ε2

(4.4)

For the XOR/XNOR gates, to get an incorrect value at the output from the first row in

the truth table, either value should be in error. However, a double error does not result in

an incorrect output. This is expressed as (1− ε1)ε2 + ε1(1− ε2), which can be simplified to

ε1 + ε2 − 2ε1ε2. The same can be said for the remaining three rows of the truth table and,

therefore, the error probability due to input signal erroneous inversion is given by:

EPXOR/XNOR = (1−X1)(1−X2)(ε1 + ε2 − 2ε1ε2)

+ (1−X1)X2(ε1 + ε2 − 2ε1ε2)

+X1(1−X2)(ε1 + ε2 − 2ε1ε2)

+X1X2(ε1 + ε2 − 2ε1ε2)

= ε1 + ε2 − 2ε1ε2

(4.5)

If, again the input probabilities are taken all as equal to 0.5 then the reliability of the

XOR due to error in the inputs is given by:

RXORinputs= 1− 2εi + 2ε2i (4.6)

Table 4.2 summarizes these results.

To account for both error in the inputs and the gate [86], we define the total error

probability as follows:

EPtotal = EPinputs(1− EPgate) + (1− EPinputs)EPgate

= EPgate + (1− 2EPgate)EPinputs

(4.7)

34

Gate Reliability (R)

BufferInverter

R = 1− εi

ANDNAND

R = 1− [X1ε2 +X2ε1 + (1− 2X1 − 2X2 + 2X1X2)ε1ε2]

ORNOR

R = 1− [(1−X1)ε2 + (1−X2)ε1 + (2X1X2 − 1)ε1ε2]

XORXNOR

R = 1− [ε1 + ε2 − 2ε1ε2]

Table 4.2: Gate reliability given the input error probability and the input signal probabilities

4.4 Probabilistic error model in a binary decision diagram

In the previous section, we stated that the first source of error is the gate error (εg). The

gate error effect is modeled as the gate itself followed by a probabilistic inverter. This means

that the reliability of any gate (as a function of the gate error) is expressed as Rgate = 1−εg,

regardless of the gate type. The same argument is applied to a BDD representation of a

gate.

x 0 0

1 1

y

Noise-free node

Channel probabilistic

model

Noise-free output Noisy output

S

Figure 4.5: Probabilistic output error model for a node in a BDD.

The output probability of a switching function is defined as the probability that the

function will assume the value 1 given that each of the input variables are assigned the value

1. This definition and subsequent analysis is independent of the circuit realization of the

function [97].

35

The BDD realization of the function is advantageous in evaluating the probability of the

function, because it can be computed directly by following every path from the root node to

the terminal node 1 [152].

As an example, consider the diagram in Figure 4.6.

x2

x1

x3

f

S

S

S

10

Figure 4.6: Example BDD for probabilistic calculation

Let the input probabilities be X1, X2 and X3. In a bottom-up approach, we assume that

the probability of the constant terminal node 1 is 1.0. For the terminal node 0, the probability

is 0.0. Thus, the probability at the lower decision node is X3 × 1 + (1 −X3)× 0 = X3. At

the middle decision node, the probability is X2X3 + (1 − X2). And, at the root node the

probability is:

P (f) = X1X3 + (1−X1)[X2X3 + (1−X2)]

Given the input probabilities: X1 = X2 = X3 = 0.5 the probability of the output being

1 is P (f) = 0.625.

Errors of the output value may happen for variable reasons, such as:

1. Stuck-at-fault errors.

2. Probabilistic inversion of the inputs due to faults or noise.

3. Probabilistic inversion of the output of some of the gates due to faults or noise.

36

The ability to detect stuck-at-fault errors can be quantified and enhanced as a direct

consequence of the output probability calculation in terms of input probabilities. We can

manipulate the input probabilities in a way to increase the possibility of detection of stuck-

at-fault errors [97]. The study of the effect of the input signal inversion is independent of

the circuit realization of the function, and only depends on its truth table as some error

combinations may not affect the correct output of the function. It may, however, depend

on the circuit structure in case of long fan-in wiring that may experience different values of

noise along different segments of the same wire.

When we consider errors in the output due to probabilistic inversion of the gate output,

analysis of the circuit structure and gate realization becomes necessary. If the structural

unit in our circuit is a BDD node, then the binary symmetric channel noise model can be

considered at each node as shown in Figure 4.5.

Let the output of a single BDD node be f , and after the probabilistic inverter representing

the error, the output is f ′ where:

f ′ =

f with probability 1− εf ,

1− f with probability εf .

The error probability (EP) of the inputs at the multiplexer (BDD node) inputs is represented

by the following equation, which takes into account the propagation of errors through the

diagram.

EPf = (1−Xi)EPf ′

0+XiEPf ′

1

EPf ′ = (1− εf)EPf + εf(1− EPf) (4.8)

where EPf ′

0and EPf ′

1represent the error probabilities arriving at the multiplexer node

from its child faulty nodes. The output of these faulty nodes is assumed to be f0 and f1 and

after their respective probabilistic inverters, the outputs become f ′0 and f ′

1. Probability of

error at the BDD terminal nodes is always assumed to be 0. HereXi stands for the probability

37

that the input control signal xi = 1. It is only useful to consider error probabilities in the

range [0, 0.5], then, if the value of EPf ′

1is greater than 0.5, we shall subtract it from 1 in

accordance with the binary symmetric channel model.

The symbolic computation of the error in Figure 4.6 proceeds from bottom to top as

follows. At the bottom node:

EP (fX3) = X3 × 0 + (1−X3)× 0 = 0

EP (f ′X3) = (1− ε3)× 0 + ε3 × (1− 0) = ε3 (4.9)

At the middle node:

EP (fX2) = X2 × ε3 + (1−X2)× 0 = X2ε3

EP (f ′X2) = (1− ε2)X2ε3 + ε2(1−X2ε3) = X2ε3 − 2X2ε2ε3 + ε2 (4.10)

At the top node:

EP (fX1) = X1ε3 + (1−X1)(X2ε3 − 2X2ε2ε3 + ε2)

= X1ε3 +X2ε3 +−2X2ε2ε3 + ε2

−X1X2ε3 + 2X1X2ε2ε3 −X1ε2

(4.11)

and EP (f ′X1) = (1− ε1)EP (fX1

) + ε1(1− EP (fX1)).

If we assume ε1 = ε2 = ε3 = ε then the expression can be simplified to:

EP (f ′X1) = (2 +X2 −X1X2)ε− 2(1 +X1 +X2 − 2X1X2)ε

2 + 4X2(1−X1)ε3 (4.12)

This analysis can be performed in a bottom-up approach as the function probability

analysis. It also requires only one traversal and also can be calculated during the construction

of the BDD itself. The results are valid in the case of shared BDDs.

A BDD shown in Figure 4.7, for a buffer, is a single node connected directly to the

terminal nodes, the calculation starts by considering the probability of error at the terminal

38

node to be exactly EPS = X1 × 0 + (1 − X1) × 0 = 0. Therefore, the error probability at

the output of the buffer node due to the node error is then expressed as EPS′ = (1 − ε) ×

0 + ε× (1− 0) = ε.

RBDDbuffer= 1− EP = 1− ε (4.13)

Thus, its reliability is the same result as we obtained for the buffer/inverter gate.

x

f

S

0 1

Figure 4.7: BDD of a buffer

x1

x2

f

S

S

10

Figure 4.8: BDD of a 2-input NAND gate.

For a single NAND gate representation shown in Figure 4.8, the probability of error of

the lower node is the same as the one we found from the analysis of the buffer/inverter:

ε. The reliability at the top (output) node is calculated in two steps. We assume that the

input signal probability at the top node is X1, and it is X2 at the bottom node. The error

39

probability at the bottom node is equal to ε and is independent of the signal probability X2.

At the top node,

EPSx1 = X1 × ε+ (1−X1)× 0 = X1ε.

where Sx1 represents the output of the switching node controlled by the variable x1.

Then,

EPS′

x1= (1− ε)×X1ε+ ε× (1−X1ε) = (1 +X1)ε− 2X1ε

2.

where S ′x1 represents the output of the node after the probabilistic inverter that models its

error probability. Thus, the reliability of a NAND gate implemented as a BDD is expressed

by:

RBDDAND/NAND= 1− EP = 1− (1 +X1)ε+ 2X1ε

2 (4.14)

The reason why the results are the same for both the AND and the NAND functions, is

because their BDDs are different in the values of the terminal nodes only. The switching

pattern (dependent on the input signals) that defines a path from the terminal nodes to the

root note are the same for both BDDs.

Figure 4.9 illustrates the different reliability calculations for a NAND gate by means of

a Monte Carlo simulation that takes into account either gate error or input error, given the

input probabilities to be X1 = X2 = 0.5. This yields R = 1 − 1.5ε + ε2, which is used to

plot the theoretical value in the figure. The results are the same for the AND/NAND, and

OR/NOR gates. This should be expected, as the BDDs for all 4 gates have similar structure

in terms of the number of nodes and their connectivity.

For the OR/NOR, at the top node of the diagram, the error probability is

EPSx1 = (1−X1)× ε+X1 × 0.

Then,

EPS′

x1= (1− ε)× (1−X1)ε+ ε× [1− (1−X1ε)].

RBDDOR/NOR= 1− EP = 1− (2−X1)ε+ 2(1−X1)ε

2 (4.15)

40

For the special case of X1 = X2 = 0.5, the result is the same as before for the NAND

gate with R = 1− 1.5ε2 + ε2.

The XOR/XNOR diagrams, on the other hand are composed of 3 nodes and have a

different result. At the top node,

EPSx1 = X1 × ε+ (1−X1)× ε = ε.

Then,

EPS′

x1= (1− ε)× ε+ ε× (1− ε) = 2ε− 2ε2.

This result is completely independent of the control signals. Thus, the reliability is

expressed as:

RBDDXOR= 1− EP = 1− 2ε+ 2ε2 (4.16)

Table 4.3 provides a summary of the reliability functions of the various elementary gates

implemented using BDDs, given the error probability ε in the range between 0 and 0.5.

0 0.1 0.2 0.3 0.4 0.50.4

0.5

0.6

0.7

0.8

0.9

1Reliability of BDD NAND gate

Probability of Error

Rel

iabi

lity

Gate Error SimulationTheoretical gate errorSingle Gate ErrorInput Error SimulationTheoretical Input error

Figure 4.9: Reliability of a 2 input NAND gate implemented as a BDD

41

Gate BDD Reliability (R = 1− EP )

BufferInverter

x

f

S

0 1

R = 1− ε

ANDNAND

x1

x2

f

S

S

0 1

R = 1− (1 +X1)ε+ 2X1ε2

ORNOR

x1

x2

f

S

S

10

R = 1− (2−X1)ε+ 2(1−X1)ε2

XORXNOR

x1

x2

f

S

S S

0 1

R = 1− 2ε+ 2ε2

Table 4.3: Reliability of the gates implemented using BDDs, given the input probabilitiesare X1 and X2.

42

4.4.1 Input Error Probability and SNR

For simulation purposes, we add noise to the input signals (according to a certain SNR

value or noise power), simulate the circuit for a long simulation run, observe the outputs

and count the errors to obtain a reliability measure. The relationship between the input

error probability and the SNR of a Binary Non-Return to Zero (On-Off keyed) input signal

is given by [106]:

Pe = Q

(

d

2σ

)

(4.17)

where Q(x) = 12erfc( x√

2), and d is the separation between the ’1’ value and the ’0’ value.

The noise power is defined in terms of the noise power spectral density by the following

equation:

σ2 =No

2(4.18)

The average energy per bit is given as:

Eb =Ta

Ta + Tb

∫ Ta

0

V 2a dt+

TbTa + Tb

∫ Tb

0

V 2b dt

=V 2a T

2(4.19)

where Va is the amplitude of the ’1’ value, Vb is the amplitude of the ’0’ value. Here, we

assume Vb = 0 and T is the bit period such that Ta = Tb = T . Below, we will use a

normalized value for the bit period. Since Va is the separation between the ’1’ and ’0’ values,

then, in equation (4.17) the value of d is Va. Using equation (4.19), d =√2Eb. Substituting

it in equation (4.17) we obtain:

Pe = Q

( √2Eb

2√

No/2

)

= Q

(

√

Eb

No

)

(4.20)

The value Eb/No is equal to the SNRbit, which is the same as the SNR, if there is no

modulation or no combination of multiple bits per transmission symbol. To find the noise

43

SNR(dB) Pe Noise Power (σ2)

-1.5 0.2 0.03180 0.1587 0.02252 0.1040 0.01423 0.0789 0.01134 0.0565 0.00905 0.0377 0.00717 0.0126 0.00459 0.0024 0.002810 0.0008 0.002212 0.0000 0.0014

Table 4.4: Probability of error vs SNR, and the value of the noise power for VDD = 0.3V

power in terms of SNR, we use equation (4.17).

√2Eb

2σ=

√SNR

σ2 =E

2

1

SNR(4.21)

Figure 4.10 shows the probability of error as a function of the SNR. It follows from the

graph, that a probability of error of 0.1 is due to an SNR of just 2dB. Around SNR of

12dB, the probability of error is almost non-existent. The Monte Carlo (MC) simulation is

carried out by means of a normally distributed random variable with zero mean and standard

deviation given by equation (4.21). The value of the random variable is added to the value of

the signal and then a hard decision decoding (using a threshold at the middle of the voltage

range) is used to evaluate whether the noisy bit voltage corresponds to ’0’ or ’1’.

The process is repeated a large number of times (the number is increased until the results

from multiple simulations converge). The number of correct (incorrect) bits is averaged

to obtain a measure of error probability (EP). We use this type of simulation in all the

reliability estimates. Table 4.4 shows the list of SNR values and the corresponding noise

power, used in our calculations. The noise power is calculated based on the assumption that

V (1) = VDD = 0.3V , and the bit time is normalized.

44

−2 0 2 4 6 8 10 1210

−4

10−3

10−2

10−1

100

Probability of Error vs SNR

SNR (dB)

Pe

Monte Carlotheory

Figure 4.10: Probability of Input error vs Input signal SNR

4.5 Error-correction coding

In this section, we investigate enhancing the circuit reliability via error correction. A block

code, denoted as (n, k) with n > k, consists of code words of length n digits that map to

a smaller set of words of k digits. The code rate is defined as kn, and is always less than

1. A block code is linear, if the modulo-2 sum of any two codewords is also a codeword.

The Hamming distance between two codewords is the number of digits in which they differ.

This can be used in either error detection, or error detection and correction, if the minimum

Hamming distance gives enough separation to determine which codeword is the most likely

one. If the minimum Hamming distance is 3, then it is possible to correct one error and detect

2 errors. Codewords are constructed using a generator matrix, and the original codeword is

restored after errors using a parity check matrix (see [87, 106]). The Hamming code is one

of the block codes that have a minimum Hamming distance of 3. For Hamming codes, the

number of parity bits is given as m and the code word length and message length are given

45

by: n = 2m−1 and k = 2m−m−1. This introduces a family of Block codes, Hamming(7,4),

Hamming(15,11), Hamming(31,26) . . . etc. The parity check matrix for the Hamming code

can be constructed easily by having each column in the matrix to represent a number from

1 to n in m− bit binary representation. For example, given m = 3 (Hamming(7,4), we write

the numbers from 1 to 7 in 3-bit binary to generate H .

H =

1 0 1 0 1 0 1

0 1 1 0 0 1 1

0 0 0 1 1 1 1

The systematic form of the generator and the parity check matrices can be written as

G = [P |Ik] and H = [Im|−P T ], where Ik is a k×k Identity matrix. Therefore, we rearrange

the matrix H to bring the columns with one bit set to the left. These columns correspond

to the values 1, 2 and 4. Thus, H has the Identity matrix in the m left most columns as

shown below:

H =

1 0 0 1 1 0 1

0 1 0 1 0 1 1

0 0 1 0 1 1 1

The corresponding parity matrix is given by:

P =

0 1 1

1 0 1

1 1 0

1 1 1

This parity matrix is generated and the generator matrix is constructed as follows: G =

[P |I4]. It is possible to form different generator matrices by means of arbitrary addition of

different rows, and arbitrary ordering of the rows. The properties of the code will stay the

same.

46

To generate a code word, we multiply a data word ”a” of k−bits by the generator matrix

as follows:

c = aG = [aP |a]

Therefore, a code word consists of parity bits (aP ), and the message bits a. This overhead

in the number of bits is linear, and the code rate r = k/n approaches 1 as both k and n

increase.

4.5.1 Shortened codes

The Hamming codes are defined in terms of the parity bits, such that the code is given as

(2m − 1, 2m − m − 1). For arbitrary message lengths, we can use the next higher value of

2m−m−1 as the message length, and encode all the extra bits as zeros. The position of the

extra zeros is arbitrary, and we can, thus, truncate any rows of choice in the parity matrix.

For example, if we want to define a Hamming code for a message of length k = 2, we start by

choosing the nearest Hamming code: Hamming(7,4). Next, we shorten the code by encoding

each message in the form [00a1a0], or [a1a000], or [0a1a00], or [a10a00]. This will effectively

remove two rows from the parity matrix making it P2×3. Two rows and two columns will

be removed from the generator matrix, where the identity matrix will lose two columns, so

that G = [P2×3|I2]. This way we obtain the code words of the shortened Hamming code

Hamming(5,2). This code is still capable of correcting only 1 error. The number of vectors

that the code can correct is 2k(1 + n). However, the number of vectors in the code space is

2n. This means that a shortened Hamming code is not a perfect code as it does not map

2n − 2k(1 + n) input vectors. In the case of the non-perfect Hamming(5,2), the number of

non-mapped vectors is 32 − 4 × 6 = 8. These non-mapped vectors can be dealt with by

assigning them to an error indicator or by assigning them to one of the nearest code words

according to the following definitions [87].

47

Definition 8 Amaximum-likelihood error-correcting decoder is a decoder that, given

the received word r, selects the code word c which minimizes the Hamming distance dH(r, c).

Definition 9 A bounded-distance error-correcting decoder is a decoder that can se-

lect the correct codeword, if the number of errors is dH(r, c) ≤ t. Otherwise, it signals the

decoder failure.

The maximum likelihood decoder uses a standard array to achieve complete decoding.

The standard array for the shortened code H(5, 2) with parity matrix [ 1 1 01 0 1 ] is shown in

Table 4.5. The standard array is constructed by writing the code words (n-bit 2k words) in

the first row. Then rows 2 to 6 in the first column, we have the least weight error vectors

(weight = 1). In rows 7 and 8 we put the next higher weight (weight = 2) error vectors. We

choose these particular vectors because the values in them are not any where in the rows 1

to 6. A bounded distance error-correcting decoder only uses rows 1 to 6 while a maximum

likelihood decoder uses the whole standard array that covers all the 2n possibilities. It can

decode any single bit error, or 2 patterns of double bit errors. The standard array for the

shortened code H(6, 3) with parity matrix [1 1 01 0 10 1 1

] is shown in table 4.6. The rows 2 to 7

in the first column, we have the least weight error vectors (weight = 1). The remaining

possibilities are 26 − 23(1 + 6) = 8 which is enough for only one extra row. The pattern in

the last row with double errors is again chosen based on uniqueness.

4.6 BDD model with error correction

Given a function f of k variables, its BDD includes k levels. If a switching error happens

at a BDD node, it cannot be corrected. The corresponding error-correcting BDD can be

designed as follows: a binary code (n, k) is constructed, and the function f is mapped into a

function f ′, which is implemented using another BDD with n levels [7]. Note that we are not

interested in mapping a code word c into an original message word a, but rather in mapping

48

00 01 10 11row 1 00000 10101 11010 01111row 2 00001 10100 11011 01110row 3 00010 10110 11000 01101row 4 00100 10001 11110 01011row 5 01000 11101 10010 00111row 6 10000 00101 01010 11111row 7 00110 10011 11100 01001row 8 01100 11001 10110 00011

Table 4.5: Standard decoding array for Hamming(5,2) shortened code

000 001 010 011 100 101 110 111row 1 000000 011001 101010 110011 110100 101101 011110 000111row 2 000001 011000 101011 110010 110101 101100 011111 000110row 3 000010 011011 101000 110001 110110 101111 011100 000101row 4 000100 011101 101110 110111 110000 101001 011010 000011row 5 001000 010001 100010 111011 111100 100101 010110 001111row 6 010000 001001 111010 100011 100100 111101 001110 010111row 7 100000 111001 001010 010011 010100 001101 111110 100111row 8 001100 010101 100110 111111 111000 100001 010010 001011

Table 4.6: Standard decoding array for Hamming(6,3) shortened code

49

it to the target value of the binary function (c → f(a)). This means that no decoding is

required, and the code words are mapped to binary 0 or 1.

Applying the theory of block codes in this context is straight forward. Cyclic codes in

general have the advantage over block codes of easier encoding and decoding which is not

our target. Convolutional codes are suitable for continuous data streams and they target

retrieving the message. This is against the requirement that groups of binary messages

representing the function inputs are applied one at a time. There is also no direct relation

between BDDs and trellis decoders. Dealing with input streams, would require a new type

of decoder that generates a corresponding correct output stream. It will not be a BDD, but

a trellis decoder.

Hamming codes are a subset of block codes and they have the desired property of cor-

recting a single error using the minimum number of extra parity bits. They are suitable for

small binary messages with minimum coding overhead. In this study, Hamming codes are

used. This means that the new BDD will be able to withstand a single decision error due

to signal noise. In an ordered BDD composed of branches, only one branch is used a time.

A variable appears on a branch only once. A branch will have just one node controlled by

this input variable and, therefore, error in all other branches is irrelevant. The fault model

assumes that the errors in the inputs are independent. This is not realistic, if the error is

caused by noise which may affect the closely packed levels of nodes. It is realistic, if the

source of error is the switching node itself when an electron would tunnel across the wrong

barrier in a wrap gate device due to manufacturing tolerance and temperature sensitivity.

The probability, that this would happen in more than one gate in a branch simultaneously,

is assumed to be low.

A shortened Hamming code will be used in the case of an arbitrary number of inputs

that cannot be written as 2m−m− 1. With shortened codes, a number of input vectors will

have an undefined target as we mentioned earlier. We choose to map these vectors to the

50

binary value ’0’, and optionally have an extra error indicator as shown in Figure 4.11.

x1

p0

p1

p2

x2

f Unmapped

S S

SS

SS

S

S

S S

SS

1

S

0

SS

Figure 4.11: Error-correcting NAND gate BDD with indicator for unmapped vector values.

Consider the implementation of a Buffer gate, using a one-level decision diagram. This

function of one variable corresponds to the (3, 1) error-correcting code, in which 0 is encoded

by 000, and one is encoded by 111. The decoder of such code represents a majority-vote

function: it decodes the received codewords 000, 001, 010 and 100 as 0.

A generic 2 input binary function augmented with parity bits using the code (5,2) is

shown in Figure 4.12 as a diagram with multiple values in the terminal nodes. The value ’E ’

represents the error value for the non-mapped vectors in the shortened code. This decision

diagram can be the basis for generating the error-correcting binary decision diagram of any

elementary gate by replacing the terminal nodes with the values ’0’ and ’1’ and merging the

diagram nodes accordingly. Generation of the parity bits is done by multiplying the message

bits by the parity matrix (aP ), using modulo-2 addition. Since this operation is exclusively

51

binary, we can use a BDD to generate the parity bits from the message bits instead of

explicitly carrying out the matrix multiplication. For the shortened code (5,2), the parity

generation diagram is shown in Figure 4.13 for a parity matrix given as P =(

1 1 01 0 1

)

.

Figure 4.12: An error-correcting multi-valued decision diagram for a generic 2-input func-tions. In binary representation, the values of the terminal nodes are 0 or 1, and the nodesare merged accordingly.

Table 4.7 illustrates the error-correcting diagrams of the elementary gates. To generate

the diagrams in this table, we used a custom tool that can be downloaded from the author’s

website1. The tool uses a slightly modified CUDD package [126]. The modifications are

in the output functions; dumpDot and dumpBlif . The first modification is to optionally

change the node labels in the dot file from hexadecimal numbers to a literal constant, which

is more appealing visually. The original dumpBlif function does not output the correct

input variable order, if variable reordering has been called. Since the output blif is used to

construct the structure of the BDD in memory for planarization, and in order to simplify the

code so that it does not require a separate variable order file, the modification was made.

1http://people.ucalgary.ca/~tsemoham/bdd

52

x1

x2

p2 p1

S

p0

S

SS

01

Figure 4.13: A parity bit generator for the shortened Hamming(5,2).

The upper limit on complexity with block codes is doubling the number of the circuit

elements while achieving the coding gain. The doubling is an upper limit based on the

code rate which approaches 1/2 when the message length is very large according to the

reference [105].

The increase in the BDD size will be of linear complexity. In the worst case, however,

the unreduced BDD has a size that is exponential in the number of variables. Augmenting

the function variables by the parity bits will result in an exponential increase in complexity,

unless function decomposition is considered. Consider the error-correcting diagram of a 2-

bit adder implemented using the standard Hamming(7,4) shown in Figure 4.14. The adder

adds the 2 binary words a1a0 and b1b0. The original shared diagram has 11 nodes, while the

error-correcting shared diagram has 61 nodes. Another alternative is to replace each node

by 4 nodes, representing the noise tolerant diagram of the buffer/inverter circuit, then the

number of nodes is 4 × nnodes. However, the number of diagram levels becomes 3 × k. The

block coding theory, however, tells us that repeating each bit n-times is inferior to block

coding. In the next section, we will elaborate further on this statement.

53

Gate BDD Errorcorrecting BDD

Buffer/Inverter x

f

S

0 1

p0

p1

x

f

S

S S

S

1 0

AND

x1

x2

f

S

S

0 1

p1

p0

x1

p2

x2

f

S

S S

S

0

S

S S

S

1

OR

x1

x2

f

S

S

10

x1

p1

p0

p2

x2

f

S

SS

S

1

S

SSS

SS

0

54


NAND

x1

x2

f

S

S

10

x1

p1

p0

p2

x2

f

S

SS

S

1

S

SSS

SS

0

NOR

x1

x2

f

S

S

0 1

p1

p0

x1

p2

x2

f

S

S S

S

0

S

SS

S

1

XOR

x1

x2

f

S

S S

0 1

p0

x2

p2

x1

p1

f

S

S S

S S S

SS SSS

0

S

1

S

55


XNOR

x1

x2

f

S

S S

0 1

p0

x2

p2

x1

p1

f

S

SS

S SS

S

0

SS SS

SS

1

Table 4.7: Error-correcting BDDs of the elementary gates

4.7 Reliability of the error-correcting BDD

The usage of error-correcting diagrams and their ability to correct single errors should reflect

positively on the reliability of the gate. We found that the theoretical estimate of reliability

in [8] does not match the results from a Monte Carlo simulation. For an error-correcting

buffer gate shown in Table 4.7, we can find the probability of reaching the correct value

assuming that the error probability at any node due to incorrect switching is εn. If there

are no errors, a decision path consists of two decision nodes, and the reliability for this path

is (1 − εn)2. If, however, there is a single error, then the decision path is one out of two

possible paths that have three nodes. The probability to obtain a reliable output from these

two paths is 2[εn(1− εn)2]. Therefore, the total reliability is:

R = (1− εn)2 + εn(1− εn)

2

= 1− (3ε2n − 2ε3n)

= 1− EP

(4.22)

56

a1

p1

p2

b1

p0

b0

a0

s2 s1

S

s0

S

SS S S S

SS S S S SSS SS

S S SS SSS S S SSSS S SS SS

S SS

S

SS S

1

SSS

0

S

S

SSS

S

S

S

SS SS SSS

S SS S S

S

S

S

Figu

re4.14:

Error-correctin

g2x

2bitad

der.

57

We can reach the same result if we consider the properties of the error-correcting code itself.

The reliability of the code is found as the probability that there is no error in any bit in

addition to the probability that a single error occurs. Thus, the reliability is given by the

following expression:

R = (1− εn)3 +

(

3

1

)

εn(1− εn)2

= 1− (3ε2n − 2ε3n)

(4.23)

which is exactly the same result.

Figure 4.15 shows the simulation of the reliability of the error-correcting BDD for the

buffer/inverter gate and the theoretical estimation using equation (4.23). It also confirms

that the estimate in [8] is too large and does not match the results of the simulation. For

gates, other than the buffer, the reliability becomes a function of the truth table, not just

the error correction capability of the code. The reason for this, is that we are not interested

in restoring the inputs of the gate function, but rather in getting the correct output. For

this reason, when we calculate the gate reliability, we have to incorporate the possibility

that a change in the inputs does not result in a change in the output value. To analyze the

performance of an error-correcting BDD NAND, we refer to the standard array in Table 4.5.

For a NAND gate, we assign the output value ’1’ to the first 3 columns in the table, and

the value ’0’ to the last column. The error is defined as a change in the output value from

0 (1) to 1 (0). To go from a value in any column to another in any other column, we need

to change at least 2 bit positions, since the minimum Hamming distance of this code is 3.

The reliability of the error-correcting AND/NAND gate has 3 parts. The first part is the

probability of staying within the same column (for the first six rows). The second part is

the probability that one of the first three columns (also within the first six rows) changes to

one another. The third part is the probability of the value in the fourth column, changing

to any of the values in the seventh and eighth rows as we chose to map these rows to the

value 0. The reason for this third part of the calculation is that we do not have a complete

58

mapping. In incomplete mapping (bounded-distance error-correcting decoder), we assume

that the unmapped control signal sequence (data + parity) results in always choosing a

path to the ’0’ terminal node. Otherwise, with complete mapping, there is no need to have

a distinction between rows in the table. One alternative is to have an extra terminal node

(’E’) in the diagram to represent the unmapped sequence as shown in Figure 4.12. The other

alternative is to have an extra root node to indicate decoder failure as shown in Figure 4.11.

REC NAND = REC +1

2(R00↔01 +R00↔10) +

1

2R10↔01 +

1

4Runmapped

= (1− εn)5 + 5εn(1− εn)

4

+3

4(3ε3n(1− εn)

2 + 6ε2n(1− εn)3 + 3ε4n(1− εn))

+1

4(ε2n(1− εn)

3)

(4.24)

It must be emphasized that the derivation of the reliability in equation (4.24) does not take

into account the structure of the diagram itself, but only the properties of error-correction

coding, in conjunction with the truth table of the gate.

Figure 4.17 illustrates the theoretical value of the reliability from equation (4.24) and

the result of a Monte Carlo (MC) simulation of the error-correcting decision diagram. The

simulation is carried out by means of random errors applied independently at each of the

signal levels. Figure 4.17 also compares this result to the effect of replacing each node of

the NAND BDD by an error-correcting BDD node based on the TMR buffer/inverter. This

means that each node in the original BDD of a gate is simply replaced by an error-correcting

node (composed internally of 4 nodes) such as the one shown in Figure 4.16. This means we

will have 6 signal levels and 4× 2 = 8 nodes for the TMR NAND as opposed to 10 nodes in

the EC NAND and 5 levels. Again, for the MC simulation, we apply independent noise at

each of the 6 levels. The reliability of the error-correcting XOR gate is shown in Figure 4.18.

For a 2-bit adder, the original BDD has 3 outputs and 11 nodes, the TMR adder has 44

nodes and the Hamming code adder has 61 nodes. The reliability performance simulation of

59

0 0.1 0.2 0.3 0.4 0.5

0.4

0.5

0.6

0.7

0.8

0.9

1Reliability of EC buffer/inverter


Rel

iabi

lity

Monte Carlo SimulationTheoretical EstimateNon−ECAstola’s estimate

Figure 4.15: Reliability of the error-correcting BDD for the buffer/inverter

each of them is shown in Figure 4.19 for each of the individual outputs s2s1s0 of the adder.

The results show that the performance with error coding is better than the original BDD

circuit. It should be noted that for s2 and s0, the TMR performance is superior. However,

it is not possible to consider individual results because the diagram is shared and can be of

only one type, either TMR or Hamming code based. The overall performance is shown in

Figure 4.20 as the average reliability performance over the all the outputs. It shows that the

overall performance is comparable.

4.8 Experiments

Several simulation models for the switching node in a BDD can be considered as in Figure 4.2.

A high-level simulation using VHDL utilizes the multiplexer model of a node. The results

obtained in the previous section for reliability, used this high-level abstract model, which

only models the node as an ideal selection switch. This ideal model is used to simulate

60

S

S

S S

S

Figure 4.16: An error-correcting BDD node used in TMR simulations

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50.4

0.5

0.6

0.7

0.8

0.9

1Reliability of EC NAND gate


Rel

iabi

lity

MC HammingTheoretical EstimateMC TMRNon−EC

Figure 4.17: Reliability of the error-correcting BDD for the AND/NAND gate

61

0 0.1 0.2 0.3 0.4 0.50.4

0.5

0.6

0.7

0.8

0.9

1Reliability of EC XOR gate


Rel

iabi

lity

MC HammingMC TMRNon−EC

Figure 4.18: Reliability of the error-correcting BDD for the XOR/XNOR gate

0 0.1 0.2 0.3 0.4 0.50.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

s2

Rel

iabi

lity

0 0.1 0.2 0.3 0.4 0.50.4

0.5

0.6

0.7

0.8

0.9

1

Reliability Simulation of EC 2 bit adders

1

Probability of Error0 0.1 0.2 0.3 0.4 0.5

0.4

0.5

0.6

0.7

0.8

0.9

1

s0

HammingTMRnone−EC

Figure 4.19: Reliability of the error-correcting BDD for a 2-bit adder

62

0 0.1 0.2 0.3 0.4 0.50.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


Average Reliability of EC 2 bit adder

HammingTMRnone−EC

Figure 4.20: Average reliability of the error-correcting BDD for a 2-bit adder

the performance of a 2-bit adder and assess the bit-error rate averaged over the 3 adder

outputs as the SNR is degraded. An instance of the simulation at SNR = 9dB is shown

in Figure 4.22, and results for other SNR values are given in Table 4.8. The point of this

simulation is to illustrate that error-correction allows one to achieve almost an order of

magnitude of performance enhancement.

To model circuit delays, we may use the simplified bidirectional hysteresis switch cir-

cuit model to find the results in less simulation time. This is important for large circuits

when running the simulation using the complete set of model parameters requires a long

time. In the simulation experiments, we use the dual transmission gate representation of the

switching node (2 pairs of pass-transistors) along with LP 16nm predictive CMOS technol-

ogy model from [134]. The simulator used is Ngspice [90], which has a feature that allows

adding noise on the control signals. The noise mean is zero, and the standard deviation is

calculated from equation (4.21). The circuit description is automatically generated based

on the diagram. Table 4.8 shows a comparison between the error correcting version of the

NAND gate implemented as a BDD and other implementations simulated using the same

63

0 0.5 1 1.5 2 2.5

x 10−7

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

time (s)

Vol

tage

(V

)Spice simulation of EC/TMR buffer circuit at SNR=3dB

p1p0aoutput

Figure 4.21: Spice simulation of EC buffer with different random noise applied at each level.

predictive 16nm technology. The gate voltage in the simulations is above threshold and is

set to VDD. Figure 4.21 shows how the transistor-level circuit handles large superimposed

noise at all the switching levels.

uncorrected Error correctionSNR BER BER3 0.4126 0.16505 0.3236 0.08717 0.2080 0.02839 0.0933 0.005210 0.0510 0.001612 0.0107 0

Table 4.8: Noise tolerance in error-correcting 2x2 bit adder with uncorrelated noise addedat all 4 inputs for various SNR levels

64

500 1000 1500 2000 2500 3000 3500

0

5

10

15SNR=9dB, original ber=0.092617

Number of bits

Out

puts

(s0,s

1,s2)

Inpu

ts(b

0,b1,a

0,a1)

(a)

500 1000 1500 2000 2500 3000 3500

0

5

10

15SNR=9dB, error correction ber=0.0049

Number of bits

Out

puts

(s0,s

1,s2)

Inpu

ts(b

0,b1,a

0,a1)

(b)

Figure 4.22: (a)Simulation of the 2x2 adder without error-correction at SNR = 9dB.(b)Simulation of the adder with error-correction. BER values are averaged for all 3 out-put bits.

65

SNR Conventional CMOS MRF model [92] MRF model [148] MRF-BDD [153] BDD with ECKLD BER KLD BER KLD BER KLD BER KLD BER

3 2.2144 0.1028 1.2714 0.0275 0.8618 0.0463 1.2463 0.0356 0.1051 0.09985 2.1714 0.0478 0.969 0.0157 0.5076 0.0177 0.9130 0.0169 0.0444 0.04347 1.9847 0.0160 0.7225 0.0074 0.1759 0.0079 0.5659 0.0121 0.0136 0.01369 1.9224 0.0069 0.4091 0.0067 0.0321 0.0054 0.2031 0.0093 0.0027 0.002710 1.6038 0.0048 0.1400 0.0052 0.0248 0.0055 0.1235 0.0067 0.0011 0.001112 1.4000 0.0027 0.096 0.0043 0.0073 0.0051 0.1031 0.0053 0 0

Table 4.9: Performance comparison of noise-tolerant NAND gate models for different SNRlevels (16nm predictive transistor simulation model).

4.9 Conclusion

In this work, we proposed a simulation model of an error-correcting BDD based circuit, and

compared its performance to the non-error correcting BDD. The main performance metric is

the reliability as a function of the input signal error probability. This type of error-correcting

circuits can be used efficiently at the nano-scale to mitigate the effect of noise. An order-

of-magnitude of improvement in bit-error rate, at the expense of extra hardware, is shown.

The cost of the extra nodes in the diagram based circuit is linear, and in the worst case is

exponential, unless TMR is used. For implementation at the nano-scale, a direct mapping

approach between the diagram and the actual circuit construction is desirable. This is done

through synthesizing a planar representation of the BDD. This is the topic of Chapter 5.

66

Chapter 5

Synthesis of Planar Nano-Circuits

5.1 Introduction

Planar diagrams have the advantage that they can be directly mapped to the device level

without any effort in placement and routing. Interconnections are always short and local.

Thus, timing and area estimates are accurate and the design constraints can be easily sat-

isfied. They are also an important requirement in circuits built using nanowire wrap gate

devices proposed in [49,60]. They have the disadvantage that their structure increases expo-

nentially in size and in one direction only. For large diagrams, circuit decomposition is often

necessary because of the exponential increase in size and the signal degradation as it crosses

multiple switching levels specially in a pass-transistor logic implementation. Processing a

node graph is an NP-hard problem because it requires the consideration of all possible com-

binations of all the nodes. There are several approaches to this problem in the literature [72].

In [117], the authors address the planarization of multiple-valued logic diagrams and they

investigate specific types of functions; symmetric and monotonic functions; that can lead to

regular layouts. Such planar diagrams cannot be designed for arbitrary functions. In [88],

the authors synthesize regular triangular structures by means of repeating the control vari-

ables across multiple levels. In the work by Perkowski et al., the authors synthesize regular

lattice structures using various expansion types, namely the Shanon type, positive Davio

and negative Davio expansions [25, 99–102]. The resulting lattice diagrams are triangular.

They are not composed of simple decision nodes, but rather of complex unit cells.

In the decision diagram called YADD (Yet another decision diagram), triangular diagrams

are also generated, but the unit cell is a simple BDD multiplexer node [88]. The control

signals, however, have to be repeated across multiple levels. This affects the area and power

67

requirements. Delays in signal propagation between levels are dealt with by pipelining the

control signals.

In [21], the authors describe a technique to eliminate crossings in QCA layouts. In this

case, the circuit is not hierarchical nor arranged into levels as in BDDs, and the goal is to

remove edge crossings by means of a crossing elimination algorithm.

In this chapter, we describe two algorithms for the generation of planar binary decision

diagram that assume only one type of nodes and requires no repetition of the control signal

levels. This can be achieved by the insertion of dummy nodes, node swapping, node dupli-

cation, control signal duplication [18, 19]. Dummy nodes are just routing nodes that have

no logic functionality. They are used to meet the planarization algorithm requirement that

parent nodes and child nodes are always in adjacent levels. The first algorithm has linear

time complexity with respect to the number of the nodes while the second algorithm has

exponential time complexity. While the second algorithm achieves better results in general,

its exponential time requirement may render it prohibitive for analyzing large diagrams that

have 10 input control variables or more. Heuristics are used to search for a solution within a

given set instead of evaluating all possibilities to find the best solution. The generated planar

diagrams are not canonical nor optimum. To find an optimal planar diagram, its nodes in all

the signal levels must be considered. The results reported in [18] are questionable, because

the reported results show a linear time performance for all circuits, regardless of their size

despite arguing otherwise. Also, the number of nodes reported is either too small or too

large for some circuits. This was verified by running the two algorithms on the same circuits

and tabulating the results. The results reported in [21] cannot be verified because they are

based on multiple tools that are not available. In this thesis work, we developed a tool1 that

can also handle the planarization of BDDs that have complemented edges.

Two examples for a planarized error correcting BDD of a NAND gate and the s2 output

1See Appendix A for more details.The tool is available for download at http://people.ucalgary.ca/~tsemoham/bdd

68

Figure 5.1: Planarized EC-BDD NAND gate. Nodes with a single vertical branch are dummynodes. Shaded nodes are duplicate nodes.

from the error correcting 2-bit adder (Figure 4.14) are shown in Figures 5.1 and 5.2 respec-

tively. In those graphs, a dummy node is indicated by the letter d. A duplicate node is

indicated by its colour filling. Wiring to the terminal node is ignored because the terminal

node is a constant and it can be connected to the power supply terminal. We assume it can

be repeated any number of times. A node’s one outgoing branch is drawn in dark colour

(blue) and the other branch is red. In the case of complemented edged, a green colour is

used to draw the branch.

The proposed BDD tool performs the following operations. First, it generates a reduced

ordered BDD using the CUDD package. The second step is common to both algorithms and

involves analyzing this BDD and importing it to memory. In this step, dummy nodes are

inserted such that a child node is directly in the next level to the parent node. For example,

if a node at level 3 has a child node at level 5, then a dummy node is inserted at level 4 and

routing is passed from the parent node to the dummy node and then to the child node at

level 5. The dummy node does not have any logic, and is just a wiring node. Its outgoing

branches are merged into a single vertical wire. This complicates the processing, because

different types of nodes have to be accounted for. In the third step, one of the following two

algorithms is run in order to produce a planar BDD.

69

Figure 5.2: Planarized BDD implementing the output s2 of the EC-BDD 2-bit adder

70

5.2 Algorithm 1: Linear-time node processing

The algorithm carries out the generation of planar error-correcting diagrams by examining

the nodes of the diagram, one level at a time. At each level, nodes are arranged according

to the position of their parent nodes. We always assume that the parent nodes have fixed

positions within a level (locked in place) and cannot be moved or duplicated. There are 3

possible scenarios when only adjacent parent nodes are processed.

The first scenario shown in Figure 5.3 assumes that there are no common child nodes

between the nodes Sk and Sk+1 where k is the loop variable for the parent-node level. In

this case, the child nodes of each parent node are arranged according to the position of their

respective parent nodes. The choice whether to place a child node to the left or to the right

of the parent node is arbitrary. If a node has more than one parent and the parents are

not adjacent in position then this node is duplicated. We lock the children of the node Sk

in place so that they cannot be moved by a future operation. A possible future operation

involves the same child node but with a parent far away. In this case the parent would see

its child its locked in place and request a duplicate. In the next loop iteration, the nodes

Sk+1 and Sk+2 are considered.

In the second scenario shown in Figure 5.4, there is a one common child for the parent

nodes. In this case, the child is simply placed in the middle and locked in place. The other

children are placed to the right of the right parent node and to the left of the left parent

node. They are also locked in place. We then increment the loop variable twice, such that

the next parent nodes to consider are Sk+2 and Sk+3.

In the third scenario shown in Figure 5.5, two adjacent parents share two child nodes. In

this case, we arbitrarily choose a child node and place it in a middle position with respect

to the positions of the parent nodes. We duplicate the other child node and place it to the

right and to the left of the parents as before. We lock all these child nodes in place and

increment the loop variable twice such that the next parent nodes to consider are Sk+2 and

71

Figure 5.3: Two adjacent parent nodes with no common child nodes.

Figure 5.4: Two adjacent parent nodes with one common child node.

Sk+3.

This process is repeated until all child nodes are processed in a single pass. Then, we

proceed to the next level so that the locked child nodes become the parents of the new child

level. We finish processing when we reach the last level before the terminal nodes. We do

not consider ordering of the terminal nodes and assume there is an infinite supply of these

terminals which are either 0 or 1. It is an assumption in the algorithm meaning that it

does not worry about connections to the terminal nodes, and always considers them as local

connection. In a real circuit a terminal node is simply one of the supply rails. A connection

to a terminal node is a connection to one of the supply rails (or one of the supply planes).

One supply plane can be above the plane of the circuit and the other supply plane below it.

The first parent level with fixed position nodes is that of the root nodes. The complexity

and the processing time is linear in the number of nodes, and the whole diagram is processed

in one pass.

72

Figure 5.5: Two adjacent parent nodes with two common child nodes.

5.3 Algorithm 2: Multi-pass diagram processing

In order to maximize node sharing, parent nodes can be shuffled if their order is arbitrary.

This shuffling is repeated until the optimum number of node sharing is achieved. This can

lead to exponential time requirement and is best illustrated by the example in Figure 5.6.

In this case we run the same way as in the first algorithm but we keep track of the arbitrary

selections we make in the first scenario when there are no common nodes. In this case the two

children are labelled as coupled nodes. Coupled nodes, albeit locked in place, can exchange

their positions if this would result in a smaller number of duplicated nodes. Two coupled

nodes have four children which can be similarly coupled. In Figure 5.6, we show four levels

in which coupled nodes are found on the first three. In the fourth level, however, we show

only one common child node. In order to achieve minimum node duplication, we need to

rearrange the nodes in level 3. In this case we are considering rearranging the parent nodes

if they are couple nodes. The rearrangement is not only that two coupled nodes exchange

their places, but also the whole couple can exchange their place with an adjacent couple if

their parents are also coupled. This means that we keep back tracking upwards until we no

longer see coupled nodes. For the case in the example, the parent node shuffling will move

the two groups in level 3 that share a common child to the middle. Also the nodes in these

two groups will be uncoupled because a lone child has been located. Then the algorithm

will backtrack again upwards through all the levels of the involved parents and lock them in

place and uncouple them.

73

Figure 5.6: Arbitrary position coupled nodes with a common child node in the fourth level.

74

There is no guarantee how often this operation will have to be repeated. The method

used to find the required shuffling involves that, when we process a child level, we have to

iterate through all the possible arrangements of the parent nodes and keep a minimum score

that identifies a certain arrangement. This arrangement is then kept, and we proceed to the

next child level. The number of possible arrangements increases exponentially as more deep

levels of coupled nodes are generated. The final solution may not require that certain parent

nodes to be uncoupled and, thus, the final solution is arbitrary. The exponential increase

of parent position possibility is responsible for the exponential increase in time requirement

by the algorithm. When it is too large, the tool is instructed to pick some possible parent-

location combinations randomly and then choose the best one and proceed to the next level.

Otherwise, the run time is prohibitively too large for large circuits as shown in the results

section.

5.4 Results

We implemented both algorithms for multiple MCNC benchmark circuits as well as using

our own error correcting BDDs proposed in the previous chapter. The run time for each

algorithm is calculated by averaging the time for 10 consecutive runs. The fluctuation in the

measured run-time used is mainly due to caching and to the CLI garbage collector kicking

in while the algorithm is running. When the run-time reaches several hours, it is calculated

only once. The C880 benchmark was not tested with the second algorithm because of its

very large size, especially after the insertion of dummy nodes. This suggests incorporating

function decomposition before attempting to generate planar layouts. The results show that

for small circuits, the run time of the second algorithm is acceptable. The reduction in

the number of duplicate nodes is around 35% on average for larger circuits. Table 5.1 also

shows the results for different variable re-orderings. We tested exact and SIFT reordering.

Although, exact reordering, in general, is guaranteed to produce the least number of nodes

75

in a BDD, this advantage is not carried over when planarization is performed. A significant

effect of the variable reordering is apparent in some circuits like misex3.

Benchmark #inputs #outputs #nodes A1 #nodes A1 time A2 #nodes A2 time

ECadder(exact) 7 3 61 +77 0.6 ms +59 413 msECadder 7 3 68 +58 0.6 ms +53 44 ms9sym 9 1 33 +36 0.1 ms +16 27 ms

cm138a 6 8 17 +11 0.2 ms +11 16 msrd73 7 3 43 +73 6.5 ms +34 30.24 msalu2 10 6 188 +245 15.2 ms +185 19.47 s

apex4(exact) 9 19 970 +2204 153 ms +1919 72.18 sapex4 9 19 988 +2094 137 ms +1875 34.92 min

misex3(exact) 14 14 545 +3977 317 ms +2760 10 minmisex3 14 14 672 +2248 150 ms +1824 32.37 salu4 14 8 859 +2587 283 ms +1852 17.73 min

alu4(exact) 14 8 699 +1112 79 ms +1005 2 minC880 60 26 7090 +837219 12.85 Hr — —

Table 5.1: Planarization results (variable ordering is performed using SIFT algorithm unlessthe exact ordering (denoted (exact)) is used)

5.5 Conclusions

In this chapter, we introduced two algorithms for the generation of BDDs with planar layout.

The algorithms are implemented in a form of a software tool2 and the results for both algo-

rithms are analyzed. In both algorithms, the number of levels of the planarized BDD is equal

to the number of levels in the original BDD. Dummy nodes are inserted such that routing

is only necessary between adjacent levels. For the first algorithm, linear time processing

is guaranteed at the expense of increased number of duplicate nodes in each level. In the

second algorithm, the software keeps track of all possible combinations of node placements

and examines them all to minimize the number of duplicate nodes at a certain level. If more

than one combination have the same minimum score for node duplicates, they are kept for

further consideration using information from deeper levels. This may lead to exponential

2http://people.ucalgary.ca/~tsemoham/bdd

76

time requirement in the worst case. The benefit of reduction in the number of duplicted

nodes using the second algorithm is estimated at 35% on average.

77

Chapter 6

Crossbar Latch-based Combinational and Sequential

Logic for nano FPGA

6.1 Introduction

Molecular devices can exhibit desirable current-voltage (I-V) characteristics which makes

them possible candidates to replace conventional CMOS devices. Molecular devices require

a much lower footprint than CMOS transistors and therefore provide packing density capa-

bilities that would allow the continuation of the trend set by Moore’s law [23, 30, 128].

Using assemblies of nanowires and molecular devices, it is possible to implement simple

logic gates, digital computation and simple memories [89]. Although the fabrication of

working molecular scale devices has been successful, it is necessary to develop techniques

to integrate these devices to form large scale digital circuits with densities that surpass the

densities attained by common lithographic techniques and transistor scaling.

Previous work focused on using the crossbar array to implement programmable logic

arrays in which hybrid CMOS/nano approaches rely on lithographic scale electronics for

signal restoration and inversion. In this chapter, the crossbar latch, which is an integral part

of the crossbar array, is used to model full combinational logic circuits with signal restoration

and inversion. Nano architectures of primary combinational and sequential building blocks

are presented since they constitute the basis of a homogeneously structured nano processor

or nano FPGA.

The proposed nano system is assumed to interact with lithographic (microscale) devices

only for programming and signal I/O.

78

6.2 Device modeling

Designing electronic circuits with molecular devices entails the consideration of several key

issues. The first issue is concerned with obtaining device models that can be used to simulate

circuits built using the new devices. The device models are usually based on categorizing the

measured behaviour of molecular devices. The most prominent category is the programmable

resistor and the programmable rectifier device. These devices exhibit orders of magnitude

difference in conductivity between their programmed ’ON’ and ’OFF’ states and are thus

suitable for implementing digital circuits. The devices can be modeled using the level-1 diode

and resistor equations or by using more elaborate model fitting of the device characteristics,

as shown in [157]. Other types of molecular devices include those with negative differential

resistance characteristics, and field effect transistor (FET) like devices. Nanowire based

FETs are examples of active devices at the nanoscale. Active devices form an integral part

in the operation of digital circuits because there is a continuous need for signal restoration,

buffering and inversion. Signal restoration deals with the fact that the voltage levels of

logic values continuously degrade as the signal passes through passive components. This

is due to the voltage drop across the passive devices which cause the voltage level of logic

’1’ to decrease and the voltage level of logic ’0’ to increase. Signal restoration should take

place after the signal has traversed just a few passive components, otherwise it becomes

difficult to disambiguate the logic value associated with that voltage. Signal buffering is

necessary when the loading effect due to the fan out of a given signal would degrade its value.

Signal inversion is an important function since most of the building control logic in basic

multiplexers, demultiplexers, encoders, decoders, adders and all other types of combinational

and sequential logic rely on assimilating both a logic signal and its inverse.

The second issue with using molecular devices is that the design of digital circuits must

assume non-lithographic bottom-up assembly of components. This bottom-up assembly

paradigm presents a major departure from the common design view point of digital logic

79

which does not consider the difficulty of integrating non-homogeneous components in the

same circuit. The design of nanoscale electronics must clearly take into consideration that

the circuit architecture must be regular and consists of homogeneous components in order

to account for the limitations of bottom-up assembly. The regular crossbar array presents

one of the most successful deployments of bottom-up assembly. The crossbar array is a two-

dimensional array of nanowires assembled at right angles to each other. Molecular devices

in the crossbar array are laid out in an intermediate step of the assembly of the top and

bottom plane nanowires and these molecules connect to the wiring at the cross points of the

nanowires. The assembly of the crossbar array is illustrated in Figure 6.1a which shows that

the device area is small compared to the wiring. This device-to-wiring ratio is much greater

compared with with the ratio in conventional microscale circuits.

The devices used at the crossbar crossings are two terminal devices and they are ei-

ther programmable resistive or programmable rectifying devices and are thus suitable for

making memories and programmable logic arrays. The implementation of basic logic func-

tionality using the crossbar array is illustrated in Figure 6.1c. Such implementation of

logic functionality raises again the design issues of signal buffering/restoration/inversion, as

well as signal differentiation and lithographic interfacing. Signal differentiation and litho-

graphic interfacing deal with the problem of addressing individual nanowires for both signal

conveying (I/O) and for initial programming of the molecular devices at the cross points.

Hybrid CMOS/Molecular (CMOL) architectures [73] utilize microscale circuits in I/O inter-

facing, and for signal buffering and inversion. The CMOL architecture has the drawback

that the density of logic functionality is limited by the integration density of the microscale

CMOS buffers and inverters. Using nanowire FETs within the crossbar array as active de-

vices [32, 35], has the drawback of complicating the crossbar array bottom-up assembly due

to inhomogeneous components. An alternative to nanowire FETs and microscale buffers is

the recently proposed crossbar latch [68]. The crossbar latch is a two terminal device that

80

can be used for signal restoration and inversion. Integrating the crossbar latch, within the

crossbar array, does not require modifications to the construction process of the array, or

use of non-homogeneous molecular devices.

6.3 Operation model of the crossbar latch

Crossbar arrays are used for implementing combinational logic and memory functions. How-

ever, passive device components such as resistors and diodes degrade the voltage levels of

the logic values as the signal propagates through them. Techniques in the literature for sig-

nal restoration and inversion either require doing the signal restoration and inversion using

microscale circuits or using FET-like structures which makes the structure not homoge-

neous [32,73]. Nanowire FETs have the disadvantage that they require manufacturing steps

that are not identical to those used in assembling the crossbar array. This same drawback

is also true for techniques that suggest using devices with negative differential resistance

(NDR). Using CMOS microscale circuits for signal restoration and inversion precludes the

ability to have complete processing capability at the nanoscale. Using microscale circuits has

the disadvantage that it limits the device packing-density, since the packing-density that can

be achieved by molecular components, becomes limited by the number of microscale buffers

that can be integrated on chip.

The crossbar latch was proposed by Kuekes et al in [68] as a technique for implementing

signal restoration and inversion within the crossbar array. Crossbar latches are implemented

in the same way as molecular devices within crossbar arrays and they provide the capability

to store logic signals. The latch is basically a two-terminal device and its operation is based

on programming a pair of molecular switches according to the value present on the signal

line. The programmable switch pair is used to connect the signal line to one of the supply

rails. This has no effect on preceding logic since it is composed of rectifying junctions that

allow the current to pass in one direction only towards the load. In this scheme, two control

81

A

B

C

AB

VDD A B C

A+B

Gnd

(a)

(b)

(c)

VDD

VA

VBVout

AND VA VB

OR

Vout

~10nm

Figure 6.1: (a) Crossbar with molecular devices. (b) Basic logic operations requiring onlypassive components. (c) Implementation of the basic logic operations. (black arrows repre-sent enabled diode junctions)

82

lines are used in conjunction with the switch pair in a 3 step procedure. Step 1 is to apply

a large signal on the control lines that exceeds the threshold voltage for throwing ’OFF’

the switch. This step is referred to as unconditional opening of the switches. Step 2 is to

close one of the switches according to the voltage level on the signal line. This is done by

applying a voltage on the control line which is enough to change the state of only one of the

switches according to the value on the input signal line. This step is referred to as conditional

closing of the switches. Step 3 is to apply voltages corresponding to strong logical values

on the control lines. Thus, after the final step, the switch effectively connects the signal

line to one of the two supply rails. If the original value on the signal line corresponds to

a weak logical 0 (1) and the switch connects the signal line to strong logical 0 (1), then

signal restoration takes place. If on the other hand, the original value on the signal line

corresponds to a weak logical 0 (1) and the switch connects the signal line to strong logical

1 (0), then signal inversion has taken place. In both cases, the signal value is stored in

the switch which corresponds to a latching effect. This three-step operation is illustrated in

Figure 6.2 and the associated hysteretic response is in Figure 6.3. In [125] and [124], logic

values are represented by different impedance paths that connect the signal to ground. In

the crossbar latch, however,, the logic values are represented by having a switch in the lower

impedance state, connecting the signal line to one of the supply rails.

The addition of control lines for programming the latch can be in the same plane as

the main crossbar array or in the planes on the top and bottom of the crossbar array.

This is possible using three-dimensional self assembly. The main crossbar array is a simple

example of this fabrication capability. 3-D structures are also beneficial in the sense that

the orientation of all diode molecules can be the same, by having one control line on top of

the nano-array, and the other control line at the bottom, as shown in Figure 6.4a. This is

advantageous, since it is difficult in device assembly at the nanoscale to have one molecule

oriented in one direction, and another oriented in an opposite direction as required by the

83

−20−10

010

20

0

2

4

6−15

−10

−5

0

5

10

15

Control VoltageInput signal

Out

put s

igna

l

−10

−5

0

5

10

15

Figure 6.3: Crossbar latch hysteresis characteristics

circuit in Figure 6.2a.

In our proposed simulation model of the crossbar latch, we utilize PSPICE hysteresis

switches that exhibit the conditional change of ON/OFF state. The model includes a rough

estimate of the switch resistance and capacitance, as shown in Figure 6.4b. The model uses

diodes because the signal flow can be only in one direction, which is inherent in the actual

device but not modeled by the hysteresis switch. We have used this simplified simulation

model to obtain the results in the following section.

The main issues, associated with the crossbar latch, are the time required to change the

state of the switch, and the number of write cycles before the resistivity difference (gap),

between the ’On’ state presentation and the ’Off’ state presentation, degrades. The current

performance of experimental unpackaged devices shows that they are capable of switching

state for hundreds of cycles, which is not enough for implementing continuously switching

sequential logic. However, these issues are expected to be resolved with future advance in

device fabrication and sealing. The multiplexing and generation of the control signals can be

done using microscale circuits. This does not represent a major overhead, since the control

85

3-D Assembly of the crossbar latch

Original Passive Crossbar

VCA

VCB

+

-

+

-

+

-

+

-

LoLi

VCp

VCm

(a)

(b)

Figure 6.4: (a) 3-D Structure of a crossbar latch. (b) The PSPICE model of the crossbarlatch using hysteresis switches.

86

lines are shared among all the adjacent latches that are required to operate in the same

phase. Also microscale wires are expected to be the medium for a global signal in a wiring

hierarchy, and the control signals can be considered as global signals.

The current performance of these devices allows them to switch state for hundreds of cy-

cles, which is not enough for implementing continuously switching sequential logic. However,

these issues can be resolved by future advances in constructing the molecular compound used

as the switch.

6.4 Combinational circuit models

The regular structure of the crossbar array with passive devices resembles a PLA with AND

planes and OR planes. However, implementing the simplest sum of products functions

requires signal inversion. For example, consider a basic digital component as the full adder.

In the full adder, the carry out signal is given by C = xy + xz + yz which can be directly

mapped to the nano PLA. On the other hand, the sum signal S = x′y′z + x′yz′ + xy′z′ + xyz

requires inversion of all three input signals. The CMOL architecture provides a solution

to this by using microscale inverters and buffers [73]. This limits the integration density

advantage that can be achieved using nano scale electronics. The other approach that we

utilize here, is based on the crossbar model, discussed in the previous section. We use a

set of four control signals VCP1, VCM0, VCP0, VCM1. The control signals VCP1, VCM0 are used

for signal restoration while VCP0, VCM1 are used for signal inversion. The main difference

between the two pairs is the application of a voltage of opposite logic value to the original

signal value. The simulation model of a nanoscale crossbar full adder is shown in Figure 6.5.

This model is aimed at the integration of complete functionality at the nano scale. An

n-bit adder built using this model can be assumed to be pipelined if we utilize another set

of control voltages similar to the first set, but shifted in time. The shift is determined by

two factors; the circuit delay and the requirement to avoid control voltage induced spikes

87

A

B

C

clatch1

LoLiVCpVCm clatch2

LoLiVCpVCmclatch3

LoLiVCpVCm clatch4

LoLiVCpVCmclatch5

LoLiVCpVCm clatch6

LoLiVCpVCm

VCp1

VCm0

VCp0

VCm1

CarryCarry

Sum

Cout

S

Figure 6.5: A PSPICE model of a nano architecture model of a full adder, utilizing thecrossbar latches for signal restoration and inversion.

that feed through the switches. Phase-shifted control voltages allow every subsequent stage

to sample its input after it reaches a stable state. In this simulation, we chose that the

control signals sample the input to the circuit every 1µs, which is also the rate of change of

the least significant bit. Figure 6.6 shows another combinational example which is a 4-to-1

multiplexer built using the same idea.

This section demonstrates how it is feasible to implement any type of combinational logic

circuits using homogeneous nano-devices. In the next section, we will discuss the usage of

out-of-phase control signals for implementing sequential logic circuits. Out-of-phase control

signals are also used in combinational logic circuits in order to separate evaluation and

storage of different parts of the logic array. Part of the logic circuit evaluates its inputs,

when they are stored and stabilized in a prior stage. This is consistent with the concept of

pipelining. Isolation of each stage and its output is provided through the rectifying cross

connects. These cross connects are made of the same type of molecules but no control voltage

is applied to them.

88

Mux

S0

S1

A

clatch1

LoLiVCpVCm

clatch3

LoLiVCpVCm

clatch2

LoLiVCpVCm

clatch4

LoLiVCpVCm

B

C

VCp1

D

VCm0

VCp0

VCm1

Figure 6.6: 4-to-1 Multiplexer model using the crossbar latches in decoding the selectionsignal

6.5 Sequential circuits

The crossbar latch behaviour, described in Section 6.3 is analogous to a clocked D-type

latch with a single inverting or non-inverting output. This type of latch is suitable for

implementing shift registers and counters, which are necessary building blocks in a finite

state machine. The shift register effect requires isolation between the stages because in an

n-bit shift register, the next state of each bit is a function of the input bit.

In microscale circuits, this isolation is accomplished by the finite gate delay between the

input and output. In the crossbar circuits, the latches are taps on the signal line. Thus, a

shift register based on crossbar latches, represent a single signal line with multiple latches.

This would disrupt the shift register operation, because the signal will be essentially the

same at the end of the line as the input, when all the latches are unconditionally opened. To

overcome this, our design is based on a technique similar to the operation of charge coupled

89

devices (CCD). In one type of CCD operation, every other device is connected to the same

clock phase controlling the transfer. Also to force the signal to propagate in a single direction

and prevent shorting out the supplies, diode junctions are necessary to separate the stages.

Figures 6.7 and 6.8 illustrate a shift register and its simulation model. The arrows

between the latches on the signal lines represent isolation diodes, forcing signal propagation

in one direction without shorting supply rails, as would happen in a direct cascade of two

crossbar latches. Figure 6.9 shows the out-of-phase control signals used in clocking the shift

register and the simulation results of the crossbar-based shift register. The isolation diodes

are simply part of the wiring that interconnects two latches together and successive latches

can be arranged at 90o angles to each other.

Arbitrary sequence counters can be implemented by inserting combinational logic in

between the latches, in order to generate the appropriate input for the latches. The passive

combinational logic conducts the signals in one direction, and, thus it is not necessary to add

extra diode junctions. The structure of a generic counter and its crossbar implementation

are shown in Figure 6.10. In this architecture, the latching elements are parallel to each

other, and they all follow the same clocking sequence, as in the case of synchronous digital

logic. The layout, shown for the counter, can be improved in a straightforward manner,

using inspection or by using an automated tool that enforces a minimization of the layout

area.

Figure 6.9 shows the out-of-phase control signals used in clocking the shift register and the

flip-flop. The control signals are in excess of the assumed hysteresis voltage characteristics

of the proposed latch model. It also shows the simulation output of the shift register.

Another example is a circuit model for a T-flip-flop as shown in Figure 6.11. The proposed

model utilizes two-phase control signals and inversion control signals. The sum-of-products

is implemented using the diode array modeling a 2-to-1 multiplexer which selects either Q

or Q′ as the next state to be stored in the latch. The out-of-phase control signals are used to

90

D Q D Q D Qinput

clock

D Q

D Q D Q D Qinput

Ck1

D Q

Ck2

(a)

(b)

output

(c)

input

CK1

CK2

output

Figure 6.7: (a) A 4-bit shift register from D-latches. (b) Modifications to the basic shiftregister to make it suitable for crossbar implementation. (Two-phase control signals andrectifier junctions to force signal direction) (c) Crossbar implementation of the 4-bit shiftregister. (solid black arrows represent rectifier junctions, forcing signal direction)

separate between the ”evaluate” and ”store” steps. The ”store” is finished when the signal

is stable, and time-delayed from the feed through spikes at the transition of the hysteresis

switches. This architecture of the T-flip-flop can be cascaded, because it does not require

any extra inverters to produce the T and T ′ signals, used by the multiplexer portion. This

idea is shown in Figure 6.10c, in which intelligent mapping and reuse of resources results in

a very compact implementation of the counter.

6.6 Organization of a nano FPGA using crossbar arrays

The implementation of sequential circuits is necessary to implement configurable logic blocks

for nano FPGAs. FPGAs can be used to implement various nano-processors, as they are

91

VCp11

VCm01

clatch1

LoLi

VC

p

VC

m

Vx

VCp12

VCm02

clatch2

LoLi

VC

p

VC

m

clatch3

LoLi

VC

p

VC

m

clatch4

LoLi

VC

p

VC

m

R5

100k

Figure 6.8: A PSPICE model of the shift register using 2 pairs of out-of-phase control signals.

field programmable and have regular structures, which are the two main characteristics of

crossbar arrays. An FPGA slice can be composed of a lookup table, a latch and a wiring

matrix. The lookup table, the latch and the wiring matrix are all special forms of PLAs.

Thus the nano FPGA is an array of configurable logic tiles composed of patterns of these

simple structures. With the highly regular homogeneous organization of resources, the task of

organizing a nano PLA becomes greatly simplified. The same fabric is capable of performing

logic operations, and in addition to that, signal routing. A simple form of signal routing

is shown in Figure 6.12. Placement and routing have always been two distinct jobs, with

routing usually taking significantly more time than placement. In FPGAs, the tool may run

out of routing resources to implement the design even if the utilization of logic resources is

significantly less than 100%. This situation can be improved in a nano FPGA, since logic

resources can be exchanged with routing resources, and the placement can be dynamically

coupled with routing. This type of versatility, in conjunction with non-lithographic bottom-

up assembly, projects that nano electronics in the very near future will not only have superior

packing density to microelectronics, but will also require much less cost in terms of fabrication

and design time.

Since the whole structure is homogeneous, a place and route tool experiences great flex-

92

Figure 6.9: (a) Waveforms of two out-of-phase control voltage pairs for latching the inputsignal (b) SPICE Simulation of the operation of the crossbar-based shift register at steadystate.

93

D Q D Q D Qinput

clock

D Q

(a)

PLA PLA PLA

(b)

inputCK

AND

AND

OR

OR

Buffer/Invert

Latching Plane

AND

Plane

OR

Plane

AND

Plane

(c)

Figure 6.10: (a) A generic synchronous counter architecture with an arbitrary countingsequence. (b) Crossbar implementation of the generic counter requires only one controlsignal pair. (c) Floorplan of a generic counter.

Figure 6.11: A PSPICE model of a T-flipflop using a 2-to-1 MUX.

94

Figure 6.12: Shared routing/device plane

ibility in mapping a design. The only restriction that limits this flexibility is having to

periodically interface with the microscale circuits. This interfacing is required to provide

access to the input and output signals of the circuit, and to program each nanowire junc-

tion. The current solutions for addressing each junction are either using stochastic decoders,

or tilted arrays at an angle. This angle is a function of the nanowire pitch as compared

to the microscale wire pitch [33]. Figure 6.13 is an example of the organization of a nano

FPGA capable of both combinational and sequential logic. In this architecture, the building

component is a small PLA formed of AND/OR planes and buffering arrays. This building

component is placed on both sides of the vertical and horizontal axes, and is used to build

the pattern shown in Figure 6.13b. Larger PLA blocks can be incorporated to form macro

memory blocks. Thus, the architecture of a nano FPGA can be considered as inhomogeneous

from the macro level point of view (composed of different macro blocks), but it is homo-

geneous from a micro level point of view in which all the building blocks are side-by-side

crossbar arrays.

6.7 Area and timing of the nano FPGA

The area of a unit cell in a nano FPGA depends on the number of passive devices, number of

signal restoration devices and area overhead due to interfacing with lithographic scale wiring

95

AND

OR

Buffer

Connect

(a)

(b)

Figure 6.13: Example for the organization of the nano FPGA

96

and devices. In [35], the interfacing of nanowires with lithographic wires via stochastic de-

coders is how the nanoarray is programmed and this represents the major area overhead. The

second contributor to the area other than the logic itself is the stochastic buffering/inversion

devices within the array. In [35], the buffering/inversion is done by nanowire FETs. In [73],

the overhead due to interfacing with lithographic wires is overcome by tilting the nanoar-

ray relative to the lithographic array. This technique avoids using stochastic decoders and

guarantees the addressing of every cross point. However, in [73], signal restoration is carried

out by CMOS inverters and this greatly reduces the gain from using molecular devices for

the sake of higher packing density. In our circuit, we propose the use of the tilted array

technique in conjunction with cross bar latches. The 3D crossbar latch occupies one extra

cross point. In order to force signal isolation, a diode cross point is also necessary. Thus,

two extra cross points are required per buffer/inverter. The area of the nano crossbar for

building a nano FPGA slice can be approximately given by:

LUTnrows = 2×Ninputs + 1 (6.1)

LUTncols = 2Ninputs (6.2)

BuffInvcrosspoints = 2× LUTnrows + 4 (6.3)

NMUXSby1=(

S × 2 + 2S + 1)

× 2S + S × 4 (6.4)

SliceArea ≃ F 2nano(2× (LUTnrows × LUTncols

+BuffInvcrosspoints)

+2×MUX2by1 + 2×MUX4by1) (6.5)

where Fnano is the nanowire pitch. The areas of the multiplexers; MUX4by1 andMUX2by1, are

given by (6.4). One in LUTnrows stands for using one row in the OR plane, based on the as-

sumption that the PLA-based LUT has just one output. The number of input rows accounts

for every signal, and its complement as input to the PLA. The extra 4 BuffInvcrosspoints

97

account for assuming that a slice contains two latches. We also assume that a slice contains

four different multiplexers. Thus a four-input LUT can be considered as a PLA with 16× 9

cross points to account for the 16 possible minterms and the four inputs plus their inverted

counterparts and a ninth row for the OR function. If 8 crossbar latches are used at the LUT

input, and one at its output, then we need 18 extra crosspoints. This gives a total of 162

crosspoints per LUT plus latch. The typical FPGA slice contains two LUTs, some 2-input

multiplexers and 4-input multiplexers. Thus, the rough estimate of cross points per FPGA

slice is around 400. The area of a slice is, therefore, of the order of 40, 000nm2, assuming

that the nanowire pitch is 10nm. The overhead due to lithographic wires is at minimum,

based on the assumption that lithographic devices and nano devices are not in the same

plane. The typical area of CMOS logic at 22nm technology (state of the art at the time of

writing) is about two orders of magnitude higher.

The timing of the circuit depends on two main delay sources. The first delay is due to

the capacitance and resistance of the nanowires and cross points which affects the evaluation

time of the signal on the nanoarray. The second delay is associated with the time required

to program the molecular latch. Currently, the key delay component is due to programming

the molecular latch. The other component is not significant, since we do not use nanowires

for global signals. The approximate delay equations are given by:

Tclock = Tevalp + Topens + Tprograms + Tevaln (6.6)

Teval = Narray × (Ccrosspoint + Cwire)

×(Rcrosspoint +Rwire) (6.7)

where Tevalp is the evaluation time of the PLA prior to the crossbar latch, and Tevaln is the

delay time due to the resistance and capacitance of the interconnect, following the crossbar

latch till the next PLA block. If we assume the capacitance of a cross point to be 10−18Farad,

the contact resistances 1MΩ and neglect the resistance of the nanowire and its capacitance

to the substrate, then an array with 8 inputs will have a delay estimate of 10ps. The

98

Table 6.1: Comparison of nanoelectronic architectures

Architecture Nano PLA [35] CMOL [73] This work

CMOS to Stochastic Crossbar Tilt suggestedNanowire Decoders Tilt as [73]Restoration nanowire FET CMOS inverters Crossbar latchTiming Precharge RC delay RC delay

+ Evaluation + latchingArea Addressing CMOS inverter least

limited limited areaSequential FETs + Out of N/A Latches + Out ofCircuits phase Clocks phase clocks

programming time of the crossbar latch is still on the order of several milliseconds. As the

technology of fabricating molecular devices is advanced, this bottleneck will be removed.

Table 6.1 shows a comparative summary of the main features of our crossbar latch based

architecture and the architectures proposed in [35] and [73]. The mapping of finite state

machines using our circuit architecture is straightforward, and the mapped circuit can be

easily simulated using the models presented in the previous sections.

Since the full structure is homogeneous, a place and route tool experiences great flexibility

in mapping a design. The only restriction that limits this flexibility is having to periodically

interface with the microscale circuits. This interfacing is required to provide access to the

input and output signals of the circuit and to program each nanowire junction.

6.8 Fault and defect Tolerance in nano FPGA

The defect tolerance in FPGAs or in a nano FPGA follows the same ideas that are used

in manufacturing memories. Instead of throwing away the whole part, a scheme is used

to mark the bad blocks and avoid using them by a higher level software tool. In hard-

drives, a table is used for bad sectors to avoid using them. In memories, address remaping

is usually used such that the module externally appears as fault free with the addresses

99

remaped to redundant resources. FPGAs by definition are composed of redundant similar

blocks. The configuration tool can be used to map around the defects as in the Teramac

project [28,36,132]. The merit of such techniques in cross bar arrays is analyzed in [50]. Fault

tolerance is different because it has to deal with error during the operation such as failure

of one block or faults due to transient errors and noise [118]. The Dual-FPGA architecture

developed in the ROAR (Reliability Obtained by Adaptive Reconfiguration) project at the

Stanford Center for Reliable Computing is an example of the design of reconfigurable system

with the capability of ”on-line” error detection, recovery and self-repair. The techniques

for fault detection use error masking techniques, triple modular redundancy and software

reconfiguration [82]. These techniques follow what we discuss in chapter 3 and can be adapted

easily to the nano-scale because they are all high level and are not directly associated with

the technology itself.

One example for error detection and autonomous repair can be built using a dual self-

checking pair (DSCP) system shown in Figure 3.1. This figure is repeated here as Figure 6.14

for convenience. In this system, each one of the four blocks, forming the two pairs, is a

reconfigurable array. Initially, a defect scanning system maps all the defects in the four

blocks. A configuration system then configures all four blocks to do the same functionality.

The second pair (”B” pair) is placed in standby. Calculations are performed in parallel by

the first pair. When a discrepancy occurs between their outputs, the high level configuration

system switches the calculations to the second pair. The defect scanning system analyzes,

the first pair to find a new map of defects and then reconfigures them using the redundant

available resources in each block. This is the repair step. Once the pair of blocks is repaired,

they are placed in standby until an error is detected in the active pair, and then the process

is repeated. This can repeat any number of times as long as there are available unused

resources in each one of the blocks.

100

Figure 6.14: Dynamic fault tolerant system

6.9 Conclusion

The regular structure of the crossbar array has been utilized in the literature to implement

simple PLA-like structures. However, these structures are not capable of integrating ho-

mogenous devices for signal buffering, restoration and inversion. In this chapter, we utilize

the crossbar latch to present the implementation of a complete nano scale system that is

independent from the microscale electronics except for initial programming and signal I/O.

The proposed circuit model was used in simulations of simple combinational and sequential

circuits in order to verify the concept of operation. Utilizing this full adder in an accumula-

tor is straightforward since the proposed circuit model is inherently pipelined and supports

signal latching. The crossbar latch is a homogeneous device within the crossbar array used

to implement the passive combinational circuits described so far in the literature. The se-

quential circuits proposed, are based on the crossbar latch and, thus, are more feasible to

implement compared to the previously described non-homogeneous structures. Such logic

structures are necessary for implementing finite state machines which are major building

blocks in a true nano FPGA and nano processor systems. As a proof-of-concept, we showed

successful simulation results for a simple shifter and structure and simulation of a T-flipflop.

We illustrated a possible organization of a nano FPGA, capable of performing both sequen-

tial and combinational logic and is composed of regular repetition of one type of device

101

resources. The architecture of a nano FPGA or a nano processor can be inhomogeneous

from a macro level point of view in the sense that it is composed of finite state machines,

memories, shift registers, adders,. . . , etc. However, all the blocks of a nano processor are

composed of the same building fabric; (the crossbar array), and, thus, can be implemented

using the simple bottom-up assembly fabrication techniques. An automation tool can di-

rectly map complex macro level functionality onto the crossbar fabric and convert parts of

it into crossbar latches. The latches or buffering arrays can be either inserted at the points

where the signal levels are degraded due to passing through a sequence of passive devices, or

simply inserted periodically. Defect tolerance is carried out by defect mapping and routing

around defects. Fault tolerance requires dynamic reconfiguration, and it can be implemented

at a higher level of design hierarchy, which monitors the performance of the circuit.

102

Chapter 7

Quantum Computing Alternative

In the previous chapters, we studied classical circuits that rely on quantum effects for their

operation such as the WPG device used in the hexagonal array and the hysteresis switch

used in the crossbar array. In this chapter, we look at using the quantum effects directly in

the form of quantum computation. We then investigate the possibility of implementing a

conventional computing architecture that can practically emulate the functionality of a hy-

pothetical quantum computer. Quantum computing algorithms are superior to conventional

algorithms because they can be used to evaluate multiple possibilities for the input values

in parallel using one set of hardware resources.

7.1 Introduction

The origin of quantum mechanics roots in describing physical phenomena that can not be

described by classical Newtonian physics. In quantum mechanics, energy is quantized, i.e.

takes discrete values. The black body radiation catastrophe is the failure of classical mechan-

ics to predict the radiation energy from a black body at increasingly shorter wavelengths.

The classical prediction was that the energy would indefinitely increase. Infinite energy ra-

diation is impossible and nonsensical. Max Planck’s proposal that the energy is quantized

produced a theory that perfectly matches the measured phenomenon and solves the prob-

lem. Einstein described light to consist of particles (photons), instead of just a classical

electromagnetic wave, in order to correctly explain the photo-electric effect. Electrons are

particles that exhibit wave like interferences as they pass through a double slit apparatus.

De Broglie, later, proposed that every particle is associated with a wave. The Schrodinger

wave equation describes the evolution of the wave associated with the particle and can be

103

used to calculate the energy of the particle and its position.

Matrix mechanics is an alternative way of describing the same physical phenomena by

using matrix algebra instead of solving Schrodinger’s equation. With time, quantum me-

chanics and matrix mechanics became one and the same. Due to certain properties of

matrix algebra, new properties of the physical systems have to be inferred. For example, the

Heisenberg uncertainty principle is a consequence of the non commutative property of ma-

trix multiplication: AB −BA 6= 0. Observables are physical properties of the system (like

position, momentum, energy, ...etc.). In matrix (quantum) mechanics, they are described by

Hermitian 1 operators in Hilbert space 2. Since matrix multiplication is non-commutative,

a measurement of position followed by a measurement of momentum is not equivalent to

measurement of momentum first then measurement of position. This leads to uncertainty

or error and thus, the Heisenberg’s uncertainty principle.

According to matrix (quantum) mechanics, a measurement on a system would give one

of the eigenvalues of the system, where the system state is described as a superposition of

eigenvectors with different complex weights. The Max Born rule states that the probability

of measuring such an eigenvalue is equal to the amplitude squared of such complex weights.

Quantum mechanics is now described as quantum physics. Quantum physics is now the

general theory to describe the world and classical physics is considered an approximation of

quantum physics when particles and their energies are described by quantities much larger

than Planck’s constant. The classical approximation in this case will be adequate because

the difference between energy quantum levels will not be measurable and continuum of

energy becomes a valid assumption for this class of particles. Also the wave properties

become impossible to observe as the particle size becomes much larger than its associated

wavelength which is given by De Broglie’s equation λ = hp.

1A Hermitian is a self adjoint matrix in which the matrix is equal to the complex conjugate transpose ofitself. U = U

†

2Hilbert space is a vector space with a norm (means length of a vector in that space is defined).

104

7.1.1 The qubit

Quantum computers are computers that utilize quantum bits arranged in a quantum register

as opposed to classical computers which use regular bits. In physical reality, a qubit is a

physical subsystem that can be described by two states. For example, a photon is a physical

system which we can described in terms of multiple physical subsystems. One possible

subsystem is the photon polarization which can take two states; either horizontal or vertical.

Another physical subsystem of the photon is its direction of travel inside a MachZehnder

interferometer (or even the Michelson interferometer) which restricts the direction of travel

to one of two possible value. Another physical system is the electron. A two state subsystem

of the electron is its spin which can be either up or down as in the Stern-Gerlach apparatus.

There are two main similarities between a classical bit and a quantum bit (qubit). The

two similarities are that the initial state of a qubit is always set to either zero or one and

also the final measured state of a qubit is either zero or one just as a classical bit. A

qubit, however, during computation is a superposition of the states one and zero and the

amplitudes of the superposition components are indication of the probabilities of measuring

the outcome of the computation at a certain level in the quantum circuit. A quibit in either

one state or a superposition of a state is described as being in a pure state. The mixed state

describes an ensemble of particles that follow a statistical distribution and the constituents

of that ensemble can take multiple values as in unpolarized light composed of an ensemble

of photos.. The mixed state is not the superposition state which describes the state of one

particle alone.

The qubit using Dirac’s notation is usually written as

|ψ〉 = α0 |0〉+ α1 |1〉 (7.1)

where the ket operators are vectors representing the two states of the system. These two

states are usually denoted as the state −1 (or spin down) represented by |1〉 and the state

105

Figure 7.1: Possible physical realizations of a qubit as a physical subsystem of a certainphenomenon. (a)The photon direction of travel is restricted to one of two values as inthe Mach-Zehnder interferometer with one photon entering the apparatus. (b)Single photondirection of travel in the Michelson interferometer with the directions not necessarily perpen-dicular but the system states are nevertheless orthogonal. (c)The Stern-Gerlach apparatuswith the electron spin (up or down) as the qubit.

106

+1 (or spin up) is represented by |0〉.3 |0〉 is

1

0

and |1〉 is

0

1

The rule that governs the coefficients α0 and α1 is that the sum of their squares must be

equal to one since the squares give the probabilities of measuring each state according to

the Max Born rule. Thus, apart from the initial set state and the final measured state,

the in-between computation internal state (or hidden state) of a qubit must be stored by

a classical computer in two complex fixed point registers because in general the coefficients

are complex and the qubit representation is on the surface of a Bloch sphere.

The maximum value of either coefficient is just one and thus we may assume that a

floating point representation is not necessary. If we assume that a 10-bit fixed point number

representation is sufficient precision for the coefficients then this translates to 40 bits of

classical storage just to represent the internal state of one qubit. However, since we still

have the constraint that the sum of the squares of the amplitudes must equal one then we

need to store only 3 real numbers. The internal qubit state may also be represented as a

density matrix given by

ρ = 0.5(I + βxσx + βyσy + βzσz) (7.2)

where βx, βy, βz are real numbers and I, σx, σy, σz are the Pauli matrices. The elements

of the density matrix directly represent the probabilities of the measurements performed on

the system. The Pauli matrices represent rotations around the Cartesian axes and they are

3It may be confusing to say that 1 is represented by a 0 but this is not the only context in which suchnotation is used.

107

Figure 7.2: Bloch sphere representation of possible states of a single qubit

given by:

σ0 = I =

(

1 00 1

)

(7.3a)

σx = X =

(

0 11 0

)

(7.3b)

σy = Y =

(

0 −ii 0

)

(7.3c)

σz = Z =

(

1 00 −1

)

(7.3d)

Since β2 = βx2 + βy

2 + βz2 = 1 on the surface of a sphere of radius 1, the spherical

coordinates r, θ, φ can be used with r = 1. Thus, actually only two parameters may be

needed to represent a qubit. The general representation for a qubit is

α0 = eiγcos(θ/2) (7.4)

α1 = eiγeiφsin(θ/2) (7.5)

γ is an overall phase factor that can be ignored because it is not observable [61, 77]. We

only need θ and φ. This means that two complex numbers which are actually four values are

represented in this case by only two values. One value is omitted because of the constraint

and the other is omitted because it is not observable. The Bloch sphere representation is

shown in Figure 7.2.

108

7.1.2 A system of more than one qubit

There is not much computation that can be done using one qubit. Since the initial state of

a qubit is set as 0 or 1 and the final measured state is also either 0 or 1, it is possible to say

that a single qubit computer cannot be more useful than a single bit computer that can only

do four elementary operations. These operations are: store 0, store 1, invert and leave as is.

If we consider the case of having two qubits, we begin to realize that there is a big difference

between two qubits and the case of two classical bits. if the state of the first qubit is Q1 and

the second qubit is Q2 then the state of the system of the two qubits is given by the tensor

product Q2 ⊗Q1. For example if Q1 = |0〉 and Q2 = |1〉 then the system state is 4

(

01

)

⊗(

10

)

=

0010

The alternative ways of writing this state are |1〉 ⊗ |0〉 or |1〉 |0〉 or |10〉

The state of two qubits is in general a superposition of the states |00〉, |01〉, |10〉, |11〉.

Where |00〉 is given as the vector

1000

and |01〉 =

0100

... etc.

ψ = α00 |00〉+ α01 |01〉+ α10 |10〉+ α11 |11〉 (7.6)

The initial interpretation of equation 7.6 in terms of hardware comparison between a

classical computer with two bits and a quantum computer with two qubits is that while

the two bits have a space of 4 values in which they can take only one value at a time,

the quantum bits are actually represented in a Hilbert space where they take all possible 4

values simultaneously at any step of the computation. The only restriction is that the sum of

the squares of the probability coefficients is 1. Thus, the internal (hidden) value of two bits

occupies just 2 bit locations. However, the hidden value of two qubits requires representation

by 4 complex fixed point registers (or 7 floating point numbers due to the constraint). In

general the hidden state of a quantum register with n qubits requires 2 ∗ 2n − 1 classical

4In Matlab: kron([0;1],[1;0])

109

fixed point registers. To illustrate this further consider that we need 128 qubits to solve a

certain problem. While a 128 classical bits can take any value in 2128 space, they can only

take one of these values at a time and to know the state of 128 bits we need to read exactly

128 bits. The state of a quantum register, however, may take the 2128 states simultaneously

and to store that in a classical computer we need to store the hidden value in a memory

with 128 address bits. This is a very large memory address space by today’s standards.

It is also required to update all the memory space at every step in the computation due

to entanglement. This example illustrates why the quantum computer is inherently more

powerful than a classical computer and why it would take a classical computer exponential

time and exponential resource requirements to emulate a quantum computer. The trivial

case, in which a qubit is stored in a single memory location capable of holding two complex

values and our 128 qubit example would translate to requiring only 128 memory locations

instead of 2128, is only valid if all the qubits are isolated from each other and not considered

in a single closed system [55].

7.1.3 Entanglement

Entanglement is a state of the quantum system in which, one operation on one qubit will

instantaneously affect the outcome of other qubits. This happens regardless of the distance

separating the two qubits and in an instantaneous sense that if we were to assume commu-

nication taking place between the two qubits, the information transfer has to happen at a

speed greater than the speed of light [41]. One interesting interpretation of entanglement

from reference [77] is that Bob and Alice got married and thus their lives became entangled.

After Alice became pregnant, Bob left on a trip at a speed close to the speed of light to the

edge of the galaxy. When Alice gave birth, Bob’s state instantaneously changed into a father

without any delay in communication or even requirement of communication. Assuming that

the state change required communication to take place then the special relativity theory is

violated.

110

In the Bell state, two qubits are called an EPR pair after Einstein, Podolsky and Rosen

who considered this interesting behaviour. In this state, a measurement of the first qubit

may yield either a zero or a one with equal probabilities. A subsequent measurement of the

second qubit, however, will yield the exact same outcome as the measurement of the first

qubit. The state of a system can be calculated by the tensor product of individual qubit

states. Inversely, we can factorize a state vector into a tensor product in order to arrive at

the contribution of each particle in the system. However, it is not always possible to make

this factorization. For example, the Bell state given by α00 = α11 = 1/√2 and α10 = α01 = 0.

or

ψ =1√2(|0〉 |0〉+ |1〉 |1〉) (7.7)

cannot be factorized into a tensor product. This means that the system state cannot be

described in terms of the contributions of its individual constituents. Another way to look

at it is the following: If the starting state of the system is

ψ = α00 |00〉+ α01 |01〉+ α10 |10〉+ α11 |11〉

and we make a measurement on the rightmost qubit alone and find out it is a zero, then the

system we are left with is described as ψ′q0 = α′

00 |00〉+ α′10 |10〉.

If the measurement yields a one in the rightmost qubit then the system state becomes

ψ′q0 = α′

01 |01〉+ α′11 |11〉.

If from the start, the states |10〉 and |01〉 did not exist, then measuring a zero in the rightmost

qubit will lead to a definite measurement of zero for the leftmost qubit and measuring a one

111

leads to a definite measurement of a one. The four Bell states for two qubits are:

ψ00 =1√2(|0〉 |0〉+ |1〉 |1〉) (7.8a)

ψ01 =1√2(|0〉 |0〉 − |1〉 |1〉) (7.8b)

ψ10 =1√2(|0〉 |1〉+ |1〉 |0〉) (7.8c)

ψ11 =1√2(|0〉 |1〉 − |1〉 |0〉) (7.8d)

The four Bell states for two qubits are linked by unitary transformations. Unitary transfor-

mations do not change the entanglement because they are reversible [104].

Multi-particle entanglement is defined using the GHZ state5 and the W state where for

N particles [46].

|GHZ〉 = 1√2

(

|0〉⊗N + |1〉⊗N)

(7.9)

|W 〉 = 1√N

(|1000〉+ |0100〉++ |0001〉) (7.10)

7.1.4 Quantum gates

The gates are a mathematical abstraction of the possible mathematical operations that can

be carried out on one or more qubits. In digital logic, all logic operations can be defined in

terms of a universal gate which is the Nand gate. In quantum computation, it is also possible

to define a set of universal gates that can be used to describe the computational steps of any

algorithm. One such set of gates is composed of single qubit operations (rotations) and two

qubit operations (controlled-Not or the quantum XOR).

5Greenberger - Horne - Zeilinger state

112

|φ〉 = H |ψ〉 = 1√2

1 1

1 −1

α0

α1

=1√2

α0 + α1

α0 − α1

=α0√2(|0〉+ |1〉) + α1√

2(|0〉 − |1〉) (7.11)

Other single qubit gates are the S, T , phase shift P , R, rotation operators R§,R†,R‡

and their complex conjugates.

S =

(

1 00 i

)

(7.12)

T =

(

1 0

0 eiπ4

)

(7.13)

Pθ =

(

eiθ 00 eiθ

)

(7.14)

Rθ =

(

1 00 eiθ

)

(7.15)

Rx(θ) =

(

cos(θ/2) −i sin(θ/2)−i sin(θ/2) cos(θ/2)

)

(7.16)

Ry(θ) =

(

cos(θ/2) − sin(θ/2)sin(θ/2) cos(θ/2)

)

(7.17)

Rz(θ) =

(

e−iθ/2 00 eiθ/2

)

(7.18)

Two qubit quantum gates are in general called a controlled-U operation, where U is

a rotation operation carried on one qubit. The other qubit is used as the control of the

operation such that the rotation operation is carried out if the control qubit is equal to

|1〉 and the target qubit is left as is if the control qubit is |0〉. As an example, the CNOT

gate (controlled NOT) which has the classical equivalent, the XOR gate. The input to the

CNOT is a two qubit system whose state is given by |ψφ〉. This state can be found using

a tensor product and thus |ψφ〉 =

(

α0β0 α0β1 α1β0 α1β1

)T

. The operation of the

113

Figure 7.3: CNOT gate

CNOT (XOR) is represented by the matrix

GCNOT =

1 0 0 00 1 0 00 0 0 10 0 1 0

(7.19)

The symbol for the CNOT is shown in Figure 7.3.

Gates that operate on three qubits are the Fredkin and Toffoli gates. The Toffoli gate is

a controlled-controlled-Not gate or CCNOT. It flips the third bit if both the first two bits

are 1. The Fredkin gate is a controlled-swap gate. It swaps bits two and three if the first bit

is 1.

GToffoli =

1 0 0 0 0 0 0 00 1 0 0 0 0 0 00 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 0 10 0 0 0 0 0 1 0

(7.20)

GFredkin =

1 0 0 0 0 0 0 00 1 0 0 0 0 0 00 0 1 0 0 0 0 00 0 0 0 0 1 0 00 0 0 0 1 0 0 00 0 0 1 0 0 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1

(7.21)

7.1.5 Matrix expansion and refactoring for quantum gates

The elementary gate operations are usually defined in terms of the exact number of qubits

that they operate on. Single qubit gates are described using 2 by 2 matrices. A two qubit

114

gate such as the CNOT has a 4 by 4 matrix which represents an operation on two qubits

where the leftmost (MSB) is the control qubit and the LSB qubit is the target. Fredkin and

Toffoli gates are described using 8 by 8 matrices with the target being the LSB qubit. Since

the size of a quantum register is arbitrary, say 8 qubits, and a CNOT gate may be used

with qubit 5 as the control and qubit 3 as the target for example, then how can the unitary

28by28 operation matrix be generated from the 4by4 CNOT matrix in equation 7.19? For

single qubit operations such as the Hadamard gate for instance, it is easy to generate the

required unitary matrix. For example, if we want to apply the Hadamard gate to the 6th

qubit in an 8 bit quantum register described as |Q7Q6Q5Q4Q3Q2Q1Q0〉 then the operation

is given by I ⊗H ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I, where I is the 2by2 identity matrix.

The general algorithm to generate the operation matrix, from any arbitrary one/two/three

qubit gates, is the following [127]:

1. Let matrix M (2n × 2n) be the desired operation matrix from the gate matrix

G (2m × 2m) where m < n

2. Let Q be a set of the indices of the qubits in the n qubit register that G should

operate on. Q’ is the set of the remaining indices.

3. Let Mij = 0 if the binary representation of i differs from the binary represen-

tation of j at the bit positions identified by the numbers in the set Q’.

4. Otherwise, let Mij = Gi∗j∗ where i* is the number constructed by concatenat-

ing the binary bits of i at the bit positions identified by the numbers in the

set Q. Same for j*.

It should be noted that we can avoid memory issues and not store matrix M at all. Every

generated element of M row-wise can be multiplied by the system state and the accumulated

result gives the corresponding row element in the new system state vector.

115

7.1.6 Quantum algorithms and the realization of quantum computers

In [123], Peter Shor classifies the known quantum algorithms into three categories. The first

category is based on using the Fourier transform to find periodicity. Under this category is

the factoring and discrete logarithm algorithms [122]. The second category contains Grovers

search algorithm, which can perform an exhaustive search of N items in time of the order√N [47]. The third category consists of algorithms for simulating or solving problems in

quantum physics as proposed by Feynman in 1982.

The basic implementation of a real quantum computer relies on trapping a particle that

exhibits quantum behaviour and manipulating this particle or group of particles using an

external stimulus. The last step is to read the result of the manipulation (calculation)

before the decoherence time has elapsed. Decoherence time limits the number of faultless

gate operations that a quantum circuit can perform.

Some of the known prototype implementations of quantum computers use techniques

such as the ion trap method, linear optical manipulation, the magnetic resonance technique

and super conductivity.

In summary, the steps needed in a quantum computer [150]:

• Initialize qubits to a known state.

• Implement one-qubit gate operations.

• Implement two-qubit gate operations.

• Measure the state.

• Isolate the system from the environment during the gate operations and the

measurement.

116

Optical quantum computers

These computers are based on manipulating photons generated from a laser source using

optical devices such as mirrors, beam splitters and optical filters [64,111]. Single-qubit gate

operations are performed using mirrors, beam splitters and phase shifters. Measurement

is done by single photon detectors. Photons tend not to interact with each other and thus

two-qubit gates are very difficult to implement and requires many devices and extra photons.

Entangled photons can be produced from a single source using a process called parametric

down conversion [154].

One way quantum Computers

A One-way quantum computer or cluster state quantum computer (QCC) is a model of a

universal quantum computer [22,110,146]. One possible implementation is based on photons.

The one-way quantum computer does not perform quantum logic on the individual qubits of

the cluster state. In this structure, a highly entangled state, called the cluster state, allows

for quantum computation by single-qubit measurements only. Because of the central role of

measurement, the one-way quantum computer is irreversible because measurement destroys

the state. The computation is performed in the following way:

• Qubits of QCC are brought into cluster state, which is independent of algo-

rithm and input, i.e. independent of the computational problem.

• Information is put in, processed and read out by single-qubit projective mea-

surements in directions depending on algorithm, input and sometimes on pre-

vious measurement results.

In the QCC , entangling the whole cluster once and subsequently performing all the mea-

surements is equivalent to simulating a quantum logic network gate by gate. The order and

choices of measurements determine the algorithm computed.

117

NMR based quantum computers

In NMR quantum computers, molecules in a fluid are manipulated by a magnetic field in a

fashion similar to magnetic resonance imaging. Operations are performed on the ensemble

of molecules through magnetic pulses applied perpendicular to a strong, static field, created

by a large magnet.

Ion traps based quantum computers

In ion traps, an atom is converted into an ion. The ion is confined in a trap by means of

an electromagnetic field acting on the charge. A laser is used to perform four tasks based

on the location where it hits, its energy and pulse width. The first task is setting the value

of the qubit by pumping the atom into a higher energy state. The second task is to apply

one-qubit operations and the third is applying two quantum gate operations. The fourth

task is measuring the state of the ion. Entanglement is the case of ions being strung together

in the field by their common vibrational modes. The main drawback of this technique is its

lack of scalability and slow operation.

The RezQu architecture

The RezQu architecture is a recent development. The Resonator/zeroQubit (RezQu) archi-

tecture is one of the enhancements to quantum states built out of superconducting circuits

using Josephson junctions.

7.2 Simulation of quantum computers

By simulation of a quantum computer we mean the implementation of the circuit model of

quantum computation, that was introduced by Deutsch, on a classical computer. The circuit

model is the model that describes the step by step evolution of the state of the quantum

register due to the application of reversible gates. Most of the simulation techniques involve

implementing matrix multiplications in one way or another. The operations involve applying

118

a 2n × 2n matrices representing a unitary operator to a 2n register holding all possible

superposition states of an n qubit system [107, 120]. Reduction of the complexity of matrix

multiplication resource requirement is the main issue in most of the literature as pointed out

in reference [56]. Quantum Information Decision Diagrams (QuIDD) is a technique that uses

Algebraic decision diagrams (ADD) in order to reduce the memory requirements for storing

the matrices [143, 144]. In this technique, the fact that the unitary operators involved in

circuit simulations are greatly regular and greatly sparse, is used to build decision diagrams

that represent the matrix. The benefits of such approach diminish quickly with increased

entanglement. Quantum multi-valued decision diagrams (QMDD) is another approach for

storing the matrices using decision diagrams to achieve similar goals with the benefits of less

memory requirements and faster simulations than QuIDD. These benefits can be attributed

to the difference in the implementation of the decision diagram simulator in QuIDD and

QMDD [43,80]. Worst case scenario still requires exponential complexity. The reason behind

this dead end is the mathematics involved in describing quantum systems. It is not possible

to describe a system in terms simpler than a mathematical formulation and in this case the

mathematical formulation requires exponential resources. The only way around this is to

invent new mathematics or a new mathematical formulation. The existing matrix and tensor

based formulation does not map to physical space. In the case of 2 qubits, the space is at

least 6 dimensional and has no intuitive interpretation.

7.3 Emulating quantum computation using classical resources

The direct method to emulate quantum computation is to build a system that computes

matrix operations and stores the intermediate results in registers. This is shown in Figure 7.4.

These intermediate results represent the hidden state of the quantum register. Measurement

can be emulated using pseudo-random number generators. This requires value thresholds

that are calculated in accordance with the squares of the amplitudes of the complex numbers

119

Figure 7.4: Direct implementation of a quantum emulator using registers and matrix oper-ations represented by gates

as stated by the Max Born rule. There are many attempts to implement this direct approach

in the literature [45, 62, 140]. This direct approach suffers from the problem of exponential

resource requirements, and it is limited to very simple operations. In the next sections, we

investigate the requirements of a quantum computer emulator and whether it is possible to

fully implement it using classical resources.

7.3.1 Approximate storage requirement for emulating a qubit

The storage requirement of the values of θ and φ from equation 7.5 depend on the practical

choice of angle resolution. The angle resolution can be a fixed parameter or a dynamic

parameter as we will show in the practical quantum emulator. The representation of the

angle projections of a point on the sphere requires lookup tables for the sin and cos functions.

The lookup tables are indispensable since the operations on more than one qubit will involve

the manipulation of α0 and α1 (or sin(θ/2), sin(φ), cos(θ/2) and cos(φ) ) not the angles

themselves unless the operation is a pure rotation carried on one qubit only. In fact all

the elementary operations that can be performed on a single qubit are only rotations. The

number of entries in the lookup table is a function of the angle resolution while the number

of bits per entry is a function of the required precision of the representation of the sin and

cos functions. It is sufficient to store sin / cos pairs of 1/8 of the angles between 0 and 360

120

Table 7.1: Sin/Cos reduced lookup table by exploiting Sin/Cos octant symmetry

Angle sin cos0 : 45 sin(0 : 45) cos(0 : 45)45 : 90 cos(45 : 0) sin(45 : 0)90 : 135 cos(0 : 45) − sin(0 : 45)135 : 180 sin(45 : 0) − cos(45 : 0)180 : 360 − sin(0 : 180) − cos(0 : 180)

or just up to the 450 angle. This can be explained by that the values of the sin function

for the angles from 45 to 90 are identical to the values of the cos function for the angles

decrementing from 45 to 0 while the cos of the angles from 45 to 90 are identical to the

values of the sin function for the angles decrementing from 45 to 0. Thus for the angles 45

to 90 we only need to reverse the reading of the lookup table and swap the values of the sin

and cos. By symmetry we get the values of the sin and cos in the other 3 quadrants by just

manipulating a sign bit as shown in table 7.1.

7.3.2 Qubit representation using algebraic integers

Storing tables of trigonometric function values using finite precision leads to accumulating

errors in the emulation of quantum calculations. To avoid this problem, algebraic integers

can be used to represent a complex number [75]. A complex number can be represented by

the following equation:

Z(ω) =

R/2−1∑

j=0

ajωj (7.22)

where ajǫZ (set of integers), and ω = e2πR . When R = 4, this represents the usual way of

representing a complex number using a real part and an imaginary part, albeit the numbers

used are restricted to being integers. These integer values are an interpretation of the

finite number of bits used to represent a number in memory. When R = 12, which means

six integers are used to represent a single complex number, the representation becomes

dense and greater accuracy can be achieved in the computation as shown in Figure 7.5.

121

Mathematical operations such as addition and multiplication are defined on the algebraic

integer representation [76].

7.3.3 Emulating superposition of states

Systems that deal with multiple logic values in excess of two, usually use multiple wires

to represent a single value. Digital communication systems deal with binary signals, but

may transmit several data streams simultaneously in a single channel (space as the typical

single communication channel). These (transmit/receive) systems are digital computers

that represent information in an analog form and superimposes multiples of data sources in

a single channel. We can exploit this analogy to build a system that has multiple states in

a single wire and thus greatly reduce the number of circuit elements required to handle the

information. This system would store the internal (hidden) state of the quantum system

using a system of orthogonal signal generators. These signal generators can be random noise

generators, or sinusoidal generators, or even generators of dilated wavelet bases. A single

hardware FFT-calculator block; from an OFDM system, can be used to generate all the

sinusoids required to emulate the state of a quantum register in superposition.

Wavelets can produce similar results with better spectral performance than time gated

sinusoids, because of their inherent fast spectral decay. Orthogonality is inherent in con-

structed wavelets such as the Daubechies family of wavelets, because it is a condition in

their construction. Dilated/translated versions of a Haar wavelet are clearly orthogonal as

shown in Figure 7.6. A similar diagram for a Daubechies-2 wavelet has a similar characteristic

although it may not be clear from the plots in Figure 7.7.

7.3.4 Emulating entanglement

Although entanglement is different from classical correlation in the sense that it does not re-

quire physical communication or interaction between the entities, we may still try to emulate

it using classical methods. The method we investigated is to have control signals between

122

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2R=4 variables=2

Real part

Imag

inar

y pa

rt

(a)

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8R=12 variables=6

Real part

Imag

inar

y pa

rt

(b)

Figure 7.5: Complex number representation using algebraic integers (a)R=4 or using 2variables. (b)R=12 or using 6 variables.

123

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−1

0

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−1

0

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−2

0

2

Figure 7.6: Orthogonality of dilated Haar wavelets. The translation is zero.

0 2 4 6 8 10 12 14−1

0

1

2

0 2 4 6 8 10 12 14−1

0

1

0 2 4 6 8 10 12 14−2

0

2

Figure 7.7: Dilated Daubechies-2 wavelet.

124

the qubit signal generators. If a qubit is represented as a phase locked loop (PLL) with

its control voltage coming from another PLL, then by manipulating one PLL (one qubit),

we immediately change the output (state) of the other. We tried to extend this idea to

more than two qubits with no success because it does not scale. The number of connections

required will still be exponential and the system will take time to reach a stable state. The

model for such connectivity and control is the vector representing the state of the quantum

register. This vector is a simple representation, and it is not possible to use a more complex

representation (such as PLLS and control voltages) in order to reach a simpler representation

than the starting point. In fact, a mathematical representation is the simplest representa-

tion, and is a terminal end. It is used to model a physical system. However, a physical

system can not be used to efficiently model a mathematical representation. Although this

result may seem obvious, the large number of attempts to build a quantum emulator proves

otherwise. The reason in my opinion, is that the mathematics used in quantum mechanics

seems very simple (simple when compared to modern communication systems, for instance),

and this makes it difficult to understand why it is not possible to build a classical system

that can operate in a similar way to the quantum system.

7.4 Conclusion

In this chapter we gave an introduction to qubits and how they are different from classical

bits and the concept of entanglement. We illustrated the difficulty of simulating a quantum

computation step using classical computation which suffers from bottlenecks in memory size,

memory access and control of massively parallel flow of information. The technique used in

all the simulators is to calculate a series of matrix tensor products to find the final proba-

bilistic state of the system. The size of the matrices involved grows exponentially with the

number of qubits. A hardware accelerated emulator may use a coprocessor to accelerate

computing matrix products or to use various algebraic techniques for matrix factorization

125

and sparse matrices to reduce the amount of computation required. This, however, does not

account for worst case scenarios, which are typical in the case of Schor’s algorithm due to

entanglement. We explored the option to emulate the quantum computation using orthog-

onal signal presentations such as orthonormal wavelets or OFDM. However, it is impossible

to describe the gate model of quantum computation in terms that are simpler than the ma-

trix tensor products. Orthogonal systems can be used to describe superposition of states,

but they cannot be used to emulate entanglement even using control signals interconnected

between the signal sources. The reason is the complexity required without any benefit for

simplifying the system representation.

126

Chapter 8

Conclusions and Future Work

In this thesis, the following contributions are made:

1. Nanoscale devices, such as the Wrap-gate quantum wire devices in hexagonal

BDD-based arrays fabricated at Hokkaido University by the group of profes-

sor Kasai, are prone to errors due to noise and manufacturing defects. A

technique for fault tolerance in BDD based circuits using error-correction has

been developed in this thesis to address this problem. A tool to automate the

generation of such circuits was created.

2. The hexagonal nanowire arrays are implemented in a planar technology. Since

the error-correcting BDDs have complicated structure with multiple crossing,

they cannot be directly mapped to a planar layout. A tool that automates the

generation of planar layouts for such circuits has been created in this research.

This tool is planned to be used by the group of professor Kasai.

3. A typical molecular circuit assembly, which is the crossbar nanoarray, has

been investigated, and simulation results and performance evaluation for such

architectures are derived. Fault tolerance for the crossbar circuit rely on the

regular structure composed of identical devices with a lot of routing resources.

These resources are evaluated in the thesis to be used to dynamically route

around defects.

4. An alternative nanocomputing model, namely, the gate model of quantum

computation has been investigated. It was found that it is not feasible to

emulate it using orthogonal signal presentations such as orthonormal wavelets

127

or OFDM, since it was shown to be impossible to describe the gate model of

quantum computation in terms that are simpler than the matrix operations.

Orthogonal systems can be used to describe superposition of states, but they

are unable to account for interaction between states and entanglement.

For future work, we propose the following:

1. Add the capability of BDD decomposition to our BDD tool such that it pro-

duces better results for synthesizing large planar diagrams. Decomposition of

large binary circuits is a familiar topic and has been implemented in the ABC

academic tool [15].

2. Study the memristor crossbar array in comparison to the crossbar latch ar-

ray [14, 59, 63].

3. Implement the autonomous self repair mechanism discussed in section 6.8 using

dual self checking pairs. An FPGA partitioned into 4 zones will be used to

represent the 4 blocks. A CPU running the error checking and repair algorithm

will be used to monitor the performance of the active blocks and it will carry

out the task of reconfiguring the standby faulty blocks.

4. Investigate new candidates for quantum computing, other than the difficult to

implement gate model, such as the adiabatic quantum computation and the

quantum topological computation [70, 115].

128

Appendix A

BDD processor tool

In this thesis work, a software called BDD processor tool was developed. The software is

written in C++ and C♯. It is possible to recompile the code on a Linux box using Mono. The

software uses Graphviz internally to generate postscript file for the reduced BDD. (Visit the

Graphviz website to download and install it). The tool’s ”bin” folder contains the executable

”BDDprocessor.exe” which launches the GUI of the tool shown in Figure A.1.

From the File menu, select Open file. The current version accepts two file formats, PLA

and BLIF as shown in Figure A.2. Click the button ”Process File”. (This step generates

a reduced shared BDD, a variable ordering file and uses the dot program from graphviz to

produce a graphical BDD in both jpg and postscript formats. All the generated files will be

located in the same folder as the input file.)

You can select variable ordering for the BDD diagram and then click ”Process” again.

It is possible to add more ordering options by editing the source code of the software. The

ordering is executed by CUDD library. In this version, we have the most popular reordering

methods which are the SIFT, exact reordering (for the best possible result but at the expense

of processing time) and the manual reordering.

The Tools menu, shown in Figure A.4, presents the currently available tools to process

the diagram. The first tool is to generate a planar BDD for the generation of a planar layout.

The second tool is to generate a Spice netlist and simulating it using Ngspice. The third

tool is to construct an error correcting PLA description based on the logic function of the

BDD.

Click ”Planarize” to start the planar layout generation program shown in Figure A.5.

Select the algorithm and whether to connect nodes to the terminal zero or leave them floating.

129

Figure A.1: Software main window

130

Figure A.2: Open file type pla or blif

131

Figure A.3: BDD variable reordering choices

Figure A.4: Tools menu

132

Figure A.5: Planar layout generation

Click ”Process File”. Dummy nodes will be created and inserted to make routing only

between adjacent levels. The log window will report the number of duplicated nodes required

and produce a planar BDD.

If you unselect the ”Connect to zero”, you will get an output similar to the one in

Figure A.6.

There are several options to export the generated layout as shown in Figure A.7. There

is a text description that can be mated to any of the popular tools, and it is possible to

export images. The number of images and their sizes are controlled by the grid size and the

bitmap width in the layout generation window.

For SPICE simulations,we launch the tool shown in Figure A.8 and supply a BDD node

133

Figure A.6: Planar layout without connections to a zero terminal

Figure A.7: Planar layout export options

134

description, an inverter description and a transistor models file.

The inverter description has to use the following node names: Nvdd, Nin, Nout.Example:

MQ1 Nvdd Nin Nout Nvdd pmosmodel (L=16nm)

MQ2 Nout Nin 0 0 nmosmodel (L=16nm)

The BDD node description has to use the following node names:

Nvdd, Ns, Ns_bar, Nin1, Nin2, Nout

. Example:

MQ1 Nin1 Ns Nout Nin1 nmosmodel (L=16nm)

MQ2 Nin1 Ns_bar Nout Nout pmosmodel (L=16nm)

MQ3 Nin2 Ns_bar Nout Nin2 nmosmodel (L=16nm)

MQ4 Nin2 Ns Nout Nout pmosmodel (L=16nm)

The transistor model file includes the model description for BSIM4 (Spice level 54).

The simulation window has the options to define complement signal in terms of inverters

or voltage sources or ignore them (short to ground) if a BDD node generates its own control

signal complement internally. The control signal grid allow the definition of constant or

periodic voltage sources with added noise. The absolute noise power is required as an input

in the grid. For a specific SNR, refer to the discussion on calculating the noise power. Click on

run simulation and this will launch Ngspice and simulate the circuit. The output log file will

be displayed in the tab called ”Sim Log”. To view the waveforms, there are two options. The

first option is to use the waveform plotter with the software. Click on the button ”waveform

plotter”. The second option is to use Matlab and the script ”ngspice.m”, included with this

software. This script will parse the Ngspice output and place corresponding variables in the

Matlab workspace.

135

Figure A.8: Spice netlist generator and simulator window

136

Figure A.9: Error Correction PLA generation window

137

The third tool shown in Figure A.9 is the error correction generation tool. In this tool,

the diagram is analyzed to produce a complete truth table of the logic functions in the shared

diagram. A Hamming-code parity generator matrix is automatically created and truncated

if necessary in the case of shortened codes. Shortened codes are the Hamming codes that

do not satisfy the general rule for code length which is 2m − 1, 2m −m − 1 where m is the

number of parity bits. For example, the (5, 2) code used to encode two bits. It is possible

to edit the parity generator matrix before applying it. The tool will then generate a new

PLA file that describes the diagram with error correction. The number of inputs in the file

is increased due to the addition of the parity bits.

138

Appendix B

SPICE net listings for crossbar circuits

The following are SPICE net listings used in the simulations of the crossbar based nanocircuits.

Nano-adder

* source NANOADDER

V_VCp1 N226723 0 PWL TIME_SCALE_FACTOR=1e-8 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,5) (100,5)

+ ENDREPEAT

V_VCm0 N226742 0 PWL TIME_SCALE_FACTOR=1e-8 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (1,-15) (3.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,0) (100,0)

+ ENDREPEAT

V_Vx N226175 0

+PULSE 5 0 0 1n 1n 1u 2u

R_R1 0 N226983 100Meg


+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,0) (100,0)

+ ENDREPEAT


+ REPEAT FOREVER

+ (1,-15) (3.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,5) (100,5)

+ ENDREPEAT

R_R2 0 N227012 100Meg

V_Vy N226179 0

+PULSE 5 0 0 1n 1n 2u 4u

D_Adder_D3 Adder_N01927 Adder_N00766 Dbreak



D_Adder_D12 Adder_N04325 N226983 Dbreak


R_Adder_R4 Adder_N04325 VCC_BAR 100k

R_Adder_R9 0 N226983 1Meg





R_Adder_R8 0 N227012 1Meg











139

X_Adder_clatch6_S1 N226801 Adder_clatch6_N180534 N226801

+ Adder_clatch6_N175838 SCHEMATIC3_Adder_clatch6_S1

R_Adder_clatch6_R10 Adder_clatch6_N175838 Adder_N37004 1k

D_Adder_clatch6_D18 N226645 Adder_clatch6_N180534 Dbreak

C_Adder_clatch6_C1 0 Adder_N37004 10p

R_Adder_clatch6_R7 Adder_N37004 Adder_clatch6_N175820 1k

X_Adder_clatch6_S3 Adder_clatch6_N180534 N226777 Adder_clatch6_N175820

+ N226777 SCHEMATIC3_Adder_clatch6_S3




















































V_Adder_V1 VCC_BAR 0 5Vdc


V_Vz N226645 0

+PULSE 5 0 0 1n 1n 4u 8u

.subckt SCHEMATIC3_Adder_clatch6_S1 1 2 3 4

S_Adder_clatch6_S1 3 4 1 2 _Adder_clatch6_S1

RS_Adder_clatch6_S1 1 2 1G

.MODEL _Adder_clatch6_S1 VSWITCH Roff=1e6 Ron=1.0 VH=10 VT=0 TD=0

140

.ends SCHEMATIC3_Adder_clatch6_S1
























































141

4-1 nano Multiplexer

* source NANOADDER

.EXTERNAL INPUT A

.EXTERNAL INPUT B

.EXTERNAL INPUT C

.EXTERNAL INPUT D

.EXTERNAL INPUT S0

.EXTERNAL INPUT S1

.EXTERNAL INPUT VCm0

.EXTERNAL INPUT VCm1

.EXTERNAL INPUT VCp0

.EXTERNAL INPUT VCp1

.EXTERNAL OUTPUT Mux

D_D20 N258177 N258545 Dbreak

D_D3 N256495 N256357 Dbreak

D_D16 N256495 B Dbreak

D_D21 N256295 D Dbreak

D_D1 N256295 N256357 Dbreak


+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,5) (100,5)

+ ENDREPEAT

D_D17 N258177 A Dbreak

V_Vz N559996 0

+PULSE 5 0 0 1n 1n 4u 8u

X_clatch1_S1 VCM0 clatch1_N180534 VCM0 clatch1_N175838 SCHEMATIC3_clatch1_S1

+

R_clatch1_R7 N256357 clatch1_N175820 1k

R_clatch1_R10 clatch1_N175838 N256357 1k

C_clatch1_C1 0 N256357 1n

D_clatch1_D18 S0 clatch1_N180534 Dbreak

X_clatch1_S3 clatch1_N180534 VCP1 clatch1_N175820 VCP1 SCHEMATIC3_clatch1_S3

+


+ REPEAT FOREVER

+ (1,-15) (3.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,0) (100,0)

+ ENDREPEAT

D_D8 N256535 N258555 Dbreak

D_D25 N258177 MUX Dbreak


+






+

V_V1 VCC_BAR 0 5Vdc

V_Vx N559860 0

+PULSE 5 0 0 1n 1n 1u 2u

D_D14 N258177 N258583 Dbreak


+






142

+


+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,0) (100,0)

+ ENDREPEAT


+






+

D_D13 N256495 N258583 Dbreak


D_D4 N256295 N258555 Dbreak


+ REPEAT FOREVER

+ (1,-15) (3.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,5) (100,5)

+ ENDREPEAT

R_R1 N256295 VCC_BAR 100k




D_D12 N256535 N258545 Dbreak


R_R8 0 MUX 1Meg

V_Vy N559866 0

+PULSE 5 0 0 1n 1n 2u 4u


D_D15 N256535 C Dbreak

.subckt SCHEMATIC3_clatch1_S1 1 2 3 4

S_clatch1_S1 3 4 1 2 _clatch1_S1

RS_clatch1_S1 1 2 1G

.MODEL _clatch1_S1 VSWITCH Roff=1e6 Ron=1.0 VH=10 VT=0 TD=0

.ends SCHEMATIC3_clatch1_S1























143














.model Dbreak d

+ is=1e-006

+ cjo=1e-013

+ rs=0.1

+ vj=0.2

Shift Register

* source NANOADDER

X_clatch1_S1 N226983 clatch1_N180534 N226983 clatch1_N175838

+ SCHEMATIC3_clatch1_S1


D_clatch1_D18 N227088 clatch1_N180534 Dbreak

C_clatch1_C1 0 N232294 10p


X_clatch1_S3 clatch1_N180534 N228190 clatch1_N175820 N228190






C_clatch4_C1 0 N229783 10p




V_Vx N227088 0

+PULSE 5 0 0 1n 1n 2u 3u

R_R5 0 N229783 100k


+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,5) (100,5)

+ ENDREPEAT


+ REPEAT FOREVER

+ (1,-15) (3.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,0) (100,0)

+ ENDREPEAT





C_clatch2_C1 0 N239754 10p



144



+ REPEAT FOREVER

+ (50,5) (51,15) (53.3,15)

+ (53.5,-7.5) (57,-7.5)

+ (57.5,5) (100,5)

+ ENDREPEAT


+ REPEAT FOREVER

+ (50,0) (51,-15) (53.3,-15)

+ (53.5,12.5) (57,12.5)

+ (57.5,0) (100,0)

+ ENDREPEAT





C_clatch3_C1 0 N238779 10p











































145


Toggle T-flipflop

* source T-flipflop

.EXTERNAL OUTPUT Qo

.EXTERNAL INPUT Toggle

V_VCp4 VCP1Q 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (50,5) (51,15) (53.3,15)

+ (53.5,-7.5) (57,-7.5)

+ (57.5,5) (100,5)

+ ENDREPEAT

R_R3 0 QO 1MEG

V_VCm4 VCM0Q 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (50,0)(51,-15) (53.3,-15)

+ (53.5,12.5) (57,12.5)

+ (57.5,0) (100,0)

+ ENDREPEAT

D_D8 N229934 N413328 Dbreak

V_VCp2 VCP1T 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,5) (100,5)

+ ENDREPEAT

V_VCp5 VCP0Q 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (50,0) (51,15) (53.3,15)

+ (53.5,-7.5) (57,-7.5)

+ (57.5,0) (100,0)

+ ENDREPEAT

V_VCm2 VCM0T 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (1,-15) (3.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,0) (100,0)

+ ENDREPEAT

V_VCm5 VCM1Q 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (50,5)(51,-15) (53.3,-15)

+ (53.5,12.5) (57,12.5)

+ (57.5,5) (100,5)

+ ENDREPEAT

D_D9 N229934 N229624 Dbreak

D_D11 N229934 QO Dbreak

V_VCp3 VCP0T 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (1,15) (3.3,15)

+ (3.5,-7.5) (7,-7.5)

+ (7.5,0) (100,0)

+ ENDREPEAT

D_D1 N229644 N292674 Dbreak

V_VCm3 VCM1T 0 PWL TIME_SCALE_FACTOR=1e-7 VALUE_SCALE_FACTOR=1

+ REPEAT FOREVER

+ (1,-15) (1.3,-15)

+ (3.5,12.5) (7,12.5)

+ (7.5,5) (100,5)

+ ENDREPEAT

X_clatch3_S1 VCM0T clatch3_N180534 VCM0T clatch3_N175838




146



X_clatch3_S3 clatch3_N180534 VCP1T clatch3_N175820 VCP1T


D_D4 N229644 N258593 Dbreak

R_R1 N229644 VCC_BAR 10K

C_C1 0 QO 1P

R_R2 N229934 VCC_BAR 10K

D_D10 N229644 QO Dbreak

X_clatch5_S1 VCM0Q clatch5_N180534 VCM0Q clatch5_N175838






X_clatch5_S3 clatch5_N180534 VCP1Q clatch5_N175820 VCP1Q


R_R4 QO N6349720 1k









V_Vx TOGGLE 0

+PULSE 0 5 50u 1n 1n 8u 50u




D_clatch1_D18 TOGGLE clatch1_N180534 Dbreak





D_D12 N6349720 N638755 Dbreak

V_V1 VCC_BAR 0 5Vdc




D_clatch2_D18 TOGGLE clatch2_N180534 Dbreak























147

































148

Appendix C

Matlab code for the simulations

BDD node and TMR error correcting BDD node

function y=bddnode(s,x0,x1)

if s==1

y=x1;

else

y=x0;

end

function y=ecbddnode(p1,p0,s,x0,x1)

cs = [p1 p0 s];

n1out = bddnode(cs(1),x0,x1);

n2out = bddnode(cs(2),x0,n1out);

n3out = bddnode(cs(2),n1out,x1);

n4out = bddnode(cs(3),n2out,n3out);

y=n4out;

Reliability Simulation of Error Correcting 2x2 Adder

%test reliability of EC-adder

clear,clc,close all

m=3; % n=2^m-1, k=n-m, parity bits=m

[h,g,n,k] = hammgen(m);

b=(0:3)’;

addersum = [];

inputmat=double(dec2bin((0:2^k-1)’))-’0’;

for a=0:3

addersum = [addersum;a+b];

end

outputvector=dec2bin(addersum)-’0’;

inputvectors=mod(inputmat(:,:)*g,2);

%%

%error analysis

niterations = 500;

p=0.0:0.05:0.5;

ecreliability=zeros(length(p),3);

xx = rand(n,niterations);

for kp = 1:length(p)

disp(p(kp));

nerrors = [0 0 0];

for kit = 1:niterations

sigerror = xx(:,kit)<=p(kp);

for kinp=1:2^k

inval=xor(inputvectors(kinp,:),sigerror’);

msg = decode(inval,n,k,’hamming’);

%nout = [-1 -1 -1];

nout=outputvector(1+bin2dec(char(msg+’0’)),:);

nerrors = nerrors+(nout~=outputvector(kinp,:));

149

end

end

ecreliability(kp,:) = 1-nerrors/niterations/2^k;

end

%% test TMR using ecbddnode

tmreliability=zeros(length(p),3);

xx = rand(k*3,niterations);


disp(p(kp));

nerrors = [0 0 0];



for kinp=1:2^k

inval=xor([inputmat(kinp,1),inputmat(kinp,1),inputmat(kinp,1),...

inputmat(kinp,2),inputmat(kinp,2),inputmat(kinp,2),...

inputmat(kinp,3),inputmat(kinp,3),inputmat(kinp,3),...

inputmat(kinp,4),inputmat(kinp,4),inputmat(kinp,4)],sigerror’);

n1=ecbddnode(inval(10),inval(11),inval(12),0,1);

n22=n1;

n6=ecbddnode(inval(10),inval(11),inval(12),1,0);

n23=n6;

n2=ecbddnode(inval(4),inval(5),inval(6),0,n1);

n20=n2;


n19=n2;

n11=ecbddnode(inval(4),inval(5),inval(6),n22,n23);

n3=ecbddnode(inval(7),inval(8),inval(9),n2,1);






nout=[n5 n10 n11];


%if nout~=outputvector(kinp);

% nerrors = nerrors+1;

%end

end

end

tmreliability(kp,:) = 1-nerrors/niterations/2^k;

end

%%

noecreliability=zeros(length(p),3);

xx = rand(k,niterations);


disp(p(kp));

nerrors = [0 0 0];



for kinp=1:2^k

inval=xor(inputmat(kinp,:),sigerror’);

n1=bddnode(inval(4),0,1);

n22=n1;

n6=bddnode(inval(4),1,0);

n23=n6;

n2=bddnode(inval(2),0,n1);

n20=n2;


n19=n2;

n11=bddnode(inval(2),n22,n23);

n3=bddnode(inval(3),n2,1);





150


nout=[n5 n10 n11];


end

end

noecreliability(kp,:) = 1-nerrors/niterations/2^k;

end

%%

%[s2 s1 s0]

figure(1);subplot(131)

plot(p,ecreliability(:,1),p,tmreliability(:,1),’b:’,p,noecreliability(:,1),’-.’);

title(’s_2’);

ylabel(’Reliability’);

set(gca, ’Units’, ’Normalized’)

P=get(gca,’Position’);

set(gca, ’Position’, [P(1) P(2)*0.75 P(3) P(4)])

subplot(132)


title(’Reliability Simulation of EC 2 bit adder’;’s_1’);

xlabel(’Probability of Error’)




subplot(133)


title(’s_0’);

legend(’Hamming’,’TMR’,’none-EC’)




figure(2);

plot(p,mean(ecreliability’),p,mean(tmreliability’),’b:’,p,mean(noecreliability’),’-.’)

legend(’Hamming’,’TMR’,’none-EC’)

xlabel(’Probability of Error’)

title([’Average Reliability of EC 2 bit adder’]);

Crossbar latch switch simulation

%vswitch simulation

Vcontrol=[[-15:0.2:15],[15:-0.2:-15]];

Vinput=0:0.2:5;

Vhyst=10;

switchstate=0;

vout=zeros(length(Vinput),length(Vcontrol));

for k=1:length(Vinput)

for kk=1:length(Vcontrol)

if ((switchstate==0) && (Vcontrol(kk)+Vinput(k)>=Vhyst))

switchstate=1;

end

if ((switchstate==1) && (Vcontrol(kk)+Vinput(k)<=-Vhyst))

switchstate=0;

end

if(switchstate==1)

vout(k,kk)=Vcontrol(kk);

else

vout(k,kk)=Vinput(k);

end

end

151

end

surf(Vcontrol,Vinput,vout);

colormap hsv

xlabel(’Control Voltage’);

ylabel(’Input signal’);

zlabel(’Output signal’)

colorbar

Representation of a complex number using algebraic integers

close all, clc, clear

for R = 4:4:12;

nvars = floor(R/2);

wj = exp(2*pi*j/R*[0:nvars-1]);

airange = -2:2;

ais = ones(1,nvars)*min(airange);

cnumbers = zeros(1,length(airange)^nvars);

counter=1;

while ais(nvars)<=max(airange);

cnumbers(counter)=sum(ais.*wj);

counter=counter+1;

ais(1)= ais(1)+1;

for k=1:nvars-1

if ais(k)>max(airange)

ais(k)=min(airange);

ais(k+1)=ais(k+1)+1;

end

end

end

figure;plot(real(cnumbers),imag(cnumbers),’.’)

title([’R=’,num2str(R),’ variables=’,num2str(nvars)]);

xlabel(’Real part’); ylabel(’Imaginary part’);

end

Calculating the Quantum Fourier Transform (QFT) of 3 Qubits

%quantum simulation of 3 bit QFT

clc

h=hadamard(2)/sqrt(2);

p0=[1 0]’;

p1=[0 1]’;

cnot=[eye(2),zeros(2);zeros(2),1-eye(2)]; %4by4

toffoli=[eye(6),zeros(6,2);zeros(2,6),1-eye(2)]; %8by8

cR2=[eye(3),zeros(3,1);zeros(1,3),i]; %4by4

cR3=[eye(3),zeros(3,1);zeros(1,3),sqrt(i)]; %4by4

%%%Set inputs

nqubits=3;

nops=6; %number of operations

qbits=zeros(2,1,nqubits);

qbits(:,:,1)=p0;

qbits(:,:,2)=p1;

qbits(:,:,3)=p0;

qops=zeros(2^nqubits,2^nqubits,nops);

qops(:,:,1)=kron(kron(h,eye(2)),eye(2));

qops(:,:,2)=kron(cR2,eye(2));

qops(:,:,3)=kron(cR3,eye(2));

qops(:,5:6,3)=qops(:,6:-1:5,3);

qops(5:6,:,3)=qops(6:-1:5,:,3);

152

qops(:,:,4)=kron(kron(eye(2),h),eye(2));

qops(:,:,5)=kron(eye(2),cR2);

qops(:,:,6)=kron(kron(eye(2),eye(2)),h);

systemstate=1;

for k=1:nqubits

systemstate=kron(systemstate,qbits(:,:,k)); %initial system state

end

for k=1:nops

systemstate=[systemstate,qops(:,:,k)*systemstate(:,end)];

end

systemstate

imagesc(abs(systemstate.^2))

colormap(1- gray)

set(gca,’XTick’,[0:nops+1],’XTickLabel’,[-1:nops],...

’YTick’,[0:2^nqubits],’YTickLabel’,dec2bin([-1:2^nqubits-1]));

xlabel(’computation step’);

ylabel(’quantum state’);

title(’absolute probabilities’)

153

Bibliography

[1] Carbon nanotubes & buckyballs. http://education.mrsec.wisc.edu/nanoquest/

carbon/index.html. [Online; accessed December 2010].

[2] National nanotechnology initiative. http://www.nano.gov/. [Online; accessed March

2012].

[3] A. Abdollahi. Probabilistic decision diagrams for exact probabilistic analysis. In Proc.

IEEE Int. Conference on Computer-Aided Design. ICCAD, pages 266–272, 2007.

[4] E. Ahmed and J. Rose. The effect of LUT and cluster size on deep-submicron FPGA

performance and density. IEEE Transactions on VLSI Systems, 12:288, 2004.

[5] M. A. Amiri, M. Mahdavi, and S. Mirzakuchaki. QCA implementation of a MUX-

based FPGA CLB. In Proc. Int. Conference on Nanoscience and Nanotechnology.

ICONN, 2008.

[6] M Andrecut. Stochastic recovery of sparse signals from random measurements. Engi-

neering Letters, 19(1):1–6, 2011.

[7] H. Astolaa, S. Stankovic, and J. T. Astola. Error-correcting decision diagrams. In

Proc. 3rd Workshop on Information Theoretic Methods in Science and Engineering,

august 2010.

[8] H. Astolaa, S. Stankovic, and J. T. Astola. Error-correcting decision diagrams for

multiple-valued functions. In Proc. 41st IEEE Int. Symposium on Multiple-Valued

Logic. ISMVL, 2011.

[9] M. D. Austin et al. Fabrication of 5nm linewidth and 14nm pitch features by nanoim-

print lithography. Applied Physics Letters, 84:5299–5301, 2004.

154

[10] A. Avizienis et al. The STAR (self-testing and repairing) computer: An investigation

of the theory and practice of fault-tolerant computer design. IEEE Transactions on

Computers, 100(11):1312–1321, 1971.

[11] R. Bahar, J. Chen, and J. Mundy. A probabilistic-based design for nanoscale compu-

tation. Nano, quantum and molecular computing, pages 133–156, 2004.

[12] J-M. Baribeau, N. L. Rowell, and D. J. Lockwood. Self assembled Si1−xGex dots and

islands. In Motonari Adachi and David J. Lockwood, editors, Self-Organized Nanoscale

Materials. Springer, 2006.

[13] G. Bersuker, B. H. Lee, A. Korkin, and H. R. Huff. Novel dielectric materials for future

transistor generations. In A. Korkin, J. Labanowski, E. Gusev, and S. Luryi, editors,

Nanotechnology for Electronic Materials and Devices. Springer, 2007.

[14] J. Borghetti, G. S. Snider, P. J. Kuekes, J. Yang, D. R. Stewart, and R. S. Williams.

‘Memristive’ switches enable ‘stateful’ logic operations via material implication. Na-

ture, 464(7290):873–876, 2010.

[15] R. Brayton and A. Mishchenko. ABC: An academic industrial-strength verification

tool. In Computer Aided Verification, pages 24–40. Springer, 2010.

[16] R. E. Bryant. Graph-based algorithms for boolean functions manipulation. IEEE

Transactions on Computers, C-35(8):667–691, 1986.

[17] A. W. Burks. Essays on Cellular Automata. University of Illinois Press, 1970.

[18] A. Cao and C-K. Koh. Non-crossing OBDDs for mapping to regular circuit structures.

In Proc. 21st Int. Conference on Computer Design. ICCD, pages 338–343, 2003.

[19] A. Cao and C. K. Koh. Decomposition of BDDs with application to physical mapping

of regular ptl circuits. Int. Workshop for Logic Synthesis, 2004.

155

[20] G. F. Cerofolini and D. Mascolo. A hybrid route from CMOS to nano and molecular

electronics. In A. Korkin, J. Labanowski, E. Gusev, and S. Luryi, editors, Nanotech-

nology for Electronic Materials and Devices. Springer, 2007.

[21] A. Chaudhary, D. Z. Chen, K. Whitton, M. Niemier, and R. Ravichandran. Elim-

inating wire crossings for molecular quantum-dot cellular automata implementation.

In Proc. IEEE/ACM Int. conference on Computer-aided design, pages 565–571. IEEE

Computer Society, 2005.

[22] K. Chen, C-M. Li, Q. Zhang, Y-A. Chen, A. Goebel, S. Chen, A. Mair, and J-W. Pan.

Experimental realization of one-way quantum computing with two-photon four-qubit

cluster states. Phys. Rev. Lett., 99(12):120503, Sep 2007.

[23] Y. Chen et al. Nanoscale molecular-switch devices fabricated by imprint lithography.

Applied Physics Letters, 82:1610–1612, March 2003.

[24] M. R. Choudhury and K. Mohanram. Reliability analysis of logic circuits. IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(3):392–

405, March 2009.

[25] M. Chrzanowska-jeske and A. Mishchenko. Synthesis for regularity using decision

diagrams. In Proc. IEEE Int. Symposium on Circuits and Systems. ISCAS, 2005.

[26] C. P. Collier, E. W. Wong, M. Belohradsky, F. M. Raymo, J. F. Stoddar, P. J. Kuekes,

R. S. Williams, and J. R. Heath. Electronically configurable molecular-based logic

gates. Science, 285:391–394, 2001.

[27] C. Constantinescu. Intermittent faults in VLSI circuits. In Proc. IEEE Workshop on

Silicon Errors in Logic-System Effects. Citeseer, 2007.

[28] W. B. Culbertson, R. Amerson, R. J. Carter, P. Kuekes, and G. Snider. Defect tol-

erance on the teramac custom computer. In Proc. 5th Annual IEEE Symposium on

156

Field-Programmable Custom Computing Machines, pages 116 –123, April 1997.

[29] W. J. Dally and B. Towels. Route packets, not wires: On-chip interconnection net-

works. In Proc. Design automation conference. DAC, 2001.

[30] S. Das, G. Rose, M. M. Ziegler, C. A. Picconatto, and J. C. Ellenbogen. Architec-

tures and simulations for nanoprocessor systems integrated on the molecular scale. In

G. Cuniberti, G. Fagas, and K. Richter, editors, Introducing Molecular Electronics.

Springer, 2005.

[31] A. DeHon. Array-based architecture for FET-based, nanoscale electronics. IEEE

TNANO, 2:23–32, 2003.

[32] A. DeHon. Nanowire-based programmable architectures. Emerging Technologies,

Computing Systems, 1(2):109–162, 2005.

[33] A. DeHon and K. Likharev. Hybrid CMOS/nanoelectronic digital circuits: Devices,

architectures, and design automation. In Proc. ICCAD, pages 375–382, 2005.

[34] A. DeHon and H. Naeimi. Seven strategies for tolerating highly defective fabrication.

Design & Test of Computers, 22(4):306–315, 2005.

[35] A. DeHon and M. J. Wilson. Nanowire-based sublithographic programmable logic

arrays. In Proc. ACM/SIGDA 12th Int. symposium on Field programmable gate arrays,

pages 123–132, 2004.

[36] C. Dong, W. Wang, and S. Haruehanroengra. Efficient logic architectures for CMOL

nanoelectronic circuits. Micro Nano Letters, IET, 1(2):74 –78, Dec 2006.

[37] D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory,

52(4):1289–1306, 2006.

157

[38] Y. Dotan, N. Levison, R. Avidan, and D.J. Lilja. History index of correct computation

for fault-tolerant nano-computing. IEEE Transactions on Very Large Scale Integration

(VLSI) Systems, 17(7):943–952, 2009.

[39] E. Dubrova. Lectures on design of fault-tolerant systems. http://web.it.kth.se/

~dubrova/lecturesFTC.html. [Online; accessed March 2012].

[40] C. Dwyer and A. Lebeck. Introduction to DNA Self-Assembled Computer Design.

Artech House, 2008.

[41] A. Einstein, B. Podolsky, and N. Rosen. Can quantum-mechanical description of

physical reality be considered complete? Physics Review, pages 777–780, May 1935.

[42] L. Fang and M.S. Hsiao. Bilateral testing of nano-scale fault-tolerant circuits. Journal

of Electronic Testing, 24(1):285–296, 2008.

[43] D. Y. Feinstein, M. A. Thornton, and D. M. Miller. On the data structure metrics of

quantum multiple-valued decision diagrams. In Proc. 38th Int. Symposium on Multiple

Valued Logic. ISMVL, pages 138 –143, May 2008.

[44] D. T. Franco, M. C. Vasconcelos, L. Naviner, and J-F. Naviner. Signal probability for

reliability evaluation of logic circuits. Microelectronics Reliability, 48(8):1586–1591,

2008.

[45] M. Fujishima. Fpga-based high-speed emulator of quantum computing. In Proc.

IEEE International Conference on Field-Programmable Technology (FPT), pages 21–

26. IEEE, 2003.

[46] D. M. Greenberger, M. A. Horne, and A. Zeilinger. Going beyond bell’s theorem.

ArXiv Quantum Physics e-prints, 2007. arXiv:0712.0921v1.

[47] L. K. Grover. Quantum computation. In Proc. 12th Int. Conference On VLSI Design,

pages 548 –553, January 1999.

158

[48] J. Han and P. Jonker. A defect-and fault-tolerant architecture for nanocomputers.

Nanotechnology, 14(2):224, 2003.

[49] H. Hasegawa, S. Kasai, and T. Sato. Hexagonal binary decision diagram quantum

circuit approach for ultra-low power III-V quantum LSIs. IEICE Transactions on

Electronics, Es7-C(11):1757–1768, 2004.

[50] J. Huang, M. B. Tahoori, and F. Lombardi. On the defect tolerance of nano-scale

two-dimensional crossbars. In Proc. 19th IEEE Int. Symposium on Defect and Fault

Tolerance in VLSI Systems. DFT, pages 96–104. IEEE, 2004.

[51] C. P. Husband, S. M. Husband, J. S. Daniels, and J. M. Tour. Logic and memory with

nanocell circuits. IEEE Transactions on Electron Devices, 50:1865–1875, 2003.

[52] T-M. Hwang, W-W. Lin, W-C. Wang, and W. Wang. Numerical simulation of three

dimensional pyramid quantum dot. Journal of Computational Physics, 196:208–232,

2004.

[53] S. L. Jeng, J. C. Lu, and K. Wang. A review of reliability research on nanotechnology.

IEEE Transactions on Reliability, 56(3):401–410, 2007.

[54] B. Joshi, D. K. Pradhan, and S. P. Mohanty. Fault tolerant nanocomputing. Robust

Computing with Nano-scale Devices, pages 7–27, 2010.

[55] R. Jozsa. Entanglement and quantum computation. In S. Huggett, L. Mason, K. P.

Tod, S. T. Tsou, and N. M. J. Woodhouse, editors, Geometric Issues in the Foundations

of Science. Oxford University Press, 1997.

[56] R. Jozsa. On the simulation of quantum circuits. ArXiv Quantum Physics e-prints,

2006. arXiv:quant-ph/0603163.

[57] A. Kadav, M. J. Renzelmann, and M. M. Swift. Tolerating hardware device failures

in software. In Proc. Symposium on Operating Systems Principles, 2009.

159

[58] T. I. Kamins. Self assembled semiconductor nanowires. In HJ. Fecht and M. Werner,

editors, The Nano-Micro Interface. Wiley-VCH, 2004.

[59] S. M. Kang and S. Shin. Energy-efficient memristive analog and digital electronics.

In Advances in Neuromorphic Memristor Science and Applications, pages 181–209.

Springer, 2012.

[60] S. Kasai and H. Hasegawa. A single electron Binary-Decision-Diagram Quantum Logic

Circuit based on Schottky Wrap Gate Control of a GaAs Nanowire Hexagon. IEEE

Electron Device Letters, 23(8):446–448, 2002.

[61] P. Kaye, R. Laflamme, and M. Mosca. An Introduction to Quantum Computing. Ox-

ford University Press, 1st edition, 2007.

[62] A. U. Khalid. FPGA emulation of quantum circuits. Master’s thesis, McGill Univer-

sity, 2006.

[63] G. H. Kim et al. 32 x 32 crossbar array resistive memory composed of a stacked schot-

tky diode and unipolar resistive memory. Advanced Functional Materials, 23(11):1440–

1449, 2013.

[64] E. Knill, R. Laflamme, and G. J. Milburn. A scheme for efficient quantum computation

with linear optics. Nature, 409(6816):46–52, January 2001.

[65] S. Krishnaswamy, G. F. Viamontes, I. L. Markov, and J. P. Hayes. Accurate reliabil-

ity evaluation and enhancement via probabilistic transfer matrices. In Proc. Design,

Automation and Test in Europe. DATE, pages 282 – 287 Vol. 1, March 2005.

[66] P. J. Kuekes, W. Robinett, R. M. Roth, G. Seroussi, G. S. Snider, and R. S. Williams.

Resistor-logic demultiplexers for nanoelectronics based on constant-weight codes. Nan-

otechnology, 17(4):1052, 2006.

160

[67] P. J. Kuekes, W. Robinett, and R. S. Williams. Defect tolerance in resistor-logic

demultiplexers for nanoelectronics. Nanotechnology, 17:2466–2474, 2006.

[68] P. J. Kuekes, D. R. Stewart, and R. S. Williams. The crossbar latch: Logic value

storage, restoration, and inversion in crossbar circuits. Journal of Applied Physics,

97(3), 2005.

[69] S. Kullback. Information Theory and Statistics. Dover Publications, Mineola, NY,

1968.

[70] A. Landahl. Adiabatic quantum computing. Bulletin of the American Physical Society,

57, 2012.

[71] C. S. Lent, P. D. Tougaw, and W. Porod. Quantum cellular automata: The physics of

computing with arrays of quantum dot molecules. In Proc. Workshop on Physics and

Computation. PhysComp, 1994.

[72] A. Liebers. Planarizing Graphs - A Survey and Annotated Bibliography. Graph Algo-

rithms And Applications 2, page 257, 2004.

[73] K. K. Likharev and D. B. Strukov. CMOL: Devices, circuits, and architectures. In

G. Cuniberti, G. Fagas, and K. Richter, editors, Introducing Molecular Electronics.

Springer, 2005.

[74] M. Macucci, G. Iannaccone, M. Governale, C. Ungarelli, S. Francaviglia, M. Girlanda,

L. Bonci, and M. Gattobigio. Critical assessment of the QCA architecture as a viable

alternative to large scale integration. In H. Nakashima, editor, Mesoscopic tunneling

devices. Research Signpost, 2004.

[75] H. L P A Madanayake, R.J. Cintra, D. Onen, V.S. Dimitrov, and L.T. Bruton. Alge-

braic integer based 8x8 2-D DCT architecture for digital video processing. In IEEE

International Symposium on Circuits and Systems (ISCAS), pages 1247–1250, 2011.

161

[76] S. Madishetty, A. Madanayake, R. J. Cintra, D. Mugler, and V.S. Dimitrov. Error-free

VLSI architecture for the 2-D Daubechies 4-tap filter using algebraic integers. In IEEE

International Symposium on Circuits and Systems (ISCAS), pages 1484–1487, 2012.

[77] D. Marinescu and G. Marinescu. Approaching Quantum Computing. Pearson/Prentice

Hall, 1st edition, 2005.

[78] S.D. Mediratta and J. Draper. On-chip fault-tolerance utilizing bist resources. In Proc.

49th IEEE Int. Midwest Symposium on Circuits and Systems. MWSCAS, volume 2,

pages 254–258. IEEE, 2006.

[79] R. M. Metzger. Unimolecular electronics: Results and prospects. In S. E. Lyshevski,

editor, Nano and Molecular Electronics Handbook. CRC Press, 2007.

[80] D. M. Miller and M. A. Thornton. QMDD: A decision diagram structure for reversible

and quantum circuits. In Proc. 36th Int. Symposium on Multiple-Valued Logic. ISMVL,

page 30, May 2006.

[81] M. Mishra and S. C. Goldstein. Scalable defect tolerance for molecular electronics. In

Proc. 1st workshop on Nonsilicon computation (NSC-1), pages 78–85, February 2002.

[82] S. Mitra, W-J. Huang, N. R. Saxena, S-Y. Yu, and E. J. McCluskey. Reconfigurable ar-

chitecture for autonomous self-repair. IEEE Design and Test of Computers, 21(3):228

– 240, May 2004.

[83] T. Mohamed, W. Badawy, and G. Jullien. On using FPGAs to accelerate the emulation

of quantum computing. In Proc. Canadian Conference on Electrical and Computer

Engineering. CCECE, pages 175 –179, 2009.

[84] T. Mohamed, G. A. Jullien, and W. Badawy. Crossbar latch-based combinational and

sequential logic for nano-FPGA. In IEEE Int. Symposium on Nanoscale Architectures.

NANOSARCH, pages 117–122, 2007.

162

[85] T. Mohamed, S. N. Yanushkevich, and S. Kasai. Fault-tolerant nanowire BDD cir-

cuits. In Proc. Int. Workshop on Physics and Computing in nano-scale Photonics and

Materials, 2012.

[86] N. Mohyuddin, E. Pakbaznia, and M. Pedram. Probabilistic error propagation in

logic circuits using the boolean difference calculus. In Proc. IEEE Int. Conference on

Computer Design. ICCD, pages 7 –13, Oct 2008.

[87] T. K. Moon. Error Correction Coding. John Wiley & sons, 2005.

[88] A. Mukhejee, R. Sudhakar, M. Marek-Sadowska, and S. I. Long. Wave steering in

YADDs: a novel non-iterative synthesis and layout technique. In Proc. 36th Design

Automation Conference. DAC, pages 466 –471, 1999.

[89] D. P. Nackashi, C. J. Amsinck, N. H. DiSpigna, and P. D. Franzon. Molecular electronic

latches and memories. In Proc. IEEE Conference on Nanotechnology, pages 819–822,

July 2005.

[90] P Nenzi. Ng-spice: The free circuit simulator. http://ngspice.sourceforge.net/.

[Online; accessed March 2012].

[91] K. Nepal, R. I. Bahar, J. Mundy, W. R. Patterson, and A. Zaslavsky. Designing logic

circuits for probabilistic computation in the presence of noise. In Proc. 42nd Design

Automation Conference. DAC, pages 485–490. IEEE, 2005.

[92] K. Nepal, R. I. Bahar, J. Mundy, W. R. Patterson, and A. Zaslavsky. Designing

nanoscale logic circuits based on markov random fields. Journal of Electronic Testing,

23(2):255–266, 2007.

[93] K. Nikolic, A. Sadek, and M. Forshaw. Architectures for reliable computing with un-

reliable nanodevices. In Proc. 1st IEEE Conference on Nanotechnology. IEEE-NANO,

pages 254–259. IEEE, 2001.

163

[94] K. Nikolic, A. Sadek, and M. Forshaw. Fault-tolerant techniques for nanocomputers.

Nanotechnology, 13(3):357, 2002.

[95] G. Norman, D. Parker, M. Kwiatkowska, and S. K. Shukla. Evaluating the reliability

of defect-tolerant architectures for nanotechnology with probabilistic model checking.

In Proc. 17th Int. Conference on VLSI Design, pages 907–912. IEEE, 2004.

[96] R.C. Ogus. The probability of a correct output from a combinational circuit. IEEE

Transactions on Computers, C-24(5):534 – 544, May 1975.

[97] K. P. Parker and E. J. McCluskey. Analysis of logic circuits with faults using input

signal probabilities. IEEE Transactions on Computers, C-24(5):573 – 578, May 1975.

[98] F. Peper, J. Lee, F. Abo, T. Isokawa, S. Adachi, N. Matsui, and S. Mashiko. Fault-

tolerance in nanocomputers: a cellular array approach. IEEE Transactions on Nan-

otechnology, 3(1):187–201, 2004.

[99] M. Perkowski, B. Falkowski, M. Chrzanowska-Jeske, and R. Drechsler. Efficient al-

gorithms for creation of linearly-independent decision diagrams and their mapping to

regular layouts. VLSI Design, 14(1):35–52, 2002.

[100] M. Perkowski and A. Mishchenko. Logic synthesis for regular layout using satisfiability.

Proc. BP, pages 225–232, 2002.

[101] M. A. Perkowski, M. Chrzanowska-Jeske, and Y. Xu. Lattice diagrams using reed-

muller logic. In Proc. IFIP WG 10.5 Workshop on Applications of the Reed-Muller

Expansion in Circuit Design, pages 85–102, 1997.

[102] M. A. Perkowski, E. Pierzchala, and R. Drechsler. Layout-driven synthesis for submi-

cron technology: Mapping expansions to regular lattices. In Proc. ISIC, pages 9–12,

1997.

164

[103] C. Pistol and C. Dwyer. Scalable, low-cost, hierarchical assembly of programmable

DNA nanostructures. Nanotechnology, 18, 2007.

[104] M. B. Plenio and S. Virmani. An introduction to entanglement measures. ArXiv

Quantum Physics e-prints, April 2005. arXiv:quant-ph/0504163.

[105] D. K. Pradhan and S. M. Reddy. Error-control techniques for logic processors. IEEE

Transactions on Computers, c-21(12):1331–1336, 1972.

[106] J. G. Proakis and M. Salehi. Fundamentals of Communication Systems. Pearson

Prentice Hall, Upper Saddle River, New Jersey, 2005.

[107] M. D. Purkeypile. Cove: A practical quantum computer programming framework. PhD

thesis, Colorado Technical University, 2009.

[108] S. Rai. Majority gate based design for combinational quantum cellular automata

(QCA) circuits. In Proc. 40th Southeastern Symposium on System Theory. SSST,

2008.

[109] C. N. R. Rao. Nanotubes and Nanowires. RSC Publishing, 2005.

[110] R. Raussendorf, D. E. Browne, and H. J. Briegel. Measurement-based quantum com-

putation on cluster states. Phys. Rev. A, 68(2):022312, Aug 2003.

[111] M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani. Experimental realization of

any discrete unitary operator. Physics Review Letters, 73(1):58–61, July 1994.

[112] T. Rejimon and S. Bhanja. An accurate probabilistic model for error detection. In

Proc. 18th Int. Conference on VLSI Design, pages 717–722. IEEE, 2005.

[113] D. A. Rennels. Fault-tolerant computingconcepts and examples. IEEE Transactions

on Computers, 100(12):1116–1129, 1984.

165

[114] E. Rieffel and W. Polak. An introduction to quantum computing for non-physicists.

ACM Comput. Surv., 32:300–335, September 2000.

[115] S. D. Sarma, M. Freedman, and C. Nayak. Topological quantum computation. Physics

Today, 59:32, 2006.

[116] J. Sartori, J. Sloan, and R. Kumar. Stochastic computing: Embracing errors in archi-

tecture and design of processors and applications. In Proc. 14th Int. Conference on

Compilers, Architectures and Synthesis for Embedded Systems. CASES, pages 135–144,

2011.

[117] T. Sasao and J. T. Butler. Planar multiple-valued decision diagrams. In Proc. 25th

Int. Symposium on Multiple-Valued Logic. ISMVL, pages 28 –35, May 1995.

[118] N.R. Saxena and E.J. McCluskey. Dependable adaptive computing systems-the roar

project. In Proc. IEEE Int. Conference on Systems, Man, and Cybernetics, volume 3,

pages 2172 –2177 vol.3, Oct 1998.

[119] A. Schmid and Y. Leblebici. Robust circuit and system design methodologies for

nanometer-scale devices and single-electron transistors. IEEE Transactions on Very

Large Scale Integration (VLSI) Systems, 12(11):1156–1166, 2004.

[120] F. Schurmann. Interactive quantum computation. Master’s thesis, University of New

York at Buffalo, 2000.

[121] N. R. Shanbhag, R. A. Abdallah, R. Kumar, and D. L. Jones. Stochastic computation.

In Proc. 47th ACM/IEEE Design Automation Conference. DAC, pages 859–864. IEEE,

2010.

[122] P. W. Shor. Algorithms for quantum computation: discrete logarithms and factoring.

In Proc. 35th Annual Symposium on Foundations of Computer Science, pages 124–134,

November 1994.

166

[123] P. W. Shor. Why haven’t more quantum algorithms been found? J. ACM, 50:87–90,

Jan 2003.

[124] G. Snider. Computing with hysteretic resistor crossbars. Applied Physics A: Materials

Science & Processing, 80:1165–1172, March 2005.

[125] G. S. Snider and P. J. Kuekes. Nano state machines using hysteretic resistors and

diode crossbars. IEEE Transactions on nanotechnology, 5(2):129–137, March 2006.

[126] F. Somenzi. CUDD: CU decision diagram package release 2.4.2. http://vlsi.

colorado.edu/~fabio/CUDD.

[127] L. Spector. Automatic Quantum Computer Programming: A Genetic Programming

Approach. Springer, 2004.

[128] M. R. Stan, P. D. Franzon, S. C. Goldstein, J. C. Lach, and M. M. Ziegler. Molecular

electronics: From devices and interconnect to circuits and architecture. Proceedings of

the IEEE, 91(11):1940–1957, 2003.

[129] M. Stanisavljevic, M. Schmid, and Y. Leblebici. Reliability of Nanoscale Circuits and

Systems: Methodologies and Circuit Architectures. Springer, 2010.

[130] D. B. Strukov and K. K. Likharev. CMOL FPGA: A reconfigurable architecture for

hybrid digital circuits with two-terminal nanodevices. Nanotech, 16:888–900, 2005.

[131] D. B. Strukov and K. K. Likharev. Defect-tolerant architectures for nanoelectronic

crossbar memories. Journal of Nanoscience and Nanotechnology, 7(1):151–167, 2007.

[132] M. B. Tahoori and S. Mitra. Defect and fault tolerance of reconfigurable molecular

computing. In Proc. 12th Annual IEEE Symposium on Field-Programmable Custom

Computing Machines. FCCM, pages 176 – 185, April 2004.

167

[133] G. Tangim, T. Mohamed, S. N. Yanushkevich, and S. E. Lyshevski. Comparison of

noise-tolerant architectures of logic gates for nanoscaled CMOS. In Proc. Int. Confer-

ence on High Performance Computing. HPC-UA, 2012.

[134] Predectivie technology models. Arizona state university. http://ptm.asu.edu, 2008.

[Online; accessed March 2012].

[135] M. Tehranipoor. Defect tolerance for molecular electronics-based nanofabrics using

built-in self-test procedure. In Proc. 20th IEEE Int. Symposium on Defect and Fault

Tolerance in VLSI Systems. DFT, pages 305–313. IEEE, 2005.

[136] W. Torres-pomales. Software fault tolerance: A tutorial. Technical report, NASA,

2000.

[137] J. M. Tour, L. Cheng, D. P. Nackashi, Y. Yao, A. K. Flatt, S. K. St. Angelo, T. E.

Mallouk, and P. D. Franzon. Nanocell electronic memories. J. American Chemical

Society, 125:13279–13283, 2003.

[138] A. H. Tran, S. N. Yanushkevich, S. E. Lyshevski, and V. P. Shmerko. Design of neuro-

morphic logic networks and fault-tolerant computing. In Proc. 11th IEEE Conference

on Nanotechnology (IEEE-NANO), pages 457–462, 2011.

[139] A. H. Tran, S. N. Yanushkevich, S. E. Lyshevski, and V. P. Shmerko. Fault toler-

ant computing paradigm for random molecular phenomena: Hopfield gates and logic

networks. In Proc. 41st IEEE International Symposium on Multiple-Valued Logic.

(ISMVL), pages 93–98, 2011.

[140] M. Udrescu, L. Prodan, and M. Vladutiu. The bubble bit technique as improvement of

HDL-based quantum circuits simulation. In Proc. 38th Annual Simulation Symposium,

pages 217 – 224, April 2005.

168

[141] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan. Macaco: modeling and

analysis of circuits for approximate computing. In Proc. Int. Conference on Computer-

Aided Design. ICCAD, pages 667–673. IEEE Press, 2011.

[142] J. Vial, A. Bosio, P. Girard, C. Landrault, S. Pravossoudovitch, and A. Virazel. Using

tmr architectures for yield improvement. In Proc. IEEE Int. Symposium on Defect

and Fault Tolerance of VLSI Systems. DFTVS, pages 7–15. IEEE, 2008.

[143] G. F. Viamontes. Efficient Quantum Circuit Simulation. PhD thesis, The University

of Michigan, 2007.

[144] G. F. Viamontes, I. L. Markov, and J. P. Hayes. Improving gate-level simulation of

quantum circuits. Quantum Information Processing, 2:347–380, 2003.

[145] J. Von Neumann. Probabilistic logics and the synthesis of reliable organisms from

unreliable components. Automata studies, 34:43–98, 1956.

[146] P. Walther, K. J. Resch, T. Rudolph, E. Schenck, H. Weinfurter, V. Vedral, M. As-

pelmeyer, and A. Zeilinger. Experimental one-way quantum computing. Nature,

434(2):169–176, March 2005.

[147] K. Walus, G. A. Jullien, and V. S. Dimitrov. Computer arithmetic structures for

quantum cellular automata. In Proc. 37th Asilomar Conference on Signals, Systems

and Computers, volume 2, 2003.

[148] I-C. Wey et al. Design and implementation of cost-effective probabilistic-based noise-

talerant VLSI circuits. IEEE Transactions on Circuits and Systems-I, 56(11):2411–

2424, 2009.

[149] S. Winograd and J. D. Cowan. Reliable Computation in the Presence of Noise. MIT

Press, 1963.

169

[150] N. S. Yanofsky and M. A. Mannucci. Quantum Computing for Computer Scientists.

Cambridge University Press, 1st edition, 2008.

[151] S. N. Yanushkevich, S. Kasai, G. Tangim, A. H. Tran, T. Mohamed, and V. P. Shmerko.

Introduction to Noise-Resilient Computing. Morgan and Claypool, 2013.

[152] S. N. Yanushkevich, D. M. Miller, V. P. Shmerko, and R. S. Stankovic. Probabilistic

decision diagram techniques. In Decision Diagram Techniques for Micro- and Nano-

electronic Design Handbook. Springer, 2006.

[153] S. N. Yanushkevich, G. Tangim, S. Kasai, S. E. Lyshevski, and V. P. Shmerko. Design

of nanoelectronic ICs: Noise-tolerant logic based on cyclic BDD. In Proc. 12th IEEE

Conference on Nanotechnology (IEEE-NANO), pages 1–5. IEEE, 2012.

[154] A. Zeilinger. Experiment and the foundations of quantum physics. Rev. Mod. Phys.,

71(2):S288–S297, March 1999.

[155] H.-Q. Zhao, S. Kasai, Y. Shiratori, and T. Hashizume. A binary-decision-diagram-

based two-bit arithmetic logic unit on a GaAs-based regular nanowire network with

hexagonal topology. Nanotechnology, 20, June 2009.

[156] M. M. Ziegler, C. A. Picconatto, J. C. Ellenbogen, A. Dehon, D. Wang, Z. Zhong,

and C. M. Lieber. Scalability simulations for nanomemory systems integrated on the

molecular scale. Ann. NY Acad. Sci., 1006(1):312–330, 2003.

[157] M. M. Ziegler, G. S. Rose, and M. R. Stan. A universal device model for nanoelectronic

circuit simulation. In Proc. 2nd IEEE Conference on Nanotechnology, pages 83–88,

2002.

[158] M. M. Ziegler and M. R. Stan. CMOS/nano co-design for crossbar-based molecular

electronic systems. IEEE Transactions on Nanotechnology, 2(4):217–230, December

2003.

170

UNIVERSITY OF CALGARY Fault-tolerant Architectures for Nanowire and Quantum Array Devices

Documents

Transcript of UNIVERSITY OF CALGARY Fault-tolerant Architectures for Nanowire and Quantum Array Devices