IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6,...

7
Adder and Multiplier Design in Quantum-Dot Cellular Automata Heumpil Cho, Member, IEEE, and Earl E. Swartzlander, Jr., Fellow, IEEE Abstract—Quantum-dot cellular automata (QCA) is an emerging nanotechnology, with the potential for faster speed, smaller size, and lower power consumption than transistor-based technology. Quantum-dot cellular automata has a simple cell as the basic element. The cell is used as a building block to construct gates and wires. Previously, adder designs based on conventional designs were examined for implementation with QCA technology. That work demonstrated that the design trade-offs are very different in QCA. This paper utilizes the unique QCA characteristics to design a carry flow adder that is fast and efficient. Simulations indicate very attractive performance (i.e., complexity, area, and delay). This paper also explores the design of serial parallel multipliers. A serial parallel multiplier is designed and simulated with several different operand sizes. Index Terms—Adder, multiplier, carry flow adder, carry delay multiplier, quantum-dot cellular automata (QCA). Ç 1 INTRODUCTION C URRENT transistor-based semiconductor devices are becoming resistent to scaling. Due to the decreasing supply voltage, the power consumption from leakage current is a big challenge for transistor circuits. Nanotech- nology is a possible alternative to these problems and the ITRS report [1] summarizes several possible technology solutions. Quantum-dot cellular automata (QCA) is an interesting possibility. Since QCAs were introduced in 1993 [2], several experimental devices have been developed [3], [4], [5], [6], [7]. Although they are certainly “not ready for prime time,” recent papers show that QCAs may eventually achieve high density [8], fast switching speed [9], and room temperature operation [5], [10]. The development of SPICE modeling and verification for QCA [11] indicate continuing interest. Recently several molecular QCA mod- els, implementations, and power analysis have been pro- posed [12], [13]. Adders are fundamental circuits for most digital systems and several adder designs in QCA have been proposed [14], [15], [16], [17], [18], [19] and a performance comparison was presented [16]. Better adder performance depends on minimizing the carry propagation delay. A wide variety of (often complex) techniques have been used for transistor adder circuits. Conventional adder circuits frequently re- quire many wires which are relatively difficult to realize (and may be slow) in QCA technology. Due to these wire delays, most previous adder designs are limited in speed. This paper presents a new adder design, the carry flow adder that is optimized for implementation with QCAs. The carry flow adder design is compared with previous QCA adder designs. On the other hand, multiplier design has not been widely considered by QCA designers. There is a QCA multiplier design in [20], [21], which suggests the importance of design simplicity. Complex designs generally incur long delays in QCA, so a simple structure is a good choice for the starting point. This paper investigates a relatively simple serial parallel multiplier. Based on FIR filter equations, serial parallel multiplication equations are derived. Using QCA characteristics, a new multiplier is presented. The final design shows a simple and regular bit slice structure. Although this paper assumes metal-based QCA imple- mentation, the underlying principles also apply to molecular QCA. There are different clocking schemes such as wave clocking which may be more suitable for molecular QCA. If the manufacturing issues of molecular QCA can be solved, it may be an attractive implementation alternative that miti- gates the cryogenic working temperature constraints of metal based QCA. The paper is organized as follows: In Section 2, the background of QCA technology and the design approaches are presented. Section 3 shows the design and implementa- tion of carry flow adders. Simulation results and comparisons follow in Section 4. Section 5 shows the algorithmic design of multiplication networks based on filter networks. Section 6 discusses multiplier implementation for QCA circuits. Simulation results and comparisons follow in Section 7 and conclusions are presented in Section 8. 2 QCA DESIGN SCHEMES 2.1 QCA Cell A QCA is a square nanostructure of electron wells confining free electrons. Each cell has four quantum dots which can hold a single electron per dot. The four dots are located at the corners of the cell and two electrons are injected into the cell. Due to Coulombic repulsion, the two electrons reside in IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 721 . H. Cho is with Qualcomm, Incorporated, 5775 Morehouse Drive, San Diego, CA 92121-1714. E-mail: [email protected]. . E.E. Swartzlander, Jr. is with the Department of Electrical and Computer Engineering, the University of Texas at Austin, 1 University Station C0803, Austin, TX 78712-0240. E-mail: [email protected]. Manuscript received 20 June 2008; revised 12 Nov. 2008; accepted 18 Dec. 2008; published online 15 Jan. 2009. Recommended for acceptance by F. Lombardi. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2008-06-0309. Digital Object Identifier no. 10.1109/TC.2009.21. 0018-9340/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society

Transcript of IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6,...

Page 1: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

Adder and Multiplier Design inQuantum-Dot Cellular Automata

Heumpil Cho, Member, IEEE, and Earl E. Swartzlander, Jr., Fellow, IEEE

Abstract—Quantum-dot cellular automata (QCA) is an emerging nanotechnology, with the potential for faster speed, smaller size, and

lower power consumption than transistor-based technology. Quantum-dot cellular automata has a simple cell as the basic element.

The cell is used as a building block to construct gates and wires. Previously, adder designs based on conventional designs were

examined for implementation with QCA technology. That work demonstrated that the design trade-offs are very different in QCA. This

paper utilizes the unique QCA characteristics to design a carry flow adder that is fast and efficient. Simulations indicate very attractive

performance (i.e., complexity, area, and delay). This paper also explores the design of serial parallel multipliers. A serial parallel

multiplier is designed and simulated with several different operand sizes.

Index Terms—Adder, multiplier, carry flow adder, carry delay multiplier, quantum-dot cellular automata (QCA).

Ç

1 INTRODUCTION

CURRENT transistor-based semiconductor devices arebecoming resistent to scaling. Due to the decreasing

supply voltage, the power consumption from leakagecurrent is a big challenge for transistor circuits. Nanotech-nology is a possible alternative to these problems and theITRS report [1] summarizes several possible technologysolutions. Quantum-dot cellular automata (QCA) is aninteresting possibility. Since QCAs were introduced in 1993[2], several experimental devices have been developed [3],[4], [5], [6], [7]. Although they are certainly “not ready forprime time,” recent papers show that QCAs may eventuallyachieve high density [8], fast switching speed [9], androom temperature operation [5], [10]. The development ofSPICE modeling and verification for QCA [11] indicatecontinuing interest. Recently several molecular QCA mod-els, implementations, and power analysis have been pro-posed [12], [13].

Adders are fundamental circuits for most digital systems

and several adder designs in QCA have been proposed [14],

[15], [16], [17], [18], [19] and a performance comparison was

presented [16]. Better adder performance depends on

minimizing the carry propagation delay. A wide variety of

(often complex) techniques have been used for transistor

adder circuits. Conventional adder circuits frequently re-

quire many wires which are relatively difficult to realize (and

may be slow) in QCA technology. Due to these wire delays,

most previous adder designs are limited in speed. This paper

presents a new adder design, the carry flow adder that is

optimized for implementation with QCAs. The carry flowadder design is compared with previous QCA adder designs.

On the other hand, multiplier design has not been widelyconsidered by QCA designers. There is a QCA multiplierdesign in [20], [21], which suggests the importance ofdesign simplicity. Complex designs generally incur longdelays in QCA, so a simple structure is a good choice forthe starting point. This paper investigates a relativelysimple serial parallel multiplier. Based on FIR filterequations, serial parallel multiplication equations arederived. Using QCA characteristics, a new multiplier ispresented. The final design shows a simple and regular bitslice structure.

Although this paper assumes metal-based QCA imple-mentation, the underlying principles also apply to molecularQCA. There are different clocking schemes such as waveclocking which may be more suitable for molecular QCA. Ifthe manufacturing issues of molecular QCA can be solved, itmay be an attractive implementation alternative that miti-gates the cryogenic working temperature constraints of metalbased QCA.

The paper is organized as follows: In Section 2, thebackground of QCA technology and the design approachesare presented. Section 3 shows the design and implementa-tion of carry flow adders. Simulation results and comparisonsfollow in Section 4. Section 5 shows the algorithmic design ofmultiplication networks based on filter networks. Section 6discusses multiplier implementation for QCA circuits.Simulation results and comparisons follow in Section 7 andconclusions are presented in Section 8.

2 QCA DESIGN SCHEMES

2.1 QCA Cell

A QCA is a square nanostructure of electron wells confiningfree electrons. Each cell has four quantum dots which canhold a single electron per dot. The four dots are located atthe corners of the cell and two electrons are injected into thecell. Due to Coulombic repulsion, the two electrons reside in

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 721

. H. Cho is with Qualcomm, Incorporated, 5775 Morehouse Drive, SanDiego, CA 92121-1714. E-mail: [email protected].

. E.E. Swartzlander, Jr. is with the Department of Electrical and ComputerEngineering, the University of Texas at Austin, 1 University StationC0803, Austin, TX 78712-0240. E-mail: [email protected].

Manuscript received 20 June 2008; revised 12 Nov. 2008; accepted 18 Dec.2008; published online 15 Jan. 2009.Recommended for acceptance by F. Lombardi.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-2008-06-0309.Digital Object Identifier no. 10.1109/TC.2009.21.

0018-9340/09/$25.00 � 2009 IEEE Published by the IEEE Computer Society

Page 2: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

opposite corners so that two polarizations are possible asseen in Fig. 1. These basic cells can be used to make QCA-based storage elements, logic gates, and wires.

2.2 Signal Flow and Control

A series of QCA cells acts like a wire. During each clockcycle, half of the wire is active for signal propagation, whilethe other half is unpolarized. During the next clock cycle,half of the previous active clock zone is deactivated and theremaining active zone cells trigger the newly activated cellsto be polarized. Thus, signals propagate from one clockzone to the next.

The circuit area is divided into four sections and they aredriven by four phase clock signals. In each zone, the clocksignal has four states: high-to-low, low, low-to-high, andhigh. The cell begins computing during the high-to-lowstate and holds the value during the low state. The cell isreleased when the clock is in the low-to-high state andinactive during the high state.

2.3 Logic Gates

Logic gates are required to build arithmetic circuits. InQCA, inverters and three-input majority gates serve as thefundamental gates. Inverters are constructed with a forkstructure. The governing equation for a majority gate withinputs a, b, and c is Mða; b; cÞ ¼ abþ bcþ ca. Fig. 2 showsthe gate symbols and their layouts. Two input AND andOR gates can be implemented with three input majoritygates by setting one input to a constant. With ANDs, ORs,and inverters, any logic function can be realized:

a � b ¼Mða; b; 0Þ;aþ b ¼Mða; b; 1Þ:

ð1Þ

2.4 Design Rules

A nominal cell size of 20 nm by 20 nm is assumed. Thecell has a width and height of 18- and 5-nm-diameterquantum-dots. The cells are placed on a grid with a cellcenter-to-center distance of 20 nm.

Because there are propagation delays between cell-to-cellreactions, there should be a limit on the maximum cellcount in a clock zone. This ensures proper propagation andreliable signal transmission. In this paper, a maximumlength of 16 cells is used. The minimum separation betweentwo different signal wires is the width of two cells.

Multilayer crossovers are used here for wire crossings.They use more than one layer of cells like a bridge. Themultilayer crossover design is straightforward althoughthere are questions about its realization, since it requirestwo overlapping active layers with vertical via connec-tions. Alternatively, coplanar “crossovers” that may beeasier to realize can be used with some modification to thebasic designs.

For circuit layout and functionality checking, a simula-tion tool for QCA circuits, QCADesigner [22], is used. Thistool allows users to do a custom layout and then verifyQCA circuit functionality by simulations.

3 CARRY FLOW ADDERS

3.1 Basic Design Approach

Previous publications [17], [18], [19] show that interconnec-tions incur significant complexity and wire delay whenimplemented with QCAs, so transistor circuit designs thatassume wires have negligible complexity and delay are notapplicable. In QCA, if the complexity increases, the delaymay increase because of the increased cell counts and wireconnections.

In this paper, the adder design follows that of aconventional ripple carry adder, but with a new layoutoptimized to QCA technology. The proposed adder designshows that a very low delay can be obtained with anoptimized layout. This is in contrast to the conventionalripple carry adder. To avoid confusion, the new layout isreferred to as the Carry Flow Adder (CFA) here.

Equations for a full adder realized with majority gatesand inverters are shown in (2). Most adder delays comefrom carry propagation. For faster calculation, reducingcarry propagation delay is most important. The usualapproach for fast carry propagation is to add additionallogic elements. In this paper, simplification is used instead:

si ¼ aibici þ ai�bi�ci þ �aibi�ci þ �ai�bici

¼Mð �Mðai; bi; ciÞ;Mðai; bi; �ciÞ; ciÞ¼Mð�ciþ1;Mðai; bi; �ciÞ; ciÞ;

ciþ1 ¼ aibi þ bici þ ciai;¼Mðai; bi; ciÞ:

ð2Þ

In QCA, the path from carry-in to carry-out only usesone majority gate. The majority gate always adds one moreclock zone (one quarter clock delay). Thus, each bit in thewords to be added requires at least one clock zone whichsets the minimum delay.

3.2 Carry Flow Full Adder Design

Based on previous approaches, a 1-bit full adder isdesigned. The input bit streams flow downward and thecarry propagates from right to left. Figs. 3a and 3b show theschematic and the layout of the full adder for the carry flowadder. The schematic and layout are optimized to minimizethe delay and area. The carry propagation delay for 1-bit is

722 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009

Fig. 1. Basic QCA cell and two possible polarizations.

Fig. 2. QCA inverter and majority gate.

Page 3: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

a quarter clock and the delay from data inputs to the sumoutput is three quarter clocks.

The wiring channels for the input/output synchroniza-tion should be minimized since wire channels add sig-nificantly to the circuit area. The carry flow full addershown in Fig. 3b requires a vertical offset between the carry-in and carry-out of only one cell.

Figs. 4 and 5 show 4 and 32-bit adders, respectively,realized with carry flow full adders. From the layouts, it isclear that for large adders, much of the area is devoted toskewing the input data and deskewing the outputs.

4 RESULTS

4.1 Simulation Results

For clarity, only 8-bit CFA simulation results are shown.The input and output waveforms are shown in Fig. 6. Thefirst meaningful output appears in the third clock periodafter 2 2

4 clock delays. First and last input/output pairs arehighlighted.

4.2 Comparisons

For design comparisons, QCA carry lookahead adders(CLA) are used since they were smaller and faster thanconditional sum and conventional ripple carry adders in aprevious study [19]. Fig. 7 shows the layout of a 4-bit CLA.

It is roughly twice as wide and twice as high as the carryflow adder shown on Fig. 4.

Table 1 shows comparisons of the 4, 8, 16, 32, and 64-bitdesigns for the CLA and CFA. Fig. 8 compares the twotypes of adders.

From the statistics, cell counts for the CFA with n-bitoperands are roughly Oðn1:21Þ. Areas are Oðn1:42Þ. Delay forthe CFA-based ripple carry adder is proportional to the wordsize after a half clock start up delay. From the comparisonwith the carry lookahead adder, the complexity, area, anddelay are much better with the CFA full adder, so the carryflow adder shows the best performance in QCA.

5 MULTIPLIER DESIGN

5.1 Filter Networks

To consider the multiplication of two numbers, start with anFIR filter example [23]. The filter output is defined by

CHO AND SWARTZLANDER, JR.: ADDER AND MULTIPLIER DESIGN IN QUANTUM-DOT CELLULAR AUTOMATA 723

Fig. 3. Full adder for the carry flow adder. (a) Schematic. (b) Layout.

Fig. 4. Layout of 4-bit carry flow adder.

Fig. 5. Layout of 32-bit carry flow adder.

Fig. 7. Layout of a 4-bit carry lookahead adder.

Fig. 6. Simulation results for 8-bit CFA.

TABLE 1Adder Comparisons

Page 4: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

yi ¼XN�1

k¼0

bkxi�k: ð3Þ

Using the one cycle delay operator, Z�1, the equation can

be restated as

yi ¼XN�1

k¼0

bkxi�k

¼XN�1

k¼0

bkZ�kxi

¼XN�1

k¼0

bkZ�k

!xi:

ð4Þ

Equation (4) can be implemented by the network shown

in Fig. 9. The circles in the figure with the bis represent

multiplication by constants andL

indicates addition. Data

xi; bi, and yi are words of arbitrary size.To use a pipeline design, both upper and lower signal

lines include the same additional delay units. Assume

that Z�14 is possible and apply the Z�

12 delay element to

each section with upper and lower lines. Equation (5)

shows that Fig. 10 gives the correct filter output result

with N=2 cycle delays.

Pipelined FIR filter output

¼ Z�12

�bN�1Z

�32ðN�1Þ þ Z�1

2

�bN�2Z

�32ðN�2Þ

þ � � � þ Z�12

�b0Z

0���

xi

¼ Z�N2 bN�1Z�ðN�1Þxi þ Z�

N2 bN�2Z

�ðN�2Þxi

þ � � � þ Z�N2 b0xi

¼ Z�N2XN�1

k¼0

bkZ�k

!xi

¼ Z�N2 yi:

ð5Þ

5.2 Multiplication Networks

The above relations can be applied to serial parallel multi-

plication. Assume an unsigned number system. A 1-bit

multiplication is performed by an AND gate and a 1-bit

addition is performed by a full adder. The main difference

between the FIR filter and the multiplication network is the

handling of the carry-out from the adder. The filter networks

internally use carry flow, but the multiplication network

needs distinct signal flows, so the network for multiplication

needs to be adjusted accordingly.Let ai; bið Þbe the multiplicand and multiplier pair and pi be

the product bit for position i. Bits ai and pi correspond to

wordsxi andyi of the filter example. The position i is the input

at time i. Define the sum and carry-out of a full adder at the

ith time and the jth location as sij; cij� �

when 0 � i � 2N � 1

and 0 � j � N � 1 where larger values of j are to the left.

Assume that the sum generation takes at least Z�12 and the

carry generation takes at leastZ�14. Even though Figs. 9 and 10

ignored the zeroth full adder, the derivation includes that

adder. The implementation can be done as

724 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009

Fig. 8. Comparison of CFA and CLA adders for various operand sizes.

Fig. 10. Pipelined FIR filter network.

Fig. 9. FIR filter network.

Page 5: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

ðsij; cijÞ ¼ Add bjZ�3

2jai;Z�1

2siðj�1Þ;Z�1cij

� �¼ Add bjai�3

2j; sði�1

2Þðj�1Þ; cði�1Þj

� �:

ð6Þ

Equation (6) uses a feedback loop to the adder itself

using a one clock delay unit. This is denoted as a carry

delay multiplier (CDM). It is optimized to minimize the

latency of the output.Going to Fig. 9 and redirecting the output to the right

side, which is the same side to the input, Fig. 11 shows the

redirected graph. For a pipeline design, it can be redrawn as

shown in Fig. 12 by using

Z�12yi ¼ Z�

12

XN�1

k¼0

bkZ�k

!xi

¼ Z�12

XN�1

k¼0

bkZ�k2Z�

k2

!xi

¼ Z�12

XN�1

k¼0

Z�k2bkZ

�k2

!xi:

ð7Þ

Finally, Fig. 12 is a network design comparable to Fig. 10.

The main difference is that there is a much smaller latency

from the first input to the first output. Based on Fig. 12, the

multiplication network is represented by (8). Fig. 13 shows

the network implementation. The CDM design minimizes

the latency to the output:

sij; cij� �

¼ Add bjZ�1

2jai;Z�1

2siðjþ1Þ;Z�1cij

� �¼ Add bjai�1

2j; sði�1

2Þðjþ1Þ; cði�1Þj

� �:

ð8Þ

6 MULTIPLIER IMPLEMENTATION

6.1 Multiplication Networks for QCA

Based on the QCA circuit characteristics, one clock zoneprovides a quarter clock delay, which matches theD�1 operation. That is, D�4 ¼ Z�1. Assume a logicalAND operation provides one D�1 delay and a full addersum and carry have D�2 and D�1 delays, respectively.Wires also have some clock cycles of delay based on thewire length. After incorporating these characteristics, thefilter network of Fig. 12 is redrawn, as shown in Fig. 14.Delay amounts in upper and lower signal flows arechosen to make a one clock cycle difference between theadjacent paths.

From the filter network examples, a multiplier for QCAis developed. Based on (8), (9) reflects the QCA clocking.The previous figure is modified and the serial parallelmultiplier can be implemented as shown in Fig. 15

sij; cij� �

¼ Add bjD�2j�2ai;D

�2siðjþ1Þ;D�4cij

� �¼ Add bjai�2j�2; sði�2Þðjþ1Þ; cði�4Þj

� �:

ð9Þ

6.2 Multiplier Design

The structure shown in Fig. 15 is used for the QCA circuitimplementation since it minimizes the latency from the firstinput to the first output. Fig. 16 shows the block diagram ofthe optimized design for QCA layout.

6.3 QCA Implementation

Bit-serial adders are used to realize the carry delay multiplier.The underlying full adders are based on the CFA that is

CHO AND SWARTZLANDER, JR.: ADDER AND MULTIPLIER DESIGN IN QUANTUM-DOT CELLULAR AUTOMATA 725

Fig. 12. Pipelined redirected FIR filter network.

Fig. 13. Carry delay multiplication network.

Fig. 14. Redirected FIR filter network for QCA.

Fig. 11. Redirected FIR filter network.

Fig. 15. CDM network for QCA.

Fig. 16. Multiplier block diagrams for CDM.

Page 6: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

described in Section 3 with the schematic and layout shown inFigs. 3a and 3b, respectively. The bit-serial adder is modifiedfrom the full adder so that the carry-in and carry-out areconnected internally with a one clock delay. Figs. 17a and 17bshow the schematic and layout of the bit-serial adder.

Using these adders, a 4-bit CDM multiplier according toFig. 16 is implemented as shown in Fig. 18. Multipliers forlarger word sizes can be implemented easily by addingadditional bit slices.

7 RESULTS

7.1 Simulation Results

Simulation of a 4-bit multiplier is shown with the input andoutput waveforms in Fig. 19. First and last input/outputpairs are highlighted.

For N-bit inputs, the multiplier receives N þ 1 inputs (aserial input and N parallel inputs) and produces a serialoutput. The serial input and output are ordered from LSB toMSB and parallel inputs are repeated whenever a newserial input is provided (N cycles). For initialization of themultiplier, zero bits are input for N clock cycles. Zero bitsare provided between successive inputs. The time tocomplete an N-bit multiplication is 2N cycles.

7.2 Comparisons

The complexity, size, and delay of various word size carrydelay multipliers are shown in Table 2. Fig. 20 demonstratesthe layout of a 32-bit CDM.

8 CONCLUSIONS

QCA circuits have significant wire delays. For a fast designin QCA, it is generally necessary to minimize the complexity.Based on the QCA characteristics, this paper presents a newadder design, the carry flow adder. Carry flow adders use abasic ripple carry propagation scheme that is optimized forlayout in the QCA technology. The layouts and functionalitychecks were done using QCADesigner and the designs arecompared to the carry lookahead adder that was the bestprevious QCA adder design. The CFA adders require lessthan one-fifth of the area of the CLA and have about half ofthe delay of the CLA.

This paper also presents serial parallel multiplicationnetworks based on filter networks. The networks arederived from multiplication equations and implementedby network graphs. The design uses systolic array struc-tures to produce an output on every clock cycle with lowlatency to the first output. It also has a regular design foreasy word size extension as well as small area and a smallnumber of cells was used.

ACKNOWLEDGMENTS

Sections 5, 6, and 7 are greatly abbreviated versions of the

material in [24]. The authors also thank the reviewers and

the attendees of the Symposium for their constructive

comments.

726 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009

TABLE 2Multiplier Characteristics

Fig. 18. Layout of a 4-bit carry delay multiplier.

Fig. 19. Simulation of a 4-bit carry delay multiplier.

Fig. 20. Layout of a 32-bit carry delay multiplier.

Fig. 17. Carry flow bit-serial adder. (a) Schematic. (b) Layout.

Page 7: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, …pages.mtu.edu/~zhuofeng/EE5970Spring2011_files/Adde… ·  · 2011-01-27URRENT transistor-based semiconductor devices are becoming

REFERENCES

[1] International Technology Roadmap for Semiconductors (ITRS),http://www.itrs.net, 2007.

[2] C.S. Lent, P.D. Tougaw, W. Porod, and G.H. Bernstein, “QuantumCellular Automata,” Nanotechnology, vol. 4, no. 1 pp. 49-57, Jan.1993.

[3] A. Orlov et al., “Experimental Demonstration of a Binary Wire forQuantum-Dot Cellular Automata,” Applied Physics Letters, vol. 74,no. 19, pp. 2875-2877, May 1999.

[4] I. Amlani et al., “Experimental Demonstration of a LeadlessQuantum-Dot Cellular Automata Cell,” Applied Physics Letters,vol. 77, no. 5, pp. 738-740, July 2000.

[5] R.P. Cowburn and M.E. Welland, “Room Temperature MagneticQuantum Cellular Automata,” Science, vol. 287, no. 5457, pp. 1466-1468, Feb. 2000.

[6] H. Qi et al., “Molecular Quantum Cellular Automata Cells.Electric Field Driven Switching of a Silicon Surface Bound Arrayof Vertically Oriented Two-Dot Molecular Quantum CellularAutomata,” J. Am. Chemical Soc., vol. 125, pp. 15250-15259, 2003.

[7] R.K. Kummamuru et al., “Operation of a Quantum-Dot CellularAutomata (QCA) Shift Register and Analysis of Errors,” IEEETrans. Electron Devices, vol. 50, no. 9, pp. 1906-1913, Sept. 2003.

[8] A. DeHon and M.J. Wilson, “Nanowire-Based SublithographicProgrammable Logic Arrays,” Proc. Int’l Symp. Field-ProgrammableGate Arrays, pp. 123-132, 2004.

[9] J.M. Seminario et al., “A Molecular Device Operating at TerahertzFrequencies: Theoretical Simulations,” IEEE Trans. Nanotechnology,vol. 3, no. 1, pp. 215-218, Mar. 2004.

[10] Y. Wang and M. Lieberman, “Thermodynamic Behavior ofMolecular-Scale Quantum-Dot Cellular Automata (QCA) Wiresand Logic Devices,” IEEE Trans. Nanotechnology, vol. 3, no. 3,pp. 368-376, Sept. 2004.

[11] R. Tang, F. Zhang, and Y.B. Kim, “Quantum-Dot CellularAutomata SPICE Macro Model,” Proc. ACM Great Lakes Symp.VLSI, pp. 108-111, 2005.

[12] C.S. Lent, M. Liu, and Y. Lu, “Bennett Clocking of Quantum-DotCellular Automata and the Limits to Binary Logic Scaling,”Nanotechnology, vol. 17, no. 16, pp. 4240-4251, Aug. 2006.

[13] X. Ma, J. Huang, and F. Lombardi, “A Model for Computing andEnergy Dissipation of Molecular QCA Devices and Circuits,”ACM J. Emerging Technologies in Computing Systems, vol. 3, no. 4,article 18, 2008.

[14] A. Vetteth et al., “Quantum-Dot Cellular Automata Carry-Look-Ahead Adder and Barrel Shifter,” Proc. IEEE Emerging Telecomm.Technologies Conf., Sept. 2002.

[15] W. Wang, K. Walus, and G.A. Jullien, “Quantum-Dot CellularAutomata Adders,” Proc. Third IEEE Conf. Nanotechnology, pp. 461-464, 2003.

[16] R. Zhang, K. Walus, W. Wang, and G.A. Jullien, “PerformanceComparison of Quantum-Dot Cellular Automata Adders,” Proc.IEEE Int’l Symp. Circuits and Systems, vol. 3, pp. 2522-2526, 2005.

[17] H. Cho and E.E. Swartzlander, Jr., “Pipelined Carry LookaheadAdder Design in Quantum-Dot Cellular Automata,” Proc. Conf.Record of the 39th Asilomar Conf. Signals, Systems, and Computers,pp. 1191-1195, 2005.

[18] H. Cho and E.E. Swartzlander, Jr., “Modular Design of Condi-tional Sum Adders Using Quantum-Dot Cellular Automata,” Proc.Sixth IEEE Conf. Nanotechnology, July 2006.

[19] H. Cho and E.E. Swartzlander, Jr., “Adder Designs and Analysesfor Quantum-Dot Cellular Automata,” IEEE Trans. Nanotechnology,vol. 6, no. 3, pp. 374-383, May 2007.

[20] K. Walus, G.A. Jullien, and V.S. Dimitrov, “Computer ArithmeticStructures for Quantum Cellular Automata,” Proc. Conf. Record ofthe 37th Asilomar Conf. Signals, Systems, and Computers, vol. 2,pp. 1435-1439, 2003.

[21] K. Walus and G.A. Jullien, “Design Tools for an Emerging SoCTechnology: Quantum-Dot Cellular Automata,” Proc. IEEE,vol. 94, no. 6, pp. 1225-1244, 2006.

[22] K. Walus, T. Dysart, G. Jullien, and R. Budiman, “QCADesigner:A Rapid Design and Simulation Tool for Quantum-Dot CellularAutomata,” IEEE Trans. Nanotechnology, vol. 3, no. 1, pp. 26-29,Mar. 2004.

[23] D. Cohen, “A Mathematical Approach to Computational NetworkDesign,” Systolic Signal Processing Systems, E.E. Swartzlander, Jr.,ed., Marcel Dekker, Inc., pp. 1-29, 1987.

[24] H. Cho and E.E. Swartzlander, Jr., “Serial Parallel MultiplierDesign in Quantum-Dot Cellular Automata,” Proc. 18th IEEESymp. Computer Arithmetic, pp. 7-15, 2007.

Heumpil Cho received the BS degree inelectrical engineering and the MS degree inelectrical engineering and computer sciencefrom Seoul National University, and the PhDdegree in electrical and computer engineeringfrom the University of Texas at Austin, in 1998,2000, and 2006, respectively. In 2006, he waswith Luminary Micro, Inc., Austin, TX, where hewas working on I/O circuit characterization andmodeling. Since 2007, he has been a senior

engineer at Qualcomm, Incorporated, San Diego, CA, where he hasbeen working on various projects including CDMA/WCDMA/LTE/WiMAX wireless modem chip designs. His research interests includehigh-speed computer arithmetic algorithms, systolic signal processorand CORDIC processor architectures, VLSI circuit designs, architec-tures for application-specific signal processing, and applications ofarithmetic algorithms on quantum-dot cellular automata. He is a memberof the IEEE.

Earl E. Swartzlander, Jr. holds degrees inelectrical engineering from Purdue University,the University of Colorado, and the University ofSouthern California. He is a professor ofelectrical and computer engineering at theUniversity of Texas at Austin. In his currentposition, he and his students conduct researchin computer engineering with emphasis onapplication specific processor design, includinghigh-speed computer arithmetic, systolic pro-

cessor architecture, VLSI technology, and rapid prototyping. From 1975to 1990, he held a variety of positions at TRW including the director ofIndependent Research & Development in the TRW Defense SystemsGroup, the manager of the Digital Processing Laboratory in theElectronics and Technology Division, and the manager of the AdvancedDevelopment Office in the System Development Division. He was theeditor-in-chief of the IEEE Transactions on Computers from 1990 to1994 and was the founding editor-in-chief of the Journal of VLSI SignalProcessing. In addition, he has served as an editor for the IEEETransactions on Computers, the IEEE Transactions on Parallel andDistributed Systems, and the IEEE Journal of Solid-State Circuits. Hehas been a member of the Board of Governors of the IEEE ComputerSociety (1987-1991), the IEEE Signal Processing Society (1992-1994),and the IEEE Solid-State Circuits Council/Society (1986-1991). He hasbeen a member of the IEEE History Committee (1996-2004), the IEEEFellows Committee (2000-2003), and is currently the chair of the IEEEJames H. Mulligan, Jr., Education Medal Committee. He has chaired anumber of conferences including the IEEE International Conference onApplication-Specific Architectures, and Processors, the 31st AsilomarConference on Signals, Systems and Computers, the InternationalConference on Parallel and Distributed Systems, the 11th Symposiumon Computer Arithmetic, the International Conference on Wafer ScaleIntegration, and the fifth IEEE International Conference on DistributedComputing Systems. He is the author of one book, an editor of sevenbooks, and the author or coauthor of 59 refereed journal papers, 33 bookchapters, and 254 conference papers. He is a fellow of the IEEE. He hasbeen honored with the IEEE Third Millennium Medal, the DistinguishedEngineering Alumnus Award from the University of Colorado, theOutstanding Electrical Engineer and Distinguished Engineering Alum-nus awards from Purdue University, and the IEEE Computer SocietyGolden Core Award.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

CHO AND SWARTZLANDER, JR.: ADDER AND MULTIPLIER DESIGN IN QUANTUM-DOT CELLULAR AUTOMATA 727