Lowering Power Dissipation and Energy Consumption

8/19/2019 Lowering Power Dissipation and Energy Consumption

http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 1/5

of 5

Lowering Power Dissipation and Energy Consumption in Arithmetic Logic Units

From the Years 2002 to 2015

James HarrisonDepartment of Electrical and Computer Engineering

University of Central Florida

Orlando, FL 32816-2362

Abstract — When designing a processor or a logical circuitone of the most important factors that needs to be taken intoaccount is power and energy consumption. This paper willevaluate how different Arithmetic Logic Units (ALU) aredesigned with a purpose of lowering power consumption and being more energy efficient. Along with Power, datapathwidth, the number of bits in operands, ITRS technology node,execution time, and area will all be measured. Some examples

of Arithmetic Logic Units tha will be discussed are Low PowerDelay Fault Testable 32b ALU, the Ultra-area-efficient fault-tolerant QCA full adder as well as the High Throughput PowerAware FIR Filter.

Keywords — ALU, Power, Data bus, Execution, time, adder,

multi plier, f loating point, ar ea, energy

I. I NTRODUCTION

In the world today there are typically two components of aCentral Processing Unit (CPU), the Arithmetic Logic Unit orALU for short and the Control Unit (CU). In the CPU the ALU performs arithmetic and logical operations, it is a fundamental building block of the CPU [11]. Some examples of the

Arithmetic operations performed by an ALU would besubtraction, multiplication, addition and also division. As thename implies the ALU can also perform logical operationssuch as XOR, OR, AND. A logical operation can also be acomparison, the unit can compare letters, characters ornumbers and based on the result of the comparison that took place, the computer can take a specific action [11].

In [12] a computer before you can execute an instruction,data and program instructions must be place from a device,whether it be an input or secondary storage, into memory.Once they are in memory the CPU can perform four steps foreach of the instructions. The first two steps are called theInstruction Time and the last two steps are called Executiontime. First the Control Unit will get the instruction frommemory, and then it will decode the instruction and direct thedata to the ALU. The ALU then executes the arithmetic orlogical instruction based off the data, and then stores the resultof the operation in memory or it can also store the result into aregister. The ALU is the unit that has control and performs theactual operations on the data. [12]

There are many important metrics when dealing withcomputers such as data bus width, ITRS technology node,

execution time, power dissipation, and energy consumption of processors [13].

The data bus width helps to determine its data rate, whichis, the number of bytes per second it can carry. This is one ofthe main factors determining the processing power of acomputer. A 32-bit bus is what most current processor designstoday are using, which means that 32 bits of data can be

transferred at one time, the wider the bus the more informationthat can be transferred [13].

Execution time is defined as the time in which it takes asingle instruction is executed, the last portion of the instructioncycle which is comprised of the actions taken by the ALU,such as executing the arithmetic or logical instructions basedoff the data and then storing the result into a register [13].

The process that CPU’s use to consume electrical energyand also to disperse or dissipate this energy by losing it as heator also by switching devices within the CPU, this process iscalled Power Dissipation [13].

In the forthcoming sections of this paper, 10 ALU designsranging in years from 2003 to the year of 2015 will bediscussed, and also how they lowered the power dissipated andalso improved energy consumption by different ALU designs.

II. LITERATURE R EVIEW

One of the most important factors that need to beconsidered when designing a logical circuit or a processor is power as well as energy consumption. The usage of power,meaning how much power a digital logic circuit is consuming,greatly affects the performance of the system. Many differentArithmetic Logic units have been designed with one of themain purposes being to lower the power dissipated. Below are10 models and implementations of different method which all

lower power or energy consumption, in some way.

In [1] the design reviewed is the IEEE 754 Single PrecisionFloating-Point Unit in 2015. In this paper different methods ofimproving the energy efficiency are discussed one being partially truncating the computation of mantissa and allowingthe bit-width of mantissa to be dynamically interchangeable inthe multiplicand, multiplier and output product. Anothermethod to minimize power consumption and energy of digitalsystems, which are implemented in Complementary Metal



Oxide Semiconductor’s would be to reduce the supply voltagenear to the threshold voltage, this will cause a penalty to performance and will also have an impact on the logical speed[1].

In [2] implementation of Quantum dot cellular automata(QCA) circuits aim toward lowering energy consumption, as

well as faster operations. They used a ultra area efficient faulttolerant QCA full adder in the year 2015. In order to restoreenergy four distinct clock phases were applied, the QCA cellswere synchronized by the clock signals, making the accuracyof the QCA operations more accurate allowing more efficient performance [2].

Paper [3] proposes a computing scheme, which is centeredon probabilistic domain transformation aiming for faultresilience as well as low power operations. By switching formthe normal multiplier-based convolution methods [3] presents amultiplier less probabilistic convolution. They used a energyefficient multiplier in 2014. In this model a lightweight adder

replaces the expensive multipliers, through probabilisticdomain transformation. This allows the more basic operationsto be performed to achieve higher energy-efficiency at a lowercost of the hardware. [3]

In 2003 a High power Aware Finite Impulse Adder andmultiplier, the average power dissipation and latency are bothsignificantly reduced by pipelining the multipliers and adders.In order to save the power dissipation the power awareness wasimproved. By using a selective method and 2-D gatingtechnique together power awareness as well as reducinglatency of the FIR filter was achieved [4].

In [5] we learn about the implementation of the XilinxDSP48 multiplier in 2014. They implemented the PriorityUsing Resource Escalation (PURE) approach, which providedan adaptive and dynamic reconfiguration to achievesurvivability. PURE achieves the objectives, of dynamicreconfiguration of redundancy permits autonomous operationswhile maintaining a defined quality measure within arearesource, power and energy constraints, at reduced poweroverheads and area compared to static redundancy schemes. Itdoes so by adapting a uniplex instance of the data path whenaberrant behavior occurs [5].

In 2002 a high-speed 4 bits ALU is designed for 1 Voltoperations in order to display how useful the back gate forwardsubstrate bias method (BGFSB) can be. This ALU used aripple carry adder and was also capable of performing eightoperations, four of them being Logical and four beingarithmetic. The BGFSB method is low voltage as well ascapable of high speed applications. In the steady state thesubthreshhold current increase due to a reduction in thethreshold voltage because of BGFSB. [6]

In 2009 a Vedic Multiplier Module as well as a 64-bitAdder were used in order to reduce the complexity, areaexecution time and the power in computations. The design produced a high-speed power efficient multiplier [7].

A 16 bit low power pipelined RISC processor is used in

paper [8], using a carry select adder in 2015. To design theRISC processor a Verilog HDL was used, it was evaluatedusing the XILINX KINTEX XC7K1607-3fbg676, a 28 nmtechnology processor was used to implement the two clockcycles. The 16-bit RISC processor using a 2-stage pipeline thatwill increase the speed and also reduce the latency. Also adesign technique called clock gating was used, clock gating is alow power technique which reduces the consumption of power.By using this method the dynamic power was greatly reducedfrom .71 watts and the quiescent power was reduced to .149watts and the total power was reduced to .22 watts [8].

In [9] an Upper order 32b adder was used in the year 2005

because fast 32 and 64b ALU operations with a single cyclelatency and throughput are essential ingredients of high performance microprocessors execution cores [9]. In the 64bmode power performance was optimized by a gated secondaryoff chip supply voltage. By using high-speed single raildynamic circuit techniques and a sparse tree semi dynamicadder low dynamic power consumption and high noiserobustness with a maximum voltage was obtainable. Forefficient power performance tradeoff, the upper order 32b carrymerge tree slack was able to lower its supply voltage to 1V.This resulted in an extra 22% power benefit [9].

In paper [10] the design used was a 32b ALU, whichconsisted of a 32-bit adder in the year 2005. This designallowed low power operations while also supporting a designfor test (DST) scheme. This resulted in a 22% reduction in thestandby mode power leakage, and an 18% reduction in ALUtotal energy. This method integrated a delay fault testablescheme with logic design flow and could detect a large rangeof delay fault. These design techniques can help in achievinglow power operation of high end digital IC’s [10].



III. DATA A NALYSIS

.

Fig. 1. This chart represents the data bus width in bits vs the year

from 2003-2015

Fig. 2. This graph is for ITRS Technology Node vs Year. Some datahad units other than nm or didn’t have an ITRS Technology Node

listed so that data was not placed into the chart.

Fig. 4. This chart is for Power vs. Year, some of the data had power

in different units or didn’t have a power measurement so they areindicataed as low on the chart.

IV. CONCLUSION

Lowering power and energy consumption is a veryimportant aspect of designing and utilizing ALUs. This papercovered many different ALU designs and implementations,which were trying to lower the power and energy consumption,along with other metrics such as execution time. In [6] I wasable to see the effect of using a Ripple carry Adder as I previously learned about in module-09 as well and in [8] theeffect of a carry select adder was shown.

R EFERENCES

[1]

S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating-Point Unit in 15nm CMOS Process Technology," in Proceedings ofIEEE SoutheastCon 2015 (SECon-2015), Fort Lauderdale, FL, April 9 -12, 2015.

[2]

A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation ofan Ultra-Area-Efficient Fault-Tolerant QCA Full Adder,"Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015,

[3]

M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, “Energy-EfficientMultiplier-Less Discrete Convolver through Probabilistic DomainTransformation ,” in Proceedings of 22nd ACM/SIGDA InternationalSymposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, California, USA, February 27-28, 2014.

[4] J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIRFilter Design based on Fine-grain Pipeline Multipliers and Adders," inProceedings of the 2003 IEEE Annual Symposium on VLSI (ISVLSI-03), pp. 260-261, Tampa, Florida, U.S.A., February 20-21, 2003.

[5] N. Imran, R. F. DeMara, J. Lee, and J. Huang, "Self-adapting ResourceEscalation for Resilient Signal Processing Architectures," The SpringerJournal of Signal Processing Systems (JSPS), December 2014, Volume77, Issue 3, pp. 257-280.

[6] A. Srivastava and D. Govindarajan, “A Fast ALU Design in CMOS forLow Voltage Operation,” VLSI Design, vol. 14, no. 4, pp. 315-327,2002.

2000

2005

2010

2015

2020

32 1 128 64 128 4 64 16 64 32

Y e a r

Data bus width (bits)

Databus Width vs year

2000

2005

2010

2015

2020

45 15 18 240 28 90 180

Y e a r

ITRS Technology Node(nm)

ITRS Technology Node vs Year

2000

2005

2010

2015

Y e a r

Power (mW)

Power vs Year

Metrics covered by various papers which aresuitable for plotting:

Data bus width (bits) vs. Year

ITRS technology node (nm) vs. Year

Execution time per ALU or Floating Point Unitoperation (nsec) vs. Year

Power or Energy vs. Year



[7] Ramalatha, M.; Dayalan, K.D.; Dharani, P.; Priya, S.D., "High speedenergy efficient ALU design using Vedic multiplication techniques,"Advances in Computational Tools for Engineering Applications, 2009.ACTEA '09. International Conference on , vol., no., pp.600,603, 15-17July 2009.

[8] Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bitRISC processor using low power pipelining," Computing,Communication & Automation (ICCCA), 2015 International Conferenceon , vol., no., pp.1294,1297, 15-16 May 2015.

[9] Mathew, S.K.; Anders, M.A.; Bloechel, B.; Trang Nguyen;Krishnamurthy, R.K.; Borkar, S., "A 4-GHz 300-mW 64-bit integer

execution ALU with dual supply voltages in 90-nm CMOS," Solid-StateCircuits, IEEE Journal of , vol.40, no.1, pp.44,51, Jan. 2005.

[10] Chatterjee, B.; Sachdev, M., "Design of a 1.7-GHz low-power delay-fault-testable 32-b ALU in 180-nm CMOS technology," Very LargeScale Integration (VLSI) Systems, IEEE Transactions on , vol.13, no.11,

pp.1296,1304, Nov. 2005.

Additional References

[11] Angelina. "What Is A CPU and What Does It Do? [TechnologyExplained]." MakeUseOf . N.p., n.d. Web. 07 Dec. 2015.

[12] Zandbergen, Paul. "Arithmetic Logic Unit (ALU): Definition, Design &Function." Study. N.p., n.d. Web. 07 Dec. 2015.

[13] Patterson, David A., and John L. Hennessy. Computer Organization and Design: The Hardware/software Interface. N.p.: n.p., n.d. Print.



TABLE I. ALU ARCHITECTURE NAME AND SPECIFICATIONS.

ALU or Fl oating Point

Ar chitecture Name

Datapath width (bits)

or

#bits in operands

Time for Operation

or

Design Type

I TRS Technology

Node (nm)

or

Area

or

Model of Chip

used

Energy/Power

Consumption(W or J)

else

indicate “low” or

“high” Adder Mu ltipli er Fl oating Point

Energy and Area Analysis

of a Floating-Point Unit [1]32 bits (Operands) N/A N/A

IEEE-754Single

Precision

45nm and 15nm

(ITRS Node)

2.048mW (45nm)

0.6340mW (15nm)

Ultra-area-efficient fault-tolerant QCA full adder [2]

1 bit (Operands)

Ultra-area-

efficient fault-tolerant QCA

full adder

N/A N/A18nm^2 (Cell

Area)low

Energy-Efficient Multiplier-

Less Discrete Convolver

through Probabilistic

Domain Transformation [3]

128 bits (Operands) N/A

4.09 μs

Energy-

Efficient

Multiplier

N/A

Virtex 6 FPGA

devices(XC6VLX550t)

(Model of Chip

used)

166.63 nJ

High Throughput Power

Aware FIR Filter [4]64 bits(Operands)

HighThroughput

Power Aware

FIR Adder

HighThroughput

Power Aware

FIR multiplier

N/A.24 static

CMOS logiclow

Advanced Encryption

Standard Design [5]128Bits (Operands) N/A

Xilinx DSP48

multiplier N/A N/A low

Back- gate forward

substrate bias (BGFSB)

method [6]4 bit(Operand)

Ripple carry

adder N/A N/A

1.2 mm N-well

CMOS

Technologylow

Vedic ALU [7] 64 bit(Operand) 64 bit Adder

Vedic

MultiplierModule

N/A N/A low

16 bit RISC Processor

[8]

16 Bit (Operands)Carry Select

Adder N/A N/A

XILINX

KINTEX

XC7K1607-

3fbg676 in it kit

28 nm

technology

.22Watts

Integer Execution ALU with

Dual Supply Voltages [9]64bit(Operands)

Upper order

32 bit Adder N/A N/A 90nm CMOS 300mW

Low Power Delay Fault

Testable 32b ALU [10]

32 bit (operands) 32 bit Adder N/A N/A 180nm CMOS 200W

Figure 1 ALU Design

Lowering Power Dissipation and Energy Consumption

Documents

Transcript of Lowering Power Dissipation and Energy Consumption