Lowering Power Dissipation and Energy Consumption

5
8/19/2019 Lowering Power Dissipation and Energy Consumption http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 1/5 Page 1 of 5 Lowering Power Dissipation and Energy Consumption in Arithmetic Logic Units From the Years 2002 to 2015 James Harrison Department of Electrical and Computer Engineering University of Central Florida Orlando, FL 32816-2362  Abstract  —  When designing a processor or a logical circuit one of the most important factors that needs to be taken into account is power and energy consumption. This paper will evaluate how different Arithmetic Logic Units (ALU) are designed with a purpose of lowering power consumption and  being more energy efficient. Along with Power, datapath width, the number of bits in operands, ITRS technology node, execution time, and area will all be measured. Some examples of Arithmetic Logic Units tha will be discussed are Low Power Delay Fault Testable 32b ALU, the Ultra-area-efficient fault- tolerant QCA full adder as well as the High Throughput Power Aware FIR Filter. Keywords  — ALU, Power, Data bus, Execution, time, adder, multi plier, f loating point, ar ea, energy I. I  NTRODUCTION In the world today there are typically two components of a Central Processing Unit (CPU), the Arithmetic Logic Unit or ALU for short and the Control Unit (CU). In the CPU the ALU  performs arithmetic and logical operations, it is a fundamental  building block of the CPU [11]. Some examples of the Arithmetic operations performed by an ALU would be subtraction, multiplication, addition and also division. As the name implies the ALU can also perform logical operations such as XOR, OR, AND. A logical operation can also be a comparison, the unit can compare letters, characters or numbers and based on the result of the comparison that took  place, the computer can take a specific action [11]. In [12] a computer before you can execute an instruction, data and program instructions must be place from a device, whether it be an input or secondary storage, into memory. Once they are in memory the CPU can perform four steps for each of the instructions. The first two steps are called the Instruction Time and the last two steps are called Execution time. First the Control Unit will get the instruction from memory, and then it will decode the instruction and direct the data to the ALU. The ALU then executes the arithmetic or logical instruction based off the data, and then stores the result of the operation in memory or it can also store the result into a register. The ALU is the unit that has control and performs the actual operations on the data. [12] There are many important metrics when dealing with computers such as data bus width, ITRS technology node, execution time, power dissipation, and energy consumption of  processors [13]. The data bus width helps to determine its data rate, which is, the number of bytes per second it can carry. This is one of the main factors determining the processing power of a computer. A 32-bit bus is what most current processor designs today are using, which means that 32 bits of data can be transferred at one time, the wider the bus the more information that can be transferred [13]. Execution time is defined as the time in which it takes a single instruction is executed, the last portion of the instruction cycle which is comprised of the actions taken by the ALU, such as executing the arithmetic or logical instructions based off the data and then storing the result into a register [13]. The process that CPU’s use to consume electrical energy and also to disperse or dissipate this energy by losing it as heat or also by switching devices within the CPU, this process is called Power Dissipation [13]. In the forthcoming sections of this paper, 10 ALU designs ranging in years from 2003 to the year of 2015 will be discussed, and also how they lowered the power dissipated and also improved energy consumption by different ALU designs. II. LITERATURE EVIEW One of the most important factors that need to be considered when designing a logical circuit or a processor is  power as well as energy consumption. The usage of power, meaning how much power a digital logic circuit is consuming, greatly affects the performance of the system. Many different Arithmetic Logic units have been designed with one of the main purposes being to lower the power dissipated. Below are 10 models and implementations of different method which all lower power or energy consumption, in some way. In [1] the design reviewed is the IEEE 754 Single Precision Floating-Point Unit in 2015. In this paper different methods of improving the energy efficiency are discussed one being  partially truncating the computation of mantissa and allowing the bit-width of mantissa to be dynamically interchangeable in the multiplicand, multiplier and output product. Another method to minimize power consumption and energy of digital systems, which are implemented in Complementary Metal

Transcript of Lowering Power Dissipation and Energy Consumption

Page 1: Lowering Power Dissipation and Energy Consumption

8/19/2019 Lowering Power Dissipation and Energy Consumption

http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 1/5

Page 1 of 5

Lowering Power Dissipation and Energy Consumption in Arithmetic Logic Units

From the Years 2002 to 2015

James HarrisonDepartment of Electrical and Computer Engineering

University of Central Florida

Orlando, FL 32816-2362

 Abstract  —  When designing a processor or a logical circuitone of the most important factors that needs to be taken intoaccount is power and energy consumption. This paper willevaluate how different Arithmetic Logic Units (ALU) aredesigned with a purpose of lowering power consumption and being more energy efficient. Along with Power, datapathwidth, the number of bits in operands, ITRS technology node,execution time, and area will all be measured. Some examples

of Arithmetic Logic Units tha will be discussed are Low PowerDelay Fault Testable 32b ALU, the Ultra-area-efficient fault-tolerant QCA full adder as well as the High Throughput PowerAware FIR Filter.

Keywords  — ALU, Power, Data bus, Execution, time, adder,

multi plier, f loating point, ar ea, energy

I.  I NTRODUCTION

In the world today there are typically two components of aCentral Processing Unit (CPU), the Arithmetic Logic Unit orALU for short and the Control Unit (CU). In the CPU the ALU performs arithmetic and logical operations, it is a fundamental building block of the CPU [11]. Some examples of the

Arithmetic operations performed by an ALU would besubtraction, multiplication, addition and also division. As thename implies the ALU can also perform logical operationssuch as XOR, OR, AND. A logical operation can also be acomparison, the unit can compare letters, characters ornumbers and based on the result of the comparison that took place, the computer can take a specific action [11].

In [12] a computer before you can execute an instruction,data and program instructions must be place from a device,whether it be an input or secondary storage, into memory.Once they are in memory the CPU can perform four steps foreach of the instructions. The first two steps are called theInstruction Time and the last two steps are called Executiontime. First the Control Unit will get the instruction frommemory, and then it will decode the instruction and direct thedata to the ALU. The ALU then executes the arithmetic orlogical instruction based off the data, and then stores the resultof the operation in memory or it can also store the result into aregister. The ALU is the unit that has control and performs theactual operations on the data. [12]

There are many important metrics when dealing withcomputers such as data bus width, ITRS technology node,

execution time, power dissipation, and energy consumption of processors [13].

The data bus width helps to determine its data rate, whichis, the number of bytes per second it can carry. This is one ofthe main factors determining the processing power of acomputer. A 32-bit bus is what most current processor designstoday are using, which means that 32 bits of data can be

transferred at one time, the wider the bus the more informationthat can be transferred [13].

Execution time is defined as the time in which it takes asingle instruction is executed, the last portion of the instructioncycle which is comprised of the actions taken by the ALU,such as executing the arithmetic or logical instructions basedoff the data and then storing the result into a register [13].

The process that CPU’s use to consume electrical energyand also to disperse or dissipate this energy by losing it as heator also by switching devices within the CPU, this process iscalled Power Dissipation [13].

In the forthcoming sections of this paper, 10 ALU designsranging in years from 2003 to the year of 2015 will bediscussed, and also how they lowered the power dissipated andalso improved energy consumption by different ALU designs.

II.  LITERATURE R EVIEW 

One of the most important factors that need to beconsidered when designing a logical circuit or a processor is power as well as energy consumption. The usage of power,meaning how much power a digital logic circuit is consuming,greatly affects the performance of the system. Many differentArithmetic Logic units have been designed with one of themain purposes being to lower the power dissipated. Below are10 models and implementations of different method which all

lower power or energy consumption, in some way.

In [1] the design reviewed is the IEEE 754 Single PrecisionFloating-Point Unit in 2015. In this paper different methods ofimproving the energy efficiency are discussed one being partially truncating the computation of mantissa and allowingthe bit-width of mantissa to be dynamically interchangeable inthe multiplicand, multiplier and output product. Anothermethod to minimize power consumption and energy of digitalsystems, which are implemented in Complementary Metal

Page 2: Lowering Power Dissipation and Energy Consumption

8/19/2019 Lowering Power Dissipation and Energy Consumption

http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 2/5

Oxide Semiconductor’s would be to reduce the supply voltagenear to the threshold voltage, this will cause a penalty to performance and will also have an impact on the logical speed[1].

In [2] implementation of Quantum dot cellular automata(QCA) circuits aim toward lowering energy consumption, as

well as faster operations. They used a ultra area efficient faulttolerant QCA full adder in the year 2015. In order to restoreenergy four distinct clock phases were applied, the QCA cellswere synchronized by the clock signals, making the accuracyof the QCA operations more accurate allowing more efficient performance [2].

Paper [3] proposes a computing scheme, which is centeredon probabilistic domain transformation aiming for faultresilience as well as low power operations. By switching formthe normal multiplier-based convolution methods [3] presents amultiplier less probabilistic convolution. They used a energyefficient multiplier in 2014. In this model a lightweight adder

replaces the expensive multipliers, through probabilisticdomain transformation. This allows the more basic operationsto be performed to achieve higher energy-efficiency at a lowercost of the hardware. [3]

In 2003 a High power Aware Finite Impulse Adder andmultiplier, the average power dissipation and latency are bothsignificantly reduced by pipelining the multipliers and adders.In order to save the power dissipation the power awareness wasimproved. By using a selective method and 2-D gatingtechnique together power awareness as well as reducinglatency of the FIR filter was achieved [4].

In [5] we learn about the implementation of the XilinxDSP48 multiplier in 2014. They implemented the PriorityUsing Resource Escalation (PURE) approach, which providedan adaptive and dynamic reconfiguration to achievesurvivability. PURE achieves the objectives, of dynamicreconfiguration of redundancy permits autonomous operationswhile maintaining a defined quality measure within arearesource, power and energy constraints, at reduced poweroverheads and area compared to static redundancy schemes. Itdoes so by adapting a uniplex instance of the data path whenaberrant behavior occurs [5].

In 2002 a high-speed 4 bits ALU is designed for 1 Voltoperations in order to display how useful the back gate forwardsubstrate bias method (BGFSB) can be. This ALU used aripple carry adder and was also capable of performing eightoperations, four of them being Logical and four beingarithmetic. The BGFSB method is low voltage as well ascapable of high speed applications. In the steady state thesubthreshhold current increase due to a reduction in thethreshold voltage because of BGFSB. [6]

In 2009 a Vedic Multiplier Module as well as a 64-bitAdder were used in order to reduce the complexity, areaexecution time and the power in computations. The design produced a high-speed power efficient multiplier [7].

A 16 bit low power pipelined RISC processor is used in

 paper [8], using a carry select adder in 2015. To design theRISC processor a Verilog HDL was used, it was evaluatedusing the XILINX KINTEX XC7K1607-3fbg676, a 28 nmtechnology processor was used to implement the two clockcycles. The 16-bit RISC processor using a 2-stage pipeline thatwill increase the speed and also reduce the latency. Also adesign technique called clock gating was used, clock gating is alow power technique which reduces the consumption of power.By using this method the dynamic power was greatly reducedfrom .71 watts and the quiescent power was reduced to .149watts and the total power was reduced to .22 watts [8].

In [9] an Upper order 32b adder was used in the year 2005

 because fast 32 and 64b ALU operations with a single cyclelatency and throughput are essential ingredients of high performance microprocessors execution cores [9]. In the 64bmode power performance was optimized by a gated secondaryoff chip supply voltage. By using high-speed single raildynamic circuit techniques and a sparse tree semi dynamicadder low dynamic power consumption and high noiserobustness with a maximum voltage was obtainable. Forefficient power performance tradeoff, the upper order 32b carrymerge tree slack was able to lower its supply voltage to 1V.This resulted in an extra 22% power benefit [9].

In paper [10] the design used was a 32b ALU, whichconsisted of a 32-bit adder in the year 2005. This designallowed low power operations while also supporting a designfor test (DST) scheme. This resulted in a 22% reduction in thestandby mode power leakage, and an 18% reduction in ALUtotal energy. This method integrated a delay fault testablescheme with logic design flow and could detect a large rangeof delay fault. These design techniques can help in achievinglow power operation of high end digital IC’s [10]. 

Page 3: Lowering Power Dissipation and Energy Consumption

8/19/2019 Lowering Power Dissipation and Energy Consumption

http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 3/5

III.  DATA A NALYSIS 

.

Fig. 1. This chart represents the data bus width in bits vs the year

from 2003-2015

Fig. 2. This graph is for ITRS Technology Node vs Year. Some datahad units other than nm or didn’t have an ITRS Technology Node

listed so that data was not placed into the chart.

Fig. 4. This chart is for Power vs. Year, some of the data had power

in different units or didn’t have a power measurement so they areindicataed as low on the chart.

IV.  CONCLUSION 

Lowering power and energy consumption is a veryimportant aspect of designing and utilizing ALUs. This papercovered many different ALU designs and implementations,which were trying to lower the power and energy consumption,along with other metrics such as execution time. In [6] I wasable to see the effect of using a Ripple carry Adder as I previously learned about in module-09 as well and in [8] theeffect of a carry select adder was shown.

R EFERENCES 

[1] 

S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating-Point Unit in 15nm CMOS Process Technology," in Proceedings ofIEEE SoutheastCon 2015 (SECon-2015), Fort Lauderdale, FL, April 9 -12, 2015.

[2] 

A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation ofan Ultra-Area-Efficient Fault-Tolerant QCA Full Adder,"Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015,

[3] 

M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, “Energy-EfficientMultiplier-Less Discrete Convolver through Probabilistic DomainTransformation ,” in Proceedings of 22nd ACM/SIGDA InternationalSymposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, California, USA, February 27-28, 2014.

[4] J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIRFilter Design based on Fine-grain Pipeline Multipliers and Adders," inProceedings of the 2003 IEEE Annual Symposium on VLSI (ISVLSI-03), pp. 260-261, Tampa, Florida, U.S.A., February 20-21, 2003.

[5] N. Imran, R. F. DeMara, J. Lee, and J. Huang, "Self-adapting ResourceEscalation for Resilient Signal Processing Architectures," The SpringerJournal of Signal Processing Systems (JSPS), December 2014, Volume77, Issue 3, pp. 257-280.

[6] A. Srivastava and D. Govindarajan, “A Fast ALU Design in CMOS forLow Voltage Operation,” VLSI Design, vol. 14, no. 4, pp. 315-327,2002.

2000

2005

2010

2015

2020

32 1 128 64 128 4 64 16 64 32

       Y     e     a     r

Data bus width (bits)

Databus Width vs year

2000

2005

2010

2015

2020

45 15 18 240 28 90 180

       Y     e     a     r

ITRS Technology Node(nm)

ITRS Technology Node vs Year

2000

2005

2010

2015

       Y     e     a     r

Power (mW)

Power vs Year

Metrics covered by various papers which aresuitable for plotting:

  Data bus width (bits) vs. Year

  ITRS technology node (nm) vs. Year

 

Execution time per ALU or Floating Point Unitoperation (nsec) vs. Year

  Power or Energy vs. Year

Page 4: Lowering Power Dissipation and Energy Consumption

8/19/2019 Lowering Power Dissipation and Energy Consumption

http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 4/5

[7] Ramalatha, M.; Dayalan, K.D.; Dharani, P.; Priya, S.D., "High speedenergy efficient ALU design using Vedic multiplication techniques,"Advances in Computational Tools for Engineering Applications, 2009.ACTEA '09. International Conference on , vol., no., pp.600,603, 15-17July 2009.

[8] Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bitRISC processor using low power pipelining," Computing,Communication & Automation (ICCCA), 2015 International Conferenceon , vol., no., pp.1294,1297, 15-16 May 2015.

[9] Mathew, S.K.; Anders, M.A.; Bloechel, B.; Trang Nguyen;Krishnamurthy, R.K.; Borkar, S., "A 4-GHz 300-mW 64-bit integer

execution ALU with dual supply voltages in 90-nm CMOS," Solid-StateCircuits, IEEE Journal of , vol.40, no.1, pp.44,51, Jan. 2005.

[10] Chatterjee, B.; Sachdev, M., "Design of a 1.7-GHz low-power delay-fault-testable 32-b ALU in 180-nm CMOS technology," Very LargeScale Integration (VLSI) Systems, IEEE Transactions on , vol.13, no.11,

 pp.1296,1304, Nov. 2005.

Additional References

[11]  Angelina. "What Is A CPU and What Does It Do? [TechnologyExplained]." MakeUseOf . N.p., n.d. Web. 07 Dec. 2015. 

[12] Zandbergen, Paul. "Arithmetic Logic Unit (ALU): Definition, Design &Function." Study. N.p., n.d. Web. 07 Dec. 2015.

[13] Patterson, David A., and John L. Hennessy. Computer Organization and Design: The Hardware/software Interface. N.p.: n.p., n.d. Print.

Page 5: Lowering Power Dissipation and Energy Consumption

8/19/2019 Lowering Power Dissipation and Energy Consumption

http://slidepdf.com/reader/full/lowering-power-dissipation-and-energy-consumption 5/5

 

TABLE I. ALU ARCHITECTURE NAME AND SPECIFICATIONS.

ALU or Fl oating Point

Ar chitecture Name

Datapath width (bits)

or

#bits in operands

Time for Operation

or

Design Type

I TRS Technology

Node (nm)

or

Area

or

Model of Chip

used

Energy/Power

Consumption(W or J)

else

indicate “low” or

“high”  Adder Mu ltipli er Fl oating Point

Energy and Area Analysis

of a Floating-Point Unit [1]32 bits (Operands) N/A N/A

IEEE-754Single

Precision

45nm and 15nm

(ITRS Node)

2.048mW (45nm)

0.6340mW (15nm)

Ultra-area-efficient fault-tolerant QCA full adder [2]

1 bit (Operands)

Ultra-area-

efficient fault-tolerant QCA

full adder

 N/A N/A18nm^2 (Cell

Area)low

Energy-Efficient Multiplier-

Less Discrete Convolver

through Probabilistic

Domain Transformation [3]

128 bits (Operands) N/A

4.09 μs 

Energy-

Efficient

Multiplier

 N/A

Virtex 6 FPGA

devices(XC6VLX550t)

(Model of Chip

used)

166.63 nJ

High Throughput Power

Aware FIR Filter [4]64 bits(Operands)

HighThroughput

Power Aware

FIR Adder

HighThroughput

Power Aware

FIR multiplier

 N/A.24 static

CMOS logiclow

Advanced Encryption

Standard Design [5]128Bits (Operands) N/A

Xilinx DSP48

multiplier  N/A N/A low

Back- gate forward

substrate bias (BGFSB)

method [6]4 bit(Operand)

Ripple carry

adder  N/A N/A

1.2 mm N-well

CMOS

Technologylow

Vedic ALU [7] 64 bit(Operand) 64 bit Adder

Vedic

MultiplierModule

 N/A N/A low

16 bit RISC Processor  

[8] 

16 Bit (Operands)Carry Select

Adder N/A N/A

XILINX

KINTEX

XC7K1607-

3fbg676 in it kit

28 nm

technology

.22Watts

Integer Execution ALU with

Dual Supply Voltages [9]64bit(Operands)

Upper order

32 bit Adder N/A N/A 90nm CMOS 300mW

Low Power Delay Fault

Testable 32b ALU [10] 

32 bit (operands) 32 bit Adder N/A N/A 180nm CMOS 200W

Figure 1 ALU Design