Implementation of Floating point adder using sequential and concurrent processing on Virtex 4

4
International Journal of Advanced Engineering Research and Technology (IJAERT) Volume 1 Issue 1 pp 8-11 December 2013 www.ijaert.org IJAERT @ 2013 Implementation of Floating point adder using sequential and concurrent processing on Virtex 4 Karan Gumber * M. E (ECE), Department of Electronics and communication, Panjab University Chandigarh, India ABSTRACT In today’s life floating point adders are used in number of applications such as air traffic controller, robots, DSP processors, Math processors, Digital computer so the main objective is to reduce chip area and combinational delay i.e. latency it is nothing but the time required to travel an input to the output. Less is the combinational delay more faster is response and better is the implementation of adder. The implementation of floating point adder on Virtex 4 using sequential and concurrent processing family causes change in their consumption of chip area combinational delay i.e. combinational delay is less in Virtex family with the more speed grade as compared with Spartan family. Implementation of floating point adder using concurrent processing on device xc4vfx12-12sf363 of Virtex 4 consumes 4% chip area and using device xc4vfx100-12f1152 consumes 0% chip area and implementation of floating point adder using sequential processing on device xc4vfx12- 12sf363 of Virtex 4 consumes 7% chip area and using device xc4vfx100-12f1152 consumes 0% chip area. Keywords Floating point adder; Xilinx; FPGAs; Virtex 4; chip area I. INTRODUCTION As demand rises for electronic devices to be smaller, faster and more efficient, increasing importance is placed on well designed architecture. Architecture that uses concurrent processing tends to use faster clock period, less combinational delay and hence faster speed but also consumes more chip area compared with standard architecture(without pipeline) that uses sequential processing. Standard algorithm is best implementation with respect to area but has a large overall latency but uses less number of slices [1]. Pipelined architecture tends to reduce latency at a cost of increase in area compared to standard architecture. The 2 path implementation shows a 10% reduction in latency with an added expense of 88% in area compared to standard algorithm. The 5 stage pipelined implementation shows a 6.4% improvement in clock speed. It shows pipelining is directly proportional to chip area. As stages of pipelining increases there is increase in throughput but with an adverse effect of increase in chip area. Floating point adders are used in number of applications such as computer graphics, robotics, digital computers, and DSP processors. Floating point adders can be used to perform both the function of addition and subtraction of two floating point numbers. If the floating point operation is to be performed on decimal number, then the number needs to be converted to floating point format before any operation to be performed[1][2]. In this paper various implementation of floating point adder are studied for better utilization of chip area and to reduce combinational delay for better throughput Addition is the most complex operation in a floating point unit and can cause major delay while requiring a significant area. Over the years, VLSI community has developed many floating point adder algorithm aimed primarily at reducing the overall latency. VHDL code for floating point adder is implemented on FPGA using Xilinx. Synthesis report in Xilinx provide us all the parameters like combinational delay, chip area, components used, number of paths, modelling format etc. II. ADDITION ALGORITHM FOR FLOATING POINT NUMBERS Addition Algorithm: In this section, we explain the algorithm we use for floating point addition. Given two numbers N1 and N2, we can use the flowchart in figure 3 to compute their sum, given that e1,e2 and s1,s2 are the exponents and significands af the numbers, respectively. A detailed description of the algorithm follows: 1. Make the 24 th bit (hidden bit) explicit. If exponent = ‘0’. At this point 33 bits are needed to store the number 8 for exponent, 24 for the significand and 1for the sign. 2. Compare e1 and e2. If e2>e1, swap N1 and N2. Note if swap takes place, further references in the flowchart s1 (e1) will be referring to the old s2 (e2) and vice versa. Also, the absolute difference in the exponent values (e2-e1) needs to be saved 3. Shift s2 to the right by an amount equal to d = (e2-e1). Fill the leftmost bits with zeros. Note that both the numbers are in simple sign/magnitude format. 4. If N1 and N2 have different signs, replace s2 by its two’s complement.

description

In today’s life floating point adders are used in number of applications such as air traffic controller, robots, DSP processors, Math processors, Digital computer so the main objective is to reduce chip area and combinational delay i.e. latency it is nothing but the time required to travel an input to the output. Less is the combinational delay more faster is response and better is the implementation of adder. The implementation of floating point adder on Virtex 4 using sequential and concurrent processing family causes change in their consumption of chip area combinational delay i.e. combinational delay is less in Virtex family with the more speed grade as compared with Spartan family. Implementation of floating point adder using concurrent processing on device xc4vfx12-12sf363 of Virtex 4 consumes 4% chip area and using device xc4vfx100-12f1152 consumes 0% chip area and implementation of floating point adder using sequential processing on device xc4vfx12-12sf363 of Virtex 4 consumes 7% chip area and using device xc4vfx100-12f1152 consumes 0% chip area.

Transcript of Implementation of Floating point adder using sequential and concurrent processing on Virtex 4

Page 1: Implementation of Floating point adder using sequential and concurrent processing on Virtex 4

International Journal of Advanced Engineering Research and Technology (IJAERT)Volume 1 Issue 1 pp 8-11 December 2013 www.ijaert.org

IJAERT @ 2013

Implementation of Floating point adder using sequential and concurrentprocessing on Virtex 4

Karan Gumber* M. E (ECE), Department of Electronics and communication, Panjab University

Chandigarh, India

ABSTRACTIn today’s life floating point adders are used in numberof applications such as air traffic controller, robots,DSP processors, Math processors, Digital computer sothe main objective is to reduce chip area andcombinational delay i.e. latency it is nothing but thetime required to travel an input to the output. Less isthe combinational delay more faster is response andbetter is the implementation of adder. Theimplementation of floating point adder on Virtex 4using sequential and concurrent processing familycauses change in their consumption of chip areacombinational delay i.e. combinational delay is less inVirtex family with the more speed grade as comparedwith Spartan family. Implementation of floating pointadder using concurrent processing on devicexc4vfx12-12sf363 of Virtex 4 consumes 4% chip areaand using device xc4vfx100-12f1152 consumes 0%chip area and implementation of floating point adderusing sequential processing on device xc4vfx12-12sf363 of Virtex 4 consumes 7% chip area and usingdevice xc4vfx100-12f1152 consumes 0% chip area.

Keywords – Floating point adder; Xilinx; FPGAs;Virtex 4; chip area

I. INTRODUCTIONAs demand rises for electronic devices to be

smaller, faster and more efficient, increasingimportance is placed on well designed architecture.Architecture that uses concurrent processing tends touse faster clock period, less combinational delay andhence faster speed but also consumes more chip areacompared with standard architecture(without pipeline)that uses sequential processing. Standard algorithm isbest implementation with respect to area but has a largeoverall latency but uses less number of slices [1].Pipelined architecture tends to reduce latency at a costof increase in area compared to standard architecture.The 2 path implementation shows a 10% reduction inlatency with an added expense of 88% in areacompared to standard algorithm. The 5 stage pipelinedimplementation shows a 6.4% improvement in clockspeed. It shows pipelining is directly proportional tochip area. As stages of pipelining increases there isincrease in throughput but with an adverse effect ofincrease in chip area. Floating point adders are used in

number of applications such as computer graphics,robotics, digital computers, and DSP processors.Floating point adders can be used to perform both thefunction of addition and subtraction of two floatingpoint numbers. If the floating point operation is to beperformed on decimal number, then the number needsto be converted to floating point format before anyoperation to be performed[1][2]. In this paper variousimplementation of floating point adder are studied forbetter utilization of chip area and to reducecombinational delay for better throughput Addition isthe most complex operation in a floating point unit andcan cause major delay while requiring a significantarea. Over the years, VLSI community has developedmany floating point adder algorithm aimed primarily atreducing the overall latency. VHDL code for floatingpoint adder is implemented on FPGA using Xilinx.Synthesis report in Xilinx provide us all the parameterslike combinational delay, chip area, components used,number of paths, modelling format etc.

II. ADDITION ALGORITHM FORFLOATING POINT NUMBERSAddition Algorithm: In this section, we explain thealgorithm we use for floating point addition. Given twonumbers N1 and N2, we can use the flowchart in figure3 to compute their sum, given that e1,e2 and s1,s2 arethe exponents and significands af the numbers,respectively. A detailed description of the algorithmfollows:

1. Make the 24th bit (hidden bit) explicit. Ifexponent = ‘0’. At this point 33 bits are neededto store the number 8 for exponent, 24 for thesignificand and 1for the sign.

2. Compare e1 and e2. If e2>e1, swap N1 andN2. Note if swap takes place, furtherreferences in the flowchart s1 (e1) will bereferring to the old s2 (e2) and vice versa.Also, the absolute difference in the exponentvalues (e2-e1) needs to be saved

3. Shift s2 to the right by an amount equal to d =(e2-e1). Fill the leftmost bits with zeros. Notethat both the numbers are in simplesign/magnitude format.

4. If N1 and N2 have different signs, replace s2by its two’s complement.

Page 2: Implementation of Floating point adder using sequential and concurrent processing on Virtex 4

International Journal of Advanced Engineering Research and Technology (IJAERT)Volume 1 Issue 1 pp 8-11 December 2013 www.ijaert.org

IJAERT @ 2013

5. Compute the significand, S, of the result byadding s1 and s2.

6. If S is negative, replace by its two’scomplement. For S to be negative followingconditions are true.a. N1 & N2 have different signs.b. The most significant bit of ‘S’ is ‘1’.c. There was no carry out in step 5

7. Normalization step.a. If N1 & N2 have same sign and there was

carry out in step 5, then shift S right byone, dropping the last significant bit andfilling most significant bit with a’1’.

b. Otherwise, shift S left until there is a’1’ inthe MSB bit.

c. If S was shifted left more than 24 times,the result is zero.

8. The sign of result is determined by simplymaking output sign the sane as the sign oflarger N1 and N2. The MSB of sign bit isreplaced with this sign bit.

9. The resultant exponent (e1) is adjusted byadding the amount determined in step 7. If itwas determined in step 7(c) that S=0, set theexponent to zero.

10. Assemble the result into 32 bit format [2] [3][4].

A. Special ConditionsThere are some special conditions while implementingfloating point adder which needs to be handle these areexplained below1: If N1 = N2 = ‘0’ then overall result is ‘0’.2: If E1=E2 and sign bit of E1 ≠ E2 then again overallresult is ‘0’.3: If E1= ‘0’ and E2 ≠ ‘0’ then overall result is equal toE2.4: If E2= ‘0’ and E1 ≠ ‘0’ then overall result is equal toE15: If d= E1-E2 ≥24 then overall result is larger of E1 orE2 [3].

B. Problems associated in additionThere are two problems which occurs when we aregoing to add two floating point numbers1: When the exponent of two numbers are differentthis can be solved by shifting the significand of smallernumber to the right by an amount equal to exponentdifference and this amount is added to exponent valueof smaller number to make exponent of both thenumbers are same means in normalized form2: When there is carry out in significand addition ifboth the number are of different sign then add ‘1’ tothe exponent and shift the result of significand to theright by one discarding LSB and if both the numberare of different sign then discard the carry and shift the

result to the left until there is ‘1’ at MSB the amount ofshifting is subtracted from exponent to form realexponent [7][8].

III. IEEE-754 FLOATING POINT STANDARD

In the early days of digital computers, it was quitecommon that machine from different vendors havedifferent word lengths and unique floating-pointformats [7]. This caused many problems, especially, inthe porting of programs between different machines(designs). A man objective in developing such astandard, floating-point representation standard is tomake numerical programs predictable and completelyportable, in the sense of producing identical resultswhen run on different machines. Table 1 summarizesthe most important features of IEEE floating-pointstandard. Zero cannot be represented with anormalized significant, thus a special code wasassigned to it. Zero has all zero representation witheither + or – sign. Also, special codes were needed torepresent other expectations as NaN (not a number)and +-∞. The Nan is useful for representing undefinedresults as 0/0. The standard has adopted the hidden(implicit) approach to save one bit in therepresentation, contributing to increasing the precisionwithout taking up space. This can be done by alwaysrepresenting the floating-point numbers in normalizedform, starting with 1 in storing the number. Denormalsor denormalized values are defined as numbers withouta hidden 1 and with a smallest possible exponent.These numbers were provided to decrease the effect incase of underflow. This provision can lead to a highspeed and cost overhead, so it was offered optionally.In this paper only single precision format of thestandard is used. Format uses 8 bits exponent with abias of 127. 23 bits are used as mantissa with 1 hiddenbit which will always be concatenated as 1 while beingoperated. Denormals format (i.e. the format withouthaving an implicit 1) has not been used and certainrange has been defined to protect functions fromoverflows and undesired inputs. The same standard isdefined for all the functional module of the floatingpoint adder [9] [10].

IV. SYNTHESIS REPORTSynthesis report of floating point adder is used in 4tables using different devices and different speedgrade. Table I and Table II shows the result ofconcurrent processing and Table III and Table IVshows the result of Sequential processing

TABLE IDevice: xc4vfx12-12sf363Speed Grade: -12Concurrent Processing on Virtex 4

Page 3: Implementation of Floating point adder using sequential and concurrent processing on Virtex 4

International Journal of Advanced Engineering Research and Technology (IJAERT)Volume 1 Issue 1 pp 8-11 December 2013 www.ijaert.org

IJAERT @ 2013

Parameters Used Available

Utilization

Number ofslices

401 5472 4%

Number ofslice flip flop

72 16944 0%

Number of 4input LUTs

710 16944 6%

Number ofbonded IOBs

99 240 41%

Number ofGCLKs

2 32 6%

Clock period 1.269 ----------- -----------Combinational delay

24.201nsec ----------- -----------

Frequency 789.266MHz

----------- -----------

CombinationalDelay(offset)

27.891nsec ----------- -----------

Memory Used 24468Kbytes

----------- -----------

TABLE IIDevice: xcv3200e-8fg1156Speed Grade: -8Concurrent Processing on Virtex 4

Parameters Used Available

Utilization

Number ofslices

401 63168 0%

Number ofslice flip flop

72 126336 0%

Number of 4input LUTs

710 126336 0%

Number ofbonded IOBs

99 768 12%

Number ofGCLKs

2 32 6%

Clock period 1.446 ----------- -----------Combinational delay

27.966nsec ----------- -----------

Frequency 691.563MHz ----------- -----------CombinationalDelay(offset)

32.226nsec ----------- -----------

MemoryUsed

425668Kbytes

----------- -----------

TABLE IIIIDevice: xc4vfx12-12sf363Speed Grade: -12Sequential Processing on Virtex 4

Parameters Used Available

Utilization

Number ofslices

407 5472 7%

Number ofslice flip flop

103 16944 0%

Number of 4input LUTs

730 16944 6%

Number ofbonded IOBs

100 240 41%

Number ofGCLKs

1 32 3%

Clock period 3.575nsec ----------- -----------Frequency 279.697MHz ----------- -----------CombinationalDelay(offset)

29.997nsec ----------- -----------

MemoryUsed

240180Kbytes

----------- -----------

TABLE IVDevice: xcv3200e-8fg1156Speed Grade: -8Sequential Processing on Virtex 4Parameters Used Availabl

eUtilization

Number ofslices

401 63168 0%

Number ofslice flip flop

72 126336 0%

Number of 4input LUTs

710 126336 0%

Number ofbonded IOBs

100 768 13%

Number ofGCLKs

1 32 3%

Clock period 4.095 ----------- -----------Frequency 244.200MHz ----------- -----------CombinationalDelay(offset)

34.662nsec ----------- -----------

MemoryUsed

414324Kbytes

----------- -----------

Page 4: Implementation of Floating point adder using sequential and concurrent processing on Virtex 4

International Journal of Advanced Engineering Research and Technology (IJAERT)Volume 1 Issue 1 pp 8-11 December 2013 www.ijaert.org

IJAERT @ 2013

V. SIMULATION WAVEFORM (USING MODEL SIM)Input 1=7.5=010000001111100000000000000000000Input2=9.25=010000010100101000000000000000000DesiredResult=16.75=010000011100001100000000000000000SimulationResult=010000011100001100000000000000000

Input 1=-3.5=110000000111000000000000000000000Input 2=-120.75=110000101111100011000000000000000Desired Result=-124.25=110000101111110001000000000000000SimulationResult=110000101111110001000000000000000

ACKNOWLEDGEMENTSI would like to thanks the anonymous users for theirinsightful comments.

REFERENCES[1] Allan, Wayne Luk, “ Parametised floating point

arithmetic on FPGA,”© 2001 IEEE[2] Dr. John A. Eldon, Craig Robertson, “ A floating

point format for signal processing,” ©2002 IEEE[3] Asger, David, C. N. lyu, “ An IEEE complaint

floating point adder that conforms with thepipelined packet forwarding paradigm,” ©2000IEEE

[4] Ali malik, Dongdong chenand Soek bum ko,“Design tradeoff analysis of floating point addersin FPGAs,” Can. J. elect. Comput. Eng.,©2008IEEE.

[5] Loucas Louca, Todd A cook and William H.Johnson, “Implementation of IEEE singleprecision floating point addition and multiplicationon FPGAs,”©1996 IEEE.

[6] Alexandru, Mircea, Lucian and Oana, “Exploitingparallelism in double path adder structure forincrease througput of floating point addition ,”©2007 IEEE.

[7] V. Y. Gorshtein, A. I Grushin, S>R Shevtsov,“Floating point addtion method and apparatus,”Sun microsystem U.S patent 5808926,1998.

[8] IEEE std. 1076-2002, “IEEE stsndard VHDLreference manual,” Sponsored by DesignAutomation standards Committee published byIEEE.

[9] Metin Mete, Mustafa Gok, “A multiprecisionfloating point adder,” ©2011 IEEE.

[10] Florent de Dinechin, “Pipelined FPGA adders,”©2010 IEEE.

[11] Ali malik, Soek bum ko , “Effectiveimplementation of floating point adder usingpipelined LOP in FPGAss,” ©2010 IEEE.

AUTHORS PROFILE

Er. Karan Gumber pursuing hisMaster of Engineering fromUniversity Institute of Engineeringand technology, Panjab University inElectronics and communication.Thesis work of M.E is pursuing underthe guidance of Sharmelee Thangjam.