Field Programmable Gate Array (FPGA): A Tool For Improving ... · Field Programmable Gate Array...

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014ISSN 2229-5518

Field Programmable Gate Array (FPGA): A ToolFor

Improving Parallel ComputationsBy

Mrs. Abisoye Opeyemi A. (Msc) Computer Science; Department of Computer Science, Federal University of Technology, Minna,Niger State, Nigeria. E-mail: [email protected], [email protected]

E.O Justice Emuoyefarche (PhD) Computer Engineering Department; Ladoke Akintola University of Technology, Ogbomoso,Oyo State, Nigeria. E-mail: [email protected], [email protected]

AbstractPlatform Multicore Processor, Complex Programmable Logic Devices (CPLDs) Application-Specific Integrated Circuit (ASIC), Graphic Pro-

cessing Unit(GPU), Vector Processor, Massively parallel processor (MPP), and Symmetric Multiprocessor (SMP) have become key components fordealing with parallel applications. Within parallel computing, there is a specialized parallel device called Field Programmable Array (FPGA) remain nicheareas of interest. While not domain-specific, it tends to be applicable to only a few classes of parallel problems. This research work explores the underly-ing parallel architecture of this Field Programmable Gate Array (FPGA). The design methodology, the design tools for implementing FPGAs is discussede.g. System Generator from Xilinx, Impulse C programming model etc. FPGA design in compares with other technology) is envisaged. In this researchwork FPGA typically exploits parallelism because FPGA is a parallel device. With the use of simulation tool, Impulse Codeveloper (Impulse C), FPGAplatform graphical tools that provide initial estimates of algorithm throughput such as loop latencies and pipeline effective rates are generated. Usingsuch tools, you can interactively change optimization options or iteratively modify and recompile C code to obtain higher performance.

Index Terms: Application-specific integrated circuit (ASIC), Complex Programmable Logic Devices (CPLDs, Field Programmable Gate Array (FPGA),Graphic Processing Unit(GPU), Impulse Codeveloper(Impulse C), Massively parallel processor (MPP), Multicore Processor, Symmetric Multiprocessor(SMP), Vector Processor.

—————————— ——————————

1 INTRODUCTIONParallel Computing allows for the dissection of large data setsinto smaller sets, each to be handled by separate processors.This significantly increases the performance and handling ofprocessing intensive application. With the emergence of mul-ticore computers, we are facing the challenge of parallelizingperformance-critical applications of all sorts. Compared tosequential applications, our repertoire of tools and methodsfor cost-effectively developing reliable, parallel applications isspotty. [5]Within parallel computing, there is a specialized parallel de-vice called Field Programmable Array (FPGA) that remainniche areas of interest. While not domain-specific, it tends tobe applicable to only a few classes of parallel problems. Thehistorical roots of FPGAs are in complex programmable logicdevices (CPLDs) of the early to mid 1980s. A Xilinx co-founder, Ross Freeman, invented the field programmable gatearray in 1985. CPLDs and FPGAs include a relatively largenumber of programmable logic elements. CPLD logic gatedensities range from the equivalent of several thousand to tensof thousands of logic gates, while FPGAs typically range fromtens of thousands to several million [11]The primary differences between CPLDs and FPGAs are archi-tectural. A CPLD has a somewhat restrictive structure consist-ing of one or more programmable sum-of-products logic arraysfeeding a relatively small number of clocked registers. The re-sult of this is less flexibility, with the advantage of more pre-dictable timing delays and a higher logic-to-interconnect ratio.

The FPGA architectures, on the other hand, are dominated byinterconnect. This makes them far more flexible (in terms of therange of designs that are practical for implementation withinthem) but also far more complex to design for. Another notabledifference between CPLDs and FPGAs is the presence in mostFPGAs of higher-level embedded functions (such as adders andmultipliers) and embedded memories, as well as to have logicblocks implements decoders or mathematical functions. [15].Applications of FPGAs include digital signal processing, soft-ware-defined radio, aerospace and defense systems, ASIC pro-totyping, medical imaging, computer vision, speech recogni-tion, cryptography, bioinformatics, computer hardware emu-lation and a growing range of other areas. FPGAs originallybegan as competitors to CPLDs and competed in a similarspace, that of glue logic for PCBs. As their size, capabilities,and speed increased, they began to take over larger and largerfunctions to the state where some are now marketed as fullsystems on chips (SOC).FPGAs especially find applications in any area or algorithmthat can make use of the massive parallelism offered by theirarchitecture. One such area is code breaking, in particularbrute-force attack, of cryptographic algorithms.The inherent parallelism of the logic resources on the FPGAallows for considerable compute throughput even at a sub-500 MHz clock rate. For example, the current (2007) generationof FPGAs can implement around 100 single precision floatingpoint units, all of which can compute a result every singleclock cycle. The flexibility of the FPGA allows for even higher

1551

IJSER © 2014 http://www.ijser.org

IJSER

International Journal of Scientific & Engineering Research Volume 5, Issue 2, February-2014ISSN 2229-5518

performance by trading off precision and range in the numberformat for an increased number of parallel arithmetic units.This has driven a new type of processing called reconfigurablecomputing, where time intensive tasks are offloaded fromsoftware to FPGAs. [15]

2.0 AIMS AND OBJECTIVEIt is known that using Multicore Processor, Application-specificintegrated circuit (ASIC), Graphic Processing Unit(GPU), VectorProcessor, Massively parallel processor (MPP), and SymmetricMultiprocessor (SMP) more difficulties are still in existence tosolve parallel applications. Such difficulties are low massiveparallelism, inability to interactively change optimization op-tions to obtain higher performance, inability to update thefunctionality after shipping. To eradicate such difficulties, thefull architecture of “Field Programmable Logic Array” is envis-aged.

Objectives

1. To explore challenges of parallel computations.2. To describe the need of an effective parallel device to

eradicate such challenges3. To describe how typical parallel application is implement-

ed on FPGA using Impulse Codeveloper from Xilinx Gen-erator as simulator.

4. To write an algorithm for the application5. To translate the algorithm into C++ program having a

great deal of FPGA specific hardware knowledge.6. To compile the resulting optimized c code by the FPGA

development tools (in particular the c-to hardware com-piler) to create a parallel hardware/software implementa-tion.

3.0 RESEARCH METHODThe execution of this research work is divided into phases andthe goals are achieved through phases that include:

1. Description of the complete application in C++ lan-guage and use a standard C++ debugger to verify thealgorithm.

2. Profiling the application to find the computational“hot spots”.

3. Use of data streaming, message passing and/orshared memory to partition the algorithm into multi-ple communicating software and hardware processes.

4. Use of interactive optimization tools to analyze andimprove the performance of hardware-acceleratedfunctions.

5. Use of C++-to-hardware compiler to generate syn-thesizable hardware, in the form of hardware descrip-tion language files.

4.0 CHALLENGES OF PARALLEL COMPUTA-TIONS.

1. Low massive parallelism

2. Inability to interactively change optimization optionsto obtain higher performance, inability to update thefunctionality after shipping.

3. Finding concurrency in a program - how to help pro-grammers “think parallel”?

4. Scheduling tasks at the right granularity onto the pro-cessors of a parallel machine.

5. The data locality problem: associating data with tasksand doing it in a way that our target audience will beable to use correctly.

6. Scalability support in hardware: bandwidth and la-tencies to memory plus interconnects between pro-cessing elements.

7. Scalability support in software: libraries, scalable al-gorithms, and adaptive runtimes to map high levelsoftware onto platform details.

5.0 FPGA TECHNOLOGY:A field-programmable gate array (FPGA) is a semiconductordevice that can be configured by the customer or designer af-ter manufacturing—hence the name "field-programmable". Toprogram an FPGA you specify how you want the chip to workwith a logic circuit diagram or a source code in a hardwaredescription language (HDL). FPGAs can be used to implementany logical function that an application-specific integratedcircuit (ASIC) could perform, but the ability to update thefunctionality after shipping offers advantages for many appli-cations.FPGAs contain programmable logic components called "logicblocks", and a hierarchy of reconfigurable interconnects thatallow the blocks to be "wired together"—somewhat like a one-chip programmable breadboard. Logic blocks can be config-ured to perform complex combinational functions, or merelysimple logic gates like AND and XOR. In most FPGAs, thelogic blocks also include memory elements, which may besimple flip-flops or more complete blocks of memory.At the highest level, FPGAs are reprogrammable silicon chips.Using prebuilt logic blocks and programmable routing re-sources, you can configure these chips to implement customhardware functionality without ever having to pick up abreadboard or soldering iron.

5.1 FPGA Architecture:The typical basic architecture consists of an array of configu-rable logic blocks (CLBs) and routing channels. Multiple I/Opads may fit into the height of one row or the width of onecolumn in the array. Generally, all the routing channels havethe same width (number of wires).An application circuit mustbe mapped into an FPGA with adequate resources.A classic FPGA logic block consists of a 4-input lookup table(LUT), and a flip-flop, as shown below. In recent years, manu-facturers have started moving to 6-input LUTs in their highperformance parts, claiming increased performance.

1552


IJSER


Typical logic blockThere is only one output, which can be either the registered orthe unregistered LUT output. The logic block has four inputsfor the LUT and a clock input. Since clock signals (and oftenother high-fan-out signals) are normally routed via special-purpose dedicated routing networks in commercial FPGAs,they and other signals are separately managed.For this example architecture, the locations of the FPGA logicblock pins are shown be-

low.

Logic Block Pin Locations Switch boxtopologyEach input is accessible from one side of the logic block, whilethe output pin can connect to routing wires in both the chan-nel to the right and the channel below the logic block.Eachlogic block output pin can connect to any of the wiring seg-ments in the channels adjacent to it.Similarly, an I/O pad can connect to any one of the wiringsegments in the channel adjacent to it. For example, an I/Opad at the top of the chip can connect to any of the W wires(where W is the channel width) in the horizontal channel im-mediately below it.Generally, the FPGA routing is unsegmented. That is, eachwiring segment spans only one logic block before it terminatesin a switch box. By turning on some of the programmableswitches within a switch box, longer paths can be constructed.For higher speed interconnect, some FPGA architectures uselonger routing lines that span multiple logic blocks.Whenever a vertical and a horizontal channel intersect, there isa switch box. In this architecture, when a wire enters a switchbox, there are three programmable switches that allow it to

connect to three other wires in adjacent channel segments. Thepattern, or topology, of switches used in this architecture is theplanar or domain-based switch box topology. In this switchbox topology, a wire in track number one connects only towires in track number one in adjacent channel segments, wiresin track number 2 connect only to other wires in track number2 and so on. The figure below illustrates the connections in aswitch box.

Switch Box TopologyModern FPGA families expand upon the above capabilities toinclude higher level functionality fixed into the silicon. Havingthese common functions embedded into the silicon reducesthe area required and gives those functions increased speedcompared to building them from primitives. Examples ofthese include multipliers, generic DSP blocks, embedded pro-cessors, high speed IO logic and embedded memories.FPGAs are also widely used for systems validation includingpre-silicon validation, post-silicon validation, and firmwaredevelopment. This allows chip companies to validate theirdesign before the chip is produced in the factory, reducing thetime to market

5.2 Benefits of FPGA Technology1. Performance2. Time to Market3. Cost4. Reliability5. Long-Term Maintenance6. Efficiency

1. Performance – Taking advantage of hardware parallelism,FPGAs breaks the paradigm of sequential execution andaccomplishing more per clock cycle. BDTI, a noted analystand benchmarking firm, released benchmarks showinghow FPGAs can deliver many times the processing powerper dollar of a DSP solution in some applications. Control-ling inputs and outputs (I/O) at the hardware level pro-vides faster response times and specialized functionalityto closely match application requirements.

2. Time to market – FPGA technology offers flexibility andrapid prototyping capabilities in the face of increasedtime-to-market concerns. You can test an idea or conceptand verify it in hardware without going through the longfabrication process of custom ASIC design. You can thenimplement incremental changes and iterate on an FPGAdesign within hours instead of weeks. Commercial off-the-shelf (COTS) hardware is also available with differenttypes of I/O already connected to a user-programmableFPGA chip. The growing availability of high-level soft-ware tools decrease the learning curve with layers of ab-straction and often include valuable IP cores (prebuiltfunctions) for advanced control and signal processing.

3. Cost – The nonrecurring engineering (NRE) expense ofcustom ASIC design far exceeds that of FPGA-basedhardware solutions. The large initial investment in ASICsis easy to justify for OEMs shipping thousands of chips

1553


IJSER


per year, but many end users need custom hardware func-tionality for the tens to hundreds of systems in develop-ment. The very nature of programmable silicon meansthat there is no cost for fabrication or long lead times forassembly. As system requirements often change overtime, the cost of making incremental changes to FPGA de-signs are quite negligible when compared to the large ex-pense of repining an ASIC.

4. Reliability – While software tools provide the program-ming environment, FPGA circuitry is truly a “hard” im-plementation of program execution. Processor-based sys-tems often involve several layers of abstraction to helpschedule tasks and share resources among multiple pro-cesses. The driver layer controls hardware resources andthe operating system manages memory and processorbandwidth. For any given processor core, only one in-struction can execute at a time, and processor-based sys-tems are continually at risk of time-critical tasks pre-empting one another. FPGAs, which do not use operatingsystems, minimize reliability concerns with true parallelexecution and deterministic hardware dedicated to everytask.

5. Long-term maintenance – As mentioned earlier, FPGAchips are field-upgradable and do not require the timeand expense involved with ASIC redesign. Digital com-munication protocols, for example, have specificationsthat can change over time, and ASIC-based interfaces maycause maintenance and forward compatibility challenges.Being reconfigurable, FPGA chips are able to keep upwith future modifications that might be necessary. As aproduct or system matures, you can make functional en-hancements without spending time redesigning hardwareor modifying the board layout.

6. Efficiency:-More computation with less power consump-tion. FPGA offers the best ratio of GFlops/Watt.

5.3 FPGA Design and ProgrammingFPGAs can be programmed with hardware description lan-guages such as VHDL or Verilog. However, programming inthese languages can be tedious. Several vendors have createdC to HDL languages that attempt to emulate the syntaxand/or semantics of the C programming language, withwhich most programmers are familiar. The best known C toHDL languages are Mitrion-C, Impulse C, DIME-C, and Han-del-C.To simplify the design of complex systems in FPGAs,there exist libraries of predefined complex functions and cir-cuits that have been tested and optimized to speed up the de-sign process. These predefined circuits are commonly called IPcores, and are available from FPGA vendors.

5.31 Impulse Co-Developer (Impulse C) Impulse C is software to hardware compiler (simulator) thatgenerates hardware to software interfaces. It is a set of libraryfunctions that support parallel programming for FPGAs usingthe C language. Impulse C optimizes C code for parallelismand generates HDL ready for FPGA synthesis. It moves com-

pute –intensive functions to FPGA. The Impulse C approachfocuses on the mapping of algorithms to mixedFPGA/processor systems. It creates highly parallel, dataflow-oriented application. Impulse C simplifies the creation of high-ly parallel algorithms, including mixed software/hardwarealgorithms, through the use of well defined data communica-tion, message passing, and synchronization mechanisms.

6.0 EXPERIMENTATION WITH THE SIMULATOR

The source code is divided into various blocks and each blocksimulation in the hardware platform is shown. The pipelinestages, the latency, effective rate, and number of samples gen-erated per each cycle for each block is shown in the sourcecode. This will help in evaluating the acceleration and the per-formance of each algorithm.

Graphical Representation of Synthesis of Code using theSimulator (Impulse Codeveloper Pipeline stages

Fig 1a. Showing the source code of pipeline2, generating la-tency of 3, effective rate 18, 2samples/cycle and MaximumUnit delay of 9

1554


IJSER


Fig1b. Showing the datapath of pipeline2 generating latency of3, effective rate 18, 2samples/cycle and Maximum Unit de-lay of 9

Fig 2a. Showing the source code of pipeline4 generatinglatency of 14, effective rate 32, 1sample/clockcycle and Max-imum Unit delay of 32

Fig2b. Showing the datapath of pipeline4 generating latency of14, effective rate 32, 1sample/clockcycle and Maximum Unitdelay of 32

Fig 3. With simulator (Block Stages) , Max unit Delay of 0

1555


IJSER


Fig 4 . Showing the source code of the block stageWithout Simulator

Fig 5 The modules or classes are shown while the pipelinerate, effective rate cannot be determined

7.0 DISCUSSION OF RESULTSResults show the use of FPGA as a tool to improve parallelcomputations with the use of Impulse C tools as simulator.Graphical tools ( Fig 1a,2a) above show the source code wile(Fig 1b,2b) show the datapath and help to provide initial esti-mates of algorithm throughput such as loop latencies andpipeline effective rates. Using such tools, you can interactivelychange optimization options or iteratively modify and recom-pile C code to obtain higher performance. Such design itera-tions may take only a matter of minutes when using C, where-as the same iterations may require hours of even days whenusing VHDL or Verilog.Moreover, Impulse C-tools uses optimization techniques toincrease the performance of the code being used for an appli-cation without having a great deal of FPGA-specific hardwareknowledge. We have also shown that pipelining introduces apotentially high degree of parallelism in the generated logic,allowing us to achieve the best possible throughput.

8.0 FINDINGS: PERFORMANCE EVALUATION /COMPARATIVE ANALYSIS

1. WITHOUT FPGASIMULATOR

WITH FPGA SIMULA-TOR

2. It does not generatehardware program

It generates hardware pro-gram simultaneously

3. The number of ad-ders, comparatorscannot be calculated.

The number of adders,comparators can be calcu-lated

4. The number of sam-ples generated percycle can’t be deter-mined.

The number of samplesgenerated per cycle deter-mined.

5.The code cannot beeasily pipelined tospeed up the pro-cessing of filters.

It uses pragma co-unroll &pragma co-pipeline topipeline the code and pro-cessing of filters.

6.It can easily be usedfor fixed format offilters.

It can be used for fixed &complex filters.

7. No graphical repre-sentation of synthesisof code.

It generates graphical rep-resentation to show syn-thesis of code in blocks.

8. The synthesis of codeprocessing flow can-not be seen.

The synthesis of flow ofblocks & statements can beshown

9. Does not generatehardware descriptionlanguage.

It generates hardware de-scription language

10. Presence of low levelembedded functions

Presence of higher levelembedded functions (suchas adders & multipliers)and embedded memoriesas well as logic blocks toimplement decoders ormathematical functions.

9.0 CONCLUSION AND FURTHER RESEARCHIn conclusion, this research has described FPGA as a tool toimprove to parallel applications by acting as a co-processorthan conventional processors. We have shown that the under-lying parallel architecture of FPGA helps in moving expensivecomputations from the CPU and into the specifically designedlogic inside the FPGA and thus obtaining high performance atan economical price.FPGAs are becoming easier to use as the development toolsget better and as the prices on FPGAs falls smaller/denserchip manufacturing technology becomes available, thus mak-ing them affordable to use in more computing applications.Therefore, trends of FPGAs have now proved a better alterna-tive to traditional processors such as ASIC for a growingnumber of higher-volume applications. Further research canfocus on , hardware or software interfacing and FPGA tooldevelopment.

1556


IJSER


10.0 RECOMMENDATIONBased on the results and findings above we now recommendthe use of Field programmable gate array (FPGA) as the besttechnology to solve parallel applications rather than usingconventional processors because it increases the computation-al speed . FPGA technology offers these advantages; reliabil-ity, low cost, high performance, long term maintenance, littletime to market and efficient compared to conventional proces-sors.

REFERENCES

[1] Anthony S. & Lan P.(2006),- “The design of anewFPGA Architecture”, Friday, Jan 20, BDTI Focus Re-port: FPGAs for DSP, Second Edition, BDTI Bench-marking.

[2] Acken, Kevin P.; Irwin, Mary Jane; Owens, Robert M.(1 January 1998). The Journal of VLSI Signal Processing19 (2): 97–113. doi:10.1023/A:1008005616596

[3] D.Ryle, D.Popig, E.Stahlberag (2006). – ”ApplyingFPGA to Biological Problems”.

[4] D'Amour, Michael R., CEO DRC Computer Corpora-tion. "Standard Reconfigurable Computing". Invitedspeaker at the University of Delaware, February 28,2007.

[5] Gottlieb, Allan; Almasi, George S. (1989). Highly paral-lel computing. Redwood City, Calif.: Benja-min/Cummings. ISBN 0-8053-0177-1.

[6] Jason Cong & Kenneth Yan,(2000). Department ofComputer Science, University of California, Los An-geles, International Symposium on Field Program-mable Gate Arrays Proceedings of the 2000ACM/SIGDA eighth international symposium onField programmable gate arrays Monterey, California,United States Pages: 75 - 82 Year of Publication: 2000ISBN:1-58113-193-3

[7] Maslennikov, Oleg (2002). "Systematic Generation ofExecuting Programs for Processor Elements in Paral-lel ASIC or FPGA-Based Systems and Their Trans-formation into VHDL-Descriptions of Processor Ele-ment Control Units". Lecture Notes in Computer Science,2328/2002: p. 272.

[8] M. Thompson,(2006). “The Field-Programmable GateArray “(FPGA): Expanding Its Boundaries, InStat

Market[9] Moreno, W.A.; Poladia, K.,(1998). “Field programmable

gate array design for an application specifics Signal Pro-cessing algorithms” Devices, Circuits and Systems. Pro-ceedings of the 1998 Second IEEE International Cara-cas Conference on Digital Object Identifier10.1109/ICCDCS.1998.705837 Volume 1 , Issue , 2-4Mar 1998 Page(s):222 – 225 Research, April 2006.

[10] M. Thompson, (2004). FPGAs accelerate time to marketfor industrial designs, EE Times http://www.us.design-reuse.com/articles/8190/fpgas-accelerate-time-to-market-for-industrial-designs.html

[11] Peter Clarke(2001), “Xylinx, ASIC Vendors Talk Li-censing”, Retrieved February 10,2009.

[12] Patterson and Hennessy, (2007). Computer organizationand design : the hardware/software interface (2. ed., 3rdprint. ed.). San Francisco: Kaufmann. ISBN 1-55860-428-6. p. 537 & 714

[13] Kirkpatrick, Scott (2003). "Computer Science: RoughTimes Ahead". Science, Vol. 299. No. 5607, pp. 668 -669. DOI: 10.1126/science.1081623

[14] Shimokawa, Y.; Y. Fuwa and N. Aramaki, (1991). "Aparallel ASIC VLSI neurocomputer for a large numberof neurons and billion connections per second speed".International Joint Conference on Neural Networks 3:2162–2167. doi:10.1109/IJCNN.1991.170708. ISBN 0-7803-0227-3.

[15] htttp://en.wikipedia.org/wiki/Field-Programmable-_gate_array

1557


IJSER

Field Programmable Gate Array (FPGA): A Tool For Improving ... · Field Programmable Gate Array...

Documents

Transcript of Field Programmable Gate Array (FPGA): A Tool For Improving ... · Field Programmable Gate Array...