Basics of FPGAs Design

5

Click here to load reader

Transcript of Basics of FPGAs Design

Page 1: Basics of FPGAs Design

BASICSofFPGAs Design

A Supplement to Electronic Design/December 4, 2003 Sponsored by Mentor Graphics Corp.

Field-programmable gate arrays(FPGAs) arrived in 1984 as analternative to programmablelogic devices (PLDs) and ASICs.As their name implies, FPGAs

offer the significant benefit of being readilyprogrammable. Unlike their forebearers inthe PLD category, FPGAs can (in most cas-es) be programmed again and again, givingdesigners multiple opportunities to tweak their circuits.

There’s no large non-recurring engineering (NRE) costassociated with FPGAs. In addition, lengthy, nerve-wracking waits for mask-making operations aresquashed. Often, with FPGA development, logic designbegins to resemble software design due to the many itera-tions of a given design. Innovative design often happenswith FPGAs as an implementation platform.

But there are some downsides to FPGAs as well. Theeconomics of FPGAs force designers to balance their rel-atively high piece-part pricing compared to ASICs withthe absence of high NREs and long development cycles.They’re also available only in fixed sizes, which matterswhen you’re determined to avoid unused silicon area.

What are FPGAs?FPGAs fill a gap between discrete logic and the smallerPLDs on the low end of the complexity scale and costlycustom ASICs on the high end. They consist of an arrayof logic blocks that are configured using software. Pro-grammable I/O blocks surround these logic blocks. Bothare connected by programmable interconnects (Fig. 1).The programming technology in an FPGA determines thetype of basic logic cell and the interconnect scheme. Inturn, the logic cells and interconnection scheme deter-mine the design of the input and output circuits as well as

the programming scheme. Just a few years ago, the largest FPGA was measured

in tens of thousands of system gates and operated at 40MHz. Older FPGAs often cost more than $150 for themost advanced parts at the time. Today, however, FPGAsoffer millions of gates of logic capacity, operate at 300MHz, can cost less than $10, and offer integrated func-tions like processors and memory (Table 1).

FPGAs offer all of the features needed to imple-ment most complex designs. Clock managementis facilitated by on-chip PLL (phase-locked loop)or DLL (delay-locked loop) circuitry. Dedicated

memory blocks can be configured as basic single-portRAMs, ROMs, FIFOs, or CAMs. Data processing, asembodied in the devices’ logic fabric, varies widely. Theability to link the FPGA with backplanes, high-speed bus-es, and memories is afforded by support for various single-ended and differential I/O standards. Also found ontoday’s FPGAs are system-building resources such as high-speed serial I/Os, arithmetic modules, embedded proces-sors, and large amounts of memory.

Initially seen as a vehicle for rapid prototyping andemulation systems, FPGAs have spread into a host ofapplications. They were once too simple, and too costly,for anything but small-volume production. Now, with theadvent of much larger devices and declining per-part costs,

Tradeoffs Abound in FPGA Design

Understanding devicetypes and design flows

is key to getting the most out of FPGAs

David Maliniak, Electronic Design Automation Editor

Page 2: Basics of FPGAs Design

Sponsored by Mentor Graphics Inc.

FPGAs are finding their way off the prototyping bench andinto production (Table 2).

Comparing FPGA ArchitecturesFPGAs must be programmed by users to connect the chip’sresources in the appropriate manner to implement thedesired functionality. Over the years, various technologieshave emerged to suit different requirements. Some FPGAscan only be pro-grammed once. Thesedevices employ anti-fuse technology.Flash-based devicescan be programmedand reprogrammedagain after debug-ging. Still others canbe dynamically pro-grammed thanks toSRAM-based technol-ogy. Each has itsadvantages and disad-vantages (Table 3).

Most mod-ernFPGAsare

based on SRAM con-figuration cells, whichoffer the benefit ofunlimited reprogram-mability. When pow-ered up, they can beconfigured to performa given task, such as a

board or system test, and then reprogrammed to performtheir main task. On the flip side, though, SRAM-basedFPGAs must be reconfigured each time their host system ispowered up, and additional external circuitry is required todo so. Further, because the configuration file used to pro-gram the FPGA is stored in external memory, security issuesconcerning intellectual property emerge.

Antifuse-based FPGAs aren’t in-system programmable,

1.Do concentrate on I/O timing, notjust the register-to-register internal frequency thatthe FPGA place-and-route tools report. Frequently,the hardest challenge in a complete FPGA design is

the I/O timing. Focus on how your signals enter andleave your FPGA, because that’s where the bottlenecks frequently occur.2. Do create hierarchy around vendor-specific structures andinstantiations. Give yourself the freedom to migrate from one technology toanother by ensuring that each instantiation of a vendor-specific element isin a separate hierarchical block. This applies especially to RAMs and clock-management blocks.3. Do use IP timing models during synthesis to give the true pic-ture of your design. By importing EDIF netlists of pre-synthesized blocks,your synthesis tool can fully understand your timing requirements. Be cau-tious when using vendor cores that you can bring into your synthesis tool ifthey have no timing model.4. Do design your hierarchical blocks with registered outputswhere possible to avoid having critical paths pass through many levels of hier-archy. FPGAs exhibit step-functions in logic-limited performance. When hier-archy is preserved and the critical path passes across a hierarchical boundary,you may introduce an extra level of logic. When considered along with theassociated routing, this can add significant delay to your critical path.5. Do enable retiming in your synthesis tool. FPGAs tend to be reg-ister-rich architectures. When you correctly constrain your design in synthe-sis, you allow the tool to optimize your design to take advantage of positiveslack timing within the design. Sometimes this can be done after initial placeand route to improve retiming over wireload estimation.

1. Don’t synthesize unlessyou’ve fully and correctly constrainedyour design. This includes correctclock domains, I/O timing require-

ments, multicycle paths, and falsepaths. If your synthesis tool doesn’t see exactly what you want, it can’t makedecisions to optimize your design accordingly.2. Don’t try to fix every timing problem in place and route. Place androute offers little room for fixing timing where a properly constrained synthesis tool would.3. Don’t vainly floor plan at the RTL or block level hoping toimprove place-and-route results. Manual area placement can cause moreproblems than it might initially appear to solve. Unless you are an expert inmanual placement and floorplanning, this is best left alone.4. Don’t string clock buffers together, create multiple clocktrees from the same clock, or use multiple clocks when a simple enable willdo. Clocking schemes in FPGAs can become very complicated now that thereare PLLs, DLLs, and large numbers of clock-distribution networks. Poorclocking schemes can lead to extended place-and-route times, failure tomeet timing, and even failure to place in some technologies. Simplerschemes are vastly more desirable. Avoid those gated clocks, too!5. Don’t forget to simulate your design blocks as well as yourentire design. Discovering and back-tracking an error from the chip’s pinsduring on-board testing can be extremely difficult. On-board FPGA testingcan miss important design flaws that are much easier to identify during sim-ulation; they can be rectified by modifying the FPGA’s programming.

Do’s And Don’ts For The FPGA Designer

Table 1: KEY RESOURCES AVAILABLE IN THE LARGEST DEVICES FROM MAJOR FPGA VENDORS

Features

Clock management

Embedded memory blocks

Data processing

Programmable I/Os

Special features

Xilinx Virtex II Pro

DCMUp to 12

BlockRAMUp to 10 Mbits

Configurable logicblocks and 18-bit by18-bit multipliers

Up to 125,000 logiccells and 556 multipli-er blocks

SelectI/O

Embedded PowerPC405 cores

RocketI/O multi-giga-bit transceiver

Altera Stratix

PLLUp to 12

TriMatrix memoryUp to 10 Mbits

Logic elements andembedded multipli-ers

Up to 79,000 LEs and176 embedded mul-tipliers

Advanced I/O support

DSP blocks

High-speed differ-ential I/O and inter-face standards sup-port

Actel Axcelerator

PLLUp to 8

Embedded RAMUp to 338 kbits

Logic modules (C-Cell and R-Cell)Up to 10,000

R-Cells and 21,000C-Cells

Advanced I/O sup-port

PerPin FIFOs forbus applications

Lattice ispXPGA

SysCLOCK PLLUp to 8

SysMEM blocksUp to 414 kbits

Based on programmablefunctional unit

Up to 3844 PFUs

SysI/O

SysHSI for high-speed serialinterface

Page 3: Basics of FPGAs Design

but rather are pro-grammed offlineusing a device pro-grammer. Once the

chip is configured, it can’t be altered. However, in antifuse technology, device configuration is non-

volatile with no need for external memory. On top of that, it’svirtually impossible to reverse-engineer their programming.They often work as replacements for ASICs in small volumes.

In a sense, flash-based FPGAs fulfill the promise of FPGAs inthat they can be reprogrammed many times. They’re non-volatile, retaining their configuration even when powereddown. Programming is done either in-system or with a pro-

grammer. In some cases, IP security can be achieved using a multi-bit key that locks the configuration data after programming.

But flash-based FPGAs require extra process steps above andbeyond standard CMOS technology, leaving them at least a gen-eration behind. Moreover, the many pull-up resistors result inhigh static power consumption.

FPGAs can also be characterized as having either fine-, medi-um-, or coarse-grained architectures. Fine-grained architecturesboast a large number of relatively simple logic blocks. Each logicblock usually contains either a two-input logic function or a 4-to-1 multiplexer and a flip-flop. Blocks can only be used to imple-ment simple functions. But fine-grained architectures lend them-

selves to execution of functions that benefit from parallelism.

Coarse-grained architectures consist of relatively large log-ic blocks often containing two or more lookup tables andtwo or more flip-flops. In most of these architectures, afour-input lookup table (think of it as a 16 x 1 ROM)

implements the actual logic.

The FPGA design flowAfter weighing all implementation options, you must consid-er the design flow. The process of implementing a design onan FPGA can be broken down into several stages, looselydefinable as design entry or capture, synthesis, and place androute (Fig. 2). Along the way, the design is simulated at vari-ous levels of abstraction as in ASIC design. The availabilityof sophisticated and coherent tool suites for FPGA designmakes them all the more attractive.

At one time, design entry was performed in the form ofschematic capture. Most designers have moved over to hard-ware description languages (HDLs) for design entry. Somewill prefer a mixture of the two techniques. Schematic-baseddesign-capture tools gave designers a great deal of controlover the physical placement and partitioning of logic on thedevice. But it’s becoming less likely that designers will takethat route. Meanwhile, language-based design entry is faster,but often at the expense of performance or density.

For many designers, the choice of whether to useschematic- or HDL-based design entry comes downto their conception of their design. For those whothink in software or algorithmic-like terms, HDLs are

the better choice. HDLs are well suited for highly complexdesigns, especially when the designer has a good handle onhow the logic must be structured. They can also be very usefulfor designing smaller functions when you haven’t the time orinclination to work through the actual hardware implementa-tion.

Modify design

Done!

No

Yes

No-guessflow

Achievedtiming?

AchievedAchievedAchievedtiming?timing?timing?

Vendor place and routeVendor place and routeVendor place and routeVendor place and route

A “big picture” look at an FPGA design flow shows the major steps in the process:design entry, synthesis from RTL to gate level, and physical design. Place androute is done using the FPGA vendors’ proprietary tools that account for thedevices’ architectures and logic-block structures.

2. The Big Picture

Input/output blocks

Logic blocks

Programmable interconnects

Just about all FPGAs include a regular, programmable, and flexible architecture of logic blockssurrounded by input/output blocks on the perimeter. These functional blocks are linked togetherby a hierarchy of highly versatile programmable interconnects.

1. Functional Blocks

Sponsored by Mentor Graphics Corp.

BASICSofDesignFPGAs

Page 4: Basics of FPGAs Design

On the otherhand, HDLs rep-resent a level ofabstraction thatcan isolate design-ers from thedetails of thehardware imple-mentation.Schematic-basedentry gives designers much more visibility into the hardware. It’sa better method for those who are hardware-oriented. Thedownside of schematic-based entry is that it makes the designmore difficult to modify or port to another FPGA.

Athird option for design entry, state-machine entry,works well for designers who can see their logic designas aseries

of states that thesystem stepsthrough. It shineswhen designingsomewhat simplefunctions, often inthe area of systemcontrol, that canbe clearly repre-sented in visualformats. Toolsupport for finitestate-machineentry is limited,though.

Some designersapproach the start of their design from a level of abstractionhigher than HDLs, which is algorithmic design using the C/C++programming languages. A number of EDA vendors have toolflows supporting this design style. Generally, algorithmic designhas been thought of as atool for architecturalexploration. But increas-ingly, as tool flowsemerge for C-level syn-thesis, it’s being acceptedas a first step on the roadto hardware implemen-tation.

After designentry, thedesign is sim-ulated at the

register-transfer level(RTL). This is the first ofseveral simulation stages,because the design mustbe simulated at successivelevels of abstraction as itmoves down the chaintoward physical imple-mentation on the FPGAitself. RTL simulationoffers the highest per-

formance interms of speed.As a result,designers canperform manysimulation runsin an effort torefine the logic.At this stage,FPGA develop-

ment isn’t unlike software development. Signals and variables areobserved, procedures and functions traced, and breakpoints set.The good news is that it’s a very fast simulation. But because thedesign hasn’t yet been synthesized to gate level, properties such astiming and resource usage are still unknowns.

The next step following RTL simulation is to convert the RTLrepresentationof the designinto a bit-streamfile that can beloaded onto theFPGA. Theinterim step isFPGA synthesis,which translatesthe VHDL orVerilog codeinto a devicenetlist formatthat can beunderstood by abit-stream con-verter.

The synthesisprocess can be broken down into three steps. First, the HDLcode is converted into device netlist format. Then the resultingfile is converted into a hexadecimal bit-stream file, or .bit file.This step is necessary to change the list of required devices and

interconnects into hexa-decimal bits to down-load to the FPGA. Last-ly, the .bit file isdownloaded to thephysical FPGA. Thisfinal step completes theFPGA synthesis proce-dure by programmingthe design onto thephysical FPGA.

It’s important to fullyconstrain designsbefore synthesis (Fig.3). A constraint file

is an input to the synthe-sis process just as theRTL code itself. Con-straints can be appliedglobally or to specificportions of the design.The synthesis engine usesthese constraints to opti-mize the netlist. However,

Table 2: FPGA USAGE

Time-to-market

Performance

Volume

Emulation: 3%Fairly high; fastcompile timesNot stringentVery low perapplication

Prototyping: 30%Fairly high; fastcompile timesNot stringent

Low per application

Preproduction: 30%Fairly high; fast com-pile timesVery critical

Moderately high perapplication

Production: 37%Fairly high; fastcompile timesVery critical

High per applica-tion

Table 3: ADVANTAGES/DISADVANTAGES OF VARIOUS FPGA TECHNOLOGIES

FeatureReprogrammable?Reprogramming speed(including erasure)Volatile?External configuration file?Good for prototyping?Instant-on?IP securitySize of configuration cellPower consumptionRadiation hardness?

SRAMYes (in-system)Fast

YesYesYesNoPoorLarge (six transistors)HighNo

AntifuseNoNotapplicableNoNoNoYesVery goodVery smallLowYes

FlashYes (in-system or offline)3X SRAM

No (but can be if required)NoYesYesVery goodSmall (two transistors)MediumNo

HDL files

Constraints

Placement

Routing

FPGA/PLD

Language input (VHDL/Verilog)

Initial optimization

Timing analysis

Timing optimization

Desig

nIm

pleme

ntVHDL/IP RTL

RTLVerilog/IP

The implementation flow for FPGAs begins with synthesis of the HDL design description into a gate-level netlist.Accounting for user-defined design constraints on area, power, and speed, the tool performs various optimiza-tions before creating the netlist that’s passed on to place-and-route tools.

3. Go With The Flow

Page 5: Basics of FPGAs Design

it’s equally important to not over-constrain the design, whichwill generally result in less-than-optimal results from the nextstep in the implementation process—physical device place-ment—and interconnect routing. Synthesis constraints soonbecome place-and-route constraints.

This traditional flow will work, but it can lead to numer-ous iterations before achieving timing closure. Some EDAvendors have incorporated more modern physical synthesistechniques, which automate device re-timing by movinglookup tables (LUTs) across registers to balance out timingslack. Physical synthesis also anticipates place and route toleverage delay information.

Following synthesis, device implementation begins.After netlist synthesis, the design is automaticallyconverted into the format supported internally bythe FPGA vendor’s place-and-route tools. Design-

rule checking and optimization is performed on the incomingnetlist and the software partitions the design onto the avail-able logic resources. Good partitioning is required to achievehigh routing completion and high performance.

Increasingly, FPGA designers are turning to floorplanningafter synthesis and design partitioning. FPGA floorplannerswork from the netlist hierarchy as defined by the RTL cod-ing. Floorplanning can help if area is tight. Whenpossible, it’s a good idea to place critical logic inseparate blocks.

After partitioning and floorplanning, theplacement tool tries to place the logic blocks toachieve efficient routing. The tool monitors rout-ing length and track congestion while placing theblocks. It may also track the absolute path delaysto meet the user’s timing constraints. Overall, theprocess mimics PCB place and route.

Functional simulation is performed after syn-thesis and before physical implementation. Thisstep ensures correct logic functionality. Afterimplementation, there’s a final verification stepwith full timing information. After placementand routing, the logic and routing delays areback-annotated to the gate-level netlist for thisfinal simulation. At this point, simulation is amuch longer process, because timing is also a fac-tor (Fig. 4). Often, designers substitute static tim-ing analysis for timing simulation. Static timinganalysis calculates the timing of combinationalpaths between registers and compares it againstthe designer’s timing constraints.

Once the design is successfully verifiedand found to meet timing, the final stepis to actually program the FPGA itself. At the com-pletion of placement and routing, a binary pro-

gramming file is created. It’s used to configure the device. Nomatter what the device’s underlying technology, the FPGAinterconnect fabric has cells that configure it to connect tothe inputs and outputs of the logic blocks. In turn, the cellsconfigure those logic blocks to each other. Most programma-ble-logic technologies, including the PROMs for SRAM-based FPGAs, require some sort of a device programmer.Devices can also be programmed through their configurationports using a set of dedicated pins.

Modern FPGAs also incorporate a JTAG port that,happily, can be used for more than boundary-scantesting. The JTAG port can be connected to thedevice’s internal SRAM configuration-cell shift reg-

ister, which in turn can be instructed to connect to the chip’sJTAG scan chain.

If you’ve gotten this far with your design, chances are youhave a finished FPGA. There’s one more step to the process,however, which is to attach the device to a printed-circuit boardin a system. The appearance of 10-Gbit/s serial transmitters, orI/Os, on the chip, coupled with packages containing as many as1500 pins, makes the interface between the FPGA and its intend-ed system board a very sticky issue. All too often, an FPGA issoldered to a pc board and it doesn’t function as expected or,worse, it doesn’t function at all. That can be the result of errorscaused by manual placement of all those pins, not to mentionthe board-level timing issues created by a complex FPGA.

More than ever, designers must strongly consider an integrat-ed flow that takes them from conception of the FPGA throughboard design. Such flows maintain complete connectivitybetween the system-level design and the FPGA; they also do sobetween design iterations. Not only do today’s integrated FPGA-to-board flows create the schematic connectivity needed for veri-

fication and layout of the board, but they also document whichsignal connections are made to which device pins and how thesemap to the original board-level bus structures.

Integrated flows for FPGAs make sense in general, consider-ing that FPGA vendors will continue to introduce more com-plex, powerful, and economical devices over time. An inte-grated third-party flow makes it easier to re-target a design

to different technologies from different vendors as conditionswarrant.

FPGARTL design

Testbench

HDL simulator

Synthesis

FPGA gatelibrary

Place and route

FPGA simulation occurs at various stages of the design process: after RTL design, after synthesis, and once againafter implementation. The latter is a final gate-level check, accounting for actual logic and interconnect delays,of logic functionality.

4. Simulation Stages

BASICSofDesignFPGAs