Basics of Fpga Design

BASICSofFPGAs Design

A Supplement to Electronic Design/December 4, 2003 Sponsored by Mentor Graphics Corp.

Field-programmable gate arrays(FPGAs) arrived in 1984 as analternative to programmablelogic devices (PLDs) and ASICs.As their name implies, FPGAs

offer the significant benefit of being readilyprogrammable. Unlike their forebearers inthe PLD category, FPGAs can (in most cas-es) be programmed again and again, givingdesigners multiple opportunities to tweak their circuits.

There’s no large non-recurring engineering (NRE) costassociated with FPGAs. In addition, lengthy, nerve-wracking waits for mask-making operations aresquashed. Often, with FPGA development, logic designbegins to resemble software design due to the many itera-tions of a given design. Innovative design often happenswith FPGAs as an implementation platform.

But there are some downsides to FPGAs as well. Theeconomics of FPGAs force designers to balance their rel-atively high piece-part pricing compared to ASICs withthe absence of high NREs and long development cycles.They’re also available only in fixed sizes, which matterswhen you’re determined to avoid unused silicon area.

What are FPGAs?FPGAs fill a gap between discrete logic and the smallerPLDs on the low end of the complexity scale and costlycustom ASICs on the high end. They consist of an arrayof logic blocks that are configured using software. Pro-grammable I/O blocks surround these logic blocks. Bothare connected by programmable interconnects (Fig. 1).The programming technology in an FPGA determines thetype of basic logic cell and the interconnect scheme. Inturn, the logic cells and interconnection scheme deter-mine the design of the input and output circuits as well as

the programming scheme. Just a few years ago, the largest FPGA was measured

in tens of thousands of system gates and operated at 40MHz. Older FPGAs often cost more than $150 for themost advanced parts at the time. Today, however, FPGAsoffer millions of gates of logic capacity, operate at 300MHz, can cost less than $10, and offer integrated func-tions like processors and memory (Table 1).

FPGAs offer all of the features needed to imple-ment most complex designs. Clock managementis facilitated by on-chip PLL (phase-locked loop)or DLL (delay-locked loop) circuitry. Dedicated

memory blocks can be configured as basic single-portRAMs, ROMs, FIFOs, or CAMs. Data processing, asembodied in the devices’ logic fabric, varies widely. Theability to link the FPGA with backplanes, high-speed bus-es, and memories is afforded by support for various single-ended and differential I/O standards. Also found ontoday’s FPGAs are system-building resources such as high-speed serial I/Os, arithmetic modules, embedded proces-sors, and large amounts of memory.

Initially seen as a vehicle for rapid prototyping andemulation systems, FPGAs have spread into a host ofapplications. They were once too simple, and too costly,for anything but small-volume production. Now, with theadvent of much larger devices and declining per-part costs,

Tradeoffs Abound in FPGA Design

Understanding devicetypes and design flows

is key to getting the most out of FPGAs

David Maliniak, Electronic Design Automation Editor

Sponsored by Mentor Graphics Inc.

FPGAs are finding their way off the prototyping bench andinto production (Table 2).

Comparing FPGA ArchitecturesFPGAs must be programmed by users to connect the chip’sresources in the appropriate manner to implement thedesired functionality. Over the years, various technologieshave emerged to suit different requirements. Some FPGAscan only be pro-grammed once. Thesedevices employ anti-fuse technology.Flash-based devicescan be programmedand reprogrammedagain after debug-ging. Still others canbe dynamically pro-grammed thanks toSRAM-based technol-ogy. Each has itsadvantages and disad-vantages (Table 3).

Most mod-ernFPGAsare

based on SRAM con-figuration cells, whichoffer the benefit ofunlimited reprogram-mability. When pow-ered up, they can beconfigured to performa given task, such as a

board or system test, and then reprogrammed to performtheir main task. On the flip side, though, SRAM-basedFPGAs must be reconfigured each time their host system ispowered up, and additional external circuitry is required todo so. Further, because the configuration file used to pro-gram the FPGA is stored in external memory, security issuesconcerning intellectual property emerge.

Antifuse-based FPGAs aren’t in-system programmable,

1.Do concentrate on I/O timing, notjust the register-to-register internal frequency thatthe FPGA place-and-route tools report. Frequently,the hardest challenge in a complete FPGA design is

the I/O timing. Focus on how your signals enter andleave your FPGA, because that’s where the bottlenecks frequently occur.2. Do create hierarchy around vendor-specific structures andinstantiations. Give yourself the freedom to migrate from one technology toanother by ensuring that each instantiation of a vendor-specific element isin a separate hierarchical block. This applies especially to RAMs and clock-management blocks.3. Do use IP timing models during synthesis to give the true pic-ture of your design. By importing EDIF netlists of pre-synthesized blocks,your synthesis tool can fully understand your timing requirements. Be cau-tious when using vendor cores that you can bring into your synthesis tool ifthey have no timing model.4. Do design your hierarchical blocks with registered outputswhere possible to avoid having critical paths pass through many levels of hier-archy. FPGAs exhibit step-functions in logic-limited performance. When hier-archy is preserved and the critical path passes across a hierarchical boundary,you may introduce an extra level of logic. When considered along with theassociated routing, this can add significant delay to your critical path.5. Do enable retiming in your synthesis tool. FPGAs tend to be reg-ister-rich architectures. When you correctly constrain your design in synthe-sis, you allow the tool to optimize your design to take advantage of positiveslack timing within the design. Sometimes this can be done after initial placeand route to improve retiming over wireload estimation.

1. Don’t synthesize unlessyou’ve fully and correctly constrainedyour design. This includes correctclock domains, I/O timing require-

ments, multicycle paths, and falsepaths. If your synthesis tool doesn’t see exactly what you want, it can’t makedecisions to optimize your design accordingly.2. Don’t try to fix every timing problem in place and route. Place androute offers little room for fixing timing where a properly constrained synthesis tool would.3. Don’t vainly floor plan at the RTL or block level hoping toimprove place-and-route results. Manual area placement can cause moreproblems than it might initially appear to solve. Unless you are an expert inmanual placement and floorplanning, this is best left alone.4. Don’t string clock buffers together, create multiple clocktrees from the same clock, or use multiple clocks when a simple enable willdo. Clocking schemes in FPGAs can become very complicated now that thereare PLLs, DLLs, and large numbers of clock-distribution networks. Poorclocking schemes can lead to extended place-and-route times, failure tomeet timing, and even failure to place in some technologies. Simplerschemes are vastly more desirable. Avoid those gated clocks, too!5. Don’t forget to simulate your design blocks as well as yourentire design. Discovering and back-tracking an error from the chip’s pinsduring on-board testing can be extremely difficult. On-board FPGA testingcan miss important design flaws that are much easier to identify during sim-ulation; they can be rectified by modifying the FPGA’s programming.

Do’s And Don’ts For The FPGA Designer

Table 1: KEY RESOURCES AVAILABLE IN THE LARGEST DEVICES FROM MAJOR FPGA VENDORS

Features

Clock management

Embedded memory blocks

Data processing

Programmable I/Os

Special features

Xilinx Virtex II Pro

DCMUp to 12

BlockRAMUp to 10 Mbits

Configurable logicblocks and 18-bit by18-bit multipliers

Up to 125,000 logiccells and 556 multipli-er blocks

SelectI/O

Embedded PowerPC405 cores

RocketI/O multi-giga-bit transceiver

Altera Stratix

PLLUp to 12

TriMatrix memoryUp to 10 Mbits

Logic elements andembedded multipli-ers

Up to 79,000 LEs and176 embedded mul-tipliers

Advanced I/O support

DSP blocks

High-speed differ-ential I/O and inter-face standards sup-port

Actel Axcelerator

PLLUp to 8

Embedded RAMUp to 338 kbits

Logic modules (C-Cell and R-Cell)Up to 10,000

R-Cells and 21,000C-Cells

Advanced I/O sup-port

PerPin FIFOs forbus applications

Lattice ispXPGA

SysCLOCK PLLUp to 8

SysMEM blocksUp to 414 kbits

Based on programmablefunctional unit

Up to 3844 PFUs

SysI/O

SysHSI for high-speed serialinterface

but rather are pro-grammed offlineusing a device pro-grammer. Once the

chip is configured, it can’t be altered. However, in antifuse technology, device configuration is non-

volatile with no need for external memory. On top of that, it’svirtually impossible to reverse-engineer their programming.They often work as replacements for ASICs in small volumes.

In a sense, flash-based FPGAs fulfill the promise of FPGAs inthat they can be reprogrammed many times. They’re non-volatile, retaining their configuration even when powereddown. Programming is done either in-system or with a pro-

grammer. In some cases, IP security can be achieved using a multi-bit key that locks the configuration data after programming.

But flash-based FPGAs require extra process steps above andbeyond standard CMOS technology, leaving them at least a gen-eration behind. Moreover, the many pull-up resistors result inhigh static power consumption.

FPGAs can also be characterized as having either fine-, medi-um-, or coarse-grained architectures. Fine-grained architecturesboast a large number of relatively simple logic blocks. Each logicblock usually contains either a two-input logic function or a 4-to-

1 multiplexer and a flip-flop. Blocks can only be used to imple-ment simple functions. But fine-grained architectures lend them-selves to execution of functions that benefit from parallelism.

Coarse-grained architectures consist of relatively large log-ic blocks often containing two or more lookup tables andtwo or more flip-flops. In most of these architectures, afour-input lookup table (think of it as a 16 x 1 ROM)

implements the actual logic.

The FPGA design flowAfter weighing all implementation options, you must considerthe design flow. The process of implementing a design on anFPGA can be broken down into several stages, loosely definableas design entry or capture, synthesis, and place and route (Fig.2). Along the way, the design is simulated at various levels ofabstraction as in ASIC design. The availability of sophisticatedand coherent tool suites for FPGA design makes them all themore attractive.

At one time, design entry was performed in the form ofschematic capture. Most designers have moved over to hardwaredescription languages (HDLs) for design entry. Some will prefera mixture of the two techniques. Schematic-based design-cap-ture tools gave designers a great deal of control over the physicalplacement and partitioning of logic on the device. But it’sbecoming less likely that designers will take that route. Mean-while, language-based design entry is faster, but often at theexpense of performance or density.

For many designers, the choice of whether to use schemat-ic- or HDL-based design entry comes down to their con-ception of their design. For those who think in softwareor algorithmic-like terms, HDLs are the better choice.

HDLs are well suited for highly complex designs, especially whenthe designer has a good handle on how the logic must bestructured. They can also be very useful for designing small-er functions when you haven’t the time or inclination towork through the actual hardware implementation.

On the other hand, HDLs represent a level of abstractionthat can isolate designers from the details of the hardwareimplementation.Schematic-based entrygives designers muchmore visibility into thehardware. It’s a bettermethod for those whoare hardware-oriented.The downside ofschematic-based entry isthat it makes the designmore difficult to modifyor port to anotherFPGA.

Athird optionfor designentry, state-machine

entry, works well fordesigners who can seetheir logic design as aseries of states that thesystem steps through. Itshines when designingsomewhat simple func-tions, often in the area of

system control, that can beclearly represented in visualformats. Tool support for finitestate-machine entry is limited,though.

Some designers approachthe start of their design from alevel of abstraction higher

thangram

Modify d

Done!

No

Yes

Achievedtiming?

AchievedAchievedAchievedtiming?timing?timing?

Vendor place and routeVendor place and routeVendor place and routeVendor place and route

A “big picture” look at an FPGA design flow shows thdesign entry, synthesis from RTL to gate level, and proute is done using the FPGA vendors’ proprietary todevices’ architectures and logic-block structures.

Figure 2Input/output blocks

Logic blocks

Just about all FPGAs include a regular, programmable, and flexible architecture of logic blockssurrounded by input/output blocks on the perimeter. These functional blocks are linked togetherby a hierarchy of highly versatile programmable interconnects.

Figure 1

Sponsored by Mentor Graphics Inc.

BASICSofDesignFPGAs

considern on any definableoute (Fig.evels ofhisticatedall the

m ofo hardwarewill preferign-cap-he physicalit’s

e. Mean-at the

se schemat-their con-

n softwarechoice.

cially whenmust bening small-tion ton.abstractionhardware

than HDLs, which is algorithmic design using the C/C++ pro-gramming languages. A number of EDA vendors have tool flows

supporting this design style. Gen-erally, algorithmic design has beenthought of as a tool for architec-tural exploration. But increasingly,as tool flows emerge for C-levelsynthesis, it’s being accepted as afirst step on the road to hardwareimplementation.

After design entry, thedesign is simulated atthe register-transfer lev-el (RTL). This is the

first of several simulation stages,because the design must be simulat-ed at successive levels of abstrac-tion as it moves down the chaintoward physical implementation onthe FPGA itself. RTL simulationoffers the highest performance interms of speed. As a result, design-ers can perform many simulationruns in an effort to refine the logic.At this stage, FPGA developmentisn’t unlike software development.Signals and variables are observed,procedures and functions traced,and breakpoints set. The goodnews is that it’s a very fast simula-tion. But because the design hasn’tyet been synthesized to gate level,properties such as timing andresource usage are still unknowns.

The next step following RTL

simulation is to convert theRTL representation of thedesign into a bit-stream filethat can be loaded onto theFPGA. The interim step isFPGA synthesis, which trans-lates the VHDL or Verilog codeinto a device netlist format thatcan be understood by a bit-stream converter.

The synthesis process can bebroken down into three steps.First, the HDL code is convert-ed into device netlist format.Then the resulting file is con-verted into a hexadecimal bit-stream file, or .bit file. Thisstep is necessary to change thelist of required devices andinterconnects into hexadecimalbits to download to the FPGA.Lastly, the .bit file is down-loaded to the physical FPGA.This final step completes theFPGA synthesis procedure byprogramming the design ontothe physical FPGA.

It’s important to fully constrain designs before synthesis (Fig. 3).A constraint file is an input to the synthesis process just as theRTL code itself. Constraints can be applied globally or to spe-cific portions of the design. The synthesis engine uses these

constraints to optimize the netlist. However, it’s equally importantto not over-constrain the design, which will generally result in less-than-optimal results from the next step in the implementationprocess—physical device placement—and interconnect routing.

Synthesis coThis trad

iterations bhave incorpwhich automacross regisalso anticip

Follnetintdo

optimizatioware partitiGood partitand high pe

Increasinafter synthework from Floorplanngood idea t

After parto place themonitors roblocks. It mdelays to mOverall, theroute.

Functionsynthesis anThis step enAfter impletion step wplacement adelays are bnetlist for t

Table 2: FPGA USAGE

Time-to-market

Performance

Volume

Emulation: 3%Fairly high; fastcompile timesNot stringentVery low perapplication

Prototyping: 30%Fairly high; fastcompile timesNot stringent

Low per application

Preproduction: 30%Fairly high; fast com-pile timesVery critical

Moderately high perapplication

Production: 37%Fairly high; fastcompile timesVery critical

High per applica-tion

Table 3: ADVANTAGES/DISADVANTAGES OF VARIOUS FPGA TECHNOLOGIES

FeatureReprogrammable?Reprogramming speed(including erasure)Volatile?External configuration file?Good for prototyping?Instant-on?IP securitySize of configuration cellPower consumptionRadiation hardness?

SRAMYes (in-system)Fast

YesYesYesNoPoorLarge (six transistors)HighNo

AntifuseNoNotapplicableNoNoNoYesVery goodVery smallLowYes

FlashYes (in-system or offline)3X SRAM

No (but can be if required)NoYesYesVery goodSmall (two transistors)MediumNo

Modify design

e!

No

Yes

No-guessflow

evedng?

devedeved?ng?ng?

e and routed te and routee and route

PGA design flow shows the major steps in the process:m RTL to gate level, and physical design. Place and

GA vendors’ proprietary tools that account for thelogic-block structures.

HDL files

Constraints

Placement

Routing

FPGA/PLD

Language input (VHDL/Verilog)

Initial optimization

Timing analysis

Timing optimization

VHDL/IP RTL

RTLVerilog/IP

The implementation flow for FPGAs begins with synthesis of the HDL design description into a gate-level nAccounting for user-defined design constraints on area, power, and speed, the tool performs various optimtions before creating the netlist that’s passed on to place-and-route tools.

Figure 3

s-codethat

n beps.ert-t.n-it-

he

malGA.

A.ebyto

. 3).hepe-

ntss-

Synthesis constraints soon become place-and-route constraints. This traditional flow will work, but it can lead to numerous

iterations before achieving timing closure. Some EDA vendorshave incorporated more modern physical synthesis techniques,which automate device re-timing by moving lookup tables (LUTs)across registers to balance out timing slack. Physical synthesisalso anticipates place and route to leverage delay information.

Following synthesis, device implementation begins. Afternetlist synthesis, the design is automatically convertedinto the format supported internally by the FPGA ven-dor’s place-and-route tools. Design-rule checking and

optimization is performed on the incoming netlist and the soft-ware partitions the design onto the available logic resources.Good partitioning is required to achieve high routing completionand high performance.

Increasingly, FPGA designers are turning to floorplanningafter synthesis and design partitioning. FPGA floorplannerswork from the netlist hierarchy as defined by the RTL coding.Floorplanning can help if area is tight. When possible, it’s agood idea to place critical logic in separate blocks.

After partitioning and floorplanning, the placement tool triesto place the logic blocks to achieve efficient routing. The toolmonitors routing length and track congestion while placing theblocks. It may also track the absolute pathdelays to meet the user’s timing constraints.Overall, the process mimics PCB place androute.

Functional simulation is performed aftersynthesis and before physical implementation.This step ensures correct logic functionality.After implementation, there’s a final verifica-tion step with full timing information. Afterplacement and routing, the logic and routingdelays are back-annotated to the gate-levelnetlist for this final simulation. At this point,

simulation is a muchlonger process, becausetiming is also a factor (Fig.4). Often, designers substi-tute static timing analysisfor timing simulation. Stat-ic timing analysis calcu-lates the timing of combi-national paths betweenregisters and compares itagainst the designer’s tim-ing constraints.

Once the design issuccessfully veri-fied and foundto meet timing, the final step is to

actually program the FPGA itself. At thecompletion of placement and routing, a bina-ry programming file is created. It’s used toconfigure the device. No matter what thedevice’s underlying technology, the FPGAinterconnect fabric has cells that configure itto connect to the inputs and outputs of thelogic blocks. In turn, the cells configure thoselogic blocks to each other. Most programma-ble-logic technologies, including the PROMsfor SRAM-based FPGAs, require some sort

of a device programmer. Devices can also be programmedthrough their configuration ports using a set of dedicated pins.

Modern FPGAs also incorporate a JTAG port that,happily, can be used for more than boundary-scantesting. The JTAG port can be connected to thedevice’s internal SRAM configuration-cell shift reg-

ister, which in turn can be instructed to connect to the chip’sJTAG scan chain.

If you’ve gotten this far with your design, chances are youhave a finished FPGA. There’s one more step to the process,however, which is to attach the device to a printed-circuit boardin a system. The appearance of 10-Gbit/s serial transmitters, orI/Os, on the chip, coupled with packages containing as many as1500 pins, makes the interface between the FPGA and its intend-ed system board a very sticky issue. All too often, an FPGA issoldered to a pc board and it doesn’t function as expected or,worse, it doesn’t function at all. That can be the result of errorscaused by manual placement of all those pins, not to mentionthe board-level timing issues created by a complex FPGA.

More than ever, designers must strongly consider an integrat-ed flow that takes them from conception of the FPGA throughboard design. Such flows maintain complete connectivitybetween the system-level design and the FPGA; they also do so

between design iterations. Not only do today’s integrated FPGA-to-board flows create the schematic connectivity needed for veri-fication and layout of the board, but they also document whichsignal connections are made to which device pins and how thesemap to the original board-level bus structures.

Integrated flows for FPGAs make sense in general, consider-ing that FPGA vendors will continue to introduce more com-plex, powerful, and economical devices over time. An inte-grated third-party flow makes it easier to re-target a design

to different technologies from different vendors as conditionswarrant.

Desig

n

RTL

RTL

iption into a gate-level netlist.ol performs various optimiza-

FPGARTL design

Testbench

HDL simulator

Synthesis

FPGA gatelibrary

Place and route

FPGA simulation occurs at various stages of the design process: after RTL design, after synthesis, and once againafter implementation. The latter is a final gate-level check, accounting for actual logic and interconnect delays,of logic functionality.

Figure 4

BASICSofDesignFPGAs

Basics of Fpga Design

Documents

Transcript of Basics of Fpga Design