Best Practices For Efficient And Effective FPGA...

Post on 12-May-2018

218 views 2 download

Transcript of Best Practices For Efficient And Effective FPGA...

Best Practices For Efficient And Effective FPGA Design

RC Cofer June 9th 3:15-5:00

Class Overview • FPGAs are a very popular embedded architecture, but they can

also be intimidating. Fear not, as we’ve pulled together the best practices that will take you from initial concept and design tradeoffs through design implementation, debug and long term support. This class will address the key topics of common design challenges, risks, pain points, mistakes, and oversights, their potential design impact and how to manage them. We will help designers effectively address design longevity, maintenance, obsolescence, and potential future design migration. Tool options and tool version migration will be addressed including the things you need to consider when migrating families or tools. The class will use Xilinx’s Vivado tool suite for examples.

2

FPGA Design Decisions

3

Partitioning SW / HW Integration ASSP / Programmable Technology Non Volatile / Volatile

Tools 3rd Party / Manufacturer Capture Schematic / HDL / Model / Mixed

Language Verilog / VHDL Ownership Purchased / Self-developed / Mixed

Licensing IP: Source / Non-Source Processor uP: External / Embedded Processor uP: Hard / Soft / Firm

4

Sequ

entia

l Res

ourc

es

Combinatorial Resources

Data Path - Oriented

Control - Oriented

Gray Area

FPGA

CPLD

FPGA / CPLD Application Spaces

5

Family & Sub-Family Optimization

DSP

Processor

6

Selecting the Correct Device

7

Manufacturer Selection • Understand manufacturer’s target market • Family configuration technology • Family & Sub-Family resources

– Logic array – Hardened block resources

• Architecture & Tools familiarity • Available IP & cost • Manufacturer support & staff

8

Device Selection • Put extra effort into resource and power

estimation • Cross-reference design documentation

– Data Sheet – User Guide – Application Notes

• Understand device family life cycle status • Select a device with a migration path

– Size & Package • Look beyond your current project

Part Choices • Voltage Options • Low Power Parts 1.0V. 0.95V, 0.9V • Temperature Options • Speed Options

10

Package Selection • Know your organization’s manufacturing

capability – Primary package option typically BGA

• QFP options limited – Smaller packages have tighter pin pitches – Do height restrictions apply?

• Note that Flip-chip packages are vented – No-wash manufacturing process preferred

• Power consumption may require thermal solution – Analysis, experimentation required

Detailed Design Decisions • Configuration Mode and speed • Multi-configuration option • In-Field configuration update • Assign pins before design compete • Layout board before design complete • Purchase IP / Develop IP

Diverse Design Specialties

12

FPGA Technical Areas

13

Technology Selection

Manufacturer Selection

Resources LUTs FFs DSP

Family /Sub Family

Clocking Configuration

Simulation Tools

Memory Power

Intellectual Property

Transceivers

Data Flow Constraints

I/O Assignment Synthesis

IP Selection

SoC HW / SW

Estimation Resources

Power Schedule Budget

Debug

Signal Integrity

Security

3D FinFET Analog

Timing Closure

FPGA design involves many informed tradeoffs decisions and design decisions

FPGA Design Decisions

14

Partitioning SW / HW Integration ASSP / Programmable Technology Non Volatile / Volatile

Tools 3rd Party / Manufacturer Capture Schematic / HDL / Model / Mixed

Language Verilog / VHDL Ownership Purchased / Self-developed / Mixed

Licensing IP: Source / Non-Source Processor uP: External / Embedded Processor uP: Hard / Soft / Firm

Exponential Complexity • Complexity is increased by quantity of available

resource options – The number of elements leads to complex

interactions between the elements implementing our custom logic circuits

• Complexity is increased by the number of degrees of freedom – More dimensions of control; more ways to get

“sideways” – More control, more options, more decisions, more

interaction, more consequences

16

Trade Study Benefits • Trade studies clarify and document critical

design decisions • Factors should reflect key design drivers and

resources – Manufacturer, Family, Life Cycle, Package, IO, FFs,

Logic Cells, DSP Blocks, Memory, • Columns may be weighted to support design

priorities • Other fields: Tool Cost, Familiarity, Prior

Experience, Support Access

FPGA Resources, Ranges and Trends

17

•18

Digital Logic Range of Option

Design Capacity

Development Time

Standard Logic

SPLD

FPGA

Gate Array

Standard Cell

Structured

ASIC

Full Custom

CPLD

•Programmable •Logic

FPGA Resources • I/O Blocks • CLB

– LUTs (4,6-input) – Registers

• Memory – Block RAM – Distributed Memory – (External)

• DSP blocks • Gigabit Transceivers (GT)

19

Designing in Migration Options

Common FPGA Resources Resources I/O Block

Logic Block

Routing

Clock Routing

Memory

DSP Block

Processor (Soft)

[Processor (Hard)]

[High Speed Transceiver]

[Ethernet Block]

Clock Block

Logic Block • Multiple names:

– CLB / Slice, Macrocell – Macrocell, LAB

• Slice features

– Memory elements called LUTs – Data steering elements

• Muxes – Carry Logic – Sequential Elements

• Flops

• Implementation varies between manufacturer & family

D Q

FF Clk

LUT

Data

Clock

D Q

FF Clk

LUT

Data

Clock

+

•Combinatorial Logic

•A •B

•C •D

•Z

Look Up Table (LUT)

A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1

. . . 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1

• Also called Function Generators (FGs)

• Implement Boolean / combinatorial logic or memory

• Capacity is limited by the number of inputs, not by complexity

• Delay through the LUT is constant • Implementation varies with

device – 4 and 6 inputs common

architectures

I/O Block • IOB contained within banks

– I/O bank architectural variations among devices

– Banking restrictions may apply based on IO standard used

• External signals interface features – ESD protection – Matched impedance circuitry – Device re-clocking option before entering

logic array – Output slew rate and drive strength

control – DDR I/O registers

• I/O options – Selection made through tools, constraint

file or attributes in HDL – Single ended and Differential signaling

standards supported

Reg

Reg

DDR mux

3-State

Reg

Reg

DDR mux

PAD

Reg

Reg

Input

Output

•I/O Banks

I/O Bank Example

From: Altera Cyclone III Data Sheet

I/O Bank Example

From: Altera Cyclone III Data Sheet

Routing Resources • Global & regional routing

– Ex: Clock, Control signal (Global reset)

• Local routing supporting short, medium & long hops

• Specialized or dedicated routing – Carry logic for supporting wider

functions – Column to column or row to row

routes to support special function blocks (BRAM or DSP)

•A

•C •B

Clock Resources • Dedicated clock pins with global and regional

clock lines for distributing clocks • Clock Buffers • Clock manipulation units offering a range of

operation – PLL and/or DLL

• Jitter analysis necessary • Operational frequency limits may apply

– De-skew, multiply, divide, and shift

30

Clock Distribution • Clock distribution architectures can have

significant variations – Limited global routing resources – Different locations

• Center, Column, Peripheral – Different input and distribution paths – May have regional limitations

• Top, Bottom, Left, Right – May have different capabilities within a device

• Invest HEAVILY in understanding Clocking

31

Memory Resources • LUT memory can

support – Synchronous write – Asynchronous read – Initialization during

configuration – Can be used to emulate dual-

port RAM • Block RAM memory can

support – Synchronous read and write – May support dual-port RAM – Initial values – May support parity bits

•CLKA

•DIPA

•ADDRA

•DOPA

•CLKB

•ADDRB

•DIA •DOA

•DIPB •DOPB •DIB •DOB

Transceiver Speeds Spartan 6 GTP - 3.125 Gb/s

7-Series: (A, K, V, Z) GTP = 6.6 Gb/s GTX = 12.5 Gb/s GTH = 13.1 Gb/s GTZ = 28.05 Gb/s

UltraScale: (KU, VU) GTH = 16.3Gb/s GTY = 30.5Gb/s

34

Xilinx Transceiver Technology

33

From: xilinx.com/products/technology/high-speed-serial.html

HW vs. SW • Processors (SW) is primarily Serial

– Possibly with multiple parallel processor cores

• Hardware is primarily Parallel – Possibly with some serial operations – Possible to instantiate multiple blocks of specific

functions

Design Entry • Schematic

– Does not scale well • HDL

– VHDL, Verilog, System Verilog, others • Block Design

– System Generator, IPI • ESL

– HLS (High Level Synthesis) – C, C++, System C

Hard Processor / SOC

From: xilinx.com/products/silicon-devices/SoC/zynq-7000.html

FPGA Providers • About 80% of the market:

– Xilinx – Altera

• Others: – Lattice – Microsemi (Actel) – Achronix – Tabula

Family and SubFamily Options

39

Family Selection

FROM: http://www.xilinx.com/products/silicon-devices/fpga/index.htm

Family Selection

41

From: xilinx.com/products/silicon-devices/fpga.html

Altera Device Families

• High-End FPGAs – Stratix 10 – Stratix V (E, GX, GS, GT) – Stratix IV (E, GX, GT) – Stratix III (L and E) – Stratix II (and GX) – Stratix (and GX)

• Midrange FPGAs

– Arria 10 – Arria V (GX, GT, GZ, SX, ST) – Arria II (GX and GZ) – Arria GX

• Low-Cost FPGAs – Cyclone V (E, GX, GT, SE, SX, ST) – Cyclone IV (E and GX) – Cyclone III (and LS) – Cyclone II – Cyclone

• Altera SoCs

– Stratix 10 SoC – Arria 10 SoC – Arria V SoC – Cyclone V SoC (SE, SX, ST)

Process Trends • 90 nm • 65/60 nm • 45/40 nm • 28 nm • 20 nm • 16/14 nm (FinFET) • 10 nm • 7 nm

130 V2 Pro 90 V4, S3, S3A 65 V5 45 S6 40 V6 28 7-Series 20 UltraScale 16 UltraScale+

Process Node Migration

45

From: http://www.xilinx.com/products/silicon-devices/fpga.html

Advanced Technology Awareness

46

Virtex-7 2000T FPGA Enabled by SSI Technology

From: Xilinx White Paper - WP380 (v1.2) December 11, 2012; Xilinx Stacked Silicon Interconnect Technology

Traditional Planar Gate

48

From: http://issuu.com/xcelljournal/docs/xcell_journal_issue_84/4?e=2232228/4052461 From: WP246 (v1.2) February 1, 2007, Page 3; Power Consumption in 65 nm FPGAs

FinFET Gate

49

From: http://issuu.com/xcelljournal/docs/xcell_journal_issue_84/4?e=2232228/4052461

ESL

50

From: http://issuu.com/xcelljournal/docs/xcell_journal_issue_84/4?e=2232228/4052461

Tool Flow and Design Tools

1

Primary Design Phases • Design Specification • System Architecting • Technology, Part Tool Selection • Design Capture • Design Verification (Simulation) • Implementation

– Map, Place and Route • Design Constraint and Optimize • Validation & Certification • Final Documentation & Design Archive

3

FPGA Design Tool Flow • Design Capture

– Model, text editor, schematic capture – Design constraints

• Design Implementation – Synthesizer – Map, Place and Route

• Debug, Test & Simulate – Simulator – Static Timing Analyzer – Implementation Editor – Internal Logic Analyzer

• Design Configuration SW

4

Traditional Design Flow

FPGA Tools Design Mistakes • Insufficient budget for appropriate tools and

training • Improper tool selection

– Not using the right design tools – Not using the appropriate tools – Not using advanced tools

• Designers not up to speed on selected tools • Issues associated with not using selected tools

effectively – Constraints, too many, too few, wrong options – Wrong tool switches and options selected

• Making do without productivity enhancing tools • Changing tool versions during a design • Tool interoperability including tool version

6

Design Process Guidelines • Adequately define and partition functionality

– Address performance, margin and interface concerns • Establish critical estimates

– Budget, size, schedule, resource, performance, power • Take time to perform analysis of key trade-offs • Understand the scope of the project • Match required deliverables to critical milestones • Monitor design completeness and device utilization

– Analyze utilization metrics and flag potential issues • Monitor work completeness to projected milestones • Verify technical conformity of the work to specification • Implement adequate configuration management to

provide baseline control

7

Design Tool Selection • Evaluate required tools interoperability

– Consider FPGA design & board layout tools • Manufacturer vs. 3rd Party Tools • Consider specialized function design tools • Understand debug and test support tool choices

– Is mixed language simulation required? – Is there a tool combination for debugging the internals of

the FPGA and the board circuits? • Don’t ignore supplier recommendations • Understand design tool features, costs, training, support

Tool Survey

Supplier Design Implementation

Embedded Processor

Design Debug

Configuration SW

MicroSemi (Actel)

Libero IDE SoftConsole Silicon Explorer II

FlashPro

Altera Quartus II Nios II EDS Altera SoC EDS

Signal Tap 2 Quartus II Programmer

Xilinx (7 Series+)

Vivado Xilinx SDK Vivado Logic Analyzer

Vivado HW Manager

Xilinx (Pre-7 series

devices)

ISE Platform Studio & EDK

ChipScope Pro

iMPACT

Lattice ispLEVER Pro MSB & SPE ispTRACY ILA

ispVM System

Best Practices

9

Tool and Architecture Knowledge • Staff could be more effective with tool suite,

could have more target architecture knowledge

• Leverage available resources:

10

• Training – For fee – free

• Tutorials • Videos • Documentation

• User Guides – Tools, Part Families

• Methodology Guides • Data Sheets • Ap Notes, White papers • Dev board designs • Forums

Where is ______? • There is a lot written down about FPGA design

– May be written down in only one location – Multiple locations may conflict – Information can change in different versions

• Tools documentation • Families and sub-families • Updated documentation versions

• Very often what you are looking for is written down – The trick is finding it efficiently (what to call it) – And keeping it accessible and organized

11

The Web vs. Xilinx DocNav

• Addresses document sprawl and overload

• Find information quickly • Manage and organize

documents • Helpful resource

suggestions • Ensure you are using the

right document version • Leverage iterative and

wildcard search

12

Keeping Up to Date • Issue: New information released designers not

aware of • Approach: In addition to reviewing tools and

family data sheets and user guides • Review Customer Notices, Errata, and Answer

Records – Sign up for Document, Design and IP Advisories

(push model) – Set milestones in your project plan to re-review

these key documents

13

Complex Tool Flow Knowledge • Issue: Tools are very complex and continue to change

and evolve • Approach: Read available recommended tool flows

and tool guidance – (It’s out there…)

• Learn about the existence of intermediate and advanced tool features – Scan highlight, annotate key documents

• Ex: Analyze results after each design step rather blindly trusting implementation – This can help reduce debug cycles and achieve faster design

closure

14

Version Control Suggestions • Validate and re-validate that it works • Automate as possible

– Worth extra effort to make easy and reliable to use • Execute frequently • Keep track of the changes made

– What functionality works, what has known issues – May need to know which dev.. system (different white-wire

sets) • Develop a process that includes new files as they are

added • Leverage the low-tech “Archive” function

16

Suggested IP Management • Understand the manufacturer’s suggested

(encouraged) IP management flow • Key Choice:

– Manage IP within a design project – Manage IP remote to the design project

• Understand IP output products – Their function and their use model – Out of Context (OOC) / (Bottom-Up)

• How to manage constraints associated with IP?

17

3rd Party IP • 3rd Party IP interaction complexity • Understand which exact SW versions are

compatible – Avoid using unlisted SW versions – “Mismatched” Tool versions are difficult to debug

• Understand specialized flow and limitation for manufacturer’s IP on 3rd party tools

18

Validate SW Integrity • SW installation corruption • Take the extra step of verifying the checksum of

the downloaded SW version • Corrupted SW installations are almost

impossible to debug – Smoking gun – installer complaining about missing

files

19

SW Installation • Tool issues can be very difficult to debug • It is strongly suggested to allow SW to install to

its default location • Don’t allow a customized installation unless

absolutely required – You should have plenty of HD space if you don’t you

have bigger issues…

20

Tool Suggested OS and Resources • Sub-optimal tools Performance • Host tools only on supported OS

– Pay attention to versions – If decide to use an unsupported IP you are on your

own • Review documentation for recommendations

– Recommended memory for target part – # of Cores

• May require tool switch adjustments

21

FPGA Debug Planning • FPGA Debug can require advance planning • Will you be adding debug cores to code or to

the netlist? • Will you ship the design with the cores in place?

– Debug cores can influence circuit implementation – Need more resources for debug? Place a part with

more resources on the common board footprint • Can hardened analog blocks be used to monitor

voltage, temperature and system parameters? (correct power provided?)

22

Migrating to New Tool Releases • Review new tool release notes carefully

– Tool flow differences can require project and script changes

– It is often difficult to migrate back to previous Tool versions

• Conservative designers start a project with a stable tool version and do not migrate during the project – Migrate only if a new tool feature is very helpful – Migrate only if no workaround for serious issue

• Schedule new version migration time

23

Migrating to New IP Releases • Some tools can only modify the IP “current” to

that tool version – An older tool version may be required to make

modifications – Existing IP can still be used but may be locked – Major IP versions can have different functionality

and interfaces – Review change logs carefully

• Develop an IP upgrade strategy

24

Clocking • Keep global clock logic at the top-level of

design for easier constraint assignment and analysis – Avoid modifying IP Clocking and IO

• Use wizards implement hard clock blocks • Some families require connection of single-

ended clocks to the p-side of differential pairs

25

Tight on Clocking Resources? • Understand available global resource features

and quantities – Analyze current and projected usage

• Evaluate options for consolidating clock and IP clock resources as necessary

• Understand options and impact of routing clocks on local (non global) resources – Document local resource routed clocks

• Understand and leverage clock muxes

26

Clock Network Planning • Is the clocking hierarchy understood and

documented? – Are all clocks and derivatives (generated, phase-

shifted) known and documented? • Have appropriate clock constraints been added

and reviewed? – Run & review available clock analysis reports

27

Clocking • What percentage of global resources are being

used? (margin?)

• Understand MMCM / PLL block resource tradeoffs (Jitter, power, accuracy, performance)

28

Board Planning • Document physical interfaces, debug pins,

configuration circuitry, power distribution system and monitoring

• Document wide and critical FPGA interfaces – Target appropriate I/O banks for critical interfaces

• Visualize data flow and bus orientation between the FPGA and board-level components – device orientation

• Have compatible larger or smaller parts in the same package been identified? – Identify and set the alternate parts in the tool suite

29

Device Planning • Evaluate wide data buses

– Is bank-crossing required? – Board-level or fabric congestion?

• Pin-out relationship to BRAM, DSP Blocks, Clock Blocks, Transceivers

• Use wizards to assign memory interface I/O • Implement connectivity IP to understand I/O pin

assignment limitations • Document I/O standards, drive Strength, slew rate, and

pull up/down/keepers and termination • Document FPGA connected signals that must be driven

before FPGA configuration or powerup

30

Power • Leverage development boards for real-world

power consumption • Accurate power estimation requires accurate

circuit knowledge • IO standards, voltage, drive strength, slew rate

and termination mode can reduce power use • On-chip termination can increase chip power

consumption and junction temperature • Can point to point LVDS links be singled ended?

31

JTAG Access • Are JTAG signals easily accessible to a cable?

– Signals: TCK, TMS, TDI, TDO, PWR, GND – Standard signal layout keyed header preferred – Can use smaller pitch or break-out pads

• Multi-device chain signals correctly connected and routed? – Can individual components in the JTAG chain be isolated?

• Was signal integrity reviewed during board layout (key: CCLK, TCK signals)

• Validate used multi-function pins use – Voltage capability, bitstream generation persist option not

set 32

Configuration • Choose supported configuration memories • Configuration circuitry should be formally and

carefully reviewed – Configuration design circuit mistakes can be very

hard to recover from • Leverage dev. board circuits – known to work • Implement easy access to board-level

configuration control and status – INIT, DEBUG, POR Reset / Reset

• Do not drive pins on an unpowered FPGA

33

Configuration • Validate configuration mode and required

configuration memory bank voltage • Verify FPGA to configuration memory control

signals, data and address mapping and pull-up/down – Note NOR part address Mapping may be NOR.A1 to

FPGA.A0 • Are mode pins held constant during and after

configuration? • Bitstream encryption is a security option

34

Multi-Boot Implemented Correctly • Verify Multi-boot on a development board

– Do not count on image compression ratio – Partial reconfiguration also requires multiple

images • Choose a memory foot-print with multiple

vendors and a larger size future migration – Memories often go EOL before the FPGA

• Selection of supported memories simplifies implementation of in-system flash programming

35

Board-Level Debug • The design must meet timing before adding a

debug core • A debug core’s clock should be synchronous to

the nets being monitored • Signal probes can be routed out to external

pins or pads – Co-location of pins and matched routing helps

reduce debug signal-to-signal skew • Switches and LEDs can be helpful for debug

– Can be on board or on an external test assembly

36

Design Capture for Performance • Implement pipelining in hard blocks with

registers – DSP and Memory blocks – Large block input and output paths often have

timing challenges • Target shift register functionality to LUTs to

reduce register utilization and power use • Reset functionality can prevent mapping to

DSP & Memory blocks and SRL/LUTs • Understand target architecture • Use appropriate code templates

37

Design Capture and Power • Implement controlled clock disable for inactive

circuitry • Be careful not to implement control sets that

prevent targeting hard blocks (reset) • Disable BRAMs when not in use

– Dynamic power is proportional to time enabled

38

Design Capture and Reliability • Safe startup after Global reset release and

enable is enhanced by – Initializing registers state to desired values in RTL – Releasing clocks in a controlled manner

• Eliminate untimed resets – Unpredictable behavior when asserted or more

importantly deasserted • Eliminate combinational feedback

– Combinational processes looping – Can create timing hazards that are difficult to

predict or avoid and can prevent optimization

39

Managing Clock Doman Crossings • Synchronous CDCs should have both clocks

properly related by phase – Minimize phase differences

• Asynchronous CDCs should have proper timing exceptions and synchronization created – Implement clock synchronization circuitry for all

Asynchronous CDCs

40

Constraints • All input and output ports should have specified

delay constraints defined relative to the appropriate clock

• Use available complex constraint templates • Avoid constraints that cause path segmentation • Use logic options rather than net names in

constraints when possible • Validate constraint scope and execution order • Constrain all clock group boundaries

41

RTL Coding Guidance • Use RTL synthesizer-friendly templates • Verilog: do not misuse blocking and non-

blocking statements • VHDL: use complete sensitivity lists • Do not use delays in RTL code (unsynthesizable) • Do not code incomplete if/else clauses • Do not use for/while loops in RTL code • VHDL: Do not use buffer port mode • Verilog: do not use file inclusion

42

RTL Coding Guidance – Cont. • Avoid mixed clock edges in a hierarchy, use with

discretion • Code to allow easy removal of debug logic • Avoid coding arrays in port declarations • Code to avoid large numbers of control sets

– Unique clock enable, clock and resets • Duplicate flip-flops to reduce high fanout nets

43

Coding - Resets • Global resets are often not necessary in FPGA

design and can consume significant routing resources

• Resets can cause timing issues inside and outside the reset timing paths

• Minimize the number of resets – Reset necessary control logic only – Avoid resetting the datapath – Avoid active low resets

44

Coding – Resets Cont. • Use synchronous resets when a functional reset

is necessary • Avoid mixing reset polarities

– Can result in sub-optimal reset path timing – Use a reset polarity consistently

• Consider a BUFG for a high fanout reset net • Implement independent clock domain resets

– Manage where a reset crosses a clock domain • If an Asynchronous reset must be used ensure

that the reset is synchronously deasserted

45

Design Hierarchy • Plan design hierarchy to allow isolation of clock

regions and specific functionality – Group related logic

• Register data paths at logical / hierarchical boundaries

• Bring clocking elements to the top-level of the design

46

Coding and Memories • Use Memory HDL templates to infer memories • Use BlockRAM output registers for higher

performance • Implement the BRAM write mode that

minimizes the time the BRAM is enabled • If available use dedicated hard FIFOs for power,

performance and area advantages

47

Coding and DSP • Determine minimum acceptable DSP data bit width

(precision) – Key when targeting DSP blocks

• Specific coding is required to implement DSP block cascading

• The DSP block multiplier natively performs signed operations - Unsigned arithmetic reduces precision

• Pipeline to improve design power and performance • Seek to minimize reset usage • Implement active high synchronous reset if necessary • Structure code implement multiplication-addition flows

for DSP block compatibility

48

Coding and Clocks • Monitor clock lock signals to validate a clean

clock before releasing circuitry • Leverage BUFGMUX for “glitch free” clock

transition • Clock gating should be performed with

synchronous clock elements, or clock buffer enables rather than with LUT logic – Correct coding styles and design inference should

be used to avoid LUT logic feeding clock lines

49

Common Design Mistakes and Best Practices

Most Common FPGA Design Mistakes • Insufficient margin; resources, schedule, power • Not using the appropriate design tools; not trained on tools • Improper design constraints; too many/few, inaccurate

conflicting • Poor configuration management; unrecoverable design

corruption • Poor system architecting; poor modularization/hierarchy • Insufficient simulation (rush to lab) • Overly compressed schedules • Design oversights

– Device configuration – HDL style & coding – Clocking, I/O, pin assignments

• Insufficient design documentation leading to design churn/chaos

• Lost design implementation knowledge, documentation, build recipe; tools, files

Example Design Mistakes • Lack of awareness of speed-grade considerations of

implementing a design requiring industrial temperature grade parts

• Design implementation creep; not enough resources • Rush to Lab – Too much demanded of ILA

implementation – insufficient resource margin for required ILA functionality in the design

• Ineffective system partitioning; un-registered module boundaries

• Depending on maximum performance (650 MHz performance... etc.)

• Sloppy HDL coding without consideration of resources required to implement (too many layers)

Revision Control / Configuration Management • Supports design recovery and retrieval • A correctly implemented revision control system can avoid the

following issues: – Which FPGA or software version was shipped? – Where is a copy of that file? – Which file was used? – Did changes get overwritten? – Which tool version was used?

• A revision control system can also provide an automatic way to capture key design artifacts

HDL Coding Guidelines • A short collection of techniques that seek to minimize system

errors • Intended to be voluntarily enforced by the design team with

engineering judgment – Not to be utilized as a cudgel by any group

• Strict or universal adherence is not mandatory • An example set of VHDL Capture Guidelines available at the

INTUITIVE Research and Technology site ~ http://www.irtc-hq.com/

5

Best Practices • Use a batch file or scripting language (TCL) to automate design flows • Design file headers and comments can make HDL code self-documenting • Configuration control is essential • Developing and following design plans can improve odds of first-pass design

success – Ad Hoc can lead to wildly inconsistent results

• The design team should agree on common coding practices, and design language choice

• Appropriate ratio of design simulation vs.. lab-based verification and debug • Research tool host platform requirements

– Memory, OS

Best Practices • Common coding practices and examples should be formalized and written

down so that the team can reference and update the common standards • Signal Integrity simulation and modeling • Sufficient design margin • Accurate design estimation • Well architected design

– Appropriate module size – Hierarchy – Group functionality based on clock and performance – Well defined interfaces

• Effective IP analysis and Selection • Take advantage of early prototyping options • Follow manufacturer recommendations

– Tools, partners, decoupling

Leveraging Checklists and Manufacturer Guidance

• Leverage heavily manufacturer documentation, errata, tutorials, and answer records

• Search for any manufacturer board level and design level checklists

• Implement any suggested manufacturer design process – Example: Xilinx’s UltraFast Design Methodology

Clock Domain Crossing (CDC) • Ideally each design unit, module or component should have

one clock • If a design unit has more than one clock, strong consideration

should be given to re-architecting • Moving data and/or signals between design units requires

special attention to CDC techniques – Utilize a mix of FIFOs, Dual Flip-Flop chains and “Flancter” circuit

implementations • The significant majority of functionality within an FPGA should

be synchronous rather than asynchronous

Module Definition • Incomplete or ineffective design unit interface can

significantly increase design rework and churn – Divide the design into reasonably sized modules that

share common clocking and functionality • Take the time to think the design through and

document it prior to starting HDL capture – Module documentation should explicitly define module

boundaries and interfaces – Implement well-defined (registered) interfaces

• Poorly architected and implemented designs result in expensive inferior products regardless of the tools, languages or processes used

VHDL Capture • Validate all Inferred Latch Implementations

– Inferred Latches result from tools “assumptions” – Can be avoided through rigorous coding practices

• Process Sensitivity List – Complex sensitivity lists are a potential error source

• Use Variables with Caution – While Variables are supported they are not

generally recommended • Utilize a Style Guide • Implement HDL Code based on target

synthesizer-preferred structures & syntax

VHDL Capture • Module Size management

– Manage the HDL coding size of your modules – Not a tools issue – improves readability

• Correct & Sufficient Documentation – Documentation should be correct – incorrect

documentation casts all documentation into doubt – Documentation should be sufficient for someone

skilled in the art of FPGA design to understand your intended design implementation

Design: Architecture and Tools • Ultimately there is a contract between the

designer and the tool – The tool will implement what you tell it to the best of

its understanding – If you are ambiguous the tool will make decisions for

you – If you don’t tell the tool exactly what you want you will

get something, but not exactly what you want • Your guidance to the tools is in the form of: HDL

code, constraints, scripts, modes, options, etc. – If these are incomplete then the tool will fill in the gaps

for you… – seldom a good thing since the tools are not Magic

Design: Reset • “Easy Stuff” breeds a lack of attention.

– Lack of attention leads to injury.

• MAPLD 2004 (NASA Office of Logic Design) – “Unintended operation or lockup of finite state

machines or systems may result if the flip-flops come out of reset during different clock periods. There is a potential for one or more uncontrolled metastable states. Therefore, only reset circuits that [attempt to] remove power on reset synchronously should be considered in hi-rel applications.”

• References: – “Asynchronous & Synchronous Reset Design Techniques

- Part Deux”, Cummings et. al., Sunburst Design – Xilinx white paper 272

Design: Clock Domains • Simplified Digital Design Clocking Rules

1. Only use one clock! 2. When you need more than one clock, only use

one clock!

• Seriously. . . – Almost all FPGAs implement multiple clock domains

and thus potential clock interaction – Each clock domain can be synchronous to only one

clock – Clock domain crossing is a critical design element

• Know where they ALL are and deal with them appropriately

• Your design should be Synchronous period

Synchronous, Synchronous, Synchronous!

• Can asynchronous designs be implemented in an FPGA?

• Should they be? - Generally No, – Only when the other options are too “expensive”

>>>>> Seek to avoid asynchronous design

Synchronous Design

HDL Coding Guidelines • Do not set or reset Registers asynchronously

– Control set remapping becomes impossible – Sequential functionality in block RAM components and DSP blocks

can only be set or reset synchronously – Device resources will not be able to be used or will be sub-

optimal • Do not describe Flip-Flops with both a set and a reset

– No Flip-Flop primitives feature both a set and a reset, whether synchronous or asynchronous

– Flip-Flop primitives featuring both a set and a reset may adversely affect area and performance

• Avoid operational set/reset logic whenever possible • Always describe the clock enable, set, and reset control

inputs of Flip-Flop primitives as active-High. – If described as active-Low, the inverter logic will penalize circuit

performance FROM: UG901-vivado-synthesis

Design: Synchronous Design • FPGAs Require Synchronous Design Practices

– Asynchronous circuits will cause intermittent failures

• Synchronous Design Rules

– All data are passed through combinatorial logic and flip-flops that are synchronized to a single clock

– Delay is always controlled by flip-flops, not combinatorial logic

– No signal that is generated by combinatorial logic can be fed back to the same group of combinatorial logic without first going through a synchronizing flip-flop

– Clocks cannot be gated; clocks must go directly to the clock inputs of the flip-flops without going

Synchronous Design • Bob Zeidman’s Introduction to CPLD and FPGA Design

presents a concise set of synchronous design rules – summary below – 1. Synchronous design means that all data are passed

through combinatorial logic and flip-flops that are synchronized to a single clock

– 2. Delay should be controlled by flip-flops, not combinatorial logic

– 3. No signal generated by combinatorial logic should be fed back to the same combinatorial logic group without first going through a synchronizing flip-flop

– 4. Clocks cannot be gated - clocks must go directly to the clock inputs of the flip-flops without passing through combinatorial logic

– 5. One clock for the system. Do not clock entities or processes with outputs of other entities or processes

Design: State Machines • Disclaimer

– A thorough discussion of Finite State Machines (FSM) is on the order of a semester long graduate level course

• Conventional FSM Reasonable complexity – A good rule of thumb is no more than 20 “complex”

states – Break larger state machines into multiple smaller ones – Think about the complexity of the next state equation

the tool will have to develop – Think about your understanding of the design and the

complexity of the implementation

Tool Usage: Timing Report • The Timing Report is Critical Data • The timing report is the vendor telling how your

design will perform in their part – If the tool says you have little or no margin…

• Vary from default vendor tool settings cautiously

– Unless… you have a PhD in semiconductor manufacturing, or an extended schedule and budget

– Vendor defaults for voltage, temperature have built-in margin

– Semiconductor physics can be non-linear • Changing the temperature from 85°C to 50°C due to your

operating environment can mislead you. . .

Tool Usage: Warnings • Intellectual Property (IP)

– Vendor provided IP can generate warnings – There may be a few critical warnings in thousands of mundane

warnings – Recommend informed IP Selection (except built-in silicon IP)

• License restrictions, obsolescence, support, etc. • Coding and implementation standards for IP are all over the place

• Avoid “Inferred Latch” warnings – This is the tool telling you that you have not completely defined

your design intent and so… • The tool is going to be making some assumptions for your about your

design implementation – which is seldom where you want to be – In order to provide the described behavior, the tool must insert a

memory element (latch) where the designer did not request one – This should be considered the same level of concern as an IRS

audit

Reduce Uncertainty While Coding • Design before HDL capture begins • Be able to describe the functionality (and interaction

with other blocks) of a design block before starting design capture / coding / implementation.

• Document in writing and illustrations the projected functionality and interfaces before HDL coding begins

• You should know what the interfaces into and out of the module are before you start to code (and they likely should be registered)

• Know that the structure you are coding implements the desired functionality (i.e. build a test case)

Design Implementation - Tool Switch Options

Run Strategies • A strategy is a set of pre-configured synthesis or

implementation tool options – The are used to address / resolve synthesis and

implementation design challenges – Strategies are tool and version specific

• A range of commonly used strategies are likely

to be available as a reference for creation of customized strategies – These are validated against a range of

representative internal benchmark designs

Vivado Tools Settings Documentation • Synthesis Settings

– See the Vivado Design Suite User Guide: Synthesis (UG901) • Implementation Settings

– See the Vivado Design Suite User Guide: Implementation (UG904)

• Bitstream Settings – See the Vivado Design Suite User Guide: Programming and

Debugging (UG908) • IP Settings

– See the Vivado Design Suite User Guide: Designing with IP (UG896)

• Running DRCs – See the Vivado Design Suite Tcl Command Reference Guide

(UG835) – Creating custom DRCs, see the Vivado Design Suite User

Guide: Using Tcl Scripting (UG894)

Debugging the Design - I • RTL-level Design Simulation

– You can functionally debug the design during the simulation verification process

– RTL-level simulation debugging benefits include: • Full visibility of the entire design • The ability to more quickly iterate through the design and debug

cycle • Post-implemented Design Simulation

– The benefit of Post-implemented Design Simulation is having access to a timing-accurate model for the design

• Both classes of simulation can be limited by: – The difficulty of simulating larger designs in a reasonable

amount of time (extended run-times) – The difficulty of accurately simulating the actual system

environment (system model inaccuracy)

Debugging the Design - II • In-system Debugging

– Most development systems also include also include logic analysis functionality

• ILA – Integrated Logic Analyzer • Supports timing-accurate, post-implemented design debug in

the actual system environment at system speeds – In-system debugging includes the following challenges:

• Somewhat lower debug signal visibility compared to simulation visibility

• Potentially longer design, implementation, and debug iterations, depending on the size and complexity of the design

• Other Debugging Options include: – In-system serial I/O validation and debug

Vivado Constraints • Constraints

– See the Vivado Design Suite User Guide: Using Constraints (UG903)

• Changing the Constraint Evaluation Order • Converting UCF Constraints

– See the Vivado Design Suite Migration Methodology Guide (UG911) Also known as: ISE to Vivado Design Suite Migration Guide (UG911)

– See the Vivado Design Suite User Guide: Using Constraints (UG903)

• Constraint Conflicts – See Vivado Implementation (UG904)

Vivado Example Synthesis Options/Strategies

• –flatten_hierarchy Hierarchy control

• –gated_clock_conversion Enables tool conversion of clocked logic with enables

• –BUFG Controls how many BUFGs the tool infers in the design

• –fanout_limit Max signal load number driven before logic replication starts

• –directive Runs synthesis with different optimizations.

• –fsm_extraction Controls finite state machine mapping and extraction

• -keep_equivalent_registers Prevents merging of registers with the same input logic

• -resource_sharing Controls sharing of arithmetic operators between different signals

• -no_lc Controls the sharing of arithmetic operators between different signals

• -shreg_min_siz Sets the threshold for inference of SRLs

Implementation Strategy & Optimization Categories

• Implementation Strategy Categories: – Performance - Improve design performance – Area - Reduce LUT count – Power - Adjust power optimization – Flow - Modify flow steps – Congestion - Reduce congestion

• Implementation Optimization Categories: – Logic – Power – Placement – Routing – Physical Optimization

Other Topics • Design Configuration Management • Tools Configuration Management • Tools Licensing Models • Archiving Projects • Working With Source Control Systems • IP Version Management and Update • IP Settings

– Repository Manager: Specifies directories to add to the IP repositories list.

• Sources window / Source Management – Local source, Remote source, Missing source, Read-only

source • Constraint File Interaction and Presecidence • Incremental Compile

Estimation and Budgeting

Estimation can be Easy… • Or at least easier when you have access to the

right information

• Note that collecting and organizing the required information requires commitment buy-in and effort from both Management and Engineering

• Which can be extremely difficult

Making Estimation More Difficult Than It Needs To Be…

• How do you accurately answer the following questions:

• How long does it take to accomplish an undefined task? … an incompletely defined task? … a task that is continuously changing? • How long will it take you with complex tools you have

never used? • How long will it take with a new team who’s skills you

don’t know?

• Ultimately - you are seeking to reduce ambiguity • You are seeking to compare the unknown future against

the known past

…..

What Information Is Needed? • There can be very large design implementation swings

based on complex interaction between Documentation, System Architecting, Design Capture, Simulation, Integration, Debug and Test and Validation phases

• Accurate information and more granular information on the length of the key design phases is key

• The information has much in common with the collection of information for programming and Carnegie Mellon has been working to refine that space for 25+ years.

Estimation and Budgeting – Getting Better • How good is your organization at Estimation and

Budgeting? – Is there one or perhaps a few individuals within your

organization who can (with relative accuracy) make FPGA-related estimates?

– What happens if those individuals are no longer available

• How do you get better as an organization? • The question is really what information do you

need?

• The correct and accurate data must be centrally collected and organized

Metrics Needed • We need to collect more detail in these two

main broad categories:

• Hours / Days worked – Hours worked in the main key Design Phases

(Subjective) – Labor Hours required and Calendar Days required

to complete the functionality • Design Complexity

– Module Description and implementation details – At the Module level, at the System level

Other Areas of Estimation • Power Consumption and Thermal

– Design complexity analysis • What can we compare it against that is known?

– Leverage development boards and reference designs – Number of registers, frequency, state transition ratio – IO loading and electrical characteristics – RTL-level tools assisted estimation – Placed and Routed-level tools assisted estimation

• What level of conservatism is built into the tools?

• HW Resources to be required – Compare to similar complexity implementation – Comparing an IIR Filter vs. an FIR Filter implementation

IP

40

41

IP Use and Selection • Make vs. Buy Decision • Select IP Block • Select IP Vendor • How to modify IP Block • Netlist vs. Source • Implementing an IP Block

– IP Design Flow – Developing and documenting your In-house IP

Process – IP Implementation documentation – IP Design Tools

42

IP Selection and Use Flow 1. Define Requirements 2. Identify Required Functionality 3. Partition Design 4. Make vs. Buy Decision 5. Select IP Block 6. Select IP Vendor 7. > IP Demonstration 8. > Try before Buy 9. License IP (Contract) 10. Review & Understand

Documentation 11. Clarify Documentation Questions

and Discrepancies

12. > Re-Implement IP Block 13. > Verify IP Block Functionality in

Isolation; 14. Run vendor-supplied test-bench 15. Modify IP Block (if Required) 16. Re-Implement, Re-Verify 17. Design and test IP Block Interface

Circuitry 18. Integrate IP Block into System 19. Debug Design 20. Verify Functionality 21. Archive & Document Design 22. Deliver Product

43

IP Challenges • IP often not “plug & play” or “turnkey”

– Some level of customization required • IP interfaces are often the biggest challenges • Access to source code likely to increase cost • IP test-drive effort may be significant • Swapping IP cores can be time-consuming • Licensing agreements differ and may require legal review

Conclusion & Resources

44

Documentation Sources • Manufacturers have extensive documentation spread

through a range of document types: – Application Notes – Data Sheets – User Guides – white Papers – Errata – Methodology Guides – Tool Guides – Tutorials

• Search topics of interest on the Manufacturers:

– Online Answer Databases – Forums – Wikis

HDL Language Resources • Ashenden’s book, The Designer’s Guide to VHDL is an

excellent reference

• Verilog HDL: Digital Design and Modeling

Online Training • Manufacturer Training

– All manufacturers have free versions of their tools available for download

– Training videos are available – Tutorials with sample designs are available

• Online registration may be required for free tools licensing and access to sample design files

• Journals and Trade Papers