PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

download PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

of 40

Transcript of PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    1/40

    This material exempt per Department of Commerce license exception TSU

    Basic FPGAArchitecture

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    2/40

    Objectives

    After completing this module, you will be able to:

    Identify the basic architectural resources of the Virtex-II FPGA

    List the differences between the Virtex-II, Virtex-II Pro, Spartan-3, and Spartan-3E devices

    List the new and enhanced features of the new Virtex-4 devicefamily

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    3/40

    Basic Architecture 3

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E, andVirtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    4/40

    Basic Architecture 4

    Overview

    All Xilinx FPGAs contain the same basic resources

    Slices (grouped into CLBs)

    Contain combinatorial logic and register resources

    IOBs

    Interface between the FPGA and the outside world

    Programmable interconnect

    Other resources

    Memory

    Multipliers

    Global clock buffers

    Boundary scan logic

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    5/40

    Basic Architecture 5

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E, andVirtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    6/40

    Basic Architecture 6

    Slices and CLBs

    Each Virtex-II CLBcontains four slices

    Local routing providesfeedback between slices in

    the same CLB, and it

    provides routing toneighboring CLBs

    A switch matrix provides

    accessto general routing resources

    CIN

    SwitchMatrix

    BUFTBUF T

    COUTCOUT

    Slice S0

    Slice S1

    Local Routing

    Slice S2

    Slice S3

    CIN

    SHIFT

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    7/40

    Basic Architecture 7

    Slice 0

    LUTLUT CarryCarry

    LUTLUT CarryCarryD Q

    CE

    PRE

    CLR

    DQCE

    PRE

    CLR

    Simplified Slice Structure

    Each slice has fouroutputs Two registered outputs,

    two non-registered outputs

    Two BUFTs associatedwith each CLB, accessible

    by all 16 CLB outputs

    Carry logic runs vertically,

    up only Two independentcarry chains per CLB

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    8/40

    Basic Architecture 8

    Detailed Slice Structure

    The next few slidesdiscuss the slice features

    LUTs MUXF5, MUXF6,

    MUXF7, MUXF8

    (only the F5 and

    F6 MUX are shownin this diagram)

    Carry Logic

    MULT_ANDs Sequential Elements

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    9/40

    Basic Architecture 9

    Combinatorial Logic

    AB

    CD

    Z

    Look-Up Tables

    Combinatorial logic is stored in Look-Up Tables(LUTs)

    Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the

    complexity

    Delay through the LUT is constant

    A B C D Z

    0 0 0 0 0

    0 0 0 1 0

    0 0 1 0 0

    0 0 1 1 1

    0 1 0 0 10 1 0 1 1

    . . .

    1 1 0 0 0

    1 1 0 1 0

    1 1 1 0 0

    1 1 1 1 1

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    10/40

    Basic Architecture 10

    Connecting Look-Up Tables

    F5

    F8

    F5

    F6

    CLB

    Slice S3

    Slice S2

    Slice S0

    Slice S1F5

    F7

    F5

    F6

    MUXF8 combines the two

    MUXF7 outputs (from the

    CLB above or below)

    MUXF6 combines slices S2

    and S3

    MUXF7 combines the two

    MUXF6 outputs

    MUXF6 combines slices S0 and S1

    MUXF5 combines LUTs in each slice

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    11/40

    Basic Architecture 11

    Fast Carry Logic

    Simple, fast, and completearithmetic Logic

    Dedicated XOR gate forsingle-level sum

    completion

    Uses dedicated routing

    resources

    All synthesis tools caninfer carry logic

    COUT COUT

    SLICE

    S0

    SLICE

    S1

    SecondCarry

    Chain

    To S0 of the

    next CLBTo CIN of S2 of the next

    CLB

    First Carry

    Chain

    SLICE

    S3

    SLICES2

    COUT

    COUT

    CIN

    CIN

    CIN CIN CLB

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    12/40

    Basic Architecture 12

    CO

    DI CIS

    LUT

    CY_MUX

    CY_XOR

    MULT_AND

    A

    B

    A x B

    LUT

    LUT

    MULT_AND Gate

    Highly efficient multiply and add implementation

    Earlier FPGA architectures require two LUTs per bit to perform the

    multiplication and addition The MULT_AND gate enables an area reduction by performing the

    multiply and the add in one LUT per bit

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    13/40

    Basic Architecture 13

    DCE

    PRE

    CLR

    Q

    FDCPE

    D

    CE

    S

    R

    Q

    FDRSE

    D

    CE

    PRE

    CLR

    Q

    LDCPE

    G

    _1

    Flexible Sequential Elements

    Either flip-flops or latches

    Two in each slice; eight in each

    CLB Inputs come from LUTs or from an

    independent CLB input

    Separate set and reset controls Can be synchronous or asynchronous

    All controls are shared within a

    slice Control signals can be inverted locally

    within a slice

    Shift R i t LUT

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    14/40

    Basic Architecture 14

    Shift Register LUT

    (SRL16CE) Dynamically addressable serialshift registers

    Maximum delay of 16 clock cyclesper LUT (128 per CLB)

    Cascadable to other LUTs or CLBsfor longer shift registers

    Dedicated connection from Q15 toD input of the next SRL16CE

    Shift register length canbe changed

    asynchronously

    by toggling address ALUT

    D QCE

    D QCE

    D QCE

    D QCE

    LUT

    DCE

    CLK

    A[3:0]

    Q

    Q15 (cascade out)

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    15/40

    Basic Architecture 15

    Shift Register LUT Example

    The SRL can be used to create a No Operation (NOP)

    This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72 CLBs)

    and associated routing and delays

    12 Cycles

    64

    Operation A

    4 Cycles4 Cycles 8 Cycles8 Cycles

    Operation B

    3 Cycles3 Cycles

    Operation C

    64

    12 Cycles

    Paths are Statically

    Balanced

    9 Cycles9 Cycles

    Operation D -NOP

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    16/40

    Basic Architecture 16

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E, andVirtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    17/40

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    18/40

    Basic Architecture 18

    SelectIO Standard

    Allows direct connections to external signals of varied voltagesand thresholds Optimizes the speed/noise tradeoff

    Saves having to place interface components onto your board

    Differential signaling standards LVDS, BLVDS, ULVDS

    LDT LVPECL

    Single-ended I/O standards

    LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz)

    GTL, GTLP

    and more!

    Digital Controlled

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    19/40

    Basic Architecture 19

    Digital Controlled

    Impedance (DCI) DCI provides Output drivers that match the impedance of the traces

    On-chip termination for receivers and transmitters DCI advantages

    Improves signal integrity by eliminating stub reflections

    Reduces board routing complexity and component count by eliminatingexternal resistors

    Eliminates the effects of temperature, voltage, and process variations byusing an internal feedback circuit

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    20/40

    Basic Architecture 20

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E, andVirtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    21/40

    Basic Architecture 21

    Other Virtex-II Features

    Distributed RAM and block RAM

    Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits)

    Block RAM is a dedicated resources on the device (18-kb blocks) Dedicated 18 x 18 multipliers next to block RAMs

    Clock management resources

    Sixteen dedicated global clock multiplexers Digital Clock Managers (DCMs)

    Distributed SelectRAM

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    22/40

    Basic Architecture 22

    Distributed SelectRAM

    Resources Uses a LUT in a slice as memory

    Synchronous write

    Asynchronous read Accompanying flip-flops

    can be used to createsynchronous read

    RAM and ROM are initializedduringconfiguration

    Data can be written to RAMafter configuration

    Emulated dual-port RAM One read/write port

    One read-only port

    RAM16X1S

    O

    D

    WE

    WCLK

    A0

    A1

    A2

    A3

    LUTLUT

    RAM32X1S

    O

    D

    WE

    WCLK

    A0

    A1

    A2

    A3

    A4

    RAM16X1D

    SPO

    D

    WE

    WCLK

    A0

    A1

    A2

    A3

    DPRA0 DPO

    DPRA1

    DPRA2

    DPRA3

    Slice

    LUT

    LUT

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    23/40

    Basic Architecture 23

    Block SelectRAM Resources

    Up to 3.5 Mb of RAM in 18-kbblocks

    Synchronous read and write True dual-port memory

    Each port has synchronous readand write capability

    Different clocks for each port

    Supports initial values

    Synchronous reset on output

    latches

    Supports parity bits

    One parity bit per eight data bits

    DIA

    DIPAADDRA

    WEA

    ENA

    SSRA

    CLKA

    DIB

    DIPB

    WEB

    ADDRB

    ENB

    SSRB

    DOA

    CLKB

    DOPA

    DOPB

    DOB

    18-kb block SelectRAM memory

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    24/40

    Basic Architecture 24

    Dedicated Multiplier Blocks

    18-bit twos complement signed operation

    Optimized to implement Multiply and Accumulate functions

    Multipliers are physically located next to block SelectRAMmemory

    18 x 18

    Multiplier

    18 x 18Multiplier

    Output(36 bits)

    Data_A(18 bits)

    Data_B(18 bits)

    4 x 4 signed

    8 x 8 signed

    12 x 12 signed

    18 x 18 signed

    Global Clock Routing

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    25/40

    Basic Architecture 25

    Global Clock Routing

    Resources Sixteen dedicated global clock multiplexers Eight on the top-center of the die, eight on the bottom-center

    Driven by a clock input pad, a DCM, or local routing Global clock multiplexers provide the following:

    Traditional clock buffer (BUFG) function

    Global clock enable capability (BUFGCE) Glitch-free switching between clock signals (BUFGMUX)

    Up to eight clock nets can be used in each clock region of thedevice

    Each device contains four or more clock regions

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    26/40

    Basic Architecture 26

    Digital Clock Manager (DCM)

    Up to twelve DCMs per device

    Located on the top and bottom edges of the die

    Driven by clock input pads DCMs provide the following:

    Delay-Locked Loop (DLL)

    Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS)

    Up to four outputs of each DCM can drive onto global clock buffers

    All DCM outputs can drive general routing

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    27/40

    Basic Architecture 27

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E,and Virtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    28/40

    Basic Architecture 28

    Spartan-3 versus Virtex-II

    Lower cost

    Smaller process = lower corevoltage

    .09 micron versus .15 micron

    Vccint = 1.2V versus 1.5V

    Different I/O standard support

    New standards: 1.2V LVCMOS,1.8V HSTL, and SSTL

    Default is LVCMOS, versusLVTTL

    More I/O pins per package

    Only one-half of the slices

    support RAM or SRL16s(SLICEM)

    Fewer block RAMs and

    multiplier blocks Same size and functionality

    Eight global clock multiplexers

    Two or four DCM blocks No internal 3-state buffers

    3-state buffers are in the I/O

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    29/40

    Basic Architecture 29

    SLICEM and SLICEL

    Each Spartan-3 CLBcontains four slices

    Similar to the Virtex-II Slices are grouped in pairs

    Left-hand SLICEM (Memory)

    LUTs can be configured asmemory or SRL16

    Right-hand SLICEL (Logic)

    LUT can be used as logic

    only

    CIN

    Switch

    Matrix

    COUTCOUT

    Slice X0Y0

    Slice X0Y1

    Fast Connects

    Slice X1Y0

    Slice X1Y1

    CIN

    SHIFTIN

    Left-Hand SLICEM Right-Hand SLICEL

    SHIFTOUT

    S 3

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    30/40

    Basic Architecture 30

    Spartan-3E Features

    More gates per I/O thanSpartan-3

    Removed some I/O standards

    Higher-drive LVCMOS

    GTL, GTLP

    SSTL2_II

    HSTL_II_18, HSTL_I, HSTL_III

    LVDS_EXT, ULVDS

    DDR Cascade Internal data is presented on a

    single clock edge

    16 BUFGMUXes on left andright sides

    Drive half the chip only In addition to eight global clocks

    Pipelined multipliers

    Additional configurationmodes

    SPI, BPI

    Multi-Boot mode

    Vi t II P F t

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    31/40

    Basic Architecture 31

    Virtex-II Pro Features

    0.13 micron process

    Up to 24 RocketIOMulti-Gigabit Transceiver (MGT) blocks

    Serializer and deserializer (SERDES) Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers,

    and others

    8-, 16-, and 32-bit selectable FPGA interface

    8B/10B encoder and decoder

    PowerPCRISC processor blocks

    Thirty-two 32-bit General Purpose Registers (GPRs)

    Low power consumption: 0.9mW/MHz

    IBM CoreConnect bus architecture support

    O tli

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    32/40

    Basic Architecture 32

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E, andVirtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    33/40

    Choose the Platform that Best

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    34/40

    Basic Architecture 34

    Fits the Application

    ResourceResource

    14K14K200K LCs200K LCsLogic

    Memory

    DCMs

    DSP Slices

    SelectIO

    RocketIO

    PowerPC

    Ethernet MAC

    LXLX FXFX SXSX

    0.90.96 Mb6 Mb

    441212

    32329696

    240240960960

    23K23K55K LCs55K LCs

    2.32.35.7 Mb5.7 Mb

    4488

    128128512512

    320320640640

    12K12K140K LCs140K LCs

    0.60.610 Mb10 Mb

    442020

    3232192192

    240240896896

    0024 Channels24 Channels

    1 or 2 Cores1 or 2 Cores

    2 or 4 Cores2 or 4 Cores

    N/A

    N/A

    N/A

    N/A

    N/A

    N/A

    O tli

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    35/40

    Basic Architecture 35

    Outline

    Overview

    Slice Resources I/O Resources

    Memory and Clocking

    Spartan-3, Spartan-3E, andVirtex-II Pro Features

    Virtex-4 Features

    Summary Appendix

    R i Q ti

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    36/40

    Basic Architecture 36

    Review Questions

    List the primary slice features

    List the three ways a LUT can be configured

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    37/40

    Summary

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    38/40

    Basic Architecture 38

    Summary

    Slices contain LUTs, registers, and carry logic

    LUTs are connected with dedicated multiplexers and carry logic

    LUTs can be configured as shift registers or memory IOBs contain DDR registers

    SelectIO standards and DCI enable direct connection to multiple

    I/O standards while reducing component count Virtex-II memory resources include the following:

    Distributed SelectRAM resources and distributed SelectROM (usesCLBLUTs)

    18-kb block SelectRAM resources

    Summary

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    39/40

    Basic Architecture 39

    Summary

    The Virtex-II devices contain dedicated 18x18multipliers next to each block SelectRAM

    resource

    Digital clock managers provide the following:

    Delay-Locked Loop (DLL)

    Digital Frequency Synthesizer (DFS)

    Digital Phase Shifter (DPS)

    Where Can I Learn More?

  • 8/13/2019 PDF-XilinxPDF-FPGA Design Flow-11 Basic Fpga Arch 8

    40/40

    Basic Architecture 40

    Where Can I Learn More?

    User Guides

    www.xilinx.com DocumentationUser Guides

    Application Notes

    www.xilinx.com DocumentationApplication Notes

    Education resources

    Designing with the Virtex-4 Family course

    Spartan-3E Architecture free Recorded e-Learning