Unit3_ghongade

Programmable LogicDevices

R.B.Ghongade

UNIT 3

Key Terms

Field-Programmable Device (FPD) a general term that refers to any type of integrated circuit used for implementing digital hardware, where the chip can be configured by the end user to realize different designs. Programming of such a device often involves placing the chip into a special programming unit, but some chips can also be configured in-system. Another name for FPDsis programmable logic devices (PLDs); although PLDsencompass the same types of chips as FPDs, we prefer the term FPD because historically the word PLD has referred to relatively simple types of devices.

PLA a Programmable Logic Array (PLA) is a relatively small FPD that contains two levels of logic, an AND-plane and an OR-plane, where both levels are programmable.

Key Terms PAL a Programmable Array Logic (PAL) is a relatively

small FPD that has a programmable AND-plane followed by a fixed OR-plane

SPLD refers to any type of Simple PLD, usually either a PLA or PAL

CPLD a more Complex PLD that consists of an arrangement of multiple SPLD-like blocks on a single chip. Alternative names are sometimes adopted for this style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and others.

FPGA a Field-Programmable Gate Array is an FPD featuring a general structure that allows very high logic capacity. Whereas CPLDs feature logic resources with a wide number of inputs (AND planes), FPGAs offer more narrow logic resources. FPGAs also offer a higher ratio of flip-flops to logic resources than do CPLDs.

Key Terms

HCPLDs high-capacity PLDs: a single acronym that refers to both CPLDs and FPGAs. This term has been coined in trade literature for providing an easy way to refer to both types of devices.

Interconnect the wiring resources in an FPD. Programmable Switch a user-programmable switch

that can connect a logic element to an interconnect wire, or one interconnect wire to another

Logic Block a relatively small circuit block that is replicated in an array in an FPD. When a circuit is implemented in an FPD, it is first decomposed into smaller sub-circuits that can each be mapped into a logic block. The term logic block is mostly used in the context of FPGAs, but it could also refer to a block of circuitry in a CPLD.

Key Terms Logic Capacity the amount of digital logic that can be

mapped into a single FPD. This is usually measured in units of equivalent number of gates in a traditional gate array. In other words, the capacity of an FPD is measured by the size of gate array that it is comparable to. In simpler terms, logic capacity can be thought of as number of 2-input NAND gates.

Logic Densitythe amount of logic per unit area in an FPD.

Speed-Performance measures the maximum operable speed of a circuit when implemented in an FPD. For combinational circuits, it is set by the longest delay through any path, and for sequential circuits it is the maximum clock frequency for which the circuit functions properly.

Digital VLSI Chips -classification

ASIC

FPD

SPLD CPLD FPGA

ASIC

GATE ARRAY STANDARDCELLFULL

CUSTOM

PLA PAL GAL PROM EPLD E2PLD

Genericusage

Field Programmable Device

Typical usage

Increasing complexity Increasing complexity

General Programmable Logic Device

Logic gates and

programmableswitches

Inputs

(logic variables) Outputs

(logic functions)

Logic gates and

programmableswitches

Inputs

(logic variables) Outputs

(logic functions)

Consists of a set of inputs (the logic variables) and set of outputs (logic functions).

The job of the designer is to simply program the switches and hence configure the logic gates to perform the desired function

AND-OR realization of logicfunctions

Y AB AC BC

0111

1011

1101

0001

1110

0010

0100

0000

YCBA

A

B

C

Y

Thus given a logic function in SOP form , it can be implemented by using AND and OR arrays. This forms the basic working principle of programmable logic devices

General form of programmable functiondevice

a

b

Logic 1

y

Pull-up resistors

Links that can be programmed

Inputs are available in their true as well as inverted (complementary) forms. This is an important development since all possibilities of inputs are available

readily. The user can now put links and construct the desired function. Putting the link ( or removing it) is called as programming the device.

Programming technologiesThe type of links gives rise to two different technologies: Fusible link and Anti-fuse

Fusible link technologies

Fusible Link Technology

a

b

Logic 1

y

a

b

Logic 1

y=a.b'

blown fuses

Un-programmed Device

Programmed Device

Fusible link technologies

Devices based on fusible-link technologies are said to be one-time programmable, or OTP, because once a fuse has been blown, it cannot be replaced. This places a severe limitation on the usage of the device.

Antifuse technologies

a

b

Logic 1

y

antifuse links

a

b

Logic 1

programmed antifuse links

y=a.b'

Un-programmed Device

Programmed Device

Simplified Antifuse An antifuse is a microscopic column of amorphous (non-

crystalline) silicon linking two metal tracks. In its un-programmed state, the amorphous silicon acts as an insulator with a very high resistance in excess of one billion ohms

Un-programmed Programmed

Types of antifuse technologies

There are two classes of antifuse technologies: Poly-diffusion Anti-fuse (used by Actel)

Metal-Metal Anti-fuse (used by Quicklogic)

Poly-diffusion Anti-fuse

An Oxide-Nitride-Oxide dielectric normally prevents current from flowing between diffusion and poly-silicon layers

When a programming pulse is applied the dielectric melts and a circuit is formed between the diffusion and poly-silicon

Metal-Metal Anti-fuse

The link is an alloy of tungsten, titanium and silicon The conductive link usually forms at the corner of the via where the

electric field is highest during programming

Programming!

The act of programming this particular element effectively grows a linkknown as a viaby converting the insulating amorphous silicon into conducting polysilicon

Devices based on antifuse technologies are OTP, because once an antifuse has been grown, it cannot be removed. Again it is a severe limitation of the technology, but antifuse technology has found its way in space applications because of high reliability

PLD Notation

a b c d

a.b'.d

a'.c'

link

no link

Non-programmable link

a b c d

a.b'.d

a'.c'

non-programmable connection

Programmable Logic Array (PLA)

The AND array along with an OR array can be put together to form a Programmable Logic Array (PLA)

We have already explored the technique of realizing any logic expression by using AND and OR gates. This is the underlying principle of PLA.

In a PLA both the AND as well as OR arrays are both programmable.

PLAs are specified in terms of: Number of inputs (n) Number of outputs(m) Number of product terms(p)

PLAa b c d

f1 f2 f3

Programmable AND array

Programmable OR array

PLA programmed for various logic expressions

a.b.c'.d'+a'.b'.c'

a b c d

a.b.c'.d'

a'.b'.c'

b'.c

a'.b'.c'+b'.c

a.b.c'.d'+b'.c

PLA QP82S100

Consists of 16 dedicated inputs and 8 dedicated outputs

Each output is capable of being actively controlled by any or all of the 48 product terms. The True, Complement, or Dont Care condition of each of the 16 inputs can be ANDed together to comprise one product term

All 48 product terms can be selectively ORed to each output

48816

pnm

PLA QP82S100

Programmable AND Array Logic(PAL)

Many applications do not require that both the AND as well as OR arrays be programmable.

Programmable links are slower than permanent links owing to the considerable resistance shown by the fusible material.

Hence another option for the design engineers -The AND array can be kept programmable as in PLA but the OR array has got no programmability!

Permanent connections are only available in the OR array thus pre-defining the sum terms.

This reduces the flexibility but greatly improves the speed and reduces the manufacturing cost.

PALa b c d

f1 f2 f3

Programmable AND array

Fixed OR array

Additional Features

Tri-state outputs gives programmable bi-directional pins

saves the pin-count

Registered outputs Enables the use of the PAL in finite state

machines

Increases the versatility of the device

Macrocell

PAL16L8A

SpecificationsPart Number = PAL16L8ADescription = Programmable array logic deviceFuse type=titanium-tungsten Manufacturer = Texas InstrumentsNumber of Inputs = Upto 16Prod. Terms Max. = 64No. of Outputs = Upto 8 Nom. Supp (V) = 5.0Package = DIP, LEADLESS CERAMIC CHIP CARRIER(FK)Pins = 20Technology = Advanced Low-Power SchottkyBi-directional pins=6

Programming the PLD Programming a traditional PLD is easy because there are computer

programs and associated tools specially created for the task. The user first creates a computer file known as a PLD source file

containing a textual description of the required functionality. In addition to Boolean equations, the PLD source file may also

support truth tables, state tables, and other constructs, all in textual format.

Automatic selection on a variety of criteria, such as the speed, cost, and power consumption of the devices.

The program may also be used to partition a large design across several devices, in which case it will output a separate JEDEC file for each device.

Finally, the designer takes a new device of the appropriate type and places it in a socket on a special tool, which may be referred to as a programmer, blower, or burner.

The main computer passes the JEDEC file to the programmer, which uses the contents of the file to determine which fuses to blow

JEDEC: Joint Electron Device Engineering Council

Setup for programming

Reprogrammable PLDs

The basic (and most severe) limitation with fusible link and antifuse technologies is that, the device cannot be re-programmed.

This may be a severe short-coming especially during the development phases of the system

Technologies for re-programmable PLD

EPROM( Erasable Programmable Read-Only Memory )

E2PROM( Electrically Erasable Programmable Read-Only Memory )

FLASH

SRAM (Static Random Access Memory)

EPROM

An EPROM transistor has the same basic structure as a standard MOS transistor, but with the addition of a second polysilicon floating gate isolated by layers of oxide

DRAINSOURCE

GATE

MOS TRANSISTOR

CONTROL GATE TERMINAL

DRAIN TERMINAL

SOURCETERMINAL

DRAINSOURCE

FLOATING GATE

EPROM TRANSISTOR

CONTROL GATE TERMINAL

DRAIN TERMINAL

SOURCETERMINAL

GATESiO2

Si

EPROM

In its un-programmed state, the floating gate is uncharged and doesnt affect the normal operation of the control gate.

To program the transistor, a relatively high voltage in the order of 12V is applied between the control gate and drain terminals.

This causes the transistor to be turned hard on, and excited electrons push through the oxide into the floating gate in a process known as hot (high energy) electron injection.

When the programming signal is removed, a negative charge remains on the floating gate.

This charge is very stable and will not dissipate for more than a decade under normal operating conditions.

The stored charge on the floating gate inhibits the normal operation of the control gate, and thus distinguishes those cells that have been programmed from those which have not.

E2PROM

An E2PROM cell is approximately 2.5 times larger than an EPROM cell because it contains two transistors.

One of the transistors is similar to that of an EPROM transistor in that it contains a floating gate, but the insulating oxide layers surrounding the floating gate are very much thinner.

The second transistor can be used to erase the cell electrically, and E2PROM devices can typically be erased and reprogrammed on a word-by-word basis.

E2PROM

FLASH

The name FLASH was originally coined to reflect the technologys rapid erasure times compared to EPROM

These devices can be electrically erased, but only by erasing the whole device or a large portion of it.

architectures have a two-transistor cell which is very similar to that of an E2PROM cell allowing them to be erased and reprogrammed on a word-by-word basis.

SRAM It consists of two cross-coupled inverters and two access transistors The SRAM cell drives the gates of other transistors on the chip - either ON

to make connection or OFF to break the connection. The access transistors are connected to the at their respective

gate terminals, and the DATA at their source/drain terminals. The is used to select the cell while the DATA are used to

perform read or write operations on the cell. Internally, the cell holds the stored value on one side and its complement on

the other side. To store data, is set to to 1 (5v), the NMOS now passes the

data from the left hand side to the right hand side of the transistor. After the data stabilizes around the two NOT gates, is set to 0, and the data remains running forever.

Note that the lower NOT is labeled WEAK, meaning it has weaker transistors. That is in case we want to set a new data and we want the STRONG NOT to override the WEAK one in case the logical level has to change

R E A D / W R IT E

R E A D / W R IT E

R E A D / W R IT E

R E A D / W R IT E

SRAM Cell

SRAM

SRAM cells are used for the following:

1. They can store a logic value of 0 or 1.

2. They can store a value of an LUT.

3. They configure the interconnection switches of the FPGA

FPGACMOSNoYes

(in-circuit)FLASH

FPGACMOSYesYes

(in-circuit)SRAM

SPLD & CPLD

EECMOSNoYes

(in-circuit)E2PROM

SPLD & CPLD

UVCMOSNoYes

(out of circuit)EPROM

FPGACMOSNoNoAnti-fuse

SPLDBipolarNoNoFusible link

Associated with

TechnologyVolatileRe-programmableSymbolType

Comparison between programming technologies

Largest Area element using 5 to 6 transistors plus switch = 30u per node @ 0.25u

switch is medium impedance -3k/ohms per square (500uA/micron)

high capacitance -1.6 fA per micron/ per node @ 0.25u

volatile requires external memory to load designs easily copied dead until loaded soft ware is difficult

Base logic process - so it uses leading edge processing

Re-programmable 100% testable no programmer No socket

SRAM

requires high voltage - 1 generation below SRAM

requires programmer requires socket high impedance 80uA/ minimum gate

(12K ohm) impact ionization limits voltage across

the device

Mainstream Technology Reprogrammable 100% testable non-volatile software is simple

EPROM

LimitationsAdvantagesTechnology

LimitationsAdvantagesTechnology

Requires high voltagesAbout the same speed as SRAMRadiation Hardness is expected to behave similar to EPROM - has not been tested yet

Re-programmable in the boardNo socketNon-volatileOne transistor instead of 6 for routing control - i.e. denser partsPasses full Vcc without pumpLive at power up.Difficult to reverse engineer

FLASH

Requires programmer Requires a socket - aproblem for devices with > 200 pinssolved with BGAThose who design by test will throw out a lot of parts. Requires one to two transistors per wire for programming ~ 10mA for Metal antifusesONO antifuses require less only 5mA needed so can be programmed from the edgeSome antifuse defects not testable until programming - hence only 98% to 99 % programming yield - but 100% functional

Highest density - a mere cross point - 10X the density of SRAMLowest switch resistance - 25 OhmsVery low capacitance 1 fF per node.-approaching the metal line capacitancenon- volatileNearly impossible to reverse engineerRadiation hardLive with in 1 millisecond of the power supply reaching spec voltageSoftware is easy to place and route

Antifuse

CPLD

R.B.Ghongade

Key Terms

CPLD: Complex programmable logic device. A programmable logic device consisting of several interconnected programmable blocks.

Logic Array Block (LAB): A group of macrocells that share common resources in a CPLD.

Programmable Interconnect Array (PIA): An internal bus with programmable connections that link together the Logic Array Blocks of a CPLD.

Buried logic: Logic circuitry in a PLD that has no connection to the input or output pins of the PLD, but is used solely as internal logic.

I/O Control Block: A circuit in a CPLD that controls the type of tri-state switching used in a macrocell output.

Key Terms Parallel logic expanders: Product terms that are

borrowed from neighbouring macrocells in the same LAB.

Shared logic expanders: Product terms that are inverted and fed back into the programmable AND matrix of an LAB for use by any other macrocell in the LAB.

Specifications: There are several performance specifications for complex programmable logic devices Internal frequency is the speed at which CPLDs can perform

operations or transfer data internally. The propagation delay is the time interval between the

application of an input signal and the occurrence of the corresponding output in a logic circuit.

Speed grade indicates the delay in nanoseconds (ns) through a macrocell in the CPLD. For example, a CPLD with a speed grade of 10 has a delay of 10 ns through a macrocell. CPLD with low speed grade numbers run faster than devices with high-speed grade numbers

CPLD The term complex PLD (CPLD) is generally taken to refer to a class

of devices that contain a number of simple PLA or PAL functions (generically referred

to as simple PLDs (SPLDs) share a common programmable interconnection matrix.

Thus CPLDs consist of multiple SPLD-like blocks on a single chip. However, CPLD products are much more sophisticated than SPLDs,

even at the level of their basic SPLD-like blocks. While each manufacturer has a different variation, in general they

are all similar in that they consist of function blocks, input/output block, and an interconnect matrix.

The devices are programmed using programmable elements that, depending on the technology of the manufacturer, can be EPROM cells EEPROM cells Flash EPROM cells

Generic building blocks

PLD blocks (also called Function Blocks)

Interconnection matrix

I/O blocks

Altera MAX7000S Complex PLD

Some tricks!

Using XOR gate as programmable NOT gate

LOGIC CIRCUIT

1

10

LOGIC CIRCUIT

0

11

Some tricks!

Using MUX as programmable switch

4:1 MUX

ProgrammableCells

Packages

PQFP: Plastic Quad Flat Package

PLCC: Plastic Leaded Chip Carrier

TQFP: Thin Quad Flat Pack

PGA: Pin Grid Array

Device number

84-pin PLCCpackage

In-system programmable

Number of macrocells

MAX7000family

LC84S128EPM7

E P M 7 128 S LC84

MAX 7000 family

Features Advanced CMOS technology

EEPROM-based

provides 600 to 5,000 usable gates

In System Programmable

pin-to-pin delays as low as 5 ns

counter speeds of up to 175.4 MHz

Architecture

The MAX 7000 architecture includes the following elements: Logic array blocks (LAB)

Macrocells

Expander product terms (shareable and parallel)

Programmable interconnect array

I/O control blocks

CLOCK & RESET pins

The MAX7000S family has four pins that can be configured as control signals or inputs.

GCLK1 is a global clock that is common to all macrocells in the device and can be used to synchronously clock all registers.

OE1 is an output enable that can globally activate or disable the tristate outputs of the device macrocells.

GCLRn is an active- LOW global clear function. The fourth control pin can be configured as an input, as

can the other three pins, or as a second global clock (GCLK2) or output enable (OE2).

If the control functions are not used, these pins add four inputs to the available total.

ArchitectureGlobal Clock Active- LOW Global Clear

Logic Array Block

LABs consist of 16-macrocell arrays Multiple LABs are linked together via the

programmable interconnect array (PIA), a global bus that is fed by all dedicated inputs, I/O pins, and macrocells

Each LAB in a MAX7000S device has from 6 to 16 I/O pins

For EPM7128SLC84 there are only 60 I/Os available

Macrocell

Macrocell

The macrocell is similar to that of a GAL or Universal PAL in that it provides a sum-of-products function with active- HIGH or -LOW options and the choice of registered or combinational output.

Registered outputs can be clocked with one of two global clocks or by a product term from the AND matrix.

The register can be cleared globally or by a product term and preset with a product term.

The macrocell has five dedicated product terms, which is fewer than found in the PAL and GAL.

This is generally sufficient to implement most logic functions. If more terms are required, they can be supplied by a set of shared logic expanders or parallel logic expanders.

Shareable Expanders

Shareable Expanders

Shared logic expanders do not add more product terms to a given macrocell.

They do make the programming of the entire LAB more efficient by allowing a product term to be programmed once and used in several macrocells of the same LAB.

One product term per macrocell is inverted and fed back into the shared expander pool of product terms. Since there are 16 macrocells per LAB, the shared logic expander pool has up to 16 product terms

Parallel Expanders

Parallel Expanders

Parallel logic expanders allow a macrocell to borrow up to 15 product terms from its three lower-numbered neighbours (5 product terms per neighboring macrocell). For example, macrocell 4 can borrow up to 5 terms each from macrocells 3, 2, and 1.

By using its 5 dedicated product terms and the maximum number of parallel expanders, a macrocell can have up to 20 product terms at its disposal. These borrowed terms are not usable by the macrocell from which they were borrowed.

The parallel expanders are set up so that a lower-number cell lends product terms to a higher-number cell, so the number of available terms depends on how close to the end of a chain a macrocell is.

Programmable Interconnect Array

PIA

Logic is routed between LABs via the programmable interconnect array (PIA).

This global bus is a programmable path that connects any signal source to any destination on the device.

All MAX 7000 dedicated inputs, I/O pins, and macrocell outputs feed the PIA, which makes the signals available throughout the entire device.

Only the signals required by each LAB are actually routed from the PIA into the LAB.

An EEPROM cell controls one input to a 2-input AND gate, which selects a PIA signal to drive into the LAB.

While the routing delays of channel-based routing schemes in masked or FPGAs are cumulative, variable, and path-dependent, the MAX 7000 PIA has a fixed delay.

The PIA thus eliminates skew between signals and makes timing performance easy to predict.

I/O Block

I/O Block

The I/O control block allows each I/O pin to be individually configured for input, output, or bidirectional operation.

All I/O pins have a tri-state buffer that is individually controlled by one of the global output enable signals or directly connected to ground or VCC.

The I/O control block of EPM7032, EPM7064, and EPM7096 devices has two global output enable signals that are driven by two dedicated active-low output enable pins (OE1 and OE2).

The I/O control block of MAX 7000E and MAX 7000S devices has six global output enable signals that are driven by the true or complement of two output enable signals, a subset of the I/O pins, or a subset of the I/O macrocells

I/O Control

I/O Block

When the tri-state buffer control is connected to ground, the output is tri-stated (high impedance) and the I/O pin can be used as a dedicated input.

When the tri-state buffer control is connected to VCC, the output is enabled.

The MAX 7000 architecture provides dual I/O feedback, in which macrocell and pin feedbacks are independent.

When an I/O pin is configured as an input, the associated macrocell can be used for buried logic

Output Configuration

MultiVolt I/O Interface MAX 7000 device outputs can be programmed to

meet a variety of system-level requirements. MultiVolt I/O Interface MAX 7000 devicesexcept 44-

pin devicessupport the MultiVolt I/O interface feature, which allows MAX 7000 devices to interface with systems that have differing supply voltages.

The 5.0-V devices in all packages can be set for 3.3-V or 5.0-V I/O pin operation.

These devices have one set of VCC pins for internal operation and input buffers (VCCINT), and another set for I/O output drivers (VCCIO).


Open-Drain Output Option (MAX 7000S Devices Only) This open-drain output enables the device to

provide system-level control signals (e.g., interrupt and write enable signals) that can be asserted by any of several devices.

It can also provide an additional wired-OR plane


Slew-Rate Control The output buffer for each MAX 7000E and MAX

7000S I/O pin has an adjustable output slew rate that can be configured for low-noise or high-speed performance.

A faster slew rate provides high-speed transitions for high-performance systems

However, these fast transitions may introduce noise transients into the system.

A slow slew rate reduces system noise, but adds a nominal delay of 4 to 5 ns.

Xilinx XC95XX/XC95XXX Complex PLD

PLD like blocks called as FUNCTION BLOCKS

Available packages Xilinx CPLD

192 166 ----352-Pin BGA

168 166 ----208-Pin HQFP

-133 133 108 --160-Pin PQFP

--81 81 72 -100-Pin PQFP

--81 81 72 -100-Pin TQFP

---69 69 -84-Pin PLCC

-----34 48-Pin CSP

----34 34 44-Pin PLCC

-----34 44-Pin VQFP

XC95288 XC95216 XC95144 XC95108 XC9572 XC9536

More packages

VQFP: Very Fine Pitch Quad Flat Pack/ Very Thin Quad Flat Package

CSP: Chip Scale Package

HQFP: Heat-sinked Quad Flat Pack

BGA: Ball Grid Array

Device marking

Features

High-performance: 5 ns pin-to-pin logic delays on all pins, fCNT to 125 MHz

Large density range: 36 to 288 macrocells with 800 to 6,400 usable gates

5V in-system programmable: Endurance of 10,000 program/erase cycles

Enhanced pin-locking architecture Flexible 36V18 Function Block: 90 product terms drive any or all of

18 macrocells within Function Block, global and product term clocks, output enables, set and reset signals, extensive IEEE Std 1149.1boundary-scan (JTAG) support ,slew rate control on individual outputs, user programmable ground pin capability, extended pattern security features for design protection, High-drive 24 mA outputs, 3.3V or 5V I/O capability

Advanced CMOS 5V FLASH technology Supports parallel programming of multiple XC9500 devices

XC9500 Architecture

CLOCK ,RESET, TRI-STATE pins

The pins labeled GCK (three), GSR (one), GTS (two or four) can be used for special purposes

GCK: global clock

GSR: global set/reset

GTS: global three-state controls

Function Blocks

Function Blocks

The AND plane still exists as shown by the crossing wires.

The AND plane can accept inputs from the I/O blocks, other function blocks, or feedback from the same function block.

The terms are then ORed together using a fixed number of OR gates, and terms are selected via a large multiplexer.

The outputs of the mux can then be sent straight out of the block, or through a clocked flip-flop.

This particular block includes additional logic such as a selectable exclusive OR and a master reset signal, in addition to being able to program the polarity at different stages

Function Blocks Each Function Block is comprised of 18 independent

macrocells, each capable of implementing a combinatorial or registered function.

The FB also receives global clock, output enable, and set/reset signals.

The FB generates 18 outputs that drive the Fast CONNECT switch matrix.

These 18 outputs and their corresponding output enable signals also drive the IOB.

Logic within the FB is implemented using a sum-of-products representation.

Thirty-six inputs provide 72 true and complement signals into the programmable AND-array to form 90 product terms.

Any number of these product terms, up to the 90 available, can be allocated to each macrocell by the product term allocator.

XC9500 macrocell

Up to 5 product terms

Programmableinversion or XORproduct term

Global clock or product-term clock

Set control

Reset control

OE control

Macrocell Clock and Set/Reset Capability

Product term allocator

Switch matrix

Xilinx CoolRunner-II CPLD FamilyFeatures Optimized for 1.8V systems : Low power CPLD, Densities from 32 to 512 macrocells 0.18 micron CMOS CPLD : Optimized architecture for effective logic synthesis, multi-

voltage ,I/O operation ( 1.5V to 3.3V) Advanced system features: Fast in system programming, On-The-Fly Reconfiguration

(OTF),boundary scan test, multiple I/O banks on all devices, low- power management External signal control, flexible clocking modes Clock divider ( 2,4,6,8,10,12,14,16) Global signal options with macrocell control, multiple global clocks with phase selection

per macrocell Multiple global output enables Global set/reset: Abundant product term clocks, output enables and set/resets, efficient

control term clocks, output enables and set/resets for each macrocell and shared across function blocks

Advanced design security Open-drain output option for Wired-OR and LED drive Optional bus-hold, 3-state or weak pullup on select I/O pins: Optional configurable

grounds on unused I/Os, mixed I/O voltages compatible with 1.5V, 1.8V, 2.5V, and 3.3V logic levels on all parts

Wide package availability including fine pitch:Chip Scale Package (CSP) BGA, Fine Line BGA, TQFP, PQFP, VQFP, PLCC, and QFN packages

Guaranteed 1,000 program/erase cycles, Guaranteed 20 year data retention

CoolRunner-II CPLD Architecture

Coolrunner-II family Function Block

Macrocell

New control signals

Control Terms (CT) are available to be shared for key functions within the FB, and are generally used whenever the exact same logic function would be repeatedly created at multiple macrocells.

The CT product terms are available for FBclocking (CTC), FB asynchronous set (CTS), FB asynchronous reset (CTR), and FB output enable (CTE).

Advanced Interconnect Matrix

The Advanced Interconnect Matrix is a highly connected low power rapid switch.

The AIM is directed by the software to deliver up to a set of 40 signals to each FB for the creation of logic.

Results from all FB macrocells, as well as, all pin inputs circulate back through the AIM for additional connection available to all other FBsas dictated by the design software.

The AIM minimizes both propagation delay and power as it makes attachments to the various FBs

I/O blocks

Output Banking:The output pins are grouped in large banks which allow easy interfacing to 3.3V, 2.5V, 1.8V, and 1.5V in a single part. Thus these CPLDs can be widely used as voltage interface translators

DataGate

DataGate

Is used for power reduction. Each I/O pin has a series switch that can block the arrival of free

running signals that are not of interest. Signals that serve no use may increase power consumption, and

can be disabled. Users are free to do their design, then choose sections to participate

in the DataGATE function. DataGATE is a logic function that drives an assertion rail threaded

through the medium and high-density CoolRunner-II CPLD parts. Designers can select inputs to be blocked under the control of the

DataGATE function, effectively blocking controlled switching signals so they do not drive internal chip capacitances.

Output signals that do not switch, are held by the bus hold feature. Any set of input pins can be chosen to participate in the DataGATE

function.

Choice of CPLD When considering a CPLD for use in a design, the following issues should

be taken into account :1. The programming technology

EPROM, EEPROM, or Flash EPROM? This will determine the equipment needed to program the devices and whether they can be programmed only once or many times.

2. The function block capabilityHow many function blocks are there in the device?How many product and sum terms can be used?What are the minimum and maximum delays through the logic?What additional logic resources are there such as XNORs, ALUs, etc.?What kind of register controls are available (e.g., clock enable, reset, preset, polarity control)?How many are local inputs to the function block and how many are global, chipwide inputs?What kind of clock drivers are in the device and what is the worst case skew of the clock signal on the chip. This will help determine the maximum frequency at which the device can run.

3. The I/O capabilityHow many I/O are independent, used for any function, and how many are dedicated for clock input, master reset, etc.?What is the output drive capability in terms of voltage levels and current?What kind of logic is included in an I/O block that can be used to increase the functionality of the design?

FPGA

R.B.Ghongade

Key terms

Look-up table (LUT): A circuit that implements a combinational logic function by storing a list of output values that correspond to all possible input combinations.

CLB: Configurable Logic Block is the name for programmable logic block in a FPGA.

Logic element (LE): A circuit internal to a FPGA used to implement a logic function as a look-up table.

Cascade chain: A circuit in a FPGA that allows the input width of a Boolean function to expand beyond the width of one logic element.

Carry chain: A circuit in a FPGA that is optimized for efficient operation of carry functions between logic elements.

DCM: Digital clock manager is a very important circuit that offers various clock management functions in a FPGA.

Clock trees: Distribution of clock signal lines along the FPGA architecture.

Field Programmable Gate Arrays

Structure much like a gate array ASIC Visualized as islands of programmable

logic in a sea of programmable interconnect.

More closer to programmable ASICs Can be scaled to large sizes Large emphasis is laid on interconnection

routing Timing performance is difficult to predict

Generic FPGA architecture

Contain the following blocks: Programmable logic block I/O blocks Programmable interconnect

In addition the FPGA has: Clock distribution circuit Embedded memory blocks Special purpose blocks:

DSP blocks: Hardware multipliers, adders and registers

Embedded microprocessors/microcontrollers High-speed serial transceivers

FPGA architectureProgrammable logic block

Programmable interconnect

Many times the FPGA is described in terms of the fabricwhich means the underlying structure of the device

Programming

FPGAs can use any one of the following programming technologies: SRAM

Antifuse

FLASH

Hybrid FLASH-SRAM

FPGA fabric

Programmable Logic Block






Programmable interconnects

Types of architectures

Fine grained Each programmable logic block can be used to

implement only a very simple function. For example, it might be possible to configure the block to act as any 3-input function, such as a primitive logic gate (AND,OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.).

fine-grained architectures are said to be particularly efficient when executing systolic algorithms (functions that benefit from massively parallel implementations).

Fine-grained implementations require a relatively large number of connections into and out of each block compared to the amount of functionality that can be supported by those blocks


Coarse grained In the case of a coarse-grained architecture, each

logic block contains a relatively large amount of logic compared to their fine-grained counterparts. For example, a logic block might contain four 4-input LUTs, four multiplexers, four D-type flip-flops, and some fast carry logic.

As the granularity of the blocks increases to medium-grained and higher, the amount of connections into the blocks decreases compared to the amount of functionality they can support.

Logic realization techniques

There are two fundamental methods employed by vendors for the programmable logic blocks used to form the medium-grained architectures referenced in the previous section: MUX (multiplexer) based

LUT (lookup table) based

MUX-based

This is based on the Shannons decomposition theorem which states that:

Let f(x) be a switching function on n variables. Then f(a) can be factored as

OR

1 2( ) i if a a f a f

1 2 1 2 1 2( , ,..., ) (0, ,... ) (1, ,... )n n nf a a a a f a a a f a a

Example (MUX implementation)

Consider a 3-input function

y a b c

1111

1011

1101

0001

1110

0010

1100

0000

ycba

111

101

110

000

111

001

110

000

ycb

2y b c

1y c

( ) ( )y a c a b c Using Shannons decomposition theorem we can write y as

Example

111

101

110

000

ycb

2 3 4

2 1

y y y

y b c b

MUX implementation

Another possible implementation

LUT-based

An n-input LUT is that it can implement any possible n-input combinational.

The underlying concept behind a LUT is relatively simple.

A group of input signals is used as an index (pointer) to a lookup table.

The contents of this table are arranged such that the cell pointed to by each input combination contains the desired value

LUT implementation

Using pass transistors Using transmission gates

# of LUTs?

It has been statistically concluded that a 4-input LUT is best for FPGA devices.

One additional advantage of LUT based programmable block is that the SRAM the cells forming the LUT can be used as a small block of RAM (the 16 cells forming a 4-input LUT, for example, could be used as a 16 X 1 RAM). This is referred to as distributed RAM.

Also all the SRAM cells are effectively connected in a chain. This is so as to facilitate the programming. But this offers a new possibility of using this chain as a shift register.

Because of all these advantages , majority of todays FPGA architectures are LUT based

Major FPGA Vendors

Lattice SemiconductorQuick Logic Corp

Atmel

Altera Corp.Actel Corp.

Xilinx, Inc.

Flash & antifuse FPGAsSRAM-based FPGAs

Xilinx FPGA Devices Old families

XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. (Not recommended

for modern designs) Low Cost Family

Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3

High-performance families Virtex (0.22m) Virtex-E, Virtex-EM (0.18m) Virtex-II, Virtex-II PRO (0.13m) Virtex-4 (0.09m) Virtex-5 (0.065m)

FXTSXTLXTLXEmbedded/SerialDSP/SerialLogic/SerialLogic

Virtex 5 flavours

Xilinx devices

1985

Xil

inx

Dev

ice

Com

ple

xity

XC200050 MHz1K gates

XC4000100 MHz

250K gates

Virtex200 MHz1M gates

Virtex-II 450 MHz8M gates

Spartan80 MHz

40K gates

Spartan-II200 MHz

200K gates

Spartan-3326 MHz5M gates

19911987

XC300085 MHz

7.5K gates

Virtex-E240 MHz4M gates

XC520050 MHz

23K gates

1995 1998 1999 2000 2002 2003

Virtex-II Pro450 MHz8M gates*

2004 2006

Virtex-4500 MHz

16M gates*

Virtex-5550 MHz

24M gates*

0.35m

0.3m

0.25m

0.22m

0.18m

0.13m

0.13m 90nm 65nm

Xilinx FPGA devices

All Xilinx FPGAs contain the same basic resources: Logic cells (LCs) grouped into Slices which are grouped into

Configurable Logic Blocks (CLBs) Contain combinatorial logic and register resources I/O Blocks Interface between the FPGA and the outside world Programmable interconnect Other resources Memory Multipliers Global clock buffers Boundary scan logic

Xilinx logic cell (LC)

MUX

0

1

FLIP-FLOP

16-bit Shift Register

16 X 1 RAM

4-input LUT

y

q

abcd

e

clock

clock enable

set/reset

The core building block in a modern FPGA from Xilinx is called a logic cell

Logic Cell

The register can be configured to act as a flip-flop, or as a latch.

The polarity of the clock (rising- edge triggered or falling-edge triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low).

In addition to the LUT, MUX, and register, the LC also contains other elements, including some special fast carry logic for use in arithmetic operations.

The Slice

A slice contains two LCs Each logic cells LUT, MUX, and

register have their own data inputs and outputs; the slice has one set of clock, clock enable, and set/reset signals common to both logic cells.

Configurable Logic Block (CLB)

Xilinx FPGAs can have two or four slices in each CLB

There is also some fast programmable interconnect within the CLB. This interconnect is used to connect neighboring slices.

Why the hierarchy?

The reason for having this type of logic-block hierarchyLC Slice (with two LCs) CLB (with four slices)is that it is complemented by an equivalent hierarchy in the interconnect.

Thus, there is fast interconnect between the LCsin a slice, then slightly slower interconnect between slices in a CLB, followed by the interconnect between CLBs.

This is to achieve the optimum trade-off between making it easy to connect things together without incurring excessive interconnect-related delays.

Fast carry chains

A key feature of modern FPGAs is that they include the special logic and interconnect required to implement fast carry chains.

Each LC contains special carry logic. This is complemented by dedicated interconnect

between the two LCs in each slice, between the slices in each CLB, and between the CLBs themselves.

This special carry logic and dedicated routing boosts the performance of logical functions such as counters and arithmetic functions such as adders.

The availability of these fast carry chainsin conjunction with features like the shift register use of LUTs and embedded multipliers are useful when the FPGAs are to be used for applications like DSP

Embedded RAM

Embedded RAM

A lot of applications require the use of memory, so FPGAs may include relatively large chunks of embedded RAM called block RAM.

Depending on the architecture of the component, these blocks might be positioned around the periphery of the device, scattered across the face of the chip in relative isolation, or organized in columns.

Each block of RAM can be used independently, or multiple blocks can be combined together to implement larger blocks.

These blocks can be used for a variety of purposes, such as implementing standard single- or dual-port RAMs, first-in first-out (FIFO) functions and state machines

Embedded multipliers, adders, MACs

MAC


Some functions, like multipliers, are inherently slow if they are implemented by connecting a large number of programmable logic blocks together.

Since these functions are required by a lot of applications, many FPGAs incorporate special hardwired multiplier blocks.

These are typically located in close proximity to the embedded RAM blocks because these functions are often used in conjunction with each other

Similarly, some FPGAs offer dedicated adder blocks. One operation that is very common in DSP-type

applications is called a multiply-and-accumulate (MAC). As its name would suggest, this function multiplies two

numbers together and adds the result to a running total stored in an accumulator

Embedded processor cores

Some functions such as reading switch positions and flashing light-emitting diodes (LEDs) require low speed counters.

Slowing the hardware down to implement this sort of function (using huge counters to generate delays, for example) is often impracticable. Thus, its often better to implement these tasks with microprocessors.

High-end FPGAs contain one or more embedded microprocessors, which are typically referred to as microprocessor cores.

In this case, it often makes sense to move all of the tasks that used to be performed by the external microprocessor into the internalcore.

This provides a number of advantages, saves the cost of having two devices; eliminates large numbers of tracks, pads, and pins on the circuit board makes the board smaller and lighter

Types of microprocessor cores

There are two types of microprocessor cores : Hard microprocessor core: Implemented as a

dedicated, predefined block.

Soft microprocessor core: It is possible to configure a group of programmable logic blocks to act as a microprocessor. These are typically called soft cores, but they may be more precisely categorized as either soft or firm depending on the way in which the microprocessors functionality is mapped onto the logic blocks

Clock trees

All of the synchronous elements inside an FPGAfor example, the registers configured to act as flip-flops inside the programmable logic blocksneed to be driven by a clock signal.

Such a clock signal typically originates in the outside world, comes into the FPGA via a special clock input pin, and is then routed through the device and connected to the appropriate registers.

Clock trees

Clock managers

Some FPGA clock managers are based on phase-lockedloops (PLLs), while others are based on digital delay-locked loops

Clock manager functions

Jitter removal

Jitter removal

Skew correction

Digital frequency synthesis& Phase shifting

General-purpose I/O

I/O

Each bank can be configured individually to support a particular I/O standard.

Allows the FPGA to work with devices using multiple I/O standards,

FPGA can actually be used to interface between different I/O standards (and also to translate between different protocols that may be based on particular electrical standards).

Configurable I/O impedances

Modern FPGA output signals with fast edge rates require termination to prevent reflections and maintain signal integrity.

High pin count packages (especially ball grid arrays) cannot accommodate external termination resistors.

Thus the Digitally Controlled Impedance (DCI) circuit is employed DCI eliminates the need for external resistors, and improves signal

integrity. The DCI feature can be used on any IOB by selecting one of the

DCI I/O standards. When applied to inputs, DCI provides input parallel termination. When applied to outputs, DCI provides controlled impedance drivers

(series termination) or output parallel termination. DCI operates independently on each I/O bank.

Core versus I/O supply voltages

Over time, the geometries of the structures on silicon chips became smaller because smaller transistors have lower costs, higher speed, and lower power consumption. However, these processes demanded lower supply voltages, which have continued to fall over the years

This supply (which is actually provided using large numbers of power and ground pins) is used to power the FPGAs internal logic.

For this reason, this is known as the core voltage. However, different I/O standards may use signals with

voltage levels significantly different from the core voltage, so each bank of general-purpose I/Os can have its own additional supply pins.

Core voltages

Gigabit transceivers

The traditional way to move large amounts of data between devices is to use a bus, a collection of signals that carry similar data and perform a common function

Buses grew to 16 bits in width, then 32 bits, then 64 bits, and so forth.

The problem is that this requires a lot of pins on the device and a lot of tracks connecting the devices together. Routing these tracks so that they all have the same length and impedance becomes increasingly difficult as boards grow in complexity.

Furthermore, it becomes increasingly difficult to manage signal integrity issues (such as susceptibility to noise) when we are dealing with large numbers of bus-based tracks.

Todays high-end FPGAs include special hard-wired gigabit transceiver blocks.

These blocks use one pair of differential signals (which means a pair of signals that always carry opposite logical values) to transmit (TX) data and another pair to receive (RX) data

Interconnect and routing

A programmable switch matrix forms the heart of interconnect in a FPGA.

PSM PSM

CLB

PSM PSM

CLB CLB

CLBCLB CLB

CLBCLB CLB

ProgrammableSwitchMatrix

PSM PSM

CLB

PSM PSM

CLB CLB

CLBCLB CLB

CLBCLB CLB

PSM PSM

CLB

PSM PSM

CLB CLB

CLBCLB CLB

CLBCLB CLB


The Switch

The actual switching matrix employs a structure of six pass transistors per cross point. Thus connectivity can be established by controlling the transistors

Various types of connections

Various types of connections Single lines : used to connect a CLB to another CLB that

is one hop away. These wires have to go through a programmable switch hence adds delay.

Double lines: These wires travel past two CLBs before hitting the switch, hence they provide shorter delays for longer connections.

Long lines: Wires in Long groups do not go through any programmable switch at all; instead they travel all the way across or down a row or column and are driven by three-state drivers near the CLB.

Direct connect lines: These are the CLB outputs that are directly connected to CLBs immediately below and to the right of it.

Global clock lines: These lines are optimized for use as clock inputs to the CLB, providing short delay and minimal skew.

FPGA

R.B.Ghongade

Key terms

Look-up table (LUT): A circuit that implements a combinational logic function by storing a list of output values that correspond to all possible input combinations.

CLB: Configurable Logic Block is the name for programmable logic block in a FPGA.

Logic element (LE): A circuit internal to a FPGA used to implement a logic function as a look-up table.

Cascade chain: A circuit in a FPGA that allows the input width of a Boolean function to expand beyond the width of one logic element.

Carry chain: A circuit in a FPGA that is optimized for efficient operation of carry functions between logic elements.

DCM: Digital clock manager is a very important circuit that offers various clock management functions in a FPGA.

Clock trees: Distribution of clock signal lines along the FPGA architecture.

Field Programmable Gate Arrays

Structure much like a gate array ASIC Visualized as islands of programmable

logic in a sea of programmable interconnect.

More closer to programmable ASICs Can be scaled to large sizes Large emphasis is laid on interconnection

routing Timing performance is difficult to predict

Generic FPGA architecture

Contain the following blocks: Programmable logic block I/O blocks Programmable interconnect

In addition the FPGA has: Clock distribution circuit Embedded memory blocks Special purpose blocks:

DSP blocks: Hardware multipliers, adders and registers

Embedded microprocessors/microcontrollers High-speed serial transceivers

FPGA architectureProgrammable logic block


Many times the FPGA is described in terms of the fabricwhich means the underlying structure of the device

Programming

FPGAs can use any one of the following programming technologies: SRAM

Antifuse

FLASH

Hybrid FLASH-SRAM

FPGA fabric







Programmable interconnects


Fine grained Each programmable logic block can be used to

implement only a very simple function. For example, it might be possible to configure the block to act as any 3-input function, such as a primitive logic gate (AND,OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.).

fine-grained architectures are said to be particularly efficient when executing systolic algorithms (functions that benefit from massively parallel implementations).

Fine-grained implementations require a relatively large number of connections into and out of each block compared to the amount of functionality that can be supported by those blocks