Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić [email protected] prof. dr Veljko Milutinović...

103
Ognjen Šćekić Ognjen Šćekić 1 /103 /103 Altera vs. Altera vs. Xilinx Xilinx Ognjen Šć Ognjen Šć eki ekić [email protected] [email protected] u u prof. dr Veljko Milutinovi prof. dr Veljko Milutinović vm vm @ @ etf etf . . bg.ac. bg.ac. yu yu

Transcript of Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić [email protected] prof. dr Veljko Milutinović...

Page 1: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 11/103/103

Altera vs. Altera vs. XilinxXilinx

Ognjen ŠćOgnjen Šćekiekićć

[email protected]@cg.yu

prof. dr Veljko Milutinoviprof. dr Veljko Milutinovićć

vmvm@@etfetf..bg.ac.bg.ac.yuyu

Page 2: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 22/103/103

IntroductionIntroduction

Page 3: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 33/103/103

FPGA vs. ASICFPGA vs. ASIC

ASICASIC == AApplication pplication SSpecific pecific IIntegrated ntegrated CCircuitsircuits tailor-made on demand for specific applicationstailor-made on demand for specific applications

FPGAFPGA = = FField ield PProgrammable rogrammable GGate ate AArrayrray flexibility of software + speed of hardwareflexibility of software + speed of hardware

Page 4: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 44/103/103

• Key players: Key players: Xilinx, Altera, Lattice, ActelXilinx, Altera, Lattice, Actel• PLD market estimated at PLD market estimated at $57 billion $57 billion and rapidly growingand rapidly growing• The goal is to The goal is to expand the market:expand the market:

– by lowering per-unit cost to attack the low-end market– by increasing speed capabilities to attack the high-end market

Market OverviewMarket Overview

Figure 1Figure 1 - PLD market share - PLD market share

Page 5: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 55/103/103

About XilinxAbout Xilinx• Pronounced "zylinks"Pronounced "zylinks"

• Founded in 1984Founded in 1984

• Employs around 2,600 people. Employs around 2,600 people.

• Claims more than Claims more than half the world demandhalf the world demand for FPGAs. for FPGAs.

• Partners with leading semiconductor manufacturers Partners with leading semiconductor manufacturers such as such as IBMIBM Microelectronics, UMC and Seiko. Microelectronics, UMC and Seiko.

• Xilinx is the net Xilinx is the net market leadermarket leader at the moment at the moment

Page 6: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 66/103/103

About AlteraAbout Altera

• Founded in 1983. Founded in 1983.

• Introduced look-up table based architecture in 1992Introduced look-up table based architecture in 1992

• SecondSecond greatest FPGA manufacturer greatest FPGA manufacturer

• Strategic partner is Strategic partner is TSMCTSMC

Page 7: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 77/103/103

Recent FPGA Design Recent FPGA Design TimelineTimeline

• Virtex and Stratix families are direct opponents, as are Spartan and CycloneVirtex and Stratix families are direct opponents, as are Spartan and Cyclone

Page 8: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 88/103/103

Key Factors For ComparingFPGAs

• • Fabrication processFabrication process

• • Logic densityLogic density

• • Clock managementClock management

• • On-chip memoryOn-chip memory

• • DSP capabilitiesDSP capabilities

• • I/O compatibilityI/O compatibility

• • Software support & other design servicesSoftware support & other design services

Page 9: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 99/103/103

Fabrication Process• More advanced fabrication process brings higher integration More advanced fabrication process brings higher integration

and thus higher density and/or reduced size of chip.and thus higher density and/or reduced size of chip.

• Currently the most advanced is Currently the most advanced is 90nm90nm process (previously process (previously 0.130.13μμm)m)

• first used in Spartan-3, and later in Virtex-4 FPGA first used in Spartan-3, and later in Virtex-4 FPGA familyfamily• gave Xilinx one year lead over Alteragave Xilinx one year lead over Altera• Altera introduced it in 2004 with Cyclone II and Altera introduced it in 2004 with Cyclone II and Stratix IIStratix II

Figure 2Figure 2 - Cyclone II 90nm structure - Cyclone II 90nm structure

Page 10: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1010/103/103

Logic DensityLogic Density

• We need a unit to express the logic capability of FPGAWe need a unit to express the logic capability of FPGA• Is it possible to define such unit precisely?Is it possible to define such unit precisely?

• Traditionally:Traditionally:

Xilinx:Xilinx: LCLC – – LLogic ogic CCell ell

Altera:Altera: LELE – – LLogic ogic EElementlement

1 LC = 4-input LUT + D-FF + arithmetic/logic/register circuitry 1 LC = 4-input LUT + D-FF + arithmetic/logic/register circuitry

1 LC = 1 LE1 LC = 1 LE

Page 11: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1111/103/103

Logic Density (2)Logic Density (2)• Improved functionality of "new" architectures introduced new terms:Improved functionality of "new" architectures introduced new terms:

• ALMALM – – AAdaptive daptive LLogic ogic MModule odule

for describing Altera's Stratix II family's adaptable structurefor describing Altera's Stratix II family's adaptable structure

• CLBCLB – – CConfigurableonfigurable L Logic ogic BBlocklockfor describing Xilinx's FPGA familiesfor describing Xilinx's FPGA families

• ELCELC – – EEquivalentquivalent L Logic ogic CCellellXilinx's new unit to better express logic densityXilinx's new unit to better express logic density

1 ELC = 1.125 LC1 ELC = 1.125 LC

1 CLB has 8 LCs1 CLB has 8 LCs

Page 12: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1212/103/103

• All parts of a digital circuit need to be synchronized to a desired clock signal.All parts of a digital circuit need to be synchronized to a desired clock signal.

• If a circuit is large, complex, and operating at high frequencies If a circuit is large, complex, and operating at high frequencies the clock propagation delay and clock skew have a great impact on performance.the clock propagation delay and clock skew have a great impact on performance.

• Therefore, providing a clock signal with Therefore, providing a clock signal with zero-delayzero-delay in all parts of an FPGA in all parts of an FPGA becomes crucial.becomes crucial.

• The solution is to divide FPGA into regions that can work at different frequencies,The solution is to divide FPGA into regions that can work at different frequencies, called called clock domainsclock domains..

Clock ManagementClock Management

Clock management comprises two basic functions:Clock management comprises two basic functions:

• • remove clock skew and propagation delayremove clock skew and propagation delay

• • generate new clock signals generate new clock signals with different frequencies and/or phaseswith different frequencies and/or phases

Page 13: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1313/103/103

Removing Clock SkewIt can be done using:It can be done using:

• DLLDLLs – s – DDelay-elay-LLocked ocked LLoops (Xilinx)oops (Xilinx)• PLLPLLs – s – PPhase-hase-LLocked ocked LLoops (Altera)oops (Altera)

Figure 3aFigure 3a - DLL block diagram - DLL block diagram Figure 3bFigure 3b - PLL block diagram - PLL block diagram

They both compensate for the delay They both compensate for the delay generated on the routing network inside the FPGA, generated on the routing network inside the FPGA, providing zero-delay clock signal to different parts of FPGA.providing zero-delay clock signal to different parts of FPGA.

Page 14: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1414/103/103

Delay-Locked LoopDelay-Locked Loop

• Delay-line produces a delayed version of the input clock CLKIN.Delay-line produces a delayed version of the input clock CLKIN.

• Clock distribution network routes the clock to FPGA interior Clock distribution network routes the clock to FPGA interior and to the feedback CLKFB pin.and to the feedback CLKFB pin.

• Control logic sample the input clock and the feedback clock Control logic sample the input clock and the feedback clock in order to adjust the delay line.in order to adjust the delay line.

• Delay-line consists on an array of delay elements,Delay-line consists on an array of delay elements, typically CMOS voltage-controlled inverters connected in series.typically CMOS voltage-controlled inverters connected in series.

DLL works by inserting delay between the input DLL works by inserting delay between the input clock clock and the feedback clock until the two rising edges and the feedback clock until the two rising edges align, align, putting the two clocks in phase. putting the two clocks in phase.

When the two clocks are in phase, the DLL "locks". When the two clocks are in phase, the DLL "locks".

Thus, the DLL output clock compensates for the Thus, the DLL output clock compensates for the delay delay in the clock distribution network.in the clock distribution network.

Page 15: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1515/103/103

Phase-Locked LoopPhase-Locked Loop

• Instead of a delay line, the PLL uses a voltage controlled Instead of a delay line, the PLL uses a voltage controlled oscillator oscillator which generates a clock signal that approximates the input which generates a clock signal that approximates the input clock CLKIN.clock CLKIN.

• Control logic, consisting of a phase detector and filter, Control logic, consisting of a phase detector and filter, adjusts the oscillator frequency and phase adjusts the oscillator frequency and phase to compensate for the clock distribution delay. to compensate for the clock distribution delay.

• When the clocks are aligned the PLL "locks".When the clocks are aligned the PLL "locks".

Page 16: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1616/103/103

PLL vs. DLLPLL vs. DLL

PLL DLLPLL DLL

Advantage:Advantage:does not accumulate phase errordoes not accumulate phase error

Drawback:Drawback:oscillator accumulates phase erroroscillator accumulates phase error

Advantage:Advantage:frequency synthesis is easierfrequency synthesis is easierbecause of oscillatorbecause of oscillator

Drawback:Drawback:frequency synthesis is more difficultfrequency synthesis is more difficult

Altera uses PLLs and Xilinx uses DLLs.Altera uses PLLs and Xilinx uses DLLs.

Page 17: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1717/103/103

Clock Generation & Phase Clock Generation & Phase ShiftingShifting

• Beside clock skew elimination, DLLs (PLLs) are also used for:Beside clock skew elimination, DLLs (PLLs) are also used for:

• Clock managers need to be resistant to temperature/voltage Clock managers need to be resistant to temperature/voltage variavariations.tions.

• frequency multiplication and divisionfrequency multiplication and division• duty-cycle regulationduty-cycle regulation• phase shiftingphase shifting

Clock manipulation dramatically simplifies the Clock manipulation dramatically simplifies the designdesignand improves performance.and improves performance.

At the same time it provides many design At the same time it provides many design alternatives.alternatives.

Page 18: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1818/103/103

Embedded MemoryEmbedded Memory• Using LUTs as registers does not provide enough space or versatility.Using LUTs as registers does not provide enough space or versatility.

• Time-dependent applications, performing many computations, Time-dependent applications, performing many computations,

need an entire built-in memory.need an entire built-in memory.

• The main advantages of embedded (built-in) memory are:The main advantages of embedded (built-in) memory are:

• short access timeshort access time• high bandwidthhigh bandwidth• great versatilitygreat versatility

It can behave like:It can behave like:

• RAMRAM• ROMROM• Buffer (FIFO, LIFO, etc.)Buffer (FIFO, LIFO, etc.)• CacheCache• Shift registersShift registers• etc…etc…

Page 19: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 1919/103/103

DSP CapabilitiesDSP CapabilitiesDSPDSP – – DDigital igital SSignal ignal PProcessingrocessing

• Majority of FPGA applications require some sort of DSP.Majority of FPGA applications require some sort of DSP.

• In order to increase efficiency DSP In order to increase efficiency DSP computations are executed in parallel - computations are executed in parallel - pipeliningpipelining..

• Special DSP units have been developed Special DSP units have been developed to fully exploit FPGA's adaptable structure.to fully exploit FPGA's adaptable structure.

• These units are designed to optimize execution These units are designed to optimize execution of commonly used DSP algorithms: of commonly used DSP algorithms:

filtering, encoding/decoding, equalization, modulation, FFT, filtering, encoding/decoding, equalization, modulation, FFT, etcetc

• They usually contain: They usually contain: multipliers (in parallel), accumulators, addersmultipliers (in parallel), accumulators, adders and shift and shift

registersregisters

Page 20: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2020/103/103

I/O CompatibilityI/O Compatibility

• As FPGAs continue to grow in size and capacity As FPGAs continue to grow in size and capacity more complex systems are designed for them,more complex systems are designed for them,demanding an increased variety of I/O standards .demanding an increased variety of I/O standards .

• Furthermore, as system-clock speeds continue to increase, Furthermore, as system-clock speeds continue to increase,

the need for high-performance I/O becomes more important.the need for high-performance I/O becomes more important.

• Modern bus applications, pioneered by the most influential Modern bus applications, pioneered by the most influential companies, are commonly introduced with a new I/O companies, are commonly introduced with a new I/O standard, standard,

tailored specifically to the needs of that application.tailored specifically to the needs of that application.

The bus I/O standards provide specifications to other The bus I/O standards provide specifications to other vendors vendors who create products designed to interface with these who create products designed to interface with these applications.applications.

Each standard often has its own specifications for:Each standard often has its own specifications for:current, voltage, I/O buffering and termination techniques.current, voltage, I/O buffering and termination techniques.

Page 21: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2121/103/103

• Interfaces are implemented in Interfaces are implemented in I/O blocksI/O blocks. .

• I/O blocks are parts of FPGA architecture positioned peripherally, I/O blocks are parts of FPGA architecture positioned peripherally, connected to I/O pins and to internal interconnects.connected to I/O pins and to internal interconnects.

• I/O blocks are grouped into I/O blocks are grouped into banksbanks – a group of neighboring pins – a group of neighboring pins

which use the same or compatible I/O standard at the same time.which use the same or compatible I/O standard at the same time.

I/O Compatibility (2)I/O Compatibility (2)

Page 22: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2222/103/103

• An I/O block usually contains:An I/O block usually contains:

I/O Compatibility (I/O Compatibility (33))

programmable I/O buffersprogrammable I/O buffersProgrammable so they could adjust to different I/O Programmable so they could adjust to different I/O

standards.standards.

D-FFsD-FFsUsed as optional delay elements or registers.Used as optional delay elements or registers.

pull-up/down resistorspull-up/down resistorsUsed to assert or de-assert pins that would otherwise Used to assert or de-assert pins that would otherwise

float.float.

delay arraydelay arrayProvides a programmable delay of I/O signals.Provides a programmable delay of I/O signals.

keeper circuitkeeper circuitKeeps the last state on a bus if all other drivers are in Keeps the last state on a bus if all other drivers are in

High-Z High-Z statestate..

Page 23: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2323/103/103

Software SupportSoftware Support• Development of an FPGA-based hardware system Development of an FPGA-based hardware system

can be divided into following stages:can be divided into following stages:

• system design & synthesissystem design & synthesis• design implementationdesign implementation• on-chip verificationon-chip verification

Figure 4aFigure 4a - Altera design flow diagram - Altera design flow diagram Figure 4bFigure 4b - Xilinx design flow diagram - Xilinx design flow diagram

Page 24: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2424/103/103

System Design StageSystem Design Stage• Begins with the Begins with the design entry phasedesign entry phase using: using:

• HDLHDL – – HHardware ardware DDescription escription LLanguage anguage (like (like VHDLVHDL or or VerilogVerilog))

• schematic editorschematic editor

• Software solutions offer complete integrated environments for this stage.Software solutions offer complete integrated environments for this stage.

• A wide variety of FPGA-ready component libraries are availableA wide variety of FPGA-ready component libraries are available ranging from simple processors, peripheral components, controllers, ranging from simple processors, peripheral components, controllers,

down to general logic (gates, counters, decoders, etc).down to general logic (gates, counters, decoders, etc).

• Software support hierarchical design entry.Software support hierarchical design entry.

Page 25: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2525/103/103

System Design Stage (2)System Design Stage (2)• Once the hardware design is complete it is Once the hardware design is complete it is synthesizedsynthesized:: A process that transforms it from HDL form into a low-level gate form,A process that transforms it from HDL form into a low-level gate form,

called called RTLRTL – – RRegister egister TTransfer ransfer LLevel description.evel description.

• The system design stage is platform independent.The system design stage is platform independent.

The resulting RTL description of our system can be fitted into any The resulting RTL description of our system can be fitted into any FPGA.FPGA.

Figure 5Figure 5 - HDL and schematic representation of a BCD counter - HDL and schematic representation of a BCD counter

Page 26: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2626/103/103

Design Implementation Design Implementation StageStage

• Commonly called Commonly called Place-And-RoutePlace-And-Route stage. stage.

• Place-And-Route tools take the input RTL netlist for the Place-And-Route tools take the input RTL netlist for the design design and map the logic into the architectural resources of the and map the logic into the architectural resources of the FPGA.FPGA.

• Then, the best location for these blocks is found, Then, the best location for these blocks is found, based on their interconnections and desired performance.based on their interconnections and desired performance.

• Finally, the interconnects are routed, and pins assigned.Finally, the interconnects are routed, and pins assigned.

Page 27: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2727/103/103

Design Implementation Stage Design Implementation Stage (2)(2)

• This stage is This stage is platform-dependentplatform-dependent, , since our design is implemented in an actual FPGA architecture.since our design is implemented in an actual FPGA architecture.

• Therefore, place-and-route tools are developed by the FPGA vendors.Therefore, place-and-route tools are developed by the FPGA vendors.

• They are developed to take full advantage of FPGA architecture, They are developed to take full advantage of FPGA architecture, and to provide optimum performance for a given design.and to provide optimum performance for a given design.

• Many Many analysis and simulationanalysis and simulation tools are provided for this stage. tools are provided for this stage.

The result of this stage is a The result of this stage is a configuration file configuration file which is loaded into FPGA at startupwhich is loaded into FPGA at startup

Page 28: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2828/103/103

On-Chip Verification StageOn-Chip Verification Stage

• This stage is executed once the design has been loaded into This stage is executed once the design has been loaded into the FPGA.the FPGA.

• It gives the developer the possibility for real-world debugging.It gives the developer the possibility for real-world debugging.

• Special cables are supplied with FPGA development kits, Special cables are supplied with FPGA development kits, for connecting FPGAs to a PC or a workstation.for connecting FPGAs to a PC or a workstation.

• This provides means for reading contents of internal registersThis provides means for reading contents of internal registersand memory.and memory.

Page 29: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 2929/103/103

Software Support (2)Software Support (2)

• Both Xilinx and Altera offer complete software development kits Both Xilinx and Altera offer complete software development kits that guide users through all 3 stages of system design. that guide users through all 3 stages of system design.

• Altera offers Altera offers Quartus IIQuartus II

• Xilinx offers Xilinx offers ISEISE

• Third-party software tools can be used in Third-party software tools can be used in system design stage system design stage as as well.well.

Page 30: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3030/103/103

"Intellectual Property" "Intellectual Property" BlocksBlocks

• Complete designs of some complex systems, Complete designs of some complex systems, written in HDL by FPGA manufacturers, optimized to run on their FPGAs. written in HDL by FPGA manufacturers, optimized to run on their FPGAs. e.g. microcontrollers, microprocessors, etc. e.g. microcontrollers, microprocessors, etc.

• CPUs: Altera: CPUs: Altera: 32-bit32-bit Nios IINios II Xilinx: Xilinx: 32-bit32-bit MicroBlazeMicroBlaze

Figure 6Figure 6 - Block diagram of Altera's 16-bit Nios processor - Block diagram of Altera's 16-bit Nios processor

Page 31: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3131/103/103

• When FPGA based designs move in volume production When FPGA based designs move in volume production the main issue is the main issue is cost reduction!cost reduction!

• Xilinx and Altera have different approaches:Xilinx and Altera have different approaches:

Volume Production Volume Production SolutionsSolutions

Xilinx offers specialized Xilinx offers specialized EasyPathEasyPath FPGAs: FPGAs:

Once the clients have developed their system on FPGA, Once the clients have developed their system on FPGA, they send it to Xilinx.they send it to Xilinx.

After 8 weeks they get back the optimized FPGAs After 8 weeks they get back the optimized FPGAs with exactly the same functionality. with exactly the same functionality.

These These optimized FPGAsoptimized FPGAs are 30%-80% less expensive when mass are 30%-80% less expensive when mass produced, and they represent produced, and they represent replacementsreplacements for structured ASICs, for structured ASICs, and take less time to be completed.and take less time to be completed.

Altera offers a service called Altera offers a service called HardCopy HardCopy ::

It is a It is a migration pathmigration path from the FPGA to structured ASIC. from the FPGA to structured ASIC. Altera developed a fine-grained cell structure (HCells) Altera developed a fine-grained cell structure (HCells) ASICs ASICs which perfectly match the logic elements (LEs) of Altera’s which perfectly match the logic elements (LEs) of Altera’s FPGAs.FPGAs.

That way Stratix LEs are mapped to equivalent logic That way Stratix LEs are mapped to equivalent logic elements elements in the corresponding HardCopy device. in the corresponding HardCopy device.

If a Stratix LE is not used in the FPGA design, If a Stratix LE is not used in the FPGA design, then it is not mapped to the HardCopy device, then it is not mapped to the HardCopy device, yielding a more efficient mapping of the prototyped yielding a more efficient mapping of the prototyped design.design.

Page 32: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3232/103/103

Overviews & Overviews & ComparisonsComparisons

Page 33: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3333/103/103

low-end FPGA familylow-end FPGA family

Page 34: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3434/103/103

OverviewOverview• Most recent Altera's low-end FPGA familyMost recent Altera's low-end FPGA family

• Introduced in 2004, first shipped in February 2005Introduced in 2004, first shipped in February 2005

• 1.2V1.2V core, core, 90nm90nm process process

Page 35: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3535/103/103

PackagingPackaging

• Commercial grade and industrial grade devices are offered.Commercial grade and industrial grade devices are offered.

Page 36: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3636/103/103

Functional DescriptionFunctional Description

• Two-dimensional row/column-based architecture to implement custom Two-dimensional row/column-based architecture to implement custom logic.logic.

• Column and row interconnects of varying speeds provide signal Column and row interconnects of varying speeds provide signal interconnects between interconnects between LLogic ogic AArray rray BBlockslocks ( (LABLABs), embedded s), embedded memory, and multipliers.memory, and multipliers.

• Logic array consists of LABs, with Logic array consists of LABs, with 1616 logic elementslogic elements ( (LELEs) in each s) in each LAB.LAB.

Page 37: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3737/103/103

Functional Description (2)Functional Description (2)

• Density from 4,608 to 68,416 LEs.Density from 4,608 to 68,416 LEs.

• Up to four phase-locked-loops (PLLs).Up to four phase-locked-loops (PLLs).

• Global clock network consists of up to 16 global clock lines Global clock network consists of up to 16 global clock lines that drive throughout the entire device.that drive throughout the entire device.

Page 38: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3838/103/103

Functional Description (3)Functional Description (3)

• M4K memory blocks are true dual-port memory blocks with 4K bits of M4K memory blocks are true dual-port memory blocks with 4K bits of memory.memory.

• Works at up to 260 MHz.Works at up to 260 MHz.

• These blocks are arranged in columns across the device These blocks are arranged in columns across the device in between certain LABs.in between certain LABs.

• Cyclone II devices offer between 119 to 1,152 Kbits of embedded Cyclone II devices offer between 119 to 1,152 Kbits of embedded memory.memory.

Page 39: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 3939/103/103

Functional Description (4)Functional Description (4)

• Each embedded multiplier block can implement either two 9×9-bit Each embedded multiplier block can implement either two 9×9-bit multipliers, multipliers, or one 18 × 18-bit multiplier.or one 18 × 18-bit multiplier.

• Embedded multipliers are arranged in columns across the device.Embedded multipliers are arranged in columns across the device.

• Up to 250-MHz performance.Up to 250-MHz performance.

Page 40: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4040/103/103

Functional Description (5)Functional Description (5)

• Each I/O pin is fed by an Each I/O pin is fed by an IOEIOE ( (IInput nput OOutput utput EElement) lement) located at the periphery of the device.located at the periphery of the device.

• I/O pins support various single-ended and differential I/O standards.I/O pins support various single-ended and differential I/O standards.

• Each IOE contains a bidirectional I/O buffer and three registersEach IOE contains a bidirectional I/O buffer and three registersfor registering input, output, and output-enable signals.for registering input, output, and output-enable signals.

Page 41: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4141/103/103

LE UnitLE Unit

4-input LUT acts as a function generator for logic functions with 4 variables, or a 16-bit register.

Programmable register. Can be configured like D, T, JK or SR flipflop.Used optionally.

Carry logic

Cyclone II LE can operate in 2 modes:Cyclone II LE can operate in 2 modes:

• normal modenormal mode• arithmetic modearithmetic mode

Page 42: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4242/103/103

LE – Normal ModeLE – Normal Mode• Suitable for general logic applications and combinatorial functions.Suitable for general logic applications and combinatorial functions.

Page 43: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4343/103/103

LE – Arithmetic ModeLE – Arithmetic Mode

• Implements a 2-bit full adder and basic carry chainImplements a 2-bit full adder and basic carry chain

Page 44: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4444/103/103

LABs and InterconnectsLABs and Interconnects• LABLAB - - LLogic ogic AArray rray BBlocklock

Logic Array Blockconsists of 16 LEsconnected with carry and register chains

Local Interconnect.Transfers signals between LEs in the same LAB

Row Interconnect.Connects multiple LABs

Column Interconnect.Connects multiple LABs

Page 45: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4545/103/103

Clock ManagementClock Management

• Clock network features:Clock network features:

Up to 16 Up to 16 Global Clock NetworksGlobal Clock Networks Up to 4 Up to 4 PLLPLLss Dynamic clock source selection, enable and disableDynamic clock source selection, enable and disable

• Global clock networks spread throughout the entire device. Global clock networks spread throughout the entire device.

• They provide clocks for all resources within the device, They provide clocks for all resources within the device, such as IOEs, LEs, memory blocks, and embedded multipliers.such as IOEs, LEs, memory blocks, and embedded multipliers.

• They are driven by external clock sources (via clock pins),They are driven by external clock sources (via clock pins),PLL outputs or the logic array signals.PLL outputs or the logic array signals.

• Global clock lines can also be used for general purpose control signals.Global clock lines can also be used for general purpose control signals.

Page 46: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4646/103/103

Clock Management (2)Clock Management (2)

• There is one There is one clock control blockclock control block for each global clock network. for each global clock network.

• They are arranged on the device periphery.They are arranged on the device periphery.

• Clock control blocks are used to select/enable/disable a global clock network.Clock control blocks are used to select/enable/disable a global clock network.

• Multiplexers are used with these clocks to form 6-bit buses to feed LABs and Multiplexers are used with these clocks to form 6-bit buses to feed LABs and IOEs. IOEs.

Page 47: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4747/103/103

Clock Management (3)Clock Management (3)• PLLs are located at the corners:PLLs are located at the corners:

Page 48: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4848/103/103

Clock Management (4)Clock Management (4)

• Cyclone II PLLs provide:Cyclone II PLLs provide: Clock skew eliminationClock skew elimination

Provides zero-delay clock signal in every part of FPGA.Provides zero-delay clock signal in every part of FPGA.

Clock multiplication and divisionClock multiplication and division

Ranges from x(1/128) up to x32.Ranges from x(1/128) up to x32.

Phase shiftingPhase shifting

Programmable phase shifts in increments of at least 45°.Programmable phase shifts in increments of at least 45°.

Programmable duty-cycleProgrammable duty-cycle

Generate clock outputs with a variable duty cycleGenerate clock outputs with a variable duty cycle

Manual clock switchoverManual clock switchover

Enables you to switch between two reference input clocks Enables you to switch between two reference input clocks for applications that may require support for applications that may require support for clocks with two different frequencies.for clocks with two different frequencies.

Page 49: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 4949/103/103

Embedded MemoryEmbedded Memory• Consists of columns of M4K memory blocks:Consists of columns of M4K memory blocks:

Page 50: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5050/103/103

Embedded Memory (2)Embedded Memory (2)

The M4K blocks support the following features:The M4K blocks support the following features:

4,608 RAM bits (4Kbits + parity bits – one for each byte)4,608 RAM bits (4Kbits + parity bits – one for each byte)

250-MHz performance250-MHz performance

True dual-port memoryTrue dual-port memory

Supports any combination of two-port operations: Supports any combination of two-port operations: 2 reads, 2 writes, or 1 read and 1 write at different clock frequencies.2 reads, 2 writes, or 1 read and 1 write at different clock frequencies.

Simple dual-port memorySimple dual-port memory

Simultaneous reads and writes are supported.Simultaneous reads and writes are supported.

Single-port memorySingle-port memory

Simultaneous reads and writes are not allowed.Simultaneous reads and writes are not allowed.

Shift registerShift register

Page 51: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5151/103/103

Embedded Memory (3)Embedded Memory (3)

The M4K blocks support the following features:The M4K blocks support the following features:

FIFO bufferFIFO buffer

ROMROM

When configured as RAM or ROM, When configured as RAM or ROM, you can use an initialization file to preload the memory contents.you can use an initialization file to preload the memory contents.

Byte enableByte enable

Allows the input data to be masked so the device can write to Allows the input data to be masked so the device can write to specific bytes. specific bytes.

The unwritten bytes retain the previous written value.The unwritten bytes retain the previous written value.

Address clock enableAddress clock enable

Used to hold the previous address value for as long as the signal isUsed to hold the previous address value for as long as the signal is enabled. enabled.

This feature is useful in handling cache misses.This feature is useful in handling cache misses.

Content Addressable memory (CAM) Content Addressable memory (CAM) Associative memory Associative memory

Page 52: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5252/103/103

Embedded MultipliersEmbedded Multipliers• Located in columns high as one LAB row:Located in columns high as one LAB row:

Page 53: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5353/103/103

Embedded Multipliers (2)Embedded Multipliers (2)

• Multiplier blocks are optimized for intensive Digital Signal Processing Multiplier blocks are optimized for intensive Digital Signal Processing functions, such as: functions, such as:

finite impulse response (FIR) filters, Fast Fourier Transform (FFT), finite impulse response (FIR) filters, Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) functions, etc.Discrete Cosine Transform (DCT) functions, etc.

• Operate at up to 250 MHz.Operate at up to 250 MHz.

Embedded multipliers can work in 2 basic operational Embedded multipliers can work in 2 basic operational modes:modes:

• One 18b x 18b multiplierOne 18b x 18b multiplier• Two independent 9b x 9b multipliersTwo independent 9b x 9b multipliers

Page 54: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5454/103/103

Embedded Multipliers (3)Embedded Multipliers (3)

• The embedded multiplier consists of the following elements:The embedded multiplier consists of the following elements:

Multiplier blockMultiplier block Input and output registersInput and output registers Input and output interfacesInput and output interfaces

These signals control operand representation:signed or unsigned

Input Register(used optionally)

Output Register(used optionally)

Page 55: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5555/103/103

Input/Output ElementsInput/Output Elements• IOEIOEs (s (IInput nput OOutput utput EElements) are located in I/O blocks at the lements) are located in I/O blocks at the periphery:periphery:

Page 56: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5656/103/103

Input/Output Elements (2)Input/Output Elements (2)

IOEs support many features, including:IOEs support many features, including:

Differential and single-ended I/O standardsDifferential and single-ended I/O standards

3-state buffers3-state buffers

Programmable input and output delaysProgrammable input and output delays

Programmable pull-up resistors during device configuration and in User Programmable pull-up resistors during device configuration and in User ModeMode

Bus-hold circuitryBus-hold circuitry

Joint Test Action Group (JTAG) boundary-scan test (BST) supportJoint Test Action Group (JTAG) boundary-scan test (BST) support

etc.etc.

Page 57: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5757/103/103

Input/Output Elements (3)Input/Output Elements (3)

Output Register

(used optionally)

Input Register(used

optionally)

Programmable delay

chain (for input)

Output Enable

Register(used

optionally)

I/O pin

Programmable Pull-Up resistor

Bus-hold (keeper) circuit

Prevents damage from high voltage

Page 58: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5858/103/103

Input/Output Elements (4)Input/Output Elements (4)

IOEs support most conventional and high-speed I/O protocols:IOEs support most conventional and high-speed I/O protocols:

LVTTL (3.3V, 2.5V, 1.8V)LVTTL (3.3V, 2.5V, 1.8V) LVCMOS (3.3V, 2.5V, 1.8V, 1.5V)LVCMOS (3.3V, 2.5V, 1.8V, 1.5V) SSTL (classes I, II) and differentialSSTL (classes I, II) and differential HSTL (classes I, II) and differentialHSTL (classes I, II) and differential PCI and PCI-XPCI and PCI-X etc.etc.

Page 59: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 5959/103/103

Input/Output Elements (5)Input/Output Elements (5)

• I/O pins on Cyclone II devices are grouped together into I/OI/O pins on Cyclone II devices are grouped together into I/O banks banks. .

• Each bank has a separate power bus.Each bank has a separate power bus.

• To accommodate voltage-referenced I/O standards, To accommodate voltage-referenced I/O standards, each I/O bank has a Veach I/O bank has a VREFREF bus. bus.

• Multiple voltage-referenced standards can be supported in an I/O bank Multiple voltage-referenced standards can be supported in an I/O bank as long as they use the same Vas long as they use the same VREF REF and a compatible Vand a compatible VCCIOCCIO value. value.

• For example:For example:

When VWhen VCCIOCCIO is 3.3V, a bank can support LVTTL, LVCMOS, is 3.3V, a bank can support LVTTL, LVCMOS, and 3.3V PCI for inputs and outputs.and 3.3V PCI for inputs and outputs.

Page 60: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6060/103/103

Input/Output BanksInput/Output Banks

Page 61: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6161/103/103

• Logics, circuitry, and routing switches Logics, circuitry, and routing switches are configured with CMOS SRAM elements are configured with CMOS SRAM elements that require configuration data to be loaded on each power-up.that require configuration data to be loaded on each power-up.

• Process of physically loading the SRAM data into the device is Process of physically loading the SRAM data into the device is called: called: configuration.configuration.

• During During initializationinitialization, which occurs immediately after , which occurs immediately after configuration, configuration, the device resets registers, enables I/O pins, the device resets registers, enables I/O pins, and begins to operate as a logic device.and begins to operate as a logic device.

• Together, configuration and initialization are called: Together, configuration and initialization are called: command command modemode..

• Normal device operation is called: Normal device operation is called: user modeuser mode..

Start-Up ConfigurationStart-Up Configuration

Page 62: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6262/103/103

• Configuration data is loaded with one of three configuration schemes:Configuration data is loaded with one of three configuration schemes:

Start-Up Configuration (2)Start-Up Configuration (2)

• Cyclone II can be configured automatically at system power-up Cyclone II can be configured automatically at system power-up with data stored in a low-cost configuration device with data stored in a low-cost configuration device or provided by a system controller (Active Serial scheme).or provided by a system controller (Active Serial scheme).

• Cyclone II can also act as controller for other devices in AS configuration Cyclone II can also act as controller for other devices in AS configuration scheme.scheme.

Page 63: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6363/103/103

• Configuration data is loaded with one of three configuration schemes:Configuration data is loaded with one of three configuration schemes:

Start-Up Configuration (3)Start-Up Configuration (3)

• Cyclone II devices can also be configured while in user mode, Cyclone II devices can also be configured while in user mode, via a serial data stream, using the Passive serial (PS) configuration via a serial data stream, using the Passive serial (PS) configuration mode.mode.

• The PS mode also enables microprocessors to treat Cyclone II devices as The PS mode also enables microprocessors to treat Cyclone II devices as memory memory and configure them by writing to a virtual memory location, and configure them by writing to a virtual memory location, simplifying reconfiguration.simplifying reconfiguration.

Page 64: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6464/103/103

low-end FPGA familylow-end FPGA family

Page 65: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6565/103/103

OverviewOverview

• Spartan-3 was first announced in April 2003.Spartan-3 was first announced in April 2003.

• Its latest version (2005) is called Spartan-3E family.Its latest version (2005) is called Spartan-3E family.

• 90nm90nm process process

Page 66: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6666/103/103

PackagingPackaging

• Commercial grade and industrial grade devices are available.Commercial grade and industrial grade devices are available.

Page 67: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6767/103/103

Functional DescriptionFunctional Description

• The Spartan-3 family architecture consists of The Spartan-3 family architecture consists of five fundamental, programmable functional elements:five fundamental, programmable functional elements:

• CConfigurable onfigurable LLogic ogic BBlocks (locks (CLBCLBs)s)Contain RAM-based Look-Up Tables (LUTs) to implement logic, Contain RAM-based Look-Up Tables (LUTs) to implement logic, and storage elements that can be used as flip-flops or latches.and storage elements that can be used as flip-flops or latches.

• DDigital igital CClock lock MManager (anager (DCMDCM) blocks) blocks

Provide fully digital solutions for distributing, delaying, multiplying,Provide fully digital solutions for distributing, delaying, multiplying,

dividing, and phase shifting clock signals.dividing, and phase shifting clock signals.

• Block RAMBlock RAM

Provides data storage in form of 18-Kbit dual-port blocks.Provides data storage in form of 18-Kbit dual-port blocks.

• Multiplier blocksMultiplier blocks Accept two 18-bit binary numbers as inputs and calculate the Accept two 18-bit binary numbers as inputs and calculate the

product.product.

• IInput/nput/OOutput utput BBlocks (locks (IOBIOBs)s) Control the flow of data between the I/O pins and the internal logic of Control the flow of data between the I/O pins and the internal logic of

the device.the device. 24 I/O standards supported. 24 I/O standards supported.

Page 68: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6868/103/103

Spartan-3 FloorplanSpartan-3 Floorplan

Page 69: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 6969/103/103

CLB OverviewCLB Overview• CLBs constitute the main logic resource CLBs constitute the main logic resource

for implementing synchronous as well as combinatorial circuits.for implementing synchronous as well as combinatorial circuits.

• Each CLB comprises 4 interconnected Each CLB comprises 4 interconnected slicesslices, as shown below., as shown below.

• These slices are grouped in pairs. These slices are grouped in pairs. Each pair is organized as a column with an independent carry chain.Each pair is organized as a column with an independent carry chain.

Page 70: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7070/103/103

CLB Overview (2)CLB Overview (2)• All four slices have the following elements in common:All four slices have the following elements in common:

2 logic function generators (4-input LUTs)2 logic function generators (4-input LUTs)

2 storage elements2 storage elements

wide-function multiplexerswide-function multiplexers

carry logiccarry logic

arithmetic gatesarithmetic gates

• Both the left-hand and right-hand slice pairs use these elements Both the left-hand and right-hand slice pairs use these elements

to provide logic, arithmetic, and ROM functions.to provide logic, arithmetic, and ROM functions.

Page 71: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7171/103/103

4-input LUT "G"

4-input LUT "F"

Top portion

Bottom portion

Carry chain

between two logic cells in a

CLB

Blue-dotted elements

are used for implementin

g 16-bit shift-

registers.

Found only in left-hand

CLBs

CLBCLB ENLARGEENLARGE

Page 72: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7272/103/103

OR gate, used for logic and arithmetic functions

CLB upper portion - ENLARGEDCLB upper portion - ENLARGED

AND gate, used for logic and arithmetic functions

Optionally used register. Programmable as latch or

D-FF

Flow control

multiplexers

Page 73: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7373/103/103

InterconnectsInterconnects• Interconnects pass signals among various functional elements of Spartan-3 Interconnects pass signals among various functional elements of Spartan-3

devices.devices.

• There are four kinds of interconnects:There are four kinds of interconnects:

• Long linesLong linesConnect every sixth CLB in a row/column. Connect every sixth CLB in a row/column.

Because of their low capacitance, Because of their low capacitance,

these lines are well-suited for carrying high-frequency signals with minimal these lines are well-suited for carrying high-frequency signals with minimal skew. skew.

They can also serve as replacements for global clock lines.They can also serve as replacements for global clock lines.

• Hex linesHex lines

Connect every third CLB in a row/column.Connect every third CLB in a row/column.

• Double linesDouble lines

Connect every other CLB in a row/column.Connect every other CLB in a row/column.

• Direct linesDirect lines

Afford any CLB direct access to neighboring CLBs.Afford any CLB direct access to neighboring CLBs.

Page 74: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7474/103/103

Interconnects (2)Interconnects (2)

Page 75: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7575/103/103

Clock ManagementClock Management

• Spartan-3 devices have up to 4 Spartan-3 devices have up to 4 DCMDCM ( (DDigital igital CClock lock MManager) anager) blocks.blocks.

• DCMs supports 3 major functions:DCMs supports 3 major functions:

clock-skew eliminationclock-skew elimination frequency synthesisfrequency synthesis phase shiftingphase shifting

• A DCM consists of:A DCM consists of:

Delay-Locked Loop (DLL)Delay-Locked Loop (DLL) Digital Frequency SynthesizerDigital Frequency Synthesizer Phase ShifterPhase Shifter Status LogicStatus Logic

Page 76: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7676/103/103

Clock Management - DLLClock Management - DLL

• 2 clock inputs (input + feedback), 7 clock outputs2 clock inputs (input + feedback), 7 clock outputs• 2 operating modes: Low Frequency and High Frequency (3 outputs enabled)2 operating modes: Low Frequency and High Frequency (3 outputs enabled)

Programmable delay

blocks called taps

Outputs

Page 77: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7777/103/103

Clock Management (3)Clock Management (3)

• DFS component generates output clock signals, DFS component generates output clock signals, the frequency of which is a product of the clock frequency at the CLKIN the frequency of which is a product of the clock frequency at the CLKIN input and a ratio of two user-defined integers:input and a ratio of two user-defined integers:

• This gives the following output range: from x(1/16) up to x32This gives the following output range: from x(1/16) up to x32

• Besides 90°, 180° and 270° phase-shifted signals from DLL, Besides 90°, 180° and 270° phase-shifted signals from DLL, the PS component provides a still finer degree of control,the PS component provides a still finer degree of control,with resolution up to 1/265 of input clock cycle. (Low Frequency mode with resolution up to 1/265 of input clock cycle. (Low Frequency mode only)only)

• Spartan-3 devices have 8 global clock inputs. Spartan-3 devices have 8 global clock inputs. These inputs provide access to a low-capacitance, low-skew network These inputs provide access to a low-capacitance, low-skew network that is well-suited to carrying high-frequency signals.that is well-suited to carrying high-frequency signals.

; [2,32] , [1,32]MULIN MUL DIVOUT

DIV

Cf f C C

C

Page 78: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7878/103/103

Clock Management Clock Management (4)(4)

Figure 7Figure 7 - Spartan-3 Global Clock Networks (left). - Spartan-3 Global Clock Networks (left).Duty cycle correction (right)Duty cycle correction (right)

Clock multiplexers route global clock lines to

local clock networks and to

Digital Clock Managers

Global clock inputs

Page 79: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 7979/103/103

Embedded Memory (Block Embedded Memory (Block RAM)RAM)

• OOrganized as configurable,rganized as configurable, synchronous blocks, in up to 4 synchronous blocks, in up to 4 columns.columns.

• 200 MHz 200 MHz performanceperformance

• Each block contains 18Each block contains 18K K bits of fast static RAM, bits of fast static RAM, 16K bits 16K bits for for data storagedata storage + + 2K bits 2K bits forfor parity parity bits bits..

Page 80: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8080/103/103

Embedded Memory (Embedded Memory (22))

• Physically, the block RAM memory has two independentPhysically, the block RAM memory has two independent access ports,access ports,

labeled Port A and Port B (labeled Port A and Port B (dual port memorydual port memory).).

• The sThe structure is fully symmetrical. Both portstructure is fully symmetrical. Both ports are interchangeable are interchangeable and both ports support dataand both ports support data read and write operations.read and write operations.Each port has its own clock.Each port has its own clock.

Page 81: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8181/103/103

Embedded MEmbedded Multipliersultipliers

• 4 to 104 dedicated 18x18-bit multipliers4 to 104 dedicated 18x18-bit multipliers..

• OOperands are in twoperands are in two's complement form: 18-bit signed or 17-bit unsigned.'s complement form: 18-bit signed or 17-bit unsigned.

• One multiplier is matched to each Block RAM to ensure efficiency.One multiplier is matched to each Block RAM to ensure efficiency.

• Cascading multipliers permits more than 3 operands, and wider than 18b.Cascading multipliers permits more than 3 operands, and wider than 18b.

• Multiplication using inputs with more than 18 bits wide is possible Multiplication using inputs with more than 18 bits wide is possible by decomposing the multiplication process into smaller subprocesses.by decomposing the multiplication process into smaller subprocesses.

Figure 8Figure 8 - 22x16-bit multiplier implementation - 22x16-bit multiplier implementation

A

Page 82: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8282/103/103

Input/Output BlocksInput/Output Blocks

• IInput/nput/OOutput utput BBlock (lock (IOBIOB) provides a programmable, bidirectional interface ) provides a programmable, bidirectional interface between an I/O pin and the FPGA’s internal logic.between an I/O pin and the FPGA’s internal logic.

• There are three main signal paths within an IOB:There are three main signal paths within an IOB:

(each has an optional pair of storage elements, (each has an optional pair of storage elements, used as latches or D-FFs) used as latches or D-FFs)

Output pathOutput pathCarries data from I/O pin to the internal logic.Carries data from I/O pin to the internal logic.

Input pathInput pathCarries data from the FPGA’s internal logic Carries data from the FPGA’s internal logic through a multiplexer and then a 3-state buffer (driver) to the I/O pin.through a multiplexer and then a 3-state buffer (driver) to the I/O pin.

3-state path3-state path

Determines when the output buffer (driver) is high impedance.Determines when the output buffer (driver) is high impedance.

Page 83: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8383/103/103

IOBIOB3-state

Path

Output Path

Input Path

Optional storage element

Programmable output

buffer

I/O pin

ENLARGEENLARGE

Page 84: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8484/103/103

Part of IOB - ENLARGEDPart of IOB - ENLARGED

Digitally controlled

impedance.

Used to match the

impedance of transmission

lineCircuitry for

implementing various I/O standards

V REF pin

I/O pin from

adjacent IOB used

for differential

I/O standards

Programmable Pull-Up and Pull-

Down resistors

Page 85: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8585/103/103

Input/Output Blocks (4)Input/Output Blocks (4)

• Support for 18 single-ended 6 differential Support for 18 single-ended 6 differential I/O standardsI/O standards..

Differential standards are implemented by using a pair of IOBs.Differential standards are implemented by using a pair of IOBs.

• IOBs and pins are grouped into IOBs and pins are grouped into banksbanks. .

The need to supply VThe need to supply VREFREF and V and VCCOCCO imposes constraints imposes constraints on which standards can be used in the same bank.on which standards can be used in the same bank.

• Supported I/O standards include:Supported I/O standards include:

LVTTL (3.3V)LVTTL (3.3V) LVCMOS (3.3V, 2.5V, 1.8V, 1.5V)LVCMOS (3.3V, 2.5V, 1.8V, 1.5V) SSTL (classes I, II) and differentialSSTL (classes I, II) and differential HSTL (classes I, II, III ) and differentialHSTL (classes I, II, III ) and differential PCI 3.0VPCI 3.0V etc.etc.

Page 86: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8686/103/103

Start-Up ConfigurationStart-Up Configuration

• Spartan-3 devices are configured Spartan-3 devices are configured by loading configuration data into internal configuration memory.by loading configuration data into internal configuration memory.

• Several configuration modes are supported, selectable via mode pins M0, M1, Several configuration modes are supported, selectable via mode pins M0, M1, M2.M2.

Page 87: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8787/103/103

Start-Up Configuration (2)Start-Up Configuration (2)

• In In Slave Serial modeSlave Serial mode, the FPGA receives configuration data in bit-serial form from a serial , the FPGA receives configuration data in bit-serial form from a serial PROM or other serial source of configuration data. PROM or other serial source of configuration data.

• The CCLK pin on the FPGA is an input in this mode.The CCLK pin on the FPGA is an input in this mode.

• Multiple FPGAs can be daisy-chained for configuration from a single source.Multiple FPGAs can be daisy-chained for configuration from a single source.After a particular FPGA has been configured, After a particular FPGA has been configured, the data for the next device is routed internally to the DOUT pinthe data for the next device is routed internally to the DOUT pin

Slave–Serial

configuration mode

Page 88: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8888/103/103

Start-Up Configuration (3)Start-Up Configuration (3)

• In In Master SerialMaster Serial mode, the master FPGA drives the configuration clock mode, the master FPGA drives the configuration clock on the CCLK pin to the Xilinx Serial PROM, on the CCLK pin to the Xilinx Serial PROM, which, in response, provides bit-serial data to the FPGA’s DIN input.which, in response, provides bit-serial data to the FPGA’s DIN input.

• After the master FPGA has finished configuring, After the master FPGA has finished configuring, it passes data on its DOUT pin to the next FPGA device in a daisy-chain.it passes data on its DOUT pin to the next FPGA device in a daisy-chain.

Master–Serial

configuration mode

Page 89: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 8989/103/103

Start-Up Configuration (4)Start-Up Configuration (4)

• In In Master Parallel modeMaster Parallel mode, , FPGA configures from byte-wide FPGA configures from byte-wide data, data, and the FPGA itself supplies CCLK and the FPGA itself supplies CCLK (configuration clock). (configuration clock).

CCLK behaves as a bidirectional I/O CCLK behaves as a bidirectional I/O pin.pin.

• In In Slave Parallel mode,Slave Parallel mode, byte-wide data is written into FPGA, byte-wide data is written into FPGA, with a BUSY flag controlling the flow.with a BUSY flag controlling the flow.

An external source provides data, An external source provides data, CCLK, a Chip Select (CS_B) signal CCLK, a Chip Select (CS_B) signal and a Write signal (RDWR_B).and a Write signal (RDWR_B).

Page 90: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9090/103/103

high-end FPGA familyhigh-end FPGA family

Page 91: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9191/103/103

Quick OverviewQuick Overview

• Launched in February 2004.Launched in February 2004.

• 1.2V core1.2V core, 90nm, 90nm process process

• Approaching 180,000 LEsApproaching 180,000 LEs

• Up to 9 Mbits of on-chip, TriMatrix memory for memory-demanding Up to 9 Mbits of on-chip, TriMatrix memory for memory-demanding applications.applications.

• Up to 96 DSP blocks with up to 384 (18-bit × 18-bit) multipliers Up to 96 DSP blocks with up to 384 (18-bit × 18-bit) multipliers for efficient implementation of high performance filters and other DSP for efficient implementation of high performance filters and other DSP functions.functions.

• Various high-speed external memory interfaces are supported.Various high-speed external memory interfaces are supported.

• Complete clock management solution with clock frequency of up to 550 MHz Complete clock management solution with clock frequency of up to 550 MHz and up to 12 phase-locked loops (PLLs).and up to 12 phase-locked loops (PLLs).

Page 92: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9292/103/103

Quick Overview (2)Quick Overview (2)

• Designers requiring a low-risk cost-reduction path for high-volume production Designers requiring a low-risk cost-reduction path for high-volume production can easily migrate their Stratix II FPGA designs to structured-ASIC production can easily migrate their Stratix II FPGA designs to structured-ASIC production with HardCopy II devices. with HardCopy II devices.

• HardCopy II devices significantly minimize migration risk HardCopy II devices significantly minimize migration risk because they are generated directly from a Stratix II FPGA because they are generated directly from a Stratix II FPGA and preserve the Stratix II architecture.and preserve the Stratix II architecture.

Page 93: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9393/103/103

Quick Overview (3)Quick Overview (3)

• ALMALM – – AAdaptive daptive LLogic ogic MModuleodule

• One of the greatest improvements is certainly represented by the ALM One of the greatest improvements is certainly represented by the ALM architecture, allowing it to be configured in various modes.architecture, allowing it to be configured in various modes.

Page 94: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9494/103/103

high-end FPGA familyhigh-end FPGA family

Page 95: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9595/103/103

Quick OverviewQuick Overview

• Introduced in 2004Introduced in 2004

• 1.2V core1.2V core, 90nm, 90nm process process

• Three high-performance versions LX/SX/FXThree high-performance versions LX/SX/FX - Virtex-4 LX: Logic applications solution.- Virtex-4 LX: Logic applications solution.

- Virtex-4 FX: Full-featured solution for embedded platform applications- Virtex-4 FX: Full-featured solution for embedded platform applications

- Virtex-4 SX: Solution for Digital Signal Processing (DSP) applications- Virtex-4 SX: Solution for Digital Signal Processing (DSP) applications

• Up to 200,000 logic cellsUp to 200,000 logic cells

• XesiumXesium Clock Technology Clock Technology - Up to 20 Digital Clock Manager (DCM) blocks- Up to 20 Digital Clock Manager (DCM) blocks

- Additional Phase-Matched Clock Dividers (PMCD)- Additional Phase-Matched Clock Dividers (PMCD)

- 32 Global Clock networks- 32 Global Clock networks

• Up to 10Mb of integrated block memory operating at 500MHzUp to 10Mb of integrated block memory operating at 500MHz

Page 96: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9696/103/103

Quick Overview (2)Quick Overview (2)

• XtremeDSPXtremeDSP Slice Slice - 18x18 signed multipliers- 18x18 signed multipliers

- Up to 100% speed improvement over previous generation devices- Up to 100% speed improvement over previous generation devices

• Up to 960 user I/OsUp to 960 user I/Os

• IBM PowerPC RISC Processor Core (FX only)IBM PowerPC RISC Processor Core (FX only)

Page 97: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9797/103/103

Quick Overview (3)Quick Overview (3)

• At the heart of the Virtex-4 family is the new ASMBL architecture.At the heart of the Virtex-4 family is the new ASMBL architecture.

ASMBLASMBL – – AAdvanced dvanced SSilicon ilicon MModular odular BBlocklock

• This new, highly modular ASMBL architectureThis new, highly modular ASMBL architecturemakes use of advanced packaging technology makes use of advanced packaging technology and eliminates geometric layout constraints and eliminates geometric layout constraints associated with traditional chip design.associated with traditional chip design.

• Thanks to it, Xilinx can vary the number and ratio of different Thanks to it, Xilinx can vary the number and ratio of different functional parts to create a family (platform) of different sized functional parts to create a family (platform) of different sized devices, devices, each best suited for a certain domain of applications, each best suited for a certain domain of applications, depending on the desired type of functional attributes.depending on the desired type of functional attributes.

• This approach enables the right feature mix at the lowest cost, This approach enables the right feature mix at the lowest cost, and resulted in 3 platforms of Virtex-4 FPGAs – LX, FX, SX.and resulted in 3 platforms of Virtex-4 FPGAs – LX, FX, SX.

Page 98: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9898/103/103

Altera Altera vs.vs. XilinxXilinx

Page 99: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 9999/103/103

• Deciding which of the two is currently better, Deciding which of the two is currently better, on basis of described features, is an impossible task:on basis of described features, is an impossible task:

Altera Altera vs.vs. Xilinx Xilinx

Both of them offer a vast range of FPGAs, at different prices, Both of them offer a vast range of FPGAs, at different prices, guaranteed to satisfy any user’s needs.guaranteed to satisfy any user’s needs.

If we make feature-to-feature comparison of same-rank FPGAs If we make feature-to-feature comparison of same-rank FPGAs we will find that they offer very similar features at very similar prices:we will find that they offer very similar features at very similar prices:

90nm process, 1.2V core90nm process, 1.2V core

up to 200,000 LC (LEs)up to 200,000 LC (LEs)

maximum internal frequency around 500 MHzmaximum internal frequency around 500 MHz

embedded 18x18 multipliers and enhanced DSP featuresembedded 18x18 multipliers and enhanced DSP features

up to 10Mbits of multi-purpose embedded RAMup to 10Mbits of multi-purpose embedded RAM

support for leading I/O standards and external memory interfacessupport for leading I/O standards and external memory interfaces

numerous IP blocks (Nios II, MicroBlaze, etc.)numerous IP blocks (Nios II, MicroBlaze, etc.)

complete software systems (ISE and Quartus II)complete software systems (ISE and Quartus II)

Page 100: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 100100/103/103

Altera Altera vs.vs. Xilinx Xilinx (2)(2) Benchmarking also yields controversial results. Benchmarking also yields controversial results.

All the benchmarks are performed either by Xilinx/Altera, or their All the benchmarks are performed either by Xilinx/Altera, or their partners. partners.

Both companies issue whitepapers Both companies issue whitepapers claiming their FPGAs considerably outperform the opponent’s ones:claiming their FPGAs considerably outperform the opponent’s ones:

Quote:Quote:

“… “… Our benchmark results show that for high-density 90-nm FPGAs, Our benchmark results show that for high-density 90-nm FPGAs, the Altera Stratix II family commands an average of 39% performance lead the Altera Stratix II family commands an average of 39% performance lead over Xilinx Virtex-4 family. over Xilinx Virtex-4 family. For low-cost FPGAs, the Altera 90-nm Cyclone II family provides an averageFor low-cost FPGAs, the Altera 90-nm Cyclone II family provides an average60% higher performance than the Xilinx 90-nm Spartan-3 family…”60% higher performance than the Xilinx 90-nm Spartan-3 family…”

Altera whitepaper, Altera whitepaper, “FPGA Performance Benchmarking Methodology”“FPGA Performance Benchmarking Methodology”

Quote:Quote:

“… “… Cyclone II performance, as demonstrated by a suite of customer designs Cyclone II performance, as demonstrated by a suite of customer designs using the most cost effective speed grade, has degraded almost a full speed using the most cost effective speed grade, has degraded almost a full speed grade grade from Quartus II v4.1 to v4.2, and further degradation is indicated for the new from Quartus II v4.1 to v4.2, and further degradation is indicated for the new v5.0.v5.0.Spartan-3 design performance is now slightly faster than Cyclone II when Spartan-3 design performance is now slightly faster than Cyclone II when comparingcomparingthe most cost effective speed grade in each device…”the most cost effective speed grade in each device…”

Xilinx whitepaper, “Xilinx whitepaper, “Spartan-3 vs. Cyclone II Performance AnalysisSpartan-3 vs. Cyclone II Performance Analysis””

Page 101: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 101101/103/103

Altera Altera vs.vs. Xilinx Xilinx (3)(3)

Is there a way to find out who is better?Is there a way to find out who is better?Let us ask the customers:Let us ask the customers:Quote:Quote:“… “… in a survey of more than 350 design teams worldwide, in which in a survey of more than 350 design teams worldwide, in which respondents were asked to rate their experience with FPGA and EDA respondents were asked to rate their experience with FPGA and EDA companies' products and services,companies' products and services,FPGA designers ranked Xilinx highest in reader/customer satisfaction for FPGA designers ranked Xilinx highest in reader/customer satisfaction for devices, devices, design tools, service and support, including:design tools, service and support, including:

Virtex and Spartan FPGAs Virtex and Spartan FPGAs - "Xilinx continues to lead the pack in - "Xilinx continues to lead the pack in performance and performance and features, and goes the extra mile in features, and goes the extra mile in explaining how to use explaining how to use their devices for particular class their devices for particular class of application."of application."

ISE design toolsISE design tools - "Xilinx has made significant - "Xilinx has made significant improvements to their tool improvements to their tool suite over the past suite over the past year, particularly in the DSP and year, particularly in the DSP and embedded design embedded design areas."areas."Support staff, Support staff, and documentationand documentation -"Xilinx consistently sets the standard for support -"Xilinx consistently sets the standard for support staff and staff and resources, particularly with their robust resources, particularly with their robust website and website and responsible and knowledgeable responsible and knowledgeable application engineers."application engineers."

FPGA JournalFPGA Journal

Page 102: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 102102/103/103

ConclusionConclusion• It seems that It seems that Xilinx is the winner.Xilinx is the winner.

• But the competition is closing the gaps.But the competition is closing the gaps.

• A careful reader will notice that A careful reader will notice that the stated reasons for Xilinx winning the readers’ award the stated reasons for Xilinx winning the readers’ award have more to do with client relations have more to do with client relations than with a great difference in performance. than with a great difference in performance.

• One thing, however, is certain:One thing, however, is certain:

vs.vs. = =

A satisfied user A satisfied user

Page 103: Ognjen Šćekić 1/103 Altera vs. Xilinx Ognjen Šćekić ogi@cg.yu prof. dr Veljko Milutinović vm@etf.bg.ac.yu.

Ognjen ŠćekićOgnjen Šćekić 103103/103/103

Thank you!Thank you!

The EndThe End