Commercial FPGAs: Altera Stratix Family

Post on 23-Feb-2016

36 views 0 download

Tags:

description

Commercial FPGAs: Altera Stratix Family. Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223. Notes on T hese Slides. Altera has disclosed the details of their devices both in online documentation and academic papers - PowerPoint PPT Presentation

Transcript of Commercial FPGAs: Altera Stratix Family

Commercial FPGAs: Altera Stratix Family

Dr. Philip BriskDepartment of Computer Science and Engineering

University of California, Riverside

CS 223

Notes on These Slides

• Altera has disclosed the details of their devices both in online documentation and academic papers

• The academic papers evaluate different design decisions and tradeoffs; the experiments are a bit too specialized for this course. – Please do not overly emphasize the

experimentation in your studies

The Stratix TM Routing and Logic Architecture

D.M. Lewis, et al.,International Symposium on FPGAs, 2003

Online documentation

Altera Stratix FPGA

Stratix Logic Element (LE)

Register Feedback Mode

Register Cascade (Shift Regs.)

Logic Array Block (LAB)

Directionally Biased Routing• Long vertical wires

require power drivers– Fewer vertical wires

• More rows than columns– More demand for

horizontal wires

The Stratix II Logic and Routing Architecture

D.M. Lewis, et al.,International Symposium on FPGAs, 2005

Online documentation

Logic Array Block (LAB)

Adaptive Logic Module (ALM)

Adaptive Logic Module (ALM)

Four ALM Operating Modes

• Normal Mode• Extended LUT Mode• Arithmetic Mode• Shared Arithmetic Mode

Normal Mode

LUT Input Utilization

Extended LUT Mode

• Some 7-input logic functions

Arithmetic Mode

Arithmetic Mode ExampleR = (X < Y) ? Y : X

(X < Y)• Compute X-Y using the carry

chain• Only look at the carry output• Use the carry output to select

either X or Y accordingly

Configure the LUTs to pass X through unmodified, and ignore the carry chain outputs

Shared Arithmetic Mode (3-input Add)

Register Chain (Shift Registers)

Separates logic and shift register functions• Cycle 1

• Combination logic• Cycles 2..k+1

• Shift by k

ALM Benefits

• Reduced LAB area by 2.6% compared to Stratix• 15% performance improvement• When shrinking from a 0.13um(Stratix) to 90nm

(Stratix II) technology node– 51% performance improvement– 50% area decrease

TriMatrix Embedded Memories

M512 RAM Block

Functions• 1-port RAM• 2-port RAM• FIFO• ROM• Shift Register

576 RAM bits (32 x 18), includes parity bits

M4K RAM Block

4,608 RAM bits (128 x 36), includes parity bits

Functions• 1-port RAM• 2-port RAM• True 2-port

RAM• FIFO• ROM• Shift Register

M-RAM Block

589,824 RAM bits (4K x 144), includes parity bits

Functions• 1-port RAM• 2-port RAM• True 2-port

RAM• FIFO

MRAM LAB Interface

DSP Blocks

• Eight 9x9 multipliers• Four 18x18 multipliers• One 36x36 multiplier

Add/Sub/Accum Functions• Multiplier• Multiply-Accum• AB + CD• AB + CD + EF + GH

DSP BlockInternals

DSP Block Interconnect Interface

Architectural Enhancements in Stratix-IIITM and Stratix-IVTM

D.M. Lewis, et al.,International Symposium on FPGAs, 2009

Online documentation (Stratix III)

Online documentation (Stratix IV)

New Features

• Programmable power management• LUT-RAM• LUT-Register Mode• Enhanced DSP Block

Programmable Body Bias Control

Large regions• Less body bias control circuitrySmall regions• Fine-grained power mgmt

Power Efficiency

LUT-RAM

SRAM

SRAM

SRAM

SRAM

x yIdea• Use the SRAM bits as memory• Granularity is LAB-wide

What is needed?• Write capability• Signals for address and data for

the write path

LUT-RAM ArchitectureSupports one read + one write in a single cycle

MLAB vs. LAB

ALM LUT-Register Mode

https://upload.wikimedia.org/wikipedia/commons/c/c6/R-S_mk2.gif

ALM LUT-Register Mode

DSP Block Capabilities• High-performance, power-optimized, fully registered and pipelined multiplication

operations• Natively supported 9-bit, 12-bit, 18-bit, and 36-bit wordlengths• Natively supported 18-bit complex multiplications• Efficiently supported floating-point arithmetic formats (24-bit for single precision and

53-bit for double precision)• Signed and unsigned input support• Built-in addition, subtraction, and accumulation units to combine multiplication• results efficiently• Cascading 18-bit input bus to form tap-delay line for filtering applications• Cascading 44-bit output bus to propagate output results from one block to the next

block without external logic support• Rich and flexible arithmetic rounding and saturation units• Efficient barrel shifter support• Loopback capability to support adaptive filtering

DSP Block Overview

Multiply-Add

4-Multiply Add w/Accumulation

Cascading Output for FIR Filters

Full DSP Block

Half-DSP Block Architecture

Four 9-bit Independent Half-DSP Multiplier Mode

Three 12-bit Independent Half-DSP Multiplier Mode

Two 18-bit Independent Half-DSP Multiplier Mode

36-bit Half-DSP Multiplier Mode

54x54-bit Multiplier Mode

Used for double-precision floating-point

Architectural Enhancements in Stratix-VTM

D.M. Lewis, et al.,International Symposium on FPGAs, 2013

Online documentation

Larger MLAB/LUT-RAM

4 Flip-Flops per ALM

Embedded Memories with Error Correction Codes (ECC)