© 2005 Altera Corporation © 2006 Altera Corporation FPGAs and Structured ASICs Overview & Research...
-
Upload
erick-mcbride -
Category
Documents
-
view
219 -
download
1
Transcript of © 2005 Altera Corporation © 2006 Altera Corporation FPGAs and Structured ASICs Overview & Research...
© 2005 Altera Corporation© 2006 Altera Corporation
FPGAs and Structured ASICsOverview & Research Challenges
Vaughn BetzDirector, Software Engineering
2© 2006 Altera Corporation
AgendaAgenda What is an FPGA? FPGA & ASIC market dynamics FPGA technology Structured ASIC technology Research Challenges
Power Scalable CAD CAD to raise abstraction level Structured ASIC total cost
© 2005 Altera Corporation© 2006 Altera Corporation
What is an FPGA?
4© 2006 Altera Corporation
What is an FPGA?What is an FPGA? Field Programmable Gate Array Gate Array
Two-dimensional array of logic gates Traditionally connected with customized metal Every logic circuit (customer) needs a custom-
manufactured chip
Field Programmable Customized by programming after manufacture One FPGA can serve every customer
FPGA: re-programmable hardware
5© 2006 Altera Corporation
Basic Internals of an FPGABasic Internals of an FPGA
Logic Element
Each logic element is
to implement thedesired function
programmed to
Programmable Connections
Logic Element
Logic Element
Logic Element
Logic Element
Logic Element
Logic Element
Logic Element
Logic Element
6© 2006 Altera Corporation
Embedding a circuit in an FPGAEmbedding a circuit in an FPGA All done by CAD system (e.g. Quartus)
Chop up circuit into little pieces of logic Each piece goes in a separate logic element (LE) Hook them together with the programmable routing
Desired Circuit
f
x
yz
LE
FPGA
x
y
z
fI/O Pads
I/O Pad
7© 2006 Altera Corporation
FPGA Logic ElementFPGA Logic Element Look-Up Table (LUT) + register + extra …
FPGAs typically use 4-input or larger LUTs Cyclone family (low cost): 4-inputs Stratix II: Adaptive Logic Module implements 4 – 6
input LUTs efficiently Virtex 5: 6 inputs
A
BOut
0
0
0
0
1
A B
Out
LUT
0
1
SRAMCell
8© 2006 Altera Corporation
Connecting the LogicConnecting the Logic
Logic elements implement the pieces of the circuit Now hook them up with the programmable routing
LE
FPGA
x
y
z
fI/O Pads
I/O Pad
9© 2006 Altera Corporation
Programmable RoutingProgrammable Routing Programmable switches connect fixed metal
wires Choose pattern so any logic element can
connect to any other
Logic Block
OutIn1
In2
SRAMcell
10© 2006 Altera Corporation
Modern, mid-size FPGA – 2S60Modern, mid-size FPGA – 2S60
Adaptive Logic Modules
M512 Block
M4K Block
High-Speed I/OChannels with
Dynamic Phase Alignment (DPA)
I/O Channels with External Memory Interface Circuitry
M-RAM Blocks
I/O Channels with External Memory Interface Circuitry
Digital Signal Processing (DSP) Blocks
Phase-Locked Loops (PLL)
High-Speed I/O Channels withDPA
60,440 Equivalent Logic Elements2,544,192 Memory Bits
90nm Stratix II 2S60
© 2005 Altera Corporation© 2006 Altera Corporation
FPGA and ASIC Market Dynamics
12© 2006 Altera Corporation
FPGAs vs. Standard Cell ASICsFPGAs vs. Standard Cell ASICsParameter FPGA Standard Cell
CAD tool Cost $2000 $Millions
Mask Cost 0 $1.4M US @ 90 nm
Bug Fix 1 hour ~10 weeks
Electrical & Optical Check & Debug Vendor’s Problem Your Problem!
Time to Market Fast Slow
Die Size 2X to 20X 1X
Volume Cost 1X to 20X 1X
Speed 0.3X to 0.6X 1X
Power 2X to 5X 1X
13© 2006 Altera Corporation
CMOS Semiconductor MarketCMOS Semiconductor Market
2003 Total$26.0B
GateArray
5%
ASSP37%
Standard Cell39%
StandardLogic 6%
ProgrammableLogic 10%
Custom IC3%
2003 Total$26.0B
GateArray
5%
ASSP37%
Standard Cell39%
StandardLogic 6%
ProgrammableLogic 10%
Custom IC3%
14© 2006 Altera Corporation
Traditional FPGA UsersTraditional FPGA Users
15© 2006 Altera Corporation
Std Cell ASIC Development Cost TrendStd Cell ASIC Development Cost TrendT
ota
l D
evel
op
men
t C
ost
s ($
M)
Note: Conservative estimate; does not include re-spins.
0
5
10
15
20
25
30
35
40
45
0.18 µm 0.15 µm 0.13 µm 90 nm 65 nm 45 nm
Masks & Wafers Test & Product EngineeringSoftware Design/Verification & Layout
16© 2006 Altera Corporation
Result: Declining ASIC StartsResult: Declining ASIC Starts
Source: Dataquest/Gartner
Standard Cell/Gate Arrays
0
2000
4000
6000
8000
10000
12000
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Des
ign
Sta
rts
17© 2006 Altera Corporation
EE Times, Aug 23, 2004
Today’s “typical design”Today’s “typical design”
18© 2006 Altera Corporation
New FPGA Users & ProductsNew FPGA Users & Products
Designers “Priced Out” Of ASIC Start-Ups / Risk-Adverse Replacement For DSP Consumer & Industrial
19© 2006 Altera Corporation
Broadcast/Audio/VideoBroadcast/Audio/Video
20© 2006 Altera Corporation
WirelessWireless
21© 2006 Altera Corporation
Industrial, Test & MeasurementIndustrial, Test & Measurement
22© 2006 Altera Corporation
Consumer: DisplaysConsumer: Displays
23© 2006 Altera Corporation
Consumer GadgetsConsumer Gadgets
© 2005 Altera Corporation© 2006 Altera Corporation
FPGA Technology
25© 2006 Altera Corporation
FPGAs Need Vertical IntegrationFPGAs Need Vertical Integration
Silicon process models & expertise FPGA architecture Complete CAD system Intellectual Property cores
Including soft processors Embedded software development tools
26© 2006 Altera Corporation
Silicon Process KnowledgeSilicon Process Knowledge FPGAs move to latest process very early
Helps close speed, area gap with ASICs in older processes
High volume covers development costs
Foundries use FPGAs as process drivers Large dies Both logic and RAM Regular structures help shake out systematic fab
issues
Need good silicon expertise to stay on the bleeding edge of process
27© 2006 Altera Corporation
90nm 9-layer Interconnect90nm 9-layer Interconnect
Transistor
28© 2006 Altera Corporation
90nm Transistor Cross-section90nm Transistor Cross-section
PolySpacer Spacer
Contact
Diffusion
Isolatio
n
Isolatio
n
Dielectric
Salicide
29© 2006 Altera Corporation
FPGA ArchitectureFPGA Architecture
Want to improve speed, area & power to Close gap with ASICs Stay ahead of competition
Need to ensure device Is routable Has right mix of features
Huge problem space Routing wires, switch pattern, LUT size, RAM
types, logic block size, …
30© 2006 Altera Corporation
Architecting via Virtual PrototypiongArchitecting via Virtual Prototypiong
Analysis:Speed & Area
Routability, Power
FMT Place&Route
FMT Synthesis
FPGA Database
(300M)
FPGA Arch. Spec(150 pages)
FMT
Timing, AreaModels
Params
Customer DesignsIP, Reference Designs
1.4E-08
1.4E-08
1.5E-08
1.5E-08
1.6E-08
1.6E-08
0 0.2 0.4 0.6 0.8 1
Fraction Length 4 Wires
Crit
ical
Pat
h D
elay
(s)
Length 4/16
Length 4/8
31© 2006 Altera Corporation
Parallel designParallel design
LUT 4
A
B
DC0
DC1
E1
share_in
LUT 3
+
carry_out
LUT 3
0
1
0
1
0
1
carry_in
share_out
0
1
0
1
0
1
F1
LUT 4
R
F0
E0
LUT 3
LUT 3
0
1
0
1
0
1
R
+
0
1
0
1
0
1
lelocal1
0
1
CLR
LD
D QD
DATA
EN
aclr[
1:0]
aloa
d
sclr
sloa
d
ena[
2:0]
clk[
1:0]
0
1
CLR
LD
D QD
DATA
EN
reg_cascade_out
leout1a
leout1b
lelocal0
leout0a
leout0b
reg_cascade_in
Process Technology Circuit Design
Software FPGA Architecture
ConcurrentDesign
Carefully Manage Risk vs. Reward Can’t Do This Sequentially
32© 2006 Altera Corporation
Complete Design Flow: Quartus IIComplete Design Flow: Quartus II
Synthesis3-rd Partyor Altera
Placement& Routing
PhysicalSynthesis
Timing & Power
Analysis
IP Cores
// Begin: Write Controlalways @ (posedge wrbusy_int)begin
write0 <= 1'b1;write1 <= 1'b0;writex <= 1'b0;
end
always @ (negedge wrbusy_int)begin
write0 <= 1'b0;end
always @ (posedge write0_done)begin
write1 <= 1'b1;
// Begin: Write Controlalways @ (posedge wrbusy_int)begin
write0 <= 1'b1;write1 <= 1'b0;writex <= 1'b0;
end
always @ (negedge wrbusy_int)begin
write0 <= 1'b0;end
always @ (posedge write0_done)begin
write1 <= 1'b1;
// Begin: Write Controlalways @ (posedge wrbusy_int)begin
write0 <= 1'b1;write1 <= 1'b0;writex <= 1'b0;
end
always @ (negedge wrbusy_int)begin
write0 <= 1'b0;end
always @ (posedge write0_done)begin
write1 <= 1'b1;
Verilog,VHDL
ReportAssembler
Over 10 Million Lines of Code!
33© 2006 Altera Corporation
IP Core: Nios II Soft ProcessorIP Core: Nios II Soft Processor
Three CPU Choices: Nios II/f Fast: Optimized for Performance Nios II/s Standard: Faster and Smaller than Nios Nios II/e Economy: Smallest FPGA Footprint
Choose peripherals you want SoPC Builder software builds bus interfaces,
arbitration etc.
Nios II/e Nios II/s Nios II/fSmaller Faster
34© 2006 Altera Corporation
Soft Processors are AffordableSoft Processors are Affordable
Nios II
Nios II
Largest Stratix II180,000 LEs
Small Cyclone II 4600 LEs
600 LEs13% of FPGA
Nios II/e “economy”
1800 LEs, 1% of FPGANios II/f “fast”
FPGAFPGA
Nios II
Nios II
Nios IINios II
35¢ in lowestcost FPGA
35© 2006 Altera Corporation© 2005 Altera Corporation -
Massively Parallel Nios IIBarco Media & Entertainment Olite 510 LED Display System
Massively Parallel Nios IIBarco Media & Entertainment Olite 510 LED Display System
FPGA Used:
Modular LED Display System100 Nios II Processors per square meter!
© 2005 Altera Corporation© 2006 Altera Corporation
Structured ASIC Technology
37© 2006 Altera Corporation
What is a Structured ASIC?What is a Structured ASIC? Use fixed masks for most layers Use customer-specific masks for a few via
& metal layers To customize the logic cells and to route
signals between logic cells Has characteristics between an FPGA and
a standard cell ASIC Faster and smaller than an FPGA But lower development cost & time than a
standard cell ASIC
38© 2006 Altera Corporation
FPGA to Structured ASIC
Two Metal Layers for Customization
Signal Routing
LEs, Memory, PLLs, DSP Blocks,
Internal Routing
CommonBase Die
Flip-ChipBumps
Configuration Routing
39© 2006 Altera Corporation
Start With The FPGA Die
Logic Elements
Configuration Memory & Logic
Memory
Interconnect
Stratix HardCopy Base ArrayStratix HardCopy Base Array
Remove Interconnect SystemRemove Configuration, Logic &Memory Programmability
Resulting Base DieUp to 70% Smaller
40© 2006 Altera Corporation
Development Cost and RiskDevelopment Cost and Risk Mask cost reduced vs. Std. Cell
~5 masks instead of ~30 Verification of crosstalk, electromigration etc.
much easier than Std. Cell Since most layers are standard
Same PLLs, I/Os, RAMs and packages as FPGA Debug your system with an FPGA, then do a drop-in
replacement with HardCopy Can ship systems with FPGA until volume merits going
to HardCopy Can get customer feedback on systems with FPGAs
and tweak before going to HardCopy
41© 2006 Altera Corporation
Identical Operation Identical Operation
EP1S80F1020, 105C, VCC-5% HC1S80F1020, 105C, VCC-5%
Data Rate 840 Mbps, LVDS
Key selling point for Altera HardCopy
42© 2006 Altera Corporation
FPGA to HardCopy CAD FlowFPGA to HardCopy CAD Flow
Stratix IIPOF
QuartusStratix II
Flow
HandoffDesignFiles
HDLCode
HardCopy Design Center
QuartusHardCopy II
FlowHardCopy
Constraints
FPGA Constraints
EqualityChecker
Same CAD flow & guaranteed equivalence
43© 2006 Altera Corporation
2nd Generation: HardCopy II2nd Generation: HardCopy II First generation HardCopy
Removed programmability from FPGA Second generation HardCopy II
Removes programmability Re-maps logic and DSP blocks to a fabric that is more
efficient in a structured ASIC Larger die size reduction
But more complex CAD flow Typical results vs. Stratix II
70% die size reduction 60% power reduction 50% speed increase
44© 2006 Altera Corporation
HardCopy II Logic RemappingHardCopy II Logic RemappingSection of Stratix II Floorplan Section of HardCopy II Floorplan
•Not Drawn to Scale
•Illustration Only, Not Actual Quartus II Floorplan View
HCell Macro Implementations of ALMs
M4K Block
Logic ALMs
45© 2006 Altera Corporation
DSP Block RemappingDSP Block Remapping
Built as Needed using HCell Macros
Can be Placed Anywhere in the Floorplan where HCells Exist
HardCopy II FloorplanStratix II Floorplan(only DSP Blocks Shown)
© 2005 Altera Corporation© 2006 Altera Corporation
Research Challenges
© 2005 Altera Corporation© 2006 Altera Corporation
Power
48© 2006 Altera Corporation
Power ScalingPower Scaling
130 nm and above FPGAs scaled without regard to power Got full performance boost of process
90 nm and below Power-constrained scaling Low-cost FPGA power budget: ¼ W to 3 W High-speed FPGA: 2 W to 20 W Maximum performance within power budget
49© 2006 Altera Corporation
Process Scaling & PowerProcess Scaling & Power Dynamic Power drops per LE
But reduction is less than 50% / LEDoubling LE count increases power budget
Static Power tends to increase Use higher Vt, thicker Tox, longer L on non-
timing-critical circuitry If still too high, sacrifice speed by increasing
Vt, Tox, L on timing-critical circuitry Can compensate by making architecture faster
E.g. Larger LUT
50© 2006 Altera Corporation
Controlling PowerControlling Power 90 nm
Process parameters FPGA CAD tools optimize for power
20% dynamic power reduction
Innovate on performance, then trade for Pstatic E.g. Stratix II ALM: larger LUT
65 & 45 nm Innovation needed!
32 nm Process will likely have better static power
Double-gates FETs, high-K gate dielectric
51© 2006 Altera Corporation
CAD for Power OptimizationCAD for Power Optimization
TimingCritical?
Yes
No
MinDelay
MinArea
Timing-Driven Compiler
TimingCritical?
Yes
No
MinDelay
MinArea
Power-Driven Compiler
PowerCritical?
Yes
No
MinPower
52© 2006 Altera Corporation
E.g. Power-Optimized RAM MappingE.g. Power-Optimized RAM Mapping
Power Efficient Option
16
2:4Decoder
4 256x16 M4K RAMs
Default Option
16
4 1Kx4 M4K RAMs
1K X 16 RAM
53© 2006 Altera Corporation
E.g. Power-Driven Place & RouteE.g. Power-Driven Place & Route Minimize capacitance of high-toggling signals Without violating timing constraints
Power Optimize
100 Million Toggle/s20 Million Toggle/s
© 2005 Altera Corporation© 2006 Altera Corporation
CAD Scalability
55© 2006 Altera Corporation
FPGA Logic & Memory GrowthFPGA Logic & Memory Growth
0
100
200
300
400
500
600
700
1998 1999 2000 2001 2002 2004
Lo
gic
Ele
me
nts
(K
)
Me
mo
ry B
its
(M
bit
s)
20060
10
20
30
40
200945 nm65 nm90 nm130 nm150 nm180 nm250 nm 180 nm
EP20K1500E
EP20K600E
EPF10K200E
EP2A70
EP2S180EP1S80
56© 2006 Altera Corporation
FPGA Capacity vs. CPU SpeedFPGA Capacity vs. CPU Speed
30X logic growth from 1998 to 2006 Over 30X memory bits growth
~8X CPU speed increase from 1998 to 2006
FPGA CAD problem growing more rapidly than CPU speed
But productivity of FPGA designers depends on many compiles To iteratively debug, add features, close timing
57© 2006 Altera Corporation
Compile Time Compile Time Need to find highly scalable algorithms
For placement, routing, synthesis Do not sacrifice result quality
Future: single processor speed-up will fall further behind FPGA capacity growth
But more cores per chip Today: 2 2007: 4 Parallel CAD tools, with same result quality? Need sequentially consistent algorithms, or debugging
is a nightmare
© 2005 Altera Corporation© 2006 Altera Corporation
Increasing Design Abstraction
59© 2006 Altera Corporation
FPGA UsageFPGA Usage
FPGA design is usually done in Hardware Description Language (HDL) Limits FPGA use to hardware designers
FPGAs can: Outperform DSPs Create custom hardware / software systems
that outperform fixed microcontrollers Usage in these fields limited by
unfamiliarity with HDL design
60© 2006 Altera Corporation
Efficiency vs. Development CostEfficiency vs. Development Cost
Low
High
Processor DSP FPGA Struct.ASIC
Std. Cell FullCustom
Power & System Cost*Development Difficulty & Cost
*For applications with significant parallelism
61© 2006 Altera Corporation
Raising Design AbstractionRaising Design Abstraction Ideal: software engineers can design hardware
C to gates Not achievable in general
Practical: domain-specific higher-level tools SoPC builder:
Build a custom microcontroller Integrate IP cores
C-HAC, Impulse, Celoxica: Hardware accelerator for targeted C code, soft processor for rest
DSP Builder: Convert DSP block diagrams to hardware
Other tools?
62© 2006 Altera Corporation
Product
Modern FPGA RTL Design FlowModern FPGA RTL Design Flow
62
IP Cores
Third-PartySoftware
Hardware/Software Debug
Hardware/Software Debug
Design
Verification
Timing Verification& Debug
Timing Verification& Debug
Place-&-Route& Physical Synth.
Place-&-Route& Physical Synth.
RTL LogicSynthesisRTL LogicSynthesis
FunctionalVerificationFunctionalVerification
Custom RTLDevelopment Custom RTLDevelopment
SpecificationSpecification
Compilation& Optimization
Power Analysis
63© 2006 Altera Corporation
Product
Extending the Design FlowExtending the Design Flow
63
Hardware/Software Debug
Hardware/Software Debug
RTL Design FlowRTL Design Flow
Back-end Flow
64© 2006 Altera Corporation
Product
Extending the Design Flow To System LevelExtending the Design Flow To System Level
64
Hardware/Software Debug
Hardware/Software Debug
RTL Design FlowRTL Design Flow
System IntegrationInterface SynthesisSystem IntegrationInterface Synthesis
IP Core ReuseIP Core Reuse
Embedded Soft Processors
Embedded Soft Processors
Higher LevelLanguages
Higher LevelLanguages
HW/SW Interface
Generation
HW/SW Interface
Generation
Back-end Flow
© 2005 Altera Corporation© 2006 Altera Corporation
Structured ASIC Architecture
66© 2006 Altera Corporation
Structured ASIC ArchitectureStructured ASIC Architecture Many questions similar to FPGA
Logic cell, RAM types, structure of custom metal routing layers for best speed, area, power
Metal programmed answers different than FPGA How to keep non-recurring engineering cost low
Few masks? Cheap masks? Make custom layers easy to electrically and optically
verify? Clever tricks? Still have to beat FPGA speed, area, power And device must be routable