Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May...
Reconfigurable HPC
Reconfigurable HPC
part 4miscellaneous
Reiner Hartenstein
TU Kaiserslautern
May 14, 2004 , TU Tallinn, Estonia
© 2004, [email protected] http://hartenstein.de2
TU KaiserslauternTime to Market
• A Fundamental Paradigm Shift in Silicon Application
Revenue/ month
Time / months
1 10 20
ASIC Product
30
Update 1
Product
Update 2
reconfigurable Product with download
[Tom Kean]
© 2004, [email protected] http://hartenstein.de3
TU Kaiserslautern
Makimoto’s 3rd wave
Reconfigurability
The next Revolution:
1978
Transistor entry: Applicon, Calma, CV ...
1992Synthesis: Cadence, Synopsys ...
1985
Schematics entry: Daisy, Mentor, Valid ...
[Keutzer / Newton]
EDA industry paradigmswitching every 7 years
1999(Co-) Compilation &
Data-stream-based (r)DPAs[Hartenstein]
2006
Paradigm Shift
Mainstream
TornadoM
cKin
sey
Curv
e
[Richard Newton]
[Keutzer / Newton]
82% of designershate their tools
© 2004, [email protected] http://hartenstein.de4
TU KaiserslauternSoftware to Configware
Migration
this talk will illustrate the performance benfitwhich may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC),point of view, this talk hardly mentions FPGAs(But coarse grain may be always mapped onto FPGAs)
Software to Configware Migration is the most important source of speed-upHardware is just frozen Configware
© 2004, [email protected] http://hartenstein.de5
TU Kaiserslautern
directly delivered to the customer: completely configured
0.1 3
2001 2002 2003 2004
year
50,000
40,000
30,000
20,000
10,000
0c)
number of design starts
rGA-based
[N. Tredennick, Gilder Technology Report, 2003]
omit emulation
avoiding specific silicon ….avoiding specific silicon ….
© 2004, [email protected] http://hartenstein.de6
TU KaiserslauternMega-rGAs
10 000 000
1 000 000
100 000
10 000
1 000
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
planned
Virtex II
XC 40250XV
Virtex
XC 4085XL
100
System gates per rGA chip
Jahr
[Xilinx Data]
200
500
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
7
Embedded hardw. CPU & memory cores on chip.
HLL Compiler
CPUcore
FPGA core
Memorycore
HLL Compiler
[à la S. Guccione]
© 2004, [email protected] http://hartenstein.de8
TU Kaiserslautern
• FPGA Fabric-based on Virtex-II Architecture
Source: Ivo Bolsens, Xilinx
On Chip Memory Controller
Power PCCore
EmbededRAM
RocketIO
entire system on a single chip
all you need on boardall you need on board• Xilinx Virtex-II Pro
FPGA Architecture
• PowerPC 405 RISC CPU (PPC405) cores
© 2004, [email protected] http://hartenstein.de9
TU Kaiserslautern
What’s Wrong with This Picture?
1.Still Have to Make the Chip2.Need Two Sets of Software to Build It
– The ASIC Flow– The PLD Flow
3.Have No Idea What to Connect the PLD Pins to – Chances Are, You Are Going to Get It Wrong!
Embedded FPGA Fabric
[Jonathan Rose]
What About PLD Cores on
ASICs ?
© 2004, [email protected] http://hartenstein.de10
TU Kaiserslautern
What’s Right with This Picture!
1.Pre-Fabricated2.One CAD Tool Flow!3.Can Connect Anything to Anything
PLDs are built for general connectivity
Embedded CPU Serial Link,Analog, “etc.”
[Jonathan Rose]
© 2004, [email protected] http://hartenstein.de11
TU Kaiserslautern>> rGAs <<
•rGAs
•Placement & Routing
•Soft Processors
•History of Frameworks
•RTR
•Support by rGA vendors
•EDA
•Future directions
•conclusionshttp://www.uni-kl.de
© 2004, [email protected] http://hartenstein.de12
TU Kaiserslautern
Different Morphware-Platforms:
Reconfigurable Logic Blocks
Reconfigurable Interconnect Blocks
Reconfigurable Datapath Arrays
fine grain reconfigurable
coarse grain reconfigurable
Reconfigurable interconnect fabrics
© 2004, [email protected] http://hartenstein.de13
TU Kaiserslautern
switch
rGA w. island architecture(Ausschnitt)
© 2003, [email protected] http://hartenstein.de13
Interkonnect-
Fabricsswitch box
connect box
reconfigurable logic block
© 2004, [email protected] http://hartenstein.de14
TU Kaiserslautern Switch boxTU Kaiserslautern
Xputer Lab
© 2003, [email protected] http://hartenstein.de14
switch point
switch box
© 2004, [email protected] http://hartenstein.de15
TU Kaiserslautern connect boxTU Kaiserslautern
Xputer Lab
© 2003, [email protected] http://hartenstein.de15
point
© 2004, [email protected] http://hartenstein.de16
TU Kaiserslautern
Verbindungspunkt
(vergrößert)
conncect point activatedTU Kaiserslautern
Xputer Lab
© 2003, [email protected] http://hartenstein.de16
© 2004, [email protected] http://hartenstein.de17
TU Kaiserslautern
der 4. Schaltpunkt
der 5. Schaltpunkt
3 Schaltpunkte switch boxes activatedTU Kaiserslautern
Xputer Lab
© 2003, [email protected] http://hartenstein.de17
switch point
switch box
© 2004, [email protected] http://hartenstein.de18
TU KaiserslauternResult
TU KaiserslauternXputer Lab
© 2003, [email protected]
18http://hartenstein.de
© 2004, [email protected] http://hartenstein.de19
TU KaiserslauternTU Kaiserslautern
Xputer Lab
A
B
Routing completed
for 1 net
© 2003, [email protected]
1979 Silva Lisco (Silicon Valley Research Corp.) offers CALM-P
20 Transistors + 20 Flipflops
http://hartenstein.de19
© 2004, [email protected] http://hartenstein.de20
TU Kaiserslautern>> Placement & Routing <<
•rGAs
•Placement & Routing
•Soft Processors
•History of Frameworks
•RTR
•Support by rGA vendors
•EDA
•Future directions
•conclusionshttp://www.uni-kl.de
© 2004, [email protected] http://hartenstein.de21
TU Kaiserslautern
A
B
passing through
Routing:long distance net
At a time a path may be used only for one signal...
... Bridges of Königsberg
© 2004, [email protected] http://hartenstein.de22
TU Kaiserslautern
A
B
CCDD
C and D are not reachable
C and D need another placement
Routing congestion
C cannot beconnected with D.
rLBs are not 100% usable
© 2004, [email protected] http://hartenstein.de23
TU Kaiserslautern
Leonhard Euler
Euler‘s Problem of the bridges Königsberg is such a network (1736):
Find a way, which crosses each bridge exactly once ..... ... Also an optimization: none of the bridges is unused.
1736
© 2004, [email protected] http://hartenstein.de24
TU Kaiserslautern
L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140
Graph
edge
node
Left Bank
Right Bank
Kneiphof Island
Other Island
© 2004, [email protected] http://hartenstein.de25
TU Kaiserslautern
1913 J. N. Reynold‘s crossbar switch
1915 patent granted
1926 first public telefon switching application in Shweden
Betulander‘s crossbar switch 1919
NASA telemetrics crossbar array 1964
CrossbarCrossbr switch
© 2004, [email protected] http://hartenstein.de26
TU KaiserslauternCrossbar complete?
One bar connects 2 pins
Size of full complete switchs: n x n / 2
n x n/2n
4 8100 5000
cossbar chips in
a row fulln
4100
partial
no of crossbar chips needed
Crossbar Chipsavailable from
Aptix, Texas Instruments
and others
© 2004, [email protected] http://hartenstein.de27
TU KaiserslauternRouting
congestion example with
detour
Direct connection impossible
rGA rGA rGA rGA
Routing through
Detour connection
rLB
Identityfunction
configured
Routing-Resources:Logic gates and/or pass transistors
© 2003, [email protected] 27
© 2004, [email protected] http://hartenstein.de28
TU Kaiserslautern Crossbar-based Architectures
1993: PADY-II (Jan Rabaey)
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
crossbar switchI/OI/O
1990: UC Berkeley (Jan Rabaey)
16 bit
1997: Pleiades (mesh & crossbar)
32 bit
© 2004, [email protected] http://hartenstein.de29
TU Kaiserslautern
PADDI-II Architecture
NetworkP47
P48
P46
P45
P1P2P3P4
P5P6P7P8
P9P10P11P12
P13P14P15P16
P17P18P19P20
P21P22P23P24
P25P26P27P28
P29P30P31P32
P33P34P35P36
P37P38P39P40
P41P42P43P44
P45P46P47P48
bre
ak-s
wit
ch
bre
ak-s
wit
ch
I/O I/O I/O I/O
I/O I/O I/O I/O
6 x 16b
16 x 6 switch matrix
Level-2
16 x 16b
Level-1 Network
4-PE Cluster
© 2004, [email protected] http://hartenstein.de30
TU Kaiserslautern
>> Soft Processors <<
http://www.uni-kl.de
•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions
© 2004, [email protected] http://hartenstein.de31
TU Kaiserslautern
FPGA CPUs in teaching and academic research
• UCSC: 1990! • Märaldalen University,
Eskilstuna, Sweden • Chalmers University,
Göteborg, Sweden• Cornell University• Gray Research• Georgia Tech • Hiroshima City University,
Japan
• Michigan State• Universidad de
Valladolid, Spain• Virginia Tech• Washington
University, St. Louis • New Mexico Tech• UC Riverside • Tokai University, Japan
© 2004, [email protected] http://hartenstein.de32
TU Kaiserslautern
Some soft CPU core examples
core architecture platform
MicroBlaze 125 MHz 70 D-MIPS
32 bit standard RISC32 reg. by 32 LUT RAM-based reg.
Xilinx up to 100 on one FPGA
Nios 16-bit instr. set
Altera Mercury
Nios 50 MHz
32-bit instr. set
Altera 22 D-MIPS
Nios 8 bit Altera – Mercury
gr1040 16-bit
gr1050 32-bit
My80 i8080A FLEX10K30 or EPF6016
DSPuva16 16 bit DSP Spartan-II
core architecture platform
Leon25 Mhz
SPARC
ARM7 clone ARM
uP1232 8-bit
CISC, 32 reg. 200 XC4000E CLBs
REGIS 8 bits Instr. + ext. ROM
2 XILINX 3020 LCA
Reliance-1 12 bit DSP Lattice 4 isp30256, 4 isp1016
1Popcorn-1 8 bit CISC Altera, Lattice, Xilinx
Acorn-1 1 Flex 10K20
YARD-1A 16-bit RISC, 2 opd. Instr.
old Xilinx FPGA Board
xr16 RISC integer C SpartanXL
© 2004, [email protected] http://hartenstein.de33
TU Kaiserslautern
einige „soft CPU core“ Beispiele
Spartan-II16 bit DSPDSPuva16
FLEX10K30 or EPF6016
i8080AMy80
32-bit gr1050
16-bitgr1040
Altera – Mercury
8 bitNios
Altera 22 D-MIPS
32-bit instr. set
Nios 50 MHz
Altera Mercury
16-bit instr. set
Nios
Xilinx up to 100 on one FPGA
32 bit standard RISC32 reg. by 32 LUT RAM-based reg.
MicroBlaze 125 MHz 70 D-MIPS
platformarchitecturecore
SpartanXLRISC integer Cxr16
old Xilinx FPGA Board
16-bit RISC, 2 opd. Instr.
YARD-1A
1 Flex 10K20Acorn-1
Altera, Lattice, Xilinx
8 bit CISC1Popcorn-1
Lattice 4 isp30256, 4 isp1016
12 bit DSPReliance-1
2 XILINX 3020 LCA
8 bits Instr. + ext. ROM
REGIS
200 XC4000E CLBs
CISC, 32 reg.uP1232 8-bit
ARMARM7 clone
SPARCLeon25 Mhz
platformarchitecturecore
Configware !
(keine Hardware)
Configware !
(keine Hardware)
Retro-
Emulation
Retro-
Emulation
© 2004, [email protected] http://hartenstein.de34
TU KaiserslauternIt’s a Paradigm Shift !
• Using FPGAs (fine grain reconfigurable) just mainly has been classical Logic Synthesis on a “strange hardware” platform
• Coarse Grain Reconfigurable Arrays (rDPAs) (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift
• This is still ignored by CS and EE Curricula and almost all R&D scenes
© 2004, [email protected] http://hartenstein.de35
TU Kaiserslautern
Why the speed-up ...
... although FPGA is clock slower by x 3 or even more(most know-how from „high level synthesis“ discipline)
moving operator to the data stream (before run time)
support operations: no clock nor memory cycle
decisions without memory cycles nor clock cycles
most „data fetch“ without memory cycle
© 2004, [email protected] http://hartenstein.de36
TU Kaiserslautern>> History of Frameworks
<<
http://www.uni-kl.de
•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
37
Goal: away from complex design flow
Placeand
Route NetlistSchematics/
HDL Netlister
Bitstream
CompilerHLL
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
38
Overcome traditional separate design flow
UserCode Compiler Executable
Netlister NetlistPlaceand
Route..
Bitstream
Schematics/HDL
HLL Compiler
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
39
Overcome traditional co-processing design separate flow -> JBits Design
Flow
UserJavaCode
JavaCompiler
JBitsJBitsAPI
Executable
UserCode Compiler Executable
Netlister NetlistPlaceand
Route..
Bitstream
Schematics/HDL
[à la S. Guccione]
© 2004, [email protected] http://hartenstein.de40
TU Kaiserslautern
new directions in application development
• new directions in application development.
• aut. partitioning compilers: designer productivity
• like CoDe-X (Jürgen Becker, Univ. of Karlsruhe),
• supports Run-Time Reconfiguration (RTR), a key enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet.
© 2004, [email protected] http://hartenstein.de41
TU Kaiserslautern
•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions
>> RTR <<
http://www.uni-kl.de
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
42
CPU use for configuration management
• on-board microprocessor CPU is available anyhow - even along with a little RTOS
• use this CPU for configuration management
CompilerHLL
RTR System Design
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
43
hard CPU & memory core on same chip
CPUcore
FPGA core
Memorycore
CompilerHLL
CompilerHLL
RTR System Design
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
44
Converging factors for RTR
UserJavaCode
JavaCompiler
JBitsJBitsAPI
Executable
•Converging factors make RTR based system design viable
•1) million gate FPGA devices and co-processing with standard microprocessors are commonplace
•direct implementation of complex algorithms in FPGAs.
•This alone has already revolutionized FPGA design.
•2) new tools like Xilinx Jbits software tool suite directly support coprocessing and RTR.
© 2004, [email protected] http://hartenstein.de45
TU Kaiserslautern
RTR
•divides application into a series of sequentially executed stages, each mapped as a separate execution module.
•Excellent example :Xtrem platform by PACT AG, Munich
•Without RTR, all configurable platforms just ASIC emulators.
•directly support development and debugging of RTR applications
•will also heavily influence the future system organization
© 2004, [email protected] http://hartenstein.de46
TU Kaiserslautern
•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions
>> Support by rGA vendors <<
http://www.uni-kl.de
© 2004, [email protected] http://hartenstein.de47
TU Kaiserslautern
>> Support …
• Support by FPGA Vendors– Xilinx
• Software by Xilinx • Configware (soft IP Cores) • Hardware
– Altera • Software • Configware • Hardware
© 2004, [email protected] http://hartenstein.de48
TU Kaiserslautern
Xilinx
• fabless FPGA semi vendor, San Jose, Ca, founded 1984• key patents on FPGAs (expiring in a few years) • Fortune 2001: No. 14 Best Company to work for in (intel: no. 42, hp no. 64,
TI no. 65). • DARPA grant (Nov‘99) to develop Jbits API tools for internet
reconfigurable / upgradable logic (w. VT) • Less brilliant early/mid 90ies (president Curt Wozniak): 1995 market share
from 84% down to 62% [Dataquest]• As designs get larger, Xilinx losed its advantage (bugfixes did not require
to burn new chips)• meanwhile, weeks of expensive debug time needed
© 2004, [email protected] http://hartenstein.de49
TU Kaiserslautern
Software by Xilinx
• Full design flow from Cadence, Mentor, and Synopsys• Xilinx Software AllianceEDA Program:
– Alliance Series Development System.– Foundation Series Development Systems.– Xilinx Foundation Series ISE (Integrated Synthesis Environment)– free WebPOWERED SW w. WebFitter & WebPACK-ISE– StateCAD XE and HDL Bencher– Foundation Base Express– Foundation ISE Base Express ----- More:
• ModelSim Xilinx Edition (ModelSim XE) | Forge Compiler | Modular Design | Chipscope ILA | The Xilinx System Generator| XPower| JBits SDK | The Xilinx XtremeDSP Initiative| MathWorks / Xilinx Alliance| System Generator| The Wind River / Xilinx alliance|
© 2004, [email protected] http://hartenstein.de50
TU Kaiserslautern
Configware (soft IP Products)
• For libraries, creation and reuse of configware• To search for IPs see: List of all available IP• The AllianceCORE program is a cooperation
between Xilinx and third-party core developers• The Xilinx Reference Design Alliance Program • The Xilinx University Program • LogiCORE soft IP with LogiCORE PCI Interface.• Consultants
© 2004, [email protected] http://hartenstein.de51
TU Kaiserslautern
Xilinx hardware
• Virtex, Virtex-II, first w. 1 mio system gates. – Virtex-E series > 3 mio system gates.
• Virtex-EM on a copper process & addit. on chip memory f. network switch appl.
• The Virtex XCV3200E > 3 million gates, 0.15-micron technology,
• Spartan, Spartan-XL, Spartan-II– for low-cost, high volume applications as ASIC replacements– Multiple I/O standards, on-chip block RAM, digital delay lock loops – eliminate phase lock loops, FIFOs, I/O xlators , system bus drivers
• XC4000XV, XC4000XL/XLA, CPLD: low-cost families – rapid development, longer system life, robust field upgradability– support In-System Programming (ISP), in-board debugging,– test during manufacturing, field upgrades, full JTAG compliant
interface
• CoolRunner: low power, high speed/density, standby mode.• Military & Aerospace: QPRO high-reliability QML certified• Configuration Storage Devices
© 2004, [email protected] http://hartenstein.de52
TU Kaiserslautern
Altera
•Altera was founded in June 1983•EDA: synthesis, place & route, and, verification•Quartus II: APEX, Excalibur, Mercury, FLEX 6000 families•MAX+PLUS II: FLEX, ACEX & MAX families•Flow with Quartus II: Mentor Graphics, Synopsys, Synplicity deliver a
design design software to support Altera SOPC solutions. •Mentor: only EDA vendor w. complete design environment f. APEX II
incl. IP, design capture, simulation, synthesis, and h/s co-verification•Configware: Altera offers over a hundred IP cores•Third party IP core design services and consultants
© 2004, [email protected] http://hartenstein.de53
TU Kaiserslautern
Altera hardware
• Newer families: APEX 20KE, APEX 20KC, APEX II, MAX 7000B, ACEX 1K, Excalibur, Mercury families. – Apex EP20K1500E (0.18-µ), up to 2.4 mio system gates, – APEX II (all-copper 0.13-µ) f. data path applications,
supports many I/O standards. 1-Gbps True-LVDS performance
– wQ2001, an ARM-based Excalibur device
• Altera mainstream: MAX 7000A, 3000A; FLEX 6000, 10KA, 10KE; APEX 20K families.
• Mature and other : Classic, MAX 7000, 7000S, 9000; FLEX 8000, 10K families.
© 2004, [email protected] http://hartenstein.de54
TU Kaiserslautern
•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions
>> EDA <<
http://www.uni-kl.de
© 2004, [email protected] http://hartenstein.de55
TU Kaiserslautern
>> EDA <<
• EDA as the Key Enabler (major EDA vendors) • Altera • Cadence • Mentor Graphics • Synopsys• Xilinx • Changing EDA Tools Market
© 2004, [email protected] http://hartenstein.de56
TU Kaiserslautern
EDA as the Key Enabler (major EDA vendors)
•Select EDA quality / productivity, not FPGA architectures•EDA often has massive software quality problems •Customer: highest priority EDA center of excellence
– collecting EDA expertise and EDA user experience– to assemble best possible tool environments – for optimum support design teams– to cope with interoperability problems – to keep track with the EDA scene as a rapidly moving target
•being fabless, FPGA vendors spend most qualified manpower in development of EDA, IP cores, applications , support
•Xilinx and Altera are morphing into EDA companies.
© 2004, [email protected] http://hartenstein.de57
TU Kaiserslautern
Cadence
•FPGA Designer: top-down FPGA design system,
•high-level mapping, architecture-specific optimization,
•Verilog,VHDL, schematic-level design entry.
•Verilog, VHDL to Synergy (logic synthesis) and FPGA Designer
•FPGAs simulated by themselves using Cadence's Verilog-XL or Leapfrog VHDL simulators and
•simulated w. rest of the system design w. Logic Workbench board/system verification env‘ment.
•Libraries for the leading FPGA manufacturers.
© 2004, [email protected] http://hartenstein.de58
TU Kaiserslautern
Mentor Graphics
• System Design and Verification. • PCB design and analysis:• IC Design and Verification• shifts ASIC design flow to FPGAs (Altera, Xilinx)
– by FPGA Advantage with IP support – by ModuleWare, – Xilinx CORE Generator – Altera MegaWizard integration,
© 2004, [email protected] http://hartenstein.de59
TU Kaiserslautern
Synopsys
• FPGA Compiler II• Version of ASIC Design Compiler Ultra• Block Level Incremental Synthesis (BLIS)• ASIC <-> FPGA migration• Actel, Altera, Atmel, Cypress, Lattice, Lucent,
Quicklogic, Triscend, Xilinx
© 2004, [email protected] http://hartenstein.de60
TU Kaiserslautern
new directions in application development
• new directions in application development. • aut. partitioning compilers: designer productivity • like CoDe-X (Jürgen Becker, Univ. of Karlsruhe), • supports Run-Time Reconfiguration (RTR), a key
enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet.
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
61
Converging factors for RTR
UserJavaCode
JavaCompiler
JBitsJBitsAPI
Executable
•Converging factors make RTR based system design viable
•1) million gate FPGA devices and co-processing with standard microprocessors are commonplace
•direct implementation of complex algorithms in FPGAs.
•This alone has already revolutionized FPGA design.
•2) new tools like Xilinx Jbits software tool suite directly support coprocessing and RTR.
© 2004, [email protected] http://hartenstein.de62
TU Kaiserslautern
RTR
•divides application into a series of sequentially executed stages, each implemented as a separate execution module.
•Partial RTR partitions these stages into finer-grain sub-modules to be swapped in as needed. •Without RTR, all conf. platforms just ASIC emulators. •needs a new kind of application development environments. •directly support development and debugging of RTR appl.•essential for the advancement of configurable computing•will also heavily influence the future system organization•Xilinx, VT, BYU work on run-time kernels, run-time support, RTR debugging tools and other associated tools.
•smaller, faster circuits, simplified hardware interfacing, fewer IOBs; smaller, cheaper packages, simplified software interfaces.
© 2004, [email protected] http://hartenstein.de63
TU Kaiserslautern
Run-time Mapping
•run-time reconfigurable are: Xilinx VIRTEX FPGA family•RAs being part of Chameleon CS2000 series systems •Using such devices changes many of the basic assumptions in the HW/SW co-design process:
•host/RL interaction is dynamic, needs a tiny OS like eBIOS, also to organize RL reconfiguration under host control
•typical goal is minimization of reconfiguration latency (especially important in communication processors), to hide configuration loading latency, and,
•Scheduling to find ’best’ schedule for eBIOS calls (C~side).
© 2004, [email protected] http://hartenstein.de64
TU Kaiserslautern>> future directions <<
•rGAs
•Placement & Routing
•Soft Processors
•History of Frameworks
•RTR
•Support by rGA vendors
•EDA
•Future directions
•conclusionshttp://www.uni-kl.de
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
65
Soft CPU: new job for compilers
softCPU
FPGA
MemorycoreFPGA
CompilerHLL
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
66
Soft rDPA feasible ?
rDPUArray
rDPUArray
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
67
Array I/O examples
rDPUArray
rDPUArray
data streams, or, from / to embedded memory banks
data streams,
or,from / to
embedded memory
banks
1
10
100
1000Performance
1980 1990 2000
µProc60%/yr..
DRAM7%/yr..
Processor-MemoryPerformance Gap:(grows 50% / year)
DRAM
CPU
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
68
HLL 2 Soft Array
Memorysoft CPU
miscellanous
soft
soft
DPUDPU
arra
y
arra
ysoft
soft
DPUDPU
arra
y
arra
y
HLL Compiler
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
69
HLL 2 „flex“ rDPA
MemoryCPU
miscellanous
rDPU
rDPU
arra
y
arra
yrD
PUrD
PU
arra
y
arra
y
HLL Compiler
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
70
>> HLLs <<
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
71
HLLs for Hardware Design vs. System Design vs. RTR System
Design
HLL Compiler
System Design
CompilerHLL
RTR System Design[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
72
HLLs for Hardware Design vs. System Design vs. RTR System
Design
HLL Compiler
System Design
CompilerHLL
RTR System Design
CompilerHLL
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
73
CPU and memory on Chip
CPUcore
FPGA core
Memorycore
CompilerHLL
CompilerHLL
RTR System Design
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
74
Jbit Environment
RTP CoreLibrary
JRouteAPI
DeviceSimulator
UserCode
BoardScopeDebugger
XHWIF
JBitsAPI
TCP/IP
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
75
HLLs for Hardware Design vs. System Design vs. RTR System
Design
CompilerHLL
HLL Compiler
System Design
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de
University of Kaiserslautern
Xputer LabTU Kaiserslautern
76
Embedded System Design
HLL Compiler
CPUcore
FPGA core
Memorycore
HLL Compiler
softCPU
FPGA
MemorycoreFPGA
[à la S. Guccione]
© 2002, [email protected] http://kressarray.de77
University of Kaiserslautern
Xputer LabTU Kaiserslautern
>> conclusions <<
•rGAs
•Placement & Routing
•Soft Processors
•History of Frameworks
•RTR
•Support by rGA vendors
•EDA
•Future directions
•conclusionshttp://www.uni-kl.de
© 2004, [email protected] http://hartenstein.de78
TU Kaiserslautern
© 2001, [email protected] http://KressArray.de
University of Kaiserslautern
Xputer Lab missing the next revolution
Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula
Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula
causing the waste billions of dollars.causing the waste billions of dollars.
is one of the biggest mistakes in the history of information technology application
is one of the biggest mistakes in the history of information technology application
© 2004, [email protected] http://hartenstein.de79
TU Kaiserslautern
„EDA industry shifts into CS mentality“
[Wojciech Maly]•Microprogramming to replace FSM design
•Hardware languages replace EE-type schematics
•EDA Software and its interfacing languages
•Newer system level languages like systemC etc.
•Small and large module re-use
•Hierarchical organization of designs, EDA, et al.
•.....................
© 2004, [email protected] http://hartenstein.de80
TU Kaiserslautern
„EDA industry shifts into CS mentality“
[Wojciech Maly]
•Which language to select ?
© 2004, [email protected] http://hartenstein.de81
TU Kaiserslautern
roadmap
old CS lab course philosophy:given an application: implement it by a program
-/-new CS freshman lab course environment:Given an application:
a) implement it by writing a programb) implement it as a morphware prototypec) Partition it into P and Q
c.1) implement P by softwarec.2) implement Q by morphwarec.3) implement P / Q communication interface
© 2004, [email protected] http://hartenstein.de82
TU Kaiserslautern
All enabling technologies are available
•anti machine and all its architectural resources
•parallel memory IP cores and generators
•anything else needed
•languages & (co-)compilation techniques
•morphware vendors like PACT ....
•literature from last 30 years
© 2004, [email protected] http://hartenstein.de84
TU KaiserslauternThe dichotomy of models
• Note for von Neumann: state register is with the CPU
• Note for the anti machine: state register is with memory bank / state registers are within memory banks
© 2004, [email protected] http://hartenstein.de85
TU Kaiserslautern
Machine Paradigms
machine category Computer (the Machine:
“v. Neumann”) The Anti Machine
driven by: Instruction streams data streams (no “dataflow”)
engine principles instruction sequencing sequencing data streams
state register single program counter (multiple) data counter(s)
Communication path set-up .
at run time at load time
resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path
operation sequential parallel pipe network etc.
( “instruction fetch” )
also hardwired implementations**) e g. Bee project Prof. Broderson
© 2004, [email protected] http://hartenstein.de86
TU Kaiserslautern
benefit from RAM-based & 2nd paradigm
RAM-based platform needed for:• flexibility, programmability
• avoiding the need of specific silicon
mask cost: currently 2 mio $ - rapidly growing
1)
simple 2nd machine paradigm needed as a common model:• to avoid the need of circuit expertize
• needed to to educate zillions of programmers
2)
© 2004, [email protected] http://hartenstein.de87
TU Kaiserslautern
Design Space Exploration Systems
Explorer System year sourceinter-active
status evaluation status generation
DPE 1991 [66] no abstract models rule-based
Clio 1992 [67] yes prediction models device generator
DIA 1998 [68] yes prediction from library rule-based
DSE for RAW 1998 [49] no analytical models analytical
ICOS 1998 [76] no fuzzy logic greedy search
DSE for Multimedia 1999 [77] no simulation branch and bound
Xplorer 1999 [11] [50] yes fuzzy rule-based simulated annealing