Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May...

87
Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May 14, 2004 , TU Tallinn, Estonia
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May...

Reconfigurable HPC

Reconfigurable HPC

part 4miscellaneous

Reiner Hartenstein

TU Kaiserslautern

May 14, 2004 , TU Tallinn, Estonia

© 2004, [email protected] http://hartenstein.de2

TU KaiserslauternTime to Market

• A Fundamental Paradigm Shift in Silicon Application

Revenue/ month

Time / months

1 10 20

ASIC Product

30

Update 1

Product

Update 2

reconfigurable Product with download

[Tom Kean]

© 2004, [email protected] http://hartenstein.de3

TU Kaiserslautern

Makimoto’s 3rd wave

Reconfigurability

The next Revolution:

1978

Transistor entry: Applicon, Calma, CV ...

1992Synthesis: Cadence, Synopsys ...

1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]

EDA industry paradigmswitching every 7 years

1999(Co-) Compilation &

Data-stream-based (r)DPAs[Hartenstein]

2006

Paradigm Shift

Mainstream

TornadoM

cKin

sey

Curv

e

[Richard Newton]

[Keutzer / Newton]

82% of designershate their tools

© 2004, [email protected] http://hartenstein.de4

TU KaiserslauternSoftware to Configware

Migration

this talk will illustrate the performance benfitwhich may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC),point of view, this talk hardly mentions FPGAs(But coarse grain may be always mapped onto FPGAs)

Software to Configware Migration is the most important source of speed-upHardware is just frozen Configware

© 2004, [email protected] http://hartenstein.de5

TU Kaiserslautern

directly delivered to the customer: completely configured

0.1 3

2001 2002 2003 2004

year

50,000

40,000

30,000

20,000

10,000

0c)

number of design starts

rGA-based

[N. Tredennick, Gilder Technology Report, 2003]

omit emulation

avoiding specific silicon ….avoiding specific silicon ….

© 2004, [email protected] http://hartenstein.de6

TU KaiserslauternMega-rGAs

10 000 000

1 000 000

100 000

10 000

1 000

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

planned

Virtex II

XC 40250XV

Virtex

XC 4085XL

100

System gates per rGA chip

Jahr

[Xilinx Data]

200

500

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

7

Embedded hardw. CPU & memory cores on chip.

HLL Compiler

CPUcore

FPGA core

Memorycore

HLL Compiler

[à la S. Guccione]

© 2004, [email protected] http://hartenstein.de8

TU Kaiserslautern

• FPGA Fabric-based on Virtex-II Architecture

Source: Ivo Bolsens, Xilinx

On Chip Memory Controller

Power PCCore

EmbededRAM

RocketIO

entire system on a single chip

all you need on boardall you need on board• Xilinx Virtex-II Pro

FPGA Architecture

• PowerPC 405 RISC CPU (PPC405) cores

© 2004, [email protected] http://hartenstein.de9

TU Kaiserslautern

What’s Wrong with This Picture?

1.Still Have to Make the Chip2.Need Two Sets of Software to Build It

– The ASIC Flow– The PLD Flow

3.Have No Idea What to Connect the PLD Pins to – Chances Are, You Are Going to Get It Wrong!

Embedded FPGA Fabric

[Jonathan Rose]

What About PLD Cores on

ASICs ?

© 2004, [email protected] http://hartenstein.de10

TU Kaiserslautern

What’s Right with This Picture!

1.Pre-Fabricated2.One CAD Tool Flow!3.Can Connect Anything to Anything

PLDs are built for general connectivity

Embedded CPU Serial Link,Analog, “etc.”

[Jonathan Rose]

© 2004, [email protected] http://hartenstein.de11

TU Kaiserslautern>> rGAs <<

•rGAs

•Placement & Routing

•Soft Processors

•History of Frameworks

•RTR

•Support by rGA vendors

•EDA

•Future directions

•conclusionshttp://www.uni-kl.de

© 2004, [email protected] http://hartenstein.de12

TU Kaiserslautern

Different Morphware-Platforms:

Reconfigurable Logic Blocks

Reconfigurable Interconnect Blocks

Reconfigurable Datapath Arrays

fine grain reconfigurable

coarse grain reconfigurable

Reconfigurable interconnect fabrics

© 2004, [email protected] http://hartenstein.de13

TU Kaiserslautern

switch

rGA w. island architecture(Ausschnitt)

© 2003, [email protected] http://hartenstein.de13

Interkonnect-

Fabricsswitch box

connect box

reconfigurable logic block

Rainer Hartenstein

© 2004, [email protected] http://hartenstein.de14

TU Kaiserslautern Switch boxTU Kaiserslautern

Xputer Lab

© 2003, [email protected] http://hartenstein.de14

switch point

switch box

© 2004, [email protected] http://hartenstein.de15

TU Kaiserslautern connect boxTU Kaiserslautern

Xputer Lab

© 2003, [email protected] http://hartenstein.de15

point

© 2004, [email protected] http://hartenstein.de16

TU Kaiserslautern

Verbindungspunkt

(vergrößert)

conncect point activatedTU Kaiserslautern

Xputer Lab

© 2003, [email protected] http://hartenstein.de16

© 2004, [email protected] http://hartenstein.de17

TU Kaiserslautern

der 4. Schaltpunkt

der 5. Schaltpunkt

3 Schaltpunkte switch boxes activatedTU Kaiserslautern

Xputer Lab

© 2003, [email protected] http://hartenstein.de17

switch point

switch box

© 2004, [email protected] http://hartenstein.de18

TU KaiserslauternResult

TU KaiserslauternXputer Lab

© 2003, [email protected]

18http://hartenstein.de

Rainer Hartenstein

© 2004, [email protected] http://hartenstein.de19

TU KaiserslauternTU Kaiserslautern

Xputer Lab

A

B

Routing completed

for 1 net

© 2003, [email protected]

1979 Silva Lisco (Silicon Valley Research Corp.) offers CALM-P

20 Transistors + 20 Flipflops

http://hartenstein.de19

© 2004, [email protected] http://hartenstein.de20

TU Kaiserslautern>> Placement & Routing <<

•rGAs

•Placement & Routing

•Soft Processors

•History of Frameworks

•RTR

•Support by rGA vendors

•EDA

•Future directions

•conclusionshttp://www.uni-kl.de

© 2004, [email protected] http://hartenstein.de21

TU Kaiserslautern

A

B

passing through

Routing:long distance net

At a time a path may be used only for one signal...

... Bridges of Königsberg

Rainer Hartenstein

© 2004, [email protected] http://hartenstein.de22

TU Kaiserslautern

A

B

CCDD

C and D are not reachable

C and D need another placement

Routing congestion

C cannot beconnected with D.

rLBs are not 100% usable

© 2004, [email protected] http://hartenstein.de23

TU Kaiserslautern

Leonhard Euler

Euler‘s Problem of the bridges Königsberg is such a network (1736):

Find a way, which crosses each bridge exactly once ..... ... Also an optimization: none of the bridges is unused.

1736

© 2004, [email protected] http://hartenstein.de24

TU Kaiserslautern

L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140

Graph

edge

node

Left Bank

Right Bank

Kneiphof Island

Other Island

© 2004, [email protected] http://hartenstein.de25

TU Kaiserslautern

1913 J. N. Reynold‘s crossbar switch

1915 patent granted

1926 first public telefon switching application in Shweden

Betulander‘s crossbar switch 1919

NASA telemetrics crossbar array 1964

CrossbarCrossbr switch

© 2004, [email protected] http://hartenstein.de26

TU KaiserslauternCrossbar complete?

One bar connects 2 pins

Size of full complete switchs: n x n / 2

n x n/2n

4 8100 5000

cossbar chips in

a row fulln

4100

partial

no of crossbar chips needed

Crossbar Chipsavailable from

Aptix, Texas Instruments

and others

© 2004, [email protected] http://hartenstein.de27

TU KaiserslauternRouting

congestion example with

detour

Direct connection impossible

rGA rGA rGA rGA

Routing through

Detour connection

rLB

Identityfunction

configured

Routing-Resources:Logic gates and/or pass transistors

© 2003, [email protected] 27

© 2004, [email protected] http://hartenstein.de28

TU Kaiserslautern Crossbar-based Architectures

1993: PADY-II (Jan Rabaey)

EXUCTL

EXUCTL

EXUCTL

EXUCTL

EXUCTL

EXUCTL

EXUCTL

EXUCTL

crossbar switchI/OI/O

1990: UC Berkeley (Jan Rabaey)

16 bit

1997: Pleiades (mesh & crossbar)

32 bit

© 2004, [email protected] http://hartenstein.de29

TU Kaiserslautern

PADDI-II Architecture

NetworkP47

P48

P46

P45

P1P2P3P4

P5P6P7P8

P9P10P11P12

P13P14P15P16

P17P18P19P20

P21P22P23P24

P25P26P27P28

P29P30P31P32

P33P34P35P36

P37P38P39P40

P41P42P43P44

P45P46P47P48

bre

ak-s

wit

ch

bre

ak-s

wit

ch

I/O I/O I/O I/O

I/O I/O I/O I/O

6 x 16b

16 x 6 switch matrix

Level-2

16 x 16b

Level-1 Network

4-PE Cluster

© 2004, [email protected] http://hartenstein.de30

TU Kaiserslautern

>> Soft Processors <<

http://www.uni-kl.de

•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions

© 2004, [email protected] http://hartenstein.de31

TU Kaiserslautern

FPGA CPUs in teaching and academic research

• UCSC: 1990! • Märaldalen University,

Eskilstuna, Sweden • Chalmers University,

Göteborg, Sweden• Cornell University• Gray Research• Georgia Tech • Hiroshima City University,

Japan

• Michigan State• Universidad de

Valladolid, Spain• Virginia Tech• Washington

University, St. Louis • New Mexico Tech• UC Riverside • Tokai University, Japan

© 2004, [email protected] http://hartenstein.de32

TU Kaiserslautern

Some soft CPU core examples

core architecture platform

MicroBlaze 125 MHz 70 D-MIPS

32 bit standard RISC32 reg. by 32 LUT RAM-based reg.

Xilinx up to 100 on one FPGA

Nios 16-bit instr. set

Altera Mercury

Nios 50 MHz

32-bit instr. set

Altera 22 D-MIPS

Nios 8 bit Altera – Mercury

gr1040 16-bit

gr1050 32-bit

My80 i8080A FLEX10K30 or EPF6016

DSPuva16 16 bit DSP Spartan-II

core architecture platform

Leon25 Mhz

SPARC

ARM7 clone ARM

uP1232 8-bit

CISC, 32 reg. 200 XC4000E CLBs

REGIS 8 bits Instr. + ext. ROM

2 XILINX 3020 LCA

Reliance-1 12 bit DSP Lattice 4 isp30256, 4 isp1016

1Popcorn-1 8 bit CISC Altera, Lattice, Xilinx

Acorn-1 1 Flex 10K20

YARD-1A 16-bit RISC, 2 opd. Instr.

old Xilinx FPGA Board

xr16 RISC integer C SpartanXL

© 2004, [email protected] http://hartenstein.de33

TU Kaiserslautern

einige „soft CPU core“ Beispiele

Spartan-II16 bit DSPDSPuva16

FLEX10K30 or EPF6016

i8080AMy80

32-bit gr1050

16-bitgr1040

Altera – Mercury

8 bitNios

Altera 22 D-MIPS

32-bit instr. set

Nios 50 MHz

Altera Mercury

16-bit instr. set

Nios

Xilinx up to 100 on one FPGA

32 bit standard RISC32 reg. by 32 LUT RAM-based reg.

MicroBlaze 125 MHz 70 D-MIPS

platformarchitecturecore

SpartanXLRISC integer Cxr16

old Xilinx FPGA Board

16-bit RISC, 2 opd. Instr.

YARD-1A

1 Flex 10K20Acorn-1

Altera, Lattice, Xilinx

8 bit CISC1Popcorn-1

Lattice 4 isp30256, 4 isp1016

12 bit DSPReliance-1

2 XILINX 3020 LCA

8 bits Instr. + ext. ROM

REGIS

200 XC4000E CLBs

CISC, 32 reg.uP1232 8-bit

ARMARM7 clone

SPARCLeon25 Mhz

platformarchitecturecore

Configware !

(keine Hardware)

Configware !

(keine Hardware)

Retro-

Emulation

Retro-

Emulation

© 2004, [email protected] http://hartenstein.de34

TU KaiserslauternIt’s a Paradigm Shift !

• Using FPGAs (fine grain reconfigurable) just mainly has been classical Logic Synthesis on a “strange hardware” platform

• Coarse Grain Reconfigurable Arrays (rDPAs) (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift

• This is still ignored by CS and EE Curricula and almost all R&D scenes

© 2004, [email protected] http://hartenstein.de35

TU Kaiserslautern

Why the speed-up ...

... although FPGA is clock slower by x 3 or even more(most know-how from „high level synthesis“ discipline)

moving operator to the data stream (before run time)

support operations: no clock nor memory cycle

decisions without memory cycles nor clock cycles

most „data fetch“ without memory cycle

© 2004, [email protected] http://hartenstein.de36

TU Kaiserslautern>> History of Frameworks

<<

http://www.uni-kl.de

•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

37

Goal: away from complex design flow

Placeand

Route NetlistSchematics/

HDL Netlister

Bitstream

CompilerHLL

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

38

Overcome traditional separate design flow

UserCode Compiler Executable

Netlister NetlistPlaceand

Route..

Bitstream

Schematics/HDL

HLL Compiler

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

39

Overcome traditional co-processing design separate flow -> JBits Design

Flow

UserJavaCode

JavaCompiler

JBitsJBitsAPI

Executable

UserCode Compiler Executable

Netlister NetlistPlaceand

Route..

Bitstream

Schematics/HDL

[à la S. Guccione]

© 2004, [email protected] http://hartenstein.de40

TU Kaiserslautern

new directions in application development

• new directions in application development.

• aut. partitioning compilers: designer productivity

• like CoDe-X (Jürgen Becker, Univ. of Karlsruhe),

• supports Run-Time Reconfiguration (RTR), a key enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet.

© 2004, [email protected] http://hartenstein.de41

TU Kaiserslautern

•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions

>> RTR <<

http://www.uni-kl.de

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

42

CPU use for configuration management

• on-board microprocessor CPU is available anyhow - even along with a little RTOS

• use this CPU for configuration management

CompilerHLL

RTR System Design

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

43

hard CPU & memory core on same chip

CPUcore

FPGA core

Memorycore

CompilerHLL

CompilerHLL

RTR System Design

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

44

Converging factors for RTR

UserJavaCode

JavaCompiler

JBitsJBitsAPI

Executable

•Converging factors make RTR based system design viable

•1) million gate FPGA devices and co-processing with standard microprocessors are commonplace

•direct implementation of complex algorithms in FPGAs.

•This alone has already revolutionized FPGA design.

•2) new tools like Xilinx Jbits software tool suite directly support coprocessing and RTR.

© 2004, [email protected] http://hartenstein.de45

TU Kaiserslautern

RTR

•divides application into a series of sequentially executed stages, each mapped as a separate execution module.

•Excellent example :Xtrem platform by PACT AG, Munich

•Without RTR, all configurable platforms just ASIC emulators.

•directly support development and debugging of RTR applications

•will also heavily influence the future system organization

© 2004, [email protected] http://hartenstein.de46

TU Kaiserslautern

•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions

>> Support by rGA vendors <<

http://www.uni-kl.de

© 2004, [email protected] http://hartenstein.de47

TU Kaiserslautern

>> Support …

• Support by FPGA Vendors– Xilinx

• Software by Xilinx • Configware (soft IP Cores) • Hardware

– Altera • Software • Configware • Hardware

© 2004, [email protected] http://hartenstein.de48

TU Kaiserslautern

Xilinx

• fabless FPGA semi vendor, San Jose, Ca, founded 1984• key patents on FPGAs (expiring in a few years) • Fortune 2001: No. 14 Best Company to work for in (intel: no. 42, hp no. 64,

TI no. 65). • DARPA grant (Nov‘99) to develop Jbits API tools for internet

reconfigurable / upgradable logic (w. VT) • Less brilliant early/mid 90ies (president Curt Wozniak): 1995 market share

from 84% down to 62% [Dataquest]• As designs get larger, Xilinx losed its advantage (bugfixes did not require

to burn new chips)• meanwhile, weeks of expensive debug time needed

© 2004, [email protected] http://hartenstein.de49

TU Kaiserslautern

Software by Xilinx

• Full design flow from Cadence, Mentor, and Synopsys• Xilinx Software AllianceEDA Program:

– Alliance Series Development System.– Foundation Series Development Systems.– Xilinx Foundation Series ISE (Integrated Synthesis Environment)– free WebPOWERED SW w. WebFitter & WebPACK-ISE– StateCAD XE and HDL Bencher– Foundation Base Express– Foundation ISE Base Express ----- More:

• ModelSim Xilinx Edition (ModelSim XE) | Forge Compiler | Modular Design | Chipscope ILA | The Xilinx System Generator| XPower| JBits SDK | The Xilinx XtremeDSP Initiative| MathWorks / Xilinx Alliance| System Generator| The Wind River / Xilinx alliance|

© 2004, [email protected] http://hartenstein.de50

TU Kaiserslautern

Configware (soft IP Products)

• For libraries, creation and reuse of configware• To search for IPs see: List of all available IP• The AllianceCORE program is a cooperation

between Xilinx and third-party core developers• The Xilinx Reference Design Alliance Program • The Xilinx University Program • LogiCORE soft IP with LogiCORE PCI Interface.• Consultants

© 2004, [email protected] http://hartenstein.de51

TU Kaiserslautern

Xilinx hardware

• Virtex, Virtex-II, first w. 1 mio system gates. – Virtex-E series > 3 mio system gates.

• Virtex-EM on a copper process & addit. on chip memory f. network switch appl.

• The Virtex XCV3200E > 3 million gates, 0.15-micron technology,

• Spartan, Spartan-XL, Spartan-II– for low-cost, high volume applications as ASIC replacements– Multiple I/O standards, on-chip block RAM, digital delay lock loops – eliminate phase lock loops, FIFOs, I/O xlators , system bus drivers

• XC4000XV, XC4000XL/XLA, CPLD: low-cost families – rapid development, longer system life, robust field upgradability– support In-System Programming (ISP), in-board debugging,– test during manufacturing, field upgrades, full JTAG compliant

interface

• CoolRunner: low power, high speed/density, standby mode.• Military & Aerospace: QPRO high-reliability QML certified• Configuration Storage Devices

© 2004, [email protected] http://hartenstein.de52

TU Kaiserslautern

Altera

•Altera was founded in June 1983•EDA: synthesis, place & route, and, verification•Quartus II: APEX, Excalibur, Mercury, FLEX 6000 families•MAX+PLUS II: FLEX, ACEX & MAX families•Flow with Quartus II: Mentor Graphics, Synopsys, Synplicity deliver a

design design software to support Altera SOPC solutions. •Mentor: only EDA vendor w. complete design environment f. APEX II

incl. IP, design capture, simulation, synthesis, and h/s co-verification•Configware: Altera offers over a hundred IP cores•Third party IP core design services and consultants

© 2004, [email protected] http://hartenstein.de53

TU Kaiserslautern

Altera hardware

• Newer families: APEX 20KE, APEX 20KC, APEX II, MAX 7000B, ACEX 1K, Excalibur, Mercury families. – Apex EP20K1500E (0.18-µ), up to 2.4 mio system gates, – APEX II (all-copper 0.13-µ) f. data path applications,

supports many I/O standards. 1-Gbps True-LVDS performance

– wQ2001, an ARM-based Excalibur device

• Altera mainstream: MAX 7000A, 3000A; FLEX 6000, 10KA, 10KE; APEX 20K families.

• Mature and other : Classic, MAX 7000, 7000S, 9000; FLEX 8000, 10K families.

© 2004, [email protected] http://hartenstein.de54

TU Kaiserslautern

•rGAs•Placement & Routing•Soft Processors•History of Frameworks•RTR•Support by rGA vendors•EDA•Future directions•conclusions

>> EDA <<

http://www.uni-kl.de

© 2004, [email protected] http://hartenstein.de55

TU Kaiserslautern

>> EDA <<

• EDA as the Key Enabler (major EDA vendors) • Altera • Cadence • Mentor Graphics • Synopsys• Xilinx • Changing EDA Tools Market

© 2004, [email protected] http://hartenstein.de56

TU Kaiserslautern

EDA as the Key Enabler (major EDA vendors)

•Select EDA quality / productivity, not FPGA architectures•EDA often has massive software quality problems •Customer: highest priority EDA center of excellence

– collecting EDA expertise and EDA user experience– to assemble best possible tool environments – for optimum support design teams– to cope with interoperability problems – to keep track with the EDA scene as a rapidly moving target

•being fabless, FPGA vendors spend most qualified manpower in development of EDA, IP cores, applications , support

•Xilinx and Altera are morphing into EDA companies.

© 2004, [email protected] http://hartenstein.de57

TU Kaiserslautern

Cadence

•FPGA Designer: top-down FPGA design system,

•high-level mapping, architecture-specific optimization,

•Verilog,VHDL, schematic-level design entry.

•Verilog, VHDL to Synergy (logic synthesis) and FPGA Designer

•FPGAs simulated by themselves using Cadence's Verilog-XL or Leapfrog VHDL simulators and

•simulated w. rest of the system design w. Logic Workbench board/system verification env‘ment.

•Libraries for the leading FPGA manufacturers.

© 2004, [email protected] http://hartenstein.de58

TU Kaiserslautern

Mentor Graphics

• System Design and Verification. • PCB design and analysis:• IC Design and Verification• shifts ASIC design flow to FPGAs (Altera, Xilinx)

– by FPGA Advantage with IP support – by ModuleWare, – Xilinx CORE Generator – Altera MegaWizard integration,

© 2004, [email protected] http://hartenstein.de59

TU Kaiserslautern

Synopsys

• FPGA Compiler II• Version of ASIC Design Compiler Ultra• Block Level Incremental Synthesis (BLIS)• ASIC <-> FPGA migration• Actel, Altera, Atmel, Cypress, Lattice, Lucent,

Quicklogic, Triscend, Xilinx

© 2004, [email protected] http://hartenstein.de60

TU Kaiserslautern

new directions in application development

• new directions in application development. • aut. partitioning compilers: designer productivity • like CoDe-X (Jürgen Becker, Univ. of Karlsruhe), • supports Run-Time Reconfiguration (RTR), a key

enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet.

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

61

Converging factors for RTR

UserJavaCode

JavaCompiler

JBitsJBitsAPI

Executable

•Converging factors make RTR based system design viable

•1) million gate FPGA devices and co-processing with standard microprocessors are commonplace

•direct implementation of complex algorithms in FPGAs.

•This alone has already revolutionized FPGA design.

•2) new tools like Xilinx Jbits software tool suite directly support coprocessing and RTR.

© 2004, [email protected] http://hartenstein.de62

TU Kaiserslautern

RTR

•divides application into a series of sequentially executed stages, each implemented as a separate execution module.

•Partial RTR partitions these stages into finer-grain sub-modules to be swapped in as needed. •Without RTR, all conf. platforms just ASIC emulators. •needs a new kind of application development environments. •directly support development and debugging of RTR appl.•essential for the advancement of configurable computing•will also heavily influence the future system organization•Xilinx, VT, BYU work on run-time kernels, run-time support, RTR debugging tools and other associated tools.

•smaller, faster circuits, simplified hardware interfacing, fewer IOBs; smaller, cheaper packages, simplified software interfaces.

© 2004, [email protected] http://hartenstein.de63

TU Kaiserslautern

Run-time Mapping

•run-time reconfigurable are: Xilinx VIRTEX FPGA family•RAs being part of Chameleon CS2000 series systems •Using such devices changes many of the basic assumptions in the HW/SW co-design process:

•host/RL interaction is dynamic, needs a tiny OS like eBIOS, also to organize RL reconfiguration under host control

•typical goal is minimization of reconfiguration latency (especially important in communication processors), to hide configuration loading latency, and,

•Scheduling to find ’best’ schedule for eBIOS calls (C~side).

© 2004, [email protected] http://hartenstein.de64

TU Kaiserslautern>> future directions <<

•rGAs

•Placement & Routing

•Soft Processors

•History of Frameworks

•RTR

•Support by rGA vendors

•EDA

•Future directions

•conclusionshttp://www.uni-kl.de

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

65

Soft CPU: new job for compilers

softCPU

FPGA

MemorycoreFPGA

CompilerHLL

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

66

Soft rDPA feasible ?

rDPUArray

rDPUArray

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

67

Array I/O examples

rDPUArray

rDPUArray

data streams, or, from / to embedded memory banks

data streams,

or,from / to

embedded memory

banks

1

10

100

1000Performance

1980 1990 2000

µProc60%/yr..

DRAM7%/yr..

Processor-MemoryPerformance Gap:(grows 50% / year)

DRAM

CPU

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

68

HLL 2 Soft Array

Memorysoft CPU

miscellanous

soft

soft

DPUDPU

arra

y

arra

ysoft

soft

DPUDPU

arra

y

arra

y

HLL Compiler

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

69

HLL 2 „flex“ rDPA

MemoryCPU

miscellanous

rDPU

rDPU

arra

y

arra

yrD

PUrD

PU

arra

y

arra

y

HLL Compiler

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

70

>> HLLs <<

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

71

HLLs for Hardware Design vs. System Design vs. RTR System

Design

HLL Compiler

System Design

CompilerHLL

RTR System Design[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

72

HLLs for Hardware Design vs. System Design vs. RTR System

Design

HLL Compiler

System Design

CompilerHLL

RTR System Design

CompilerHLL

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

73

CPU and memory on Chip

CPUcore

FPGA core

Memorycore

CompilerHLL

CompilerHLL

RTR System Design

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

74

Jbit Environment

RTP CoreLibrary

JRouteAPI

DeviceSimulator

UserCode

BoardScopeDebugger

XHWIF

JBitsAPI

TCP/IP

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

75

HLLs for Hardware Design vs. System Design vs. RTR System

Design

CompilerHLL

HLL Compiler

System Design

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de

University of Kaiserslautern

Xputer LabTU Kaiserslautern

76

Embedded System Design

HLL Compiler

CPUcore

FPGA core

Memorycore

HLL Compiler

softCPU

FPGA

MemorycoreFPGA

[à la S. Guccione]

© 2002, [email protected] http://kressarray.de77

University of Kaiserslautern

Xputer LabTU Kaiserslautern

>> conclusions <<

•rGAs

•Placement & Routing

•Soft Processors

•History of Frameworks

•RTR

•Support by rGA vendors

•EDA

•Future directions

•conclusionshttp://www.uni-kl.de

© 2004, [email protected] http://hartenstein.de78

TU Kaiserslautern

© 2001, [email protected] http://KressArray.de

University of Kaiserslautern

Xputer Lab missing the next revolution

Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula

Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula

causing the waste billions of dollars.causing the waste billions of dollars.

is one of the biggest mistakes in the history of information technology application

is one of the biggest mistakes in the history of information technology application

© 2004, [email protected] http://hartenstein.de79

TU Kaiserslautern

„EDA industry shifts into CS mentality“

[Wojciech Maly]•Microprogramming to replace FSM design

•Hardware languages replace EE-type schematics

•EDA Software and its interfacing languages

•Newer system level languages like systemC etc.

•Small and large module re-use

•Hierarchical organization of designs, EDA, et al.

•.....................

© 2004, [email protected] http://hartenstein.de80

TU Kaiserslautern

„EDA industry shifts into CS mentality“

[Wojciech Maly]

•Which language to select ?

© 2004, [email protected] http://hartenstein.de81

TU Kaiserslautern

roadmap

old CS lab course philosophy:given an application: implement it by a program

-/-new CS freshman lab course environment:Given an application:

a) implement it by writing a programb) implement it as a morphware prototypec) Partition it into P and Q

c.1) implement P by softwarec.2) implement Q by morphwarec.3) implement P / Q communication interface

© 2004, [email protected] http://hartenstein.de82

TU Kaiserslautern

All enabling technologies are available

•anti machine and all its architectural resources

•parallel memory IP cores and generators

•anything else needed

•languages & (co-)compilation techniques

•morphware vendors like PACT ....

•literature from last 30 years

© 2004, [email protected] http://hartenstein.de83

TU Kaiserslautern

END

© 2004, [email protected] http://hartenstein.de84

TU KaiserslauternThe dichotomy of models

• Note for von Neumann: state register is with the CPU

• Note for the anti machine: state register is with memory bank / state registers are within memory banks

© 2004, [email protected] http://hartenstein.de85

TU Kaiserslautern

Machine Paradigms

machine category Computer (the Machine:

“v. Neumann”) The Anti Machine

driven by: Instruction streams data streams (no “dataflow”)

engine principles instruction sequencing sequencing data streams

state register single program counter (multiple) data counter(s)

Communication path set-up .

at run time at load time

resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path

operation sequential parallel pipe network etc.

( “instruction fetch” )

also hardwired implementations**) e g. Bee project Prof. Broderson

© 2004, [email protected] http://hartenstein.de86

TU Kaiserslautern

benefit from RAM-based & 2nd paradigm

RAM-based platform needed for:• flexibility, programmability

• avoiding the need of specific silicon

mask cost: currently 2 mio $ - rapidly growing

1)

simple 2nd machine paradigm needed as a common model:• to avoid the need of circuit expertize

• needed to to educate zillions of programmers

2)

© 2004, [email protected] http://hartenstein.de87

TU Kaiserslautern

Design Space Exploration Systems

Explorer System year sourceinter-active

status evaluation status generation

DPE 1991 [66] no abstract models rule-based

Clio 1992 [67] yes prediction models device generator

DIA 1998 [68] yes prediction from library rule-based

DSE for RAW 1998 [49] no analytical models analytical

ICOS 1998 [76] no fuzzy logic greedy search

DSE for Multimedia 1999 [77] no simulation branch and bound

Xplorer 1999 [11] [50] yes fuzzy rule-based simulated annealing