CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe...

99
CAPES / DFG Project iversidade do Brasilia ersitaet Kaiserslautern niversitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November 14, 2003, Brasilia, Brazil Present and Future of Reconfigurable Systems *) IEEE fellow
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe...

Page 1: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

CAPES / DFG Project Universidade do Brasilia

Universitaet KaiserslauternUniversitaet Karlsruhe

Reiner Hartenstein*

University ofKaiserslautern

November 14, 2003, Brasilia, Brazil

Present and Future of Reconfigurable

Systems

*) IEEE fellow

Page 2: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de2

University of Kaiserslautern

Xputer LabLiterature (also downloads)

http://hartenstein.de

also click „recent talks“this page: also links to available Ph. D theses:

Becker ,Herz, Kress, Nageldinger,

Page 3: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de3

University of Kaiserslautern

Xputer LabReconfigurable Computing:

a second programming domain

Migration of programming to the structural domain

The opportunity to introduce the structural domain to programmers ...

The structural domain has become RAM-based

... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm

Page 4: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de4

University of Kaiserslautern

Xputer LabIT ages

mainframe age

computer age (PC age)

data streams ...

morphware age

1957

1967

1977

1987

1997

2007

von Neumann does not support morphware

flowware

here?

Page 5: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de5

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 6: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de6

University of Kaiserslautern

Xputer Labfine grain

• Fine Grain morphware platforms

already mainstream: reconfigurable logic

just logic design on a strange platform ?

speed-up til 3 orders of magnitude

Page 7: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de7

University of Kaiserslautern

Xputer Lab

cost / mio §

4

3

2

1mask set

cost [eASIC]

NRE and mask cost

[dataquest] .

12 12 16 20 26 28 30 >30no. of masks

0.8 0.6 0.35 0.25 0.18 0.15 0.13 0.1 0.07 feature size

PC: 25%

22%communication

others: 31%

6 %automotive

16% consumer

Xilinx42%

Altera37%

Lattice15%

Actel6%

Top 4 PLD Manufacturers 2000total: $3.7 Bio

• [Dataquest] > $7 billion by 2003.

• FPGAs going into every type of application – also SoC• fastest growing segment of semiconductor market

you don‘t need specific silicon !

you don‘t need specific silicon !

rGAs

Page 8: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de8

University of Kaiserslautern

Xputer Lab

switch

rGA with island architecture(Ausschnitt)

connect

switch

Rainer Hartenstein
Page 9: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de9

University of Kaiserslautern

Xputer Lab switch box• R

eko

nfi

gu

rier

bar

switch box

switch

point

Page 10: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de10

University of Kaiserslautern

Xputer Lab connect box• R

eko

nfi

gu

rier

bar

connect boxconnect point

part of configuration

memory

Page 11: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de11

University of Kaiserslautern

Xputer Lab

Verbindungspunkt (vergrößert)

Verbindungs-Punkt• R

eko

nfi

gu

rier

bar

reconfigurable logic box

illustration

Page 12: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de12

University of Kaiserslautern

Xputer Lab connection activated

Die Zuleitung zur Funktionswahl des

rLB nicht gezeigt

reconfigurable logic box

illustration

Page 13: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de13

University of Kaiserslautern

Xputer Labconnect point activated• R

ou

tin

g

Rainer Hartenstein
Page 14: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de14

University of Kaiserslautern

Xputer Lab

der 4. Schaltpunkt

der 5. Schaltpunkt

3 Schaltpunkte switch points

activated

• Ro

uti

ng

switch box

switch

point

Page 15: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de15

University of Kaiserslautern

Xputer Lab Routing continued

• Ro

uti

ng

Rainer Hartenstein
Page 16: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de16

University of Kaiserslautern

Xputer Lab A

B

Plazierungs- und Routing Software bekannt s. 25 Jahren

Solche Netzwerk-Probleme manuell oder mit Hilfe der Graphen-Theorie behandelbar.

1979 Silva Lisco (Silicon Valley Research Corp.) bietet CALM-P an

20 Transistors + 20 Flipflops

Routing completed

for 1 net

•Routing

Page 17: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de17

University of Kaiserslautern

Xputer Lab

A

B

Passing through: long distance wiring from rLBs outside this region

Routing:long distance nets

A path can be used only once at a time .....

Rainer Hartenstein
Page 18: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de18

University of Kaiserslautern

Xputer LabA

B

CCDD

C and D are not reachable.

A bridge can be passed only once (bridges of Königsberg)

routing congestion

C cannot be connected with D.

Page 19: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de19

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 20: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de20

University of Kaiserslautern

Xputer Lab

Leonhard Euler

Euler‘s problem of the bridges of Königsberg is such a network problem (1736):

Find a way, which passes each bridge exactly once .....

... also an optimization: none of the bridges remains unused.

1736

Page 21: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de21

University of Kaiserslautern

Xputer LabL. Euler: Solutio Problematis Ad geometriam Situs

Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140

Graph

edge

node

Left Bank

Right Bank

Kneiphof Island

Other Island

Page 22: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de22

University of Kaiserslautern

Xputer Lab

adjacency matrix

Data structures for Graphs

ListGraph

1 2

3 4

0000

10

10

100

1

0

100

1234

1 2 3 4from

to

2 14 /2

3 /

2 /33 /4

directed graph

1 2

3 4

0

110

10

11

110

1

0

110

1234

1 2 3 4from

to

3 /2 13 1 22 1 33 /2 4

4 /

4 /

undirected graph

J. E. Hopcroft, R. E. Tarjan: Efficient algorithm

for graph manipulation; Comm. ACM, 1973

Page 23: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de23

University of Kaiserslautern

Xputer Lab

ENIAC, completed 1945

Partitioning over racks in the hallPartitioning over card cages in the rackPartitioning over boards (cards) in card cages Partitioning over chips etc. on the card (e. g. SBC)Partitioning over blocks on the chip (e. g. microprocessor)

Large Scale Routing

Page 24: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de24

University of Kaiserslautern

Xputer LabPCBs (printed circuit boards)

for 40 years

MULTEC at Böblingen produces printed circuits boards since 1963

planar „wiring“

no. of pins is limited

Page 25: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de25

University of Kaiserslautern

Xputer Lab

Integated Citcuit (Chip)limited number of pins

„wiring“ on a planar surface

Page 26: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de26

University of Kaiserslautern

Xputer Labhierarchy

card cage

rack

cardchip

macro cell

basic cell

more levels

Kaisers-lautern

1

KL2 KL3 KL4

FTI1

JWGU

FTI2

IMS1

IMS2

IMS3

IMS

IMS

IMS

IMS

IMSIMS

Page 27: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de27

University of Kaiserslautern

Xputer Labwiring

hierarchy

cables in the rackconnect thecard cages

card cage wiringconnectsthe cards

card wiring connects the chips

macro cell

cell

on-Chip-wiringconnectsthe cells

*) 30er: Telefon-Vermittlung (ohne Chips,Crossbar / Hebdreh-Wähler statt Karten)40er: erste Computer (ohne Chips)

Page 28: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de28

University of Kaiserslautern

Xputer Lab An obsolete Application Area

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

before fabrication ?

after fabrication ?

Page 29: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de29

University of Kaiserslautern

Xputer Lab

Celaro Pro (Mentor)

Dini Group

Dini Group

EmulatorsQuickturn

PCi bus extender

Dini group

Page 30: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de30

University of Kaiserslautern

Xputer LabCrossbar

324 x 4

n=8

no. of crossbar chips

n x n/2n

8 32

100 5000

cossbar chips in

a row

full crossbar

64

64

14

32

nn

8 8

100 100

no. of crossbar chips

cossbar chips in

a row

partial crossbar

Page 31: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de31

University of Kaiserslautern

Xputer Lab

14 Logic Chips (Lchip) with 128 pins(occasionally for rout-through)

32 Crossbar Chips (Xchip) with 72 I/O pins(for rout-through only)

each Xchip: 4 pins connected to each Lchip

8 Logic cards per card cage

Logik-Karte

Einschub

Schrank

8 card cages per rack

8 Ychip cards per card cage

Backplane: 8 Zboard cards per rack

Routing

Page 32: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de32

University of Kaiserslautern

Xputer Lab

1913 J. N. Reynold‘s crossbar switch

1915 patent granted

1926 first public telefon switching application in Shweden

Betulander‘s crossbar switch 1919

NASA telemetrics crossbar array 1964

Crossbar ?

Page 33: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de33

University of Kaiserslautern

Xputer LabRWC Real World Computing, Japan, 40 TFLOPS

Crossbar weight: 220 tons, 3000 km cable,5120 processors with 5000 pins each

Page 34: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de34

University of Kaiserslautern

Xputer Lab Routing Congestion

Example

direct connection impossible

rGA rGA rGA rGA

rGA rGA rGA rGA

rout-throughdetour connection

Page 35: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de35

University of Kaiserslautern

Xputer LabRouting-only configuration

(2 examples)

rLB

Identitityfunction

configured

• Ro

uti

ng

Page 36: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de36

University of Kaiserslautern

Xputer Lab

T. Uehara, W. M. van Cleemput: Optimal Layout of CMOS Functional Arrays; IEEE Trans. C-30, pp. 305-312, May 1981

Graphs, Partitioning, Algorithms

B. Kernighan, S. Lin: An Efficient Heuristic Procedure for Partitioning Graphs; BSTJ 49, 1970,

C. Alpert, A. Kahng: Recent Directions in Netlist Partitioning: A Survey; Integration, vol 19 (1-2), pp. 1-81, 1995

T. Cormen, et al.: Introduction to Algorithms; MIT Press / McGraw-Hill, 1991

Page 37: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de37

University of Kaiserslautern

Xputer Labwhy emulators are obsolete

10 000 000

1 000 000

100 000

10 000

1 000

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

planned

Virtex II

XC 40250XV

Virtex

XC 4085XL

100

System gates per rGA chip

Jahr

[Xilinx Data]

200

500

Page 38: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de38

University of Kaiserslautern

Xputer Lab

More and more the prototyping platform of rGA based systems will be directly delivered as the product to the customer: fully configured

ASICs lost the battle. rGAs are the winners

0.1 3

2001 2002 2003 2004

year

50,000

40,000

30,000

20,000

10,000

0c)

number of design starts

rGA-basiert

[N. Tredennick, Gilder Technology Report, 2003]

why declining ASIC business?

ASIC emulators have been a transient solution: now with declining commercial significance.

you don‘t need specific silicon !you don‘t need specific silicon !

Page 39: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de39

University of Kaiserslautern

Xputer Lab

• FPGA Fabric-based on Virtex-II Architecture

Source: Ivo Bolsens, Xilinx

On Chip Memory Controller

Power PCCore

EmbededRAM

RocketIO

Xilinx: full hierarchy on chip

from rack to chipfrom rack to chip• Xilinx Virtex-II Pro

FPGA Architecture

• PowerPC 405 RISC CPU (PPC405) cores

Page 40: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de40

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 41: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de41

University of Kaiserslautern

Xputer Labfocusing on coarse grain

• Fine Grain morphware platforms

• Coarse Grain platforms:

already mainstream: reconfigurable logicjust logic design on a strange platform

Reconfigurable Computing :not that new – but shocking the

fundamentals of CS curricula

an order of magnitude more MIPS/mW than fine grain

Page 42: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de42

University of Kaiserslautern

Xputer Labwhy coarse grain

1000

100

10

1

0.1

0.01

0.0012 1 0.5 0.25 0.13 0.1 0,07

MOPS / mW

µ feature size

FPGAs (reconfigurable logic)hardwired

instruction set processors

standard microprocessor

DSP

T. Claasen et al.: ISSCC 1999*) R. Hartenstein: ISIS 1997

rDPAs (reconfigurable computing)*

flexibility

throughput

hard-wired

vonNeumann

FPGAs

coarse grain goes far beyond bridging the gap

coarsegrain

Page 43: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de43

University of Kaiserslautern

Xputer Lab

Reconfigurable Interconnect Fabric

separate routing area

rDPA (Reconfigurable Datapath Array)

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

RIF layouted over rDPUs:rDPA wired by abutment

Page 44: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de44

University of Kaiserslautern

Xputer LabCMOS intercoonnect resources

Foundries offer up to 9 metal layers

and up to 3 poly layers

reconfigurable interconnect fabric layouted over the

rDU cell

Page 45: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de45

University of Kaiserslautern

Xputer LabCommercial rDPAs

XPU family (IP cores):PACT Corp., Munich

XPU128

Page 46: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de46

University of Kaiserslautern

Xputer Lab

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

mapping algorithms efficently onto rDPA

rout thru only

not usedbackbus connect

SNN filter on KressArray

by the way: example of scalability / relocatability by EDA support

„Structured

Configware

Design“ [R. H.]

Page 47: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de47

University of Kaiserslautern

Xputer Lab

badly scalable

Hundreds of rGAs or very large rGAs

Routing congestion growing exponentially

•Routing

Page 48: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de48

University of Kaiserslautern

Xputer Lab Communication Resource Requirements

... often Functional Resources are not the Throughput

BottleneckIn some Application Areas,such as e. g. Wireless Communication, Reconfigurable Computing Arraysneed extraordinarily rich and powerful Communication ResourcesThe Solution: Generators for Domain-specific RA Platforms

Page 49: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de49

University of Kaiserslautern

Xputer Lab

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

http://kressarray.de

Page 50: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de50

University of Kaiserslautern

Xputer LabSuper Pipe Networks

pipeline propertiesarray applications

shape resources

mappingscheduling

(data streamformation)

systolicarray

regular datadependencies

only

linearonly

uniformonly

linear projection oralgebraic synthesis

super-systolicRA

no restrictionssimulated

annealing orP&R algorithm

(e.g. force-directed)schedulingalgorithm

The key is mapping, rather than architecture

**) KressArray [ASP-DAC-1995]

Page 51: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de51

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 52: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de52

University of Kaiserslautern

Xputer LabMorphware machines vs. hardwired

machines

platformprogram source

running on it

hardware (not programmable)

morphware

fine grain rGA (FPGA)configwarecoarse

grainrDPU, rDPA

machine

reconfigurable data stream processor

flowware & configware

hardwired

data stream processor

flowware

instruction stream processor (v. N.)

software

A clear terminology helps a lot

Page 53: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de53

University of Kaiserslautern

Xputer Lab

DPA

xxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

input data streams

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|output data streams

time

port #

time

time

port #time

port #

... which data item at which time at which port

Flowware defines:

Page 54: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de54

University of Kaiserslautern

Xputer LabParadigm Shifts:

Nick Tredennick‘s view

algorithms variable

resources fixed

instruction-stream-based computing:

algorithms variable

resources variable

data-stream-based reconfigurable computing:

programmable

why 2 program sources ?

Configware

resources variable

Flowware

data-stream

Software

instruction-stream

Page 55: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de55

University of Kaiserslautern

Xputer Lab

Flowware heading toward mainstream

•Data-stream-based Computing is heading for mainstream

–1997 SCCC (LANL) Streams-C Configurabble Computing

–SCORE (UCB) Stream Computations Organized for Reconfigurable Execution

–ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing

–2000 Bee (UCB), ...

–Most stream-based multimedia systems, etc.

–Many other areas ....

Flowware ..... mostly not yet modelled that way: most

flowware is hidden by its indirect instruction-stream-based implementationFlowware:

managing data streamsSoftware: managing instruction streams

Page 56: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de56

University of Kaiserslautern

Xputer Labcontrol-procedural vs. data-procedural

The structural domain is primarily data-stream-based:

Flowware provides a (data-)procedural abstraction of the (data-stream-based) structural domain

Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“ ...

... a Troyan horse to introduce the structural domain to the procedural mind set of programmers

Page 57: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de57

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 58: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de58

University of Kaiserslautern

Xputer Lab

asM

distributed memory

architecture

distributed memory

architecture

Configware / Flowware Compilation

r. DataPath

Array

rDPA intermediate

high level source

wrapper

flowwareflowware

scheduler

M M M M

M M M M

MM

MM

MM

MM

data streams

data sequencer

address generato

r

„instruction“ fetch before runtime

configwareconfigware

mapper

Page 59: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de59

University of Kaiserslautern

Xputer Lab>>> extremely high

efficiency: flowware-based computing

1. avoiding address computation memory cycle overhead

2. avoiding instruction fetch and interpretation overhead

3. high parallelism, massively multiple deep pipelines

4. much less configuration memory

5. interconnect layouted over the cell: no extra routing areas

6. methodologies readily available

Page 60: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de60

University of Kaiserslautern

Xputer LabProgramming Language

Paradigms

language category Software Languages Languages f. Anti Machine

both deterministic procedural sequencing: traceable, checkpointable

operation sequence driven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes, instruction stream branching

read next data item, goto (data addr.),

jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching

state register program counter data counter(s) address computation

massive memory cycle overhead overhead avoided

Instruction fetch memory cycle overhead overhead avoided parallel memory bank access interleaving only no restrictions

language features control flow + data manipulation

data streams only (no data manipulation)

very easy to learn

multipleGAGsmuch more

simple

much moresimple

much more

powerful

flowware languagesflowware languages

Page 61: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de61

University of Kaiserslautern

Xputer LabMachine Paradigms

machine category Computer (the Machine:

“v. Neumann”) The Anti Machine

driven by: Instruction streams data streams (no “dataflow”)

engine principles instruction sequencing sequencing data streams

state register single program counter (multiple) data counter(s)

Communication path set-up .

at run time at load time

resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path

operation sequential parallel pipe network etc.

( “instruction fetch” )

also hardwired implementations**) e g. Bee project Prof. Broderson

Page 62: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de62

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 63: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de63

University of Kaiserslautern

Xputer Labcomputing paradigms and

methodologies

1946: machine paradigm (von Neumann)

1980: data streams (Kung, Leiserson)

1989: anti machine paradigm

1990: 1st rDPU* (Rabaey)

1994: anti machine high level programming language

1995: super systolic rDPA (Kress)

1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...

1997+: discipline of distributed memory architecture

1997: 1st configware / software partitioning compiler

flow

ware

*) rDPU = reconfigurable Data Path Unit

Page 64: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de64

University of Kaiserslautern

Xputer LabThe Secret of Success: Co-

Compilation

Analyzer/ Profiler

SW code

SWcompiler

paradigm“vN" machine

CW Code

CWcompiler

anti machineparadigm

Partitioner

Resource Parameters

supportingdifferentplatforms

supporting platform-based design

High level PL source

Page 65: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de65

University of Kaiserslautern

Xputer Lab

data-stream machine

M

DPU or rDPU

data addressgenerator(data sequencer)

memory

I/O

asM**

(anti machine)(anti machine)

Machine paradigms

von Neumanninstruction

stream machineM

I/O

instructionsequencer

CPU

instructionstream

I/OMM MM M

(r)DPU

DPU

Software

I/OMM MM M

(r)DPA

memorydistributed memory architecture*

data stream

Flowware

(Configware)

(reconf.)

*) the new discipline came just in time:see Herz et al.: Proc. IEEE ICECS, 2002

instruction stream+

CPU

- data stream

-DPU

+

memory

also see books by Francky Catthoor et al.

Page 66: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de66

University of Kaiserslautern

Xputer Lab

Synthesizable distributed memory architecture...

Memory(data memory)

memory bank

memory bank

memory bank

memory bank

memory bank

...

...

Scheduler

for a Stream-based Soft Machine

rDPA“instructions”

Compiler

Sequencers(data stream

generator)

Page 67: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de67

University of Kaiserslautern

Xputer LabPC replaced by PS

mainframe age

computer age (PC age)

data streams ...

morphware age

1957

1967

1977

1987

1997

2007

PC replaced by PS (personal supercomputer)

PC replaced by PS (personal supercomputer)

flowware

rDPArDPAµProcµProc

co-compilerco-compiler

anti machineanti machinevon Neumannvon Neumann

Page 68: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de68

University of Kaiserslautern

Xputer Lab all methodologies available

data streams ...

morphware age

1957

1967

1977

1987

1997

2007

flowware

free know-how for personal super computer

free know-how for personal super computer

rDPArDPAµProcµProc

co-compilerco-compiler

.... and all other methodologies available from

literature

.... and all other methodologies available from

literature

Page 69: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de69

University of Kaiserslautern

Xputer LabWe have an education problem

... we need a second machine paradigm

The typical programmer has problems to understand function evaluation without machine mechanisms....

Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software

acceleratorsacceleratorsµprocessorµprocessor

It‘s the gap between procedural and structural mind set

Crossing the Hardware / Software Chasm [Mike

Butts]

Page 70: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de70

University of Kaiserslautern

Xputer Lab Ubiquitous Embedded Systems

... and the main focus in system design

embedded software and configware became the main vehicle to product differentiation ...

(Performance and) Flexibility are key issues

current CS curricula do not qualify our students

Page 71: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de71

University of Kaiserslautern

Xputer Labmisqualified: jobless CS graduates

?

Embe

dded

sof

twar

e [D

TI*

law

]

1

2

0 10 12 18 months

factor

*) Department of Trade and Industry, London

(1.4/year)

[Moore

’s law]90% of all code

written for embedded systems The real labor market:

10 times more programmers will write embedded applications than computer software by 2010

Page 72: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de72

University of Kaiserslautern

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Page 73: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de73

University of Kaiserslautern

Xputer LabEDA Industry Revolution every 7 years

1978

Transistor entry: Applicon, Calma, CV ...

1992Synthesis (HDLs): Cadence, Synopsys ...

1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]McKinsey Curves

EDA industry paradigmswitching every 7 years

1999

Page 74: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de74

University of Kaiserslautern

Xputer LabEDA the main bottleneck

[cou

rtes

y by

Ric

hard

New

ton]

math formula ?TRS ?

Page 75: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de75

University of Kaiserslautern

Xputer LabBiggest Mistake of EDAguess it !

Page 76: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de76

University of Kaiserslautern

Xputer LabThe next EDA Industry Revolution

1978

Transistor entry: Applicon, Calma, CV ...

1992Synthesis (HDLs): Cadence, Synopsys ...

1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]McKinsey Curves

EDA industry paradigmswitching every 7 years

1999

(Co-) Compilation:data-stream-based

DPAs

Von Neumann does not support Morphware:

System-Cmath formula: TRS*

higher abstraction level:

*) Term Rewriting Systems

Page 77: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de77

University of Kaiserslautern

Xputer Lab Algorithmic cleverness needed

Example - migration from signal processor to rGA: very high throughput on low power slow FPGAs obtained only by algorithmic cleverness:

We need an all-embracing taxonomy of algorithms and survey on algorithm transformations ....

loop transformations ....

optimization, partitioning, signal processing, (de-) coding algorithms (wireless communication), image processing, sorting, .... And much more areas .....

Page 78: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de78

University of Kaiserslautern

Xputer Labalgorithmic cleverness needed for CS graduates in embedded

systemsthe hardware / configware / software partitioning problem: current CS curricula do not qualify our students

software / configware migration: current CS curricula do not qualify our students

extending software engineering into software / flowware engineering: the anti machine paradigm and reconfigurable computing are the curricular enablers

Page 79: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de79

University of Kaiserslautern

Xputer Lab>>> thank you

thank you

Page 80: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de80

University of Kaiserslautern

Xputer Lab

- END -

Page 81: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de81

University of Kaiserslautern

Xputer Lab

Appendix for

discussion

Page 82: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de82

University of Kaiserslautern

Xputer LabProcessor Memory Performance Gap

1

10

100

1000Performance

1980 1990 2000

µProc60%/yr..

DRAM7%/yr..

Processor-MemoryPerformance Gap:(grows 50% / year)

DRAM

CPU

Page 83: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de83

University of Kaiserslautern

Xputer LabWhy a dichotomy of machine

paradigms?

data stream machine:

• bad message: caches do not help

• good message: no vN bottleneck

• caches not needed

stolen from Bob Colwell

CPU

caches, ...

vN bottleneckvN: unbalanced

The anti machine has novon Neumann bottleneck

Page 84: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de84

University of Kaiserslautern

Xputer Lab„Pollack‘s Law“

(simplified)

[intel]

growth factor

µm

0.1

performance

area efficiency

Page 85: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de85

University of Kaiserslautern

Xputer LabLoop Transformation

Examples

loop 1-8bodybodyendloop

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop

loop 1-2triggerendloop

loop 1-8triggerendloop

reconf.array:host:loop 1-16bodyendloop

sequential processes: resource parameter drivenCo-Compilation

loop unrolling

Page 86: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de86

University of Kaiserslautern

Xputer Lab

desi

gn c

ost

year

product life cycle

Die Entwurfs-KriseDie langen Durchlauf-Zeiten der ASIC-Fertigung werden zunehmend unbezahlbar

Steigende Nachfrage: schnelle Patches und Upgrades – möglichst am Standort des Kunden – Förderung der Langlebigkeit des Produktes

Page 87: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de87

University of Kaiserslautern

Xputer LabSummary of the Anti Machine

Paradigm

• anti language primitives are almost the same (slightly extended)

• anti machine execution potential is dramatically more powerful

• provides drastically more flexibility

• not always replacing von Neumann

Page 88: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de88

University of Kaiserslautern

Xputer LabReconfigurable Computing:

a second programming domain

Migration of programming to the structural domain

Currently running: the next fundamental revolution after introduction of the microprocessor

The structural domain has become RAM-based

However, CS curricula ignore this impact of Reconfigurable Computing – key issue in embedded systems ...

... causing the coming disaster by unqualified CS graduates pushing up the unemployment rate ?

Page 89: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de89

University of Kaiserslautern

Xputer LabAll enabling technologies are

available

•anti machine and all its architectural resources

•parallel memory IP cores and generators

•anything else needed

•languages & (co-)compilation techniques

•morphware vendors like PACT ....

•literature from last 30 years

Page 90: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de90

University of Kaiserslautern

Xputer LabNew horizons

• A new RAM-based platform going mainstream• Configware industry• New machine paradigm• New theory needed• New architectures – without v. N. bottleneck• New compilation techniques• More effective parallelism provided• Rich material is already available in many areas• Lots of similarities with the classical v.N. world• But a few asymmetries: a challenge

Page 91: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de91

University of Kaiserslautern

Xputer Lab evangelist‘s material + lobby

space

Evangelist‚s material:• http://hartenstein.de – click „recent talks“Lobby space:• http://morphware.net• http://configware.org• http://data-streams.org• http://flowware.netTrailblazer group:• you are welcome to improve, rewrite, post links ...• You are welcome to join the trailblazer group

Page 92: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de92

University of Kaiserslautern

Xputer LabThe genious of von Neumann

• enormous impact of the von Neumann paradigm• even stronger impact by a dichotomy of

paradigms:• von Neumann of matter• von Neuman of anti matter –• Von Neumann machine vs. anti machine

• does not mean throwing over v. N.‘s monument• it multiplies the glory of von Neumann

Page 93: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de93

University of Kaiserslautern

Xputer Lab MPU performance stalled

Moore’s law will stall soon for MPUs

relative computation time needed doubles every 2 years

had been compensated by Moore’s law

Bill Gates’ law:

Page 94: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de94

University of Kaiserslautern

Xputer LabBasics of Binding Time

run time

loading time

compile time

time of “Instruction Fetch”

microprocessorparallel computer

ReconfigurableComputing

Page 95: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de95

University of Kaiserslautern

Xputer LabTime to Market

• Morphware brings a new dimension to digital system development and has a strong impact on SoC design.

• Flexibility supports spin-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades

• A New Business Model (in-field debugging and upgrading ... )

• A Fundamental Paradigm Shift in Silicon Application

Revenue/ month

Time / months

1 10 20

ASIC Product

30

Update 1

Product

Update 2

reconfigurable Product with download

[Tom Kean]

Page 96: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de96

University of Kaiserslautern

Xputer LabKressArray principles

• take systolic array principles

• replace classical synthesis by simulated annealing

• yields the super systolic array

• a generalization of the systolic array

• no more restricted to regular data dependencies

• now reconfigurability makes sense

Page 97: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de97

University of Kaiserslautern

Xputer LabSignificance of Address Generators

• Address generators have the potential to reduce computation time significantly.

• In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750

• Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead

Page 98: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de98

University of Kaiserslautern

Xputer LabAcceleration Mechanisms

•parallelism by multi bank memory architecture•auxiliary hardware for address calculation •address calculation before run time

•avoiding multiple accesses to the same data.•avoiding memory cycles for address computation•improve parallelism by storage scheme transformations•improve parallelism by memory architecture transformations

•alleviate interconnect overhead (delay, power and area)

Page 99: CAPES / DFG Project Universidade do Brasilia Universitaet Kaiserslautern Universitaet Karlsruhe Reiner Hartenstein* University of Kaiserslautern November.

© 2003, [email protected] http://hartenstein.de99

University of Kaiserslautern

Xputer Lab

Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld

microprocessor / DSP

No

rmal

ized

pro

cess

or

spee

d

battery performance

Algorithmic Complexity(Shannon’s Law)

memory

Tra

nsi

sto

rs/c

hip

1960 1970 1980 1990 2000 2010

100 000 000

10 000 000

1000 000

100 000

10 000

1000

100

10

1

2G

3G

4GWhy coarse

grain ?

1G

wireless

100

10

1

0.1

0.01

0.001

mA/ MIP

computational efficiency

StrongARMSH7752