The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV,...

63
The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007 p://hartenstein.de HPRCTA'07 - First International Workshop on High-Performance Reconfigurable Computing Technology and Applications - in conjunction with SC07 -

Transcript of The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV,...

Page 1: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

The von Neumann Syndrome calls for a

Revolution

Reiner Hartenstein

TU Kaiserslautern

Reno, NV, November 11, 2007

http://hartenstein.de

HPRCTA'07 - First International Workshop on High-Performance Reconfigurable Computing

Technology and Applications- in conjunction with SC07 -

Page 2: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

2

About Scientific Revolutions

Ludwik Fleck: Genesis and Developent of a Scientific Fact

Thomas S. Kuhn: The Structure of Scientific Revolutions

Page 3: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

3

What is the von Neumann Syndrome

Computing the von Neumann style is tremendously inefficient. Multiple layers of massive overhead phenomena at run time often lead to code sizes of astronomic dimensions: resident at drastically slower off-chip memory.

The manycore programming crisis requires complete re-mapping and re-implementation of applications. A sufficiently large population of programmers qualified to program applications for 4 and more cores is far from being available.

Page 4: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

4Multicore-based

pacifier

I programming

multicores

Education for multi-core

Mateo Valero

Page 5: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

5

Will Computing be affordable in the Future?

Another problem is a high priority political issue: the very high energy consumption of von-Neumann-based Systems. The electricity consumption of all visible and hidden computers reaches more than 20% of our total electricity consumption. A study predicts 35 - 50% for the US by the year 2020.

Page 6: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

6

Reconfigurable Computing highly promising

Fundamental concepts from Reconfigurable Computing promise a speed-up by almost one order of magnitude, for some application areas by up to 2 or 3 orders of magnitude, at the same time slashing the electricity bill down to 10% or less.

It is really time to fully exploit the most disruptive revolution since the mainframe: Reconfigurable Computing - also to reverse the down trend in CS enrolment.

Reconfigurable Computing shows us the road map to the personal desktop supercomputer making HPC affordable also for small firms and for individuals, and, to a drastic reduction of energy consumption.

Contracts between microprocessor firms and Reconfigurable Computing system vendors are on the way but not yet published. The technology is ready, but most users are not.

Why?

Page 7: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

7

A Revolution is overdue

The talk sketches a road map requiring a redefinition of the entire discipline, inspired by the mind set of Reconfigurable Computing.

Page 8: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

8

much more saved by coarse-grain

platform examle

energyW / Gflops

energy factor

MDgrape-3*(domain-specific 2004)

0.2 1

Pentium 4 14 70

Earth Simulator(supercomputer 2003)

128 640

*) feasible also with rDPA*) feasible also with rDPA

Page 9: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

9

(3) Power-aware Applications

Cyber infrastructure energy consumption: several predictions.most pessimistic: almost 50% by 2025 in the USA

Mobile Computation, Communication, Entertainment, etc. (high volume market)

HPC and Supercomputing,

2003 and later

PCs and servers (high volume)

2020

100

200

Page 10: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

10

An Example: FPGAs in Oil and Gas .... (1)

For this example speed-up is not my key issue (Jürgen Becker‘s tutorial showed much higher speed-ups - going upto a factor of 6000)

„Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance"

[Herb Riley, R. Associates]

For this oil and gas example a side effect is much more interesting than the speed-up

Page 11: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

11

An Example: FPGAs in Oil and Gas .... (2)

Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack

„Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance"

[Herb Riley, R. Associates]

… 25% of Amsterdam‘s electric energy consumption goes into server farms ?

… a quarter square-kilometer of office floor space within New York City is occupied by server farms ?

did you know … This is a strategic issue

Page 12: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

12

Oil and Gas as a strategic issue

It should be investigated, how far the migrational achievements obtained for computationally intensive applications, can also be utilized for servers

You know the amount of Google’ s electricity bill?

Recently the US senate ordered a study on the energy consumption of servers

Low power design: not only to keep the chips cool

Page 13: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

13

Flag ship conference series: IEEE ISCA

Parallelism

faded away

98.5 % von

Neumann98.5 % von

Neumann

[Dav

id P

adua

, Joh

n He

nnes

sy, e

t al.]

Other: cache

coherence ?

speculative

scheduling?

(2001: 84%)

Jean-Loup Baer

migration of the

lemings

Page 14: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

14

Using FPGAs for scientific computation?

Unqualified for RC ?

hiring a student from the EE dept. ?

application disciplines use their own trick boxes:transdisciplinary fragmentation of methodologyCS is responsible to provide a RC common model•for transdisciplinary education•and, to fix its intradisciplinary fragmentation

Page 15: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

15

Computing Curricula 2004fully ignores

Reconfigurable Computing

Joint Task Force for

FPGA & synonyma: 0 hits

not even here

(Google: 10 million hits)

Curricula ?

Page 16: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

16

Upon my complaints the only change: including to the last paragraph of the survey volume:

Curriculum Recommendations, v. 2005

"programmable hardware (including FPGAs, PGAs, PALs,

GALs, etc.)." However, no structural changes at all

v. 2005 intended to be the final version (?) torpedoing the transdisciplinary

responsibility of CS curriculaThis is criminal !This is criminal !

Page 17: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

17

fine-grained vs. coarse-grained reconfigurability

Domain-specific rDPU design rDPU with extensible „instruction„ set

CPU w. extensible instruction set (partially reconfigurable)

Domain-specific CPU design (not reconfigurable)

Soft core CPU (reconfigurable)data-stream-based

instruction-stream-based

“fine-grained” means: data path width ~1 bit

“coarse-grained”: path width = many bits (e.g. 32 bits)

Page 18: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

18

coarse-grained: terminology

DPU

programcounter

DPUCPUCPU

*) “transport-triggered”**) does not have a program counter

termprogra

m counter

execution triggered

byparadigm

CPUyes instruction

fetchinstruction-stream-based

DPU** no data arrival*

data-stream-based

Page 19: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

19

coarse-grained: terminology

DPU

programcounter

DPUCPUCPU

*) “transport-triggered”**) does not have a program counter

PACT Corp, Munich, offers rDPU arraysPACT Corp, Munich, offers rDPU arrays

termprogra

m counter

execution triggered

byparadigm

CPUyes instruction

fetchinstruction-stream-based

DPU** no data arrival*

data-stream-based

rDPAsrDPAs

Page 20: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

20

The Paradigm Shift to Data-Stream-Based

by Software

byConfigware

The Method of CommunicationThe Method of Communicationand Data Transportand Data Transport

the von Neumann syndrome

complex pipe network on rDPA

Page 21: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

21

The Anti Machine

Twin paradigm ?split up into 2 paradigms?

Like mater & anti matter: oneelementary particles physics

A kind of trans(sub)disciplinary effort:the fusion of paradigms

Interpreation [Thomas S.Kuhn]:cleanup the terminology!

non-von-Neumannmachine paradigm

(generalization of the systolic array model)

Page 22: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

22

Languages turned into Religions

• Teaching to students the tunnel view of language designers

• falling in love with the subtleties of formalismes

• instead of meeting the needs of the user

Java is a religion – not a language[Yale Patt]

Page 23: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

23

The language and tool disaster

Software people do not speak VHDL

Hardware people do not speak MPI

Bad quality of the application development tools

End of April a DARPA brainstorming conference

A poll at FCCM’98 revealed, that 86% hardware designers hate their tools

Page 24: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

24

The first Reconfigurable Computer

•prototyped 1884 by Herman Hollerith

•a century before FPGA introduction

•data-stream-based•data-stream-based

•60 years later the von Neumann (vN) model took over

•instruction-stream-based

•instruction-stream-based

Reiner Hartenstein
Herman Hollerith *29 Feb 1860 Buffalo
Page 25: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

25

Reconfigurable Computing came back

•As a separate community – the clash of paradigms•1960 „fixed plus variable structure computer“ proposed by G. Estrin •1970 PLD (programmable logic device*)•1985 FPGA (Field Programmable Gate Array)•1989 Anti Machine Model – counterpart of von Neumann•1990 Coarse-grained Reconfigurable Datapath Array •Wann? Foundation of PACT•Wann reconfigurable address generator – 1994 MoPL

*) Boolean equations in sum of products form implemented by AND matrix and OR matrix

structured VLSI design like memory chips: integration density very close to Moore curve

AND matrix OR matrix

PLA reconfigurable

reconfigurable

ePROM fixed reconfigurable

PAL reconfigurable

fixed

Reiner Hartenstein
....... does not support massive parallelism in large systems......
Page 26: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

26

Outline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 27: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

27

The spirit of the Mainframe Age

•For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps …

•Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view

•… finally tending to code sizes of astronomic dimensions

•1951: Hardware Design going von Neumann (Microprogramming)

Page 28: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

28

von Neumann: array of massive overhead phenomena

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

multi-threading overhead instruction stream

… other overhead instruction stream

… piling up to code sizes of astronomic dimensions

Page 29: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

29

von Neumann: array of massive overhead phenomena

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

multi-threading overhead instruction stream

… other overhead instruction stream

piling up to code sizes of astronomic dimensions

[R.H. 1975] universal bus

considered harmful

[Dijkstra 1968] the “go to”

considered harmful

temptations by von Neumann style software

engineering

massive communication

congestion

Backus, 1978: Can programming be liberated from the von Neumann style?Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style

Page 30: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

30

von Neumann: array of massive overhead phenomena

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

multi-threading overhead instruction stream

… other overhead instruction stream

piling up to code sizes of astronomic dimensions

[R.H. 1975] universal bus

considered harmful

[Dijkstra 1968] the “go to”

considered harmful

temptations by von Neumann style software

engineering

massive communication

congestion

Dijkstra 1968R.H., Koch 1975Backus 1978Arvind 1983

Page 31: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

31

von Neumann overhead: just one

example

overheadvon Neumann

machine

instruction fetch instruction stream

state address computation instruction stream

data address computation instruction stream

data meet PU instruction stream

i/o - to / from off-chip RAM instruction stream

multi-threading overhead instruction stream

… other overhead instruction stream

[1989]: 94% computation load

(image processing example)

94% computation load

only for moving this window

Page 32: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

32

the Memory Wall

DRAM7%/yr..

1

10

100

1000Performance

1980 1990 2000

DRAM

CPU

µProc60%/yr..

Dave Patterson’s Law -“Performance” Gap:

… needs off-chip RAM which fully hits

instruction stream code size of astronomic dimensions …..

growth 50% / yeargrowth 50% / year

CPU clock speed ≠ performance:processor’s silicon is mostly cache

better compare off-chip vs. fast on-chip memory

ends in 2005ends in 2005

2005

: ~

1000

2005

: ~

1000

Reiner Hartenstein
processors are not that good
Page 33: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

33

Benchmarked Computational Density

[BWRC, UC Berkeley, 2004]

1990 1995 2000 2005

200

100

0

50

150

75

25

125

175

SP

EC

fp20

00/M

Hz/

Bill

ion

Tra

nsis

tors

DEC alpha

SUNHP

IBM

alp

ha:

dow

n b

y 1

00

in

6

yrsIBM

: dow

n b

y 2

0 in 6

yrs

stolen from Bob Colwell

CPU caches ...

CPU clock speed ≠ performance:processor’s silicon is mostly cache

Reiner Hartenstein
intel curve removed, meanwhile allcurves removed from RAMP website
Page 34: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

34

Outline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 35: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

35

The Manycore future

• we are embarking on a new computing age -- the age of massive parallelism [Burton Smith]

• multiple von Neumann CPUs on the same µprocessor chip lead to exploding (vN) instruction stream overhead [R.H.]

• Even mobile devices will exploit multicore processors, also to extend battery life [B.S.]

• everyone will have multiple parallel computers [B.S.]

Page 36: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

36

von Neumann parallelism

the watering pot model[Hartenstein]

the sprinkler head has only a single whole:

the von Neumann bottleneck

the sprinkler head has only a single whole:

the von Neumann bottleneck

Page 37: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

37

Several overhead phenomenaThe instruction-stream-

based parallel von Neumann approach:

has several

von Neumann overhead

phenomena

has several

von Neumann overhead

phenomena

per CPU!

per CPU!

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

the watering pot model[Hartenstein]

Page 38: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

38

Explosion of overhead by von Neumann parallelism

overheadvon Neumann

machine

monoprocessor

local overhead

instruction fetch instruction streamstate address computation instruction streamdata address computation instruction stream

data meet PU instruction streami / o to / from off-chip RAM instruction stream

… other overhead instruction stream

parallel

global

inter PU communication instruction stream

message passing instruction stream

proportionate to the number of processors

disproportionate to the number of processors[R.H. 2006] MPI

consideredharmful

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Page 39: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

39

Rewriting Applications•more processors means rewriting applications

•we need to map an application onto different size manycore configurations

•most applications are not readily mappable onto a regular array.

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

•Mapping is much less problematic with Reconfigurable Computing

Page 40: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

40

Disruptive Development

•Computer industry is probably going to be disrupted by some very fundamental changes. [Ian Barron]

•I don‘t agree: we have a model.

•A parallel [vN] programming model for manycore machines will not emerge for five to 10 years [experts from Microsoft Corp].

•We must reinvent computing. [Burton J. Smith]

•Reconfigurable Computing: Technology is Ready, Users are Not•It‘s mainly an education problem

The Education Wall

Reiner Hartenstein
....... does not support massive parallelism in large systems......
Page 41: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

41

Outline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 42: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

42

The Reconfigurable Computing Paradox

•The spirit from the Mainframe Age is collapsing under the von Neumann syndrome

•There is something fundamentally wrong in using the von Neumann paradigm

•Up to 4 orders of magnitude speedup + tremendously slashing the electricity bill by migration to FPGA

•Bad FPGA technology: reconfigurability overhead, wiring overhead, routing congestion, slow clock speed

•The reason of this paradox ?

Page 43: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

43

beyond von Neumann Parallelism

We need an approach like this:

The instruction-stream-based von Neumann

approach:

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

has several

von Neumann overhead

phenomena

per CPU!

per CPU!

it’s data-stream-based RC*

it’s data-stream-based RC*

*) “RC” = Reconfigurable Computing

Page 44: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

44

beyond von Neumann Parallelism

we need an approach like this:

the watering pot model [Hartenstein]

it’s data-stream-based Recondigurable Computing

it’s data-stream-based Recondigurable Computing

instead of this instruction-stream-based parallelism

several von

Neumann overhead

phenomena

several von

Neumann overhead

phenomena

per CPU!

per CPU!

Page 45: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

45

von Neumann overhead vs. Reconfigurable

Computing

overheadvon Neumann

machinehardwired

anti machinereconfigurable anti machine

instruction fetch instruction stream none*state address computation instruction stream none*data address computation instruction stream none*

data meet PU + other overh. instruction stream none*i / o to / from off-chip RAM instruction stream none*Inter PU communication instruction stream none*

message passing overhead instruction stream none*

using

reconfigurable

data countersusing datacounters

usingprogramcounter

*) configured before run time

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPA: reconfigurable datapath arrayrDPA: reconfigurable datapath array

(coa

rse-

grai

ned

rec.

)(c

oars

e-gr

aine

d re

c.)

no

inst

ruct

ion

fetc

h a

t ru

n

tim

e

Page 46: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

46

overheadvon Neumann

machinehardwired

anti machinereconfigurable anti machine

instruction fetch instruction stream none*state address computation instruction stream none*data address computation instruction stream none*

data meet P + other overh. instruction stream none*i / o to / from off-chip RAM instruction stream none*Inter PU communication instruction stream none*

message passing overhead instruction stream none***) just by reconfigurable address generator

von Neumann overhead vs. Reconfigurable

Computingusing

reconfigurable

data countersusing datacounters

usingprogramcounter

*) configured before run time

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

[1989]: x 17 speedup by GAG**

(image processing example)

rDPA: reconfigurable datapath arrayrDPA: reconfigurable datapath array

(coa

rse-

grai

ned

rec.

)(c

oars

e-gr

aine

d re

c.)

[1989]: x 15,000 total speedup

from this migration project

Page 47: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

47

Reconfigurable Computing means …

• Reconfigurable Computing means moving overhead from run time to compile time**

• For HPC run time is more precious than compiletime

• Reconfigurable Computing replaces “looping” at run time* …

http://www.tnt-factory.de/videos_hamster_im_laufrad.htm

… by configuration before run time

*) e. g. complex address computation**) or, loading time

Page 48: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

48

Reconfigurable Computing means …

• Reconfigurable Computing means moving overhead from run time to compile time**

• For HPC run time is more precious than compiletime

• Reconfigurable Computing replaces “looping” at run time* …… by configuration before run time

*) e. g. complex address computation**) or, loading time

Page 49: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

49

Data meeting the Processing Unit (PU)

by Software

byConfigware

routing the data by memory-cycle-hungry instruction streams thru shared memory

data-stream-based: placement* of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... explaining the RC advantage

*) before run time

(data)

(PU)

Page 50: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

50

pipe network, organized at compile time

rDPA = rDPU array, i. e. coarse-grained

rDPU = reconf. datapath unit (no program counter)

What pipe network ?

rDPArDPA

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

Generalization* of

the systolic array

array port receiving or sending a data stream

rDPArDPA

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU[R. Kress, 1995]

*) supporting non-linear pipes on free form hetero arrays

depending on connect fabrics

Page 51: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

51 datacounter

GAG RAM

ASM: Auto-Sequencing

MemoryrDPArDPA

ASMASM ASMASM ASMASM

ASMASM

ASMASM

ASMASM

Migration benefit by on-chip RAM

so that the drastic code size reduction by software to configware migration can beat the memory wall

Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM

multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

rDPUrDPU

ASMASM

ASMASM

ASMASM

ASMASM ASMASM ASMASM

rDPA = rDPU array, i. e. coarse-grainedrDPU = reconf. datapath unit (no program counter)

GAGs inside ASMs generate the data streams

GAG = generic address generator

Page 52: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

52

Coarse-grained Reconfigurable Array exampleimage processing: SNN filter ( mainly a pipe network)

note: kind of software perspective, but without instruction streams datastreams+ pipelining

note: kind of software perspective, but without instruction streams datastreams+ pipelining

compiled by Nageldinger‘s KressArray Xplorer (Juergen Becker‘s CoDe-X inside)

array size: 10 x 16 = 160 such rDPUs

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

rout thru only

not usedbackbus connect

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

ASMASM

rDPUrDPU. . . . . .

. .

32 bits wide32 bits wide

mesh-connected; exceptions: see

3 x 3 fast on-

chip RAM

coming close to programmer‘s mind set (much closer than FPGA)

coming close to programmer‘s mind set (much closer than FPGA)

Page 53: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

53

Outline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 54: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

54

Software / Configware Co-Compilation

Analyzer/ Profiler

SW code

SWcompiler

paradigm“vN" machine

CW Code

CWcompiler

anti machineparadigm

Partitioner

C language source

FW Code

Juergen Becker

1996

But we need a dual paradigm approach: to run legacy software together w. configware

Reconfigurable Computing: Technology is Ready. -- Users are Not ?

apropos compilation:

The CoDe-X co-compiler

Page 55: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

55

Curricula from the mainframe age

non-von-Neumann accelerators

(procedural) structurallydisabled

(this is not a lecture on brain regions)

no common modelno common model

the education wallthe education wall

not really taughtnot really taught the

main

pro

ble

mth

e m

ain

pro

ble

mthe common model is

ready, but users are

notthe common model is

ready, but users are

not

Page 56: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

56

We need a twin paradigm education

Brain Usage: both Hemispheres

each side needs its own common model

each side needs its own common model

procedural structural

(this is not a lecture on brain regions)

Page 57: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

57

RCeducation 2008

http://fpl.org/RCeducation/

The 3rd International Workshop on Reconfigurable Computing Education

April 10, 2008, Montpellier, France

teaching RC ?

Page 58: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

58

We need new courses

“We urgently need a Mead-&-Conway-like text book “[R. H., Dagstuhl Seminar 03301,Germany, 2003]

We need undergraduate lab courses with HW / CW / SW partitioning

We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW co-design

20072007Here it is !

Page 59: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

59

Outline

• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions

Page 60: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

60

Conclusions

•But we need it for some small code sizes, old legacy software, etc. …

•Data streaming is the key model of parallel computation – not vN

•We need to increase the population of HPC-competent people [B.S.]

•The twin paradigm approach is inevitable, also in education [R. H.].

•Von-Neumann-type instruction streams considered harmful [RH]

•We need to increase the population of RC-competent people [R.H.]

Page 61: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

61

An Open Question

please, reply to:

• Coarse-grained arrays: technology ready*, users not ready

• Much closer to programmer’s mind set: really much closer than FPGAs**

•Which effect is delaying the break-through?

*) offered by startups (PACT Corp. and others)

**) “FPGAs? Do we need to learn hardware design?”

Page 62: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

62

thank you

Page 63: The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, 2007  HPRCTA'07 - First.

© 2007, [email protected] [R.H.] http://hartenstein.de

TU Kaiserslautern

63

END