The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV,...
-
Upload
constance-lee -
Category
Documents
-
view
221 -
download
0
Transcript of The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV,...
The von Neumann Syndrome calls for a
Revolution
Reiner Hartenstein
TU Kaiserslautern
Reno, NV, November 11, 2007
http://hartenstein.de
HPRCTA'07 - First International Workshop on High-Performance Reconfigurable Computing
Technology and Applications- in conjunction with SC07 -
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
2
About Scientific Revolutions
Ludwik Fleck: Genesis and Developent of a Scientific Fact
Thomas S. Kuhn: The Structure of Scientific Revolutions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
3
What is the von Neumann Syndrome
Computing the von Neumann style is tremendously inefficient. Multiple layers of massive overhead phenomena at run time often lead to code sizes of astronomic dimensions: resident at drastically slower off-chip memory.
The manycore programming crisis requires complete re-mapping and re-implementation of applications. A sufficiently large population of programmers qualified to program applications for 4 and more cores is far from being available.
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
4Multicore-based
pacifier
I programming
multicores
Education for multi-core
Mateo Valero
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
5
Will Computing be affordable in the Future?
Another problem is a high priority political issue: the very high energy consumption of von-Neumann-based Systems. The electricity consumption of all visible and hidden computers reaches more than 20% of our total electricity consumption. A study predicts 35 - 50% for the US by the year 2020.
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
6
Reconfigurable Computing highly promising
Fundamental concepts from Reconfigurable Computing promise a speed-up by almost one order of magnitude, for some application areas by up to 2 or 3 orders of magnitude, at the same time slashing the electricity bill down to 10% or less.
It is really time to fully exploit the most disruptive revolution since the mainframe: Reconfigurable Computing - also to reverse the down trend in CS enrolment.
Reconfigurable Computing shows us the road map to the personal desktop supercomputer making HPC affordable also for small firms and for individuals, and, to a drastic reduction of energy consumption.
Contracts between microprocessor firms and Reconfigurable Computing system vendors are on the way but not yet published. The technology is ready, but most users are not.
Why?
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
7
A Revolution is overdue
The talk sketches a road map requiring a redefinition of the entire discipline, inspired by the mind set of Reconfigurable Computing.
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
8
much more saved by coarse-grain
platform examle
energyW / Gflops
energy factor
MDgrape-3*(domain-specific 2004)
0.2 1
Pentium 4 14 70
Earth Simulator(supercomputer 2003)
128 640
*) feasible also with rDPA*) feasible also with rDPA
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
9
(3) Power-aware Applications
Cyber infrastructure energy consumption: several predictions.most pessimistic: almost 50% by 2025 in the USA
Mobile Computation, Communication, Entertainment, etc. (high volume market)
HPC and Supercomputing,
2003 and later
PCs and servers (high volume)
2020
100
200
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
10
An Example: FPGAs in Oil and Gas .... (1)
For this example speed-up is not my key issue (Jürgen Becker‘s tutorial showed much higher speed-ups - going upto a factor of 6000)
„Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance"
[Herb Riley, R. Associates]
For this oil and gas example a side effect is much more interesting than the speed-up
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
11
An Example: FPGAs in Oil and Gas .... (2)
Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack
„Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance"
[Herb Riley, R. Associates]
… 25% of Amsterdam‘s electric energy consumption goes into server farms ?
… a quarter square-kilometer of office floor space within New York City is occupied by server farms ?
did you know … This is a strategic issue
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
12
Oil and Gas as a strategic issue
It should be investigated, how far the migrational achievements obtained for computationally intensive applications, can also be utilized for servers
You know the amount of Google’ s electricity bill?
Recently the US senate ordered a study on the energy consumption of servers
Low power design: not only to keep the chips cool
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
13
Flag ship conference series: IEEE ISCA
Parallelism
faded away
98.5 % von
Neumann98.5 % von
Neumann
[Dav
id P
adua
, Joh
n He
nnes
sy, e
t al.]
Other: cache
coherence ?
speculative
scheduling?
(2001: 84%)
Jean-Loup Baer
migration of the
lemings
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
14
Using FPGAs for scientific computation?
Unqualified for RC ?
hiring a student from the EE dept. ?
application disciplines use their own trick boxes:transdisciplinary fragmentation of methodologyCS is responsible to provide a RC common model•for transdisciplinary education•and, to fix its intradisciplinary fragmentation
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
15
Computing Curricula 2004fully ignores
Reconfigurable Computing
Joint Task Force for
FPGA & synonyma: 0 hits
not even here
(Google: 10 million hits)
Curricula ?
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
16
Upon my complaints the only change: including to the last paragraph of the survey volume:
Curriculum Recommendations, v. 2005
"programmable hardware (including FPGAs, PGAs, PALs,
GALs, etc.)." However, no structural changes at all
v. 2005 intended to be the final version (?) torpedoing the transdisciplinary
responsibility of CS curriculaThis is criminal !This is criminal !
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
17
fine-grained vs. coarse-grained reconfigurability
Domain-specific rDPU design rDPU with extensible „instruction„ set
CPU w. extensible instruction set (partially reconfigurable)
Domain-specific CPU design (not reconfigurable)
Soft core CPU (reconfigurable)data-stream-based
instruction-stream-based
“fine-grained” means: data path width ~1 bit
“coarse-grained”: path width = many bits (e.g. 32 bits)
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
18
coarse-grained: terminology
DPU
programcounter
DPUCPUCPU
*) “transport-triggered”**) does not have a program counter
termprogra
m counter
execution triggered
byparadigm
CPUyes instruction
fetchinstruction-stream-based
DPU** no data arrival*
data-stream-based
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
19
coarse-grained: terminology
DPU
programcounter
DPUCPUCPU
*) “transport-triggered”**) does not have a program counter
PACT Corp, Munich, offers rDPU arraysPACT Corp, Munich, offers rDPU arrays
termprogra
m counter
execution triggered
byparadigm
CPUyes instruction
fetchinstruction-stream-based
DPU** no data arrival*
data-stream-based
rDPAsrDPAs
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
20
The Paradigm Shift to Data-Stream-Based
by Software
byConfigware
The Method of CommunicationThe Method of Communicationand Data Transportand Data Transport
the von Neumann syndrome
complex pipe network on rDPA
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
21
The Anti Machine
Twin paradigm ?split up into 2 paradigms?
Like mater & anti matter: oneelementary particles physics
A kind of trans(sub)disciplinary effort:the fusion of paradigms
Interpreation [Thomas S.Kuhn]:cleanup the terminology!
non-von-Neumannmachine paradigm
(generalization of the systolic array model)
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
22
Languages turned into Religions
• Teaching to students the tunnel view of language designers
• falling in love with the subtleties of formalismes
• instead of meeting the needs of the user
Java is a religion – not a language[Yale Patt]
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
23
The language and tool disaster
Software people do not speak VHDL
Hardware people do not speak MPI
Bad quality of the application development tools
End of April a DARPA brainstorming conference
A poll at FCCM’98 revealed, that 86% hardware designers hate their tools
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
24
The first Reconfigurable Computer
•prototyped 1884 by Herman Hollerith
•a century before FPGA introduction
•data-stream-based•data-stream-based
•60 years later the von Neumann (vN) model took over
•instruction-stream-based
•instruction-stream-based
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
25
Reconfigurable Computing came back
•As a separate community – the clash of paradigms•1960 „fixed plus variable structure computer“ proposed by G. Estrin •1970 PLD (programmable logic device*)•1985 FPGA (Field Programmable Gate Array)•1989 Anti Machine Model – counterpart of von Neumann•1990 Coarse-grained Reconfigurable Datapath Array •Wann? Foundation of PACT•Wann reconfigurable address generator – 1994 MoPL
*) Boolean equations in sum of products form implemented by AND matrix and OR matrix
structured VLSI design like memory chips: integration density very close to Moore curve
AND matrix OR matrix
PLA reconfigurable
reconfigurable
ePROM fixed reconfigurable
PAL reconfigurable
fixed
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
26
Outline
• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
27
The spirit of the Mainframe Age
•For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps …
•Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view
•… finally tending to code sizes of astronomic dimensions
•1951: Hardware Design going von Neumann (Microprogramming)
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
28
von Neumann: array of massive overhead phenomena
overheadvon Neumann
machine
instruction fetch instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU instruction stream
i/o - to / from off-chip RAM instruction stream
multi-threading overhead instruction stream
… other overhead instruction stream
… piling up to code sizes of astronomic dimensions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
29
von Neumann: array of massive overhead phenomena
overheadvon Neumann
machine
instruction fetch instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU instruction stream
i/o - to / from off-chip RAM instruction stream
multi-threading overhead instruction stream
… other overhead instruction stream
piling up to code sizes of astronomic dimensions
[R.H. 1975] universal bus
considered harmful
[Dijkstra 1968] the “go to”
considered harmful
temptations by von Neumann style software
engineering
massive communication
congestion
Backus, 1978: Can programming be liberated from the von Neumann style?Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
30
von Neumann: array of massive overhead phenomena
overheadvon Neumann
machine
instruction fetch instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU instruction stream
i/o - to / from off-chip RAM instruction stream
multi-threading overhead instruction stream
… other overhead instruction stream
piling up to code sizes of astronomic dimensions
[R.H. 1975] universal bus
considered harmful
[Dijkstra 1968] the “go to”
considered harmful
temptations by von Neumann style software
engineering
massive communication
congestion
Dijkstra 1968R.H., Koch 1975Backus 1978Arvind 1983
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
31
von Neumann overhead: just one
example
overheadvon Neumann
machine
instruction fetch instruction stream
state address computation instruction stream
data address computation instruction stream
data meet PU instruction stream
i/o - to / from off-chip RAM instruction stream
multi-threading overhead instruction stream
… other overhead instruction stream
[1989]: 94% computation load
(image processing example)
94% computation load
only for moving this window
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
32
the Memory Wall
DRAM7%/yr..
1
10
100
1000Performance
1980 1990 2000
DRAM
CPU
µProc60%/yr..
Dave Patterson’s Law -“Performance” Gap:
… needs off-chip RAM which fully hits
instruction stream code size of astronomic dimensions …..
growth 50% / yeargrowth 50% / year
CPU clock speed ≠ performance:processor’s silicon is mostly cache
better compare off-chip vs. fast on-chip memory
ends in 2005ends in 2005
2005
: ~
1000
2005
: ~
1000
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
33
Benchmarked Computational Density
[BWRC, UC Berkeley, 2004]
1990 1995 2000 2005
200
100
0
50
150
75
25
125
175
SP
EC
fp20
00/M
Hz/
Bill
ion
Tra
nsis
tors
DEC alpha
SUNHP
IBM
alp
ha:
dow
n b
y 1
00
in
6
yrsIBM
: dow
n b
y 2
0 in 6
yrs
stolen from Bob Colwell
CPU caches ...
CPU clock speed ≠ performance:processor’s silicon is mostly cache
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
34
Outline
• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
35
The Manycore future
• we are embarking on a new computing age -- the age of massive parallelism [Burton Smith]
• multiple von Neumann CPUs on the same µprocessor chip lead to exploding (vN) instruction stream overhead [R.H.]
• Even mobile devices will exploit multicore processors, also to extend battery life [B.S.]
• everyone will have multiple parallel computers [B.S.]
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
36
von Neumann parallelism
the watering pot model[Hartenstein]
the sprinkler head has only a single whole:
the von Neumann bottleneck
the sprinkler head has only a single whole:
the von Neumann bottleneck
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
37
Several overhead phenomenaThe instruction-stream-
based parallel von Neumann approach:
has several
von Neumann overhead
phenomena
has several
von Neumann overhead
phenomena
per CPU!
per CPU!
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
the watering pot model[Hartenstein]
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
38
Explosion of overhead by von Neumann parallelism
overheadvon Neumann
machine
monoprocessor
local overhead
instruction fetch instruction streamstate address computation instruction streamdata address computation instruction stream
data meet PU instruction streami / o to / from off-chip RAM instruction stream
… other overhead instruction stream
parallel
global
inter PU communication instruction stream
message passing instruction stream
proportionate to the number of processors
disproportionate to the number of processors[R.H. 2006] MPI
consideredharmful
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
39
Rewriting Applications•more processors means rewriting applications
•we need to map an application onto different size manycore configurations
•most applications are not readily mappable onto a regular array.
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
•Mapping is much less problematic with Reconfigurable Computing
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
40
Disruptive Development
•Computer industry is probably going to be disrupted by some very fundamental changes. [Ian Barron]
•I don‘t agree: we have a model.
•A parallel [vN] programming model for manycore machines will not emerge for five to 10 years [experts from Microsoft Corp].
•We must reinvent computing. [Burton J. Smith]
•Reconfigurable Computing: Technology is Ready, Users are Not•It‘s mainly an education problem
The Education Wall
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
41
Outline
• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
42
The Reconfigurable Computing Paradox
•The spirit from the Mainframe Age is collapsing under the von Neumann syndrome
•There is something fundamentally wrong in using the von Neumann paradigm
•Up to 4 orders of magnitude speedup + tremendously slashing the electricity bill by migration to FPGA
•Bad FPGA technology: reconfigurability overhead, wiring overhead, routing congestion, slow clock speed
•The reason of this paradox ?
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
43
beyond von Neumann Parallelism
We need an approach like this:
The instruction-stream-based von Neumann
approach:
the watering pot model [Hartenstein]
has several
von Neumann overhead
phenomena
has several
von Neumann overhead
phenomena
per CPU!
per CPU!
it’s data-stream-based RC*
it’s data-stream-based RC*
*) “RC” = Reconfigurable Computing
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
44
beyond von Neumann Parallelism
we need an approach like this:
the watering pot model [Hartenstein]
it’s data-stream-based Recondigurable Computing
it’s data-stream-based Recondigurable Computing
instead of this instruction-stream-based parallelism
several von
Neumann overhead
phenomena
several von
Neumann overhead
phenomena
per CPU!
per CPU!
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
45
von Neumann overhead vs. Reconfigurable
Computing
overheadvon Neumann
machinehardwired
anti machinereconfigurable anti machine
instruction fetch instruction stream none*state address computation instruction stream none*data address computation instruction stream none*
data meet PU + other overh. instruction stream none*i / o to / from off-chip RAM instruction stream none*Inter PU communication instruction stream none*
message passing overhead instruction stream none*
using
reconfigurable
data countersusing datacounters
usingprogramcounter
*) configured before run time
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPA: reconfigurable datapath arrayrDPA: reconfigurable datapath array
(coa
rse-
grai
ned
rec.
)(c
oars
e-gr
aine
d re
c.)
no
inst
ruct
ion
fetc
h a
t ru
n
tim
e
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
46
overheadvon Neumann
machinehardwired
anti machinereconfigurable anti machine
instruction fetch instruction stream none*state address computation instruction stream none*data address computation instruction stream none*
data meet P + other overh. instruction stream none*i / o to / from off-chip RAM instruction stream none*Inter PU communication instruction stream none*
message passing overhead instruction stream none***) just by reconfigurable address generator
von Neumann overhead vs. Reconfigurable
Computingusing
reconfigurable
data countersusing datacounters
usingprogramcounter
*) configured before run time
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
[1989]: x 17 speedup by GAG**
(image processing example)
rDPA: reconfigurable datapath arrayrDPA: reconfigurable datapath array
(coa
rse-
grai
ned
rec.
)(c
oars
e-gr
aine
d re
c.)
[1989]: x 15,000 total speedup
from this migration project
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
47
Reconfigurable Computing means …
• Reconfigurable Computing means moving overhead from run time to compile time**
• For HPC run time is more precious than compiletime
• Reconfigurable Computing replaces “looping” at run time* …
http://www.tnt-factory.de/videos_hamster_im_laufrad.htm
… by configuration before run time
*) e. g. complex address computation**) or, loading time
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
48
Reconfigurable Computing means …
• Reconfigurable Computing means moving overhead from run time to compile time**
• For HPC run time is more precious than compiletime
• Reconfigurable Computing replaces “looping” at run time* …… by configuration before run time
*) e. g. complex address computation**) or, loading time
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
49
Data meeting the Processing Unit (PU)
by Software
byConfigware
routing the data by memory-cycle-hungry instruction streams thru shared memory
data-stream-based: placement* of the execution locality ...
We have 2 choices
pipe network generated by configware compilation
... explaining the RC advantage
*) before run time
(data)
(PU)
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
50
pipe network, organized at compile time
rDPA = rDPU array, i. e. coarse-grained
rDPU = reconf. datapath unit (no program counter)
What pipe network ?
rDPArDPA
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
Generalization* of
the systolic array
array port receiving or sending a data stream
rDPArDPA
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU[R. Kress, 1995]
*) supporting non-linear pipes on free form hetero arrays
depending on connect fabrics
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
51 datacounter
GAG RAM
ASM: Auto-Sequencing
MemoryrDPArDPA
ASMASM ASMASM ASMASM
ASMASM
ASMASM
ASMASM
Migration benefit by on-chip RAM
so that the drastic code size reduction by software to configware migration can beat the memory wall
Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM
multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
rDPUrDPU
ASMASM
ASMASM
ASMASM
ASMASM ASMASM ASMASM
rDPA = rDPU array, i. e. coarse-grainedrDPU = reconf. datapath unit (no program counter)
GAGs inside ASMs generate the data streams
GAG = generic address generator
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
52
Coarse-grained Reconfigurable Array exampleimage processing: SNN filter ( mainly a pipe network)
note: kind of software perspective, but without instruction streams datastreams+ pipelining
note: kind of software perspective, but without instruction streams datastreams+ pipelining
compiled by Nageldinger‘s KressArray Xplorer (Juergen Becker‘s CoDe-X inside)
array size: 10 x 16 = 160 such rDPUs
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
rout thru only
not usedbackbus connect
ASMASM
ASMASM
ASMASM
ASMASM
ASMASM
ASMASM
ASMASM
ASMASM
ASMASM
rDPUrDPU. . . . . .
. .
32 bits wide32 bits wide
mesh-connected; exceptions: see
3 x 3 fast on-
chip RAM
coming close to programmer‘s mind set (much closer than FPGA)
coming close to programmer‘s mind set (much closer than FPGA)
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
53
Outline
• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
54
Software / Configware Co-Compilation
Analyzer/ Profiler
SW code
SWcompiler
paradigm“vN" machine
CW Code
CWcompiler
anti machineparadigm
Partitioner
C language source
FW Code
Juergen Becker
1996
But we need a dual paradigm approach: to run legacy software together w. configware
Reconfigurable Computing: Technology is Ready. -- Users are Not ?
apropos compilation:
The CoDe-X co-compiler
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
55
Curricula from the mainframe age
non-von-Neumann accelerators
(procedural) structurallydisabled
(this is not a lecture on brain regions)
no common modelno common model
the education wallthe education wall
not really taughtnot really taught the
main
pro
ble
mth
e m
ain
pro
ble
mthe common model is
ready, but users are
notthe common model is
ready, but users are
not
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
56
We need a twin paradigm education
Brain Usage: both Hemispheres
each side needs its own common model
each side needs its own common model
procedural structural
(this is not a lecture on brain regions)
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
57
RCeducation 2008
http://fpl.org/RCeducation/
The 3rd International Workshop on Reconfigurable Computing Education
April 10, 2008, Montpellier, France
teaching RC ?
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
58
We need new courses
“We urgently need a Mead-&-Conway-like text book “[R. H., Dagstuhl Seminar 03301,Germany, 2003]
We need undergraduate lab courses with HW / CW / SW partitioning
We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW co-design
20072007Here it is !
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
59
Outline
• von Neumann overhead hits the memory wall• The manycore programming crisis• Reconfigurable Computing is the solution• We need a twin paradigm approach• Conclusions
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
60
Conclusions
•But we need it for some small code sizes, old legacy software, etc. …
•Data streaming is the key model of parallel computation – not vN
•We need to increase the population of HPC-competent people [B.S.]
•The twin paradigm approach is inevitable, also in education [R. H.].
•Von-Neumann-type instruction streams considered harmful [RH]
•We need to increase the population of RC-competent people [R.H.]
© 2007, [email protected] [R.H.] http://hartenstein.de
TU Kaiserslautern
61
An Open Question
please, reply to:
• Coarse-grained arrays: technology ready*, users not ready
• Much closer to programmer’s mind set: really much closer than FPGAs**
•Which effect is delaying the break-through?
*) offered by startups (PACT Corp. and others)
**) “FPGAs? Do we need to learn hardware design?”