Reiner Hartenstein University of Kaiserslautern

68
bling Technologies for onfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co- Design Part 3: Resources for RC - Reiner Hartenstein University of Kaiserslautern July 8, 2002, ENST, Paris, France

description

July 8, 2002, ENST, Paris, France. Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 3: Resources for RC -. Reiner Hartenstein University of Kaiserslautern. Schedule. Opportunities by new patent laws ?. to clever guys being keen on patents: - PowerPoint PPT Presentation

Transcript of Reiner Hartenstein University of Kaiserslautern

Page 1: Reiner Hartenstein University of Kaiserslautern

Enabling Technologies for

Reconfigurable Computing

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 3:Resources for RC

-

Reiner Hartenstein

University ofKaiserslautern

July 8, 2002, ENST, Paris, France

Page 2: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de2

University of Kaiserslautern

Xputer Lab

Schedule

time slotxx.30 – xx.00 Reconfigurable Computing (RC)xx.00 – xx.30 coffee breakxx.30 – xx.00 Design / Compilation Techniquesxx.00 – xx.00 lunch breakxx.00 – xx.30 Resources for Data-Stream-based RCxx.30 – xx.00 coffee breakxx.00 – xx.30 FPGAs: recent developments

Page 3: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de3

University of Kaiserslautern

Xputer LabOpportunities by new patent

laws ?

• to clever guys being keen on patents:

• don‘t file for patent following details !

• everything shown in this presentation has been published years ago

Page 4: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de4

University of Kaiserslautern

Xputer Lab>> Configware Industry

• Configware Industry

• Terminology

• MoPL data-procedural language

• Anti architecture and circuitry

• Stream-based Memory Architecturehttp://www.uni-kl.de

Page 5: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de5

University of Kaiserslautern

Xputer LabConfigware heading for mainstream

• Configware market taking off for mainstream• FPGA-based designs more complex, even SoC• No design productivity and quality without good configware

libraries (soft IP cores) from various application areas. • Growing no. of independent configware houses (soft IP core

vendors) and design services • AllianceCORE & Reference Design Alliance• Currently the top FPGA vendors are the key innovators and

meet most configware demand.

Page 6: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de6

University of Kaiserslautern

Xputer LabOS for PLDs

• separate EDA software market, comparable to the compiler / OS market in computers,

• Cadence, Mentor, Synopsys just jumped in.

• < 5% Xilinx / Altera income from EDA SW

Page 7: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de7

University of Kaiserslautern

Xputer Lab Xilinx Alliances

• The Software AllianceEDA Program

• ... Xilinx Inc.'s Foundation...

• free WebPACK downloadable tool palette

• The Xilinx XtremeDSP Initiative (with Mentor Graphics)

• MathWorks / Xilinx Alliance.

• The Wind River / Xilinx alliance

•#

Page 8: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de8

University of Kaiserslautern

Xputer Lab

The Software Alliance EDA Program

provides a wide selection of EDA tools

Acugen Software, Agilent EEsof EDA, Aldec, Aptix, Auspy Development, Cadence, Celoxica, Dolphin Integration, Elanix, Exemplar, Flynn Systems, Hyperlynx,

IKOS Systems, Innoveda, MentorGraphics, MiroTech, Model Technoloy, Protel International, Simucad, SynaptiCAD, Synopsys,Synplicity, Translogic, Virtual Computer Corporation.

helps leading EDA vendors to integrate Xilinx Alliance software tightly into their tools

Page 9: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de9

University of Kaiserslautern

Xputer LabThe Xilinx AllianceCORE

programa cooperation between Xilinx and third-party core developers, to produce a broad selection of industry-standard solutions for

use in Xilinx platforms. - Partners are:Amphion Semiconductor, Ltd. ARC Cores CAST, Inc. DELTATEC Derivation Systems, Inc.Dolphin Integration (Grenoble) Eureka Technology Inc. Frontier Design Inc. GV & Associates, Inc. inSilicon Corporation iCODING Technology Inc. Loarant CorporationMindspeed Technologies - A Conexant Business (formerly Applied Telecom) |

MemecCore Mentor GraphicsInventra NewLogic Technologies, Inc. (Europe) NMI Electronics Paxonet Communications, Inc. Perigee, LLC Rapid Prototypes Inc. sci-worx GmbH (Hannover, Germany) SysOnChip TILAB (Telecom Italia Lab) VAutomation Virtual IP Group, Inc.XYLON.

Page 10: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de10

University of Kaiserslautern

Xputer LabThe Xilinx Reference Design Alliance

Program

The Xilinx Reference Design Alliance Program helps the development of multi-component reference designs that incorporate Xilinx devices and other semiconductors. The designs are fully functional, but no warranties, no liability. Partners are:.

ADI Engineering Innovative Integration

JK microsystems, Inc.LYR Technologies NetLogic Microsystems

Page 11: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de11

University of Kaiserslautern

Xputer LabThe Xilinx University Program

The Xilinx University Program provides

• Xilinx Student Edition Software, • Professor Workshops, • a Xilinx University User Group, • Presentation Materials and Lab Files, • Course Examples, • Research,• Books, etc.

Page 12: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de12

University of Kaiserslautern

Xputer Lab Altera offers over a hundred IP cores (1)

•modulator, •synchronizer, •DDR SDRAM controller,•Hadamar transform, •interrupt controller, •Real86 16 bit microprocessor, •floating point, •FIR filter, •discrete cosine, •ATM cell processor, •and many others.

•controller, •UART, •microprocessor, •decoder, •bus control, •USB controller, •PCI bus interface, •viterbi controller, •fast Ethernet •MAC receiver or transmitter,

Altera offers over a hundred IP cores like, for example:

Page 13: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de13

University of Kaiserslautern

Xputer Lab Altera offers over a hundred IP cores (2)

from Altera | AMIRIX Systems, Inc. Amphion Semiconductor, Ltd. Arasan Chip Systems, Inc. CAST, Inc. Digital Core Design Eureka Technology Inc. HammerCores InnocorKtech Telecommunications, Inc. Lexra Computing EnginesMentor Graphics - Inventra

Modelware Ncomm, Inc. NewLogic Technologies Northwest Logic Nova Engineering, Inc. Palmchip Corporation Paxonet Communications PLD Applications Sciworx Simple Silicon Tensilica TurboConcept.

Page 14: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de14

University of Kaiserslautern

Xputer LabAltera IP core design services

Altera IP core design services are available from:

• Northwest Logic

Page 15: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de15

University of Kaiserslautern

Xputer Lab Altera Certified Design Center (CDC) Program

Certified Design Center (CDC) Program:

• Barco Silex • El Camino GmbH • Excel Consultants • Plextek • Reflex Consulting • Sci-worx • Tality • Zaiq Technologies.

Page 16: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de16

University of Kaiserslautern

Xputer LabThe Altera Consultants

Alliance Program (ACAP):

The Altera Consultants Alliance Program (ACAP): lists

•41 offices in North America and

•29 in the rest of the world.

Page 17: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de17

University of Kaiserslautern

Xputer LabDevlopment boards

Devlopment boards are offered from: • Altera • El Camino GmbH • Gid'el Limited• Nova Engineering, Inc. • PLD Applications • Princeton Technology Group • RPA Electronics Design, LLC • Tensilica.

Page 18: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de18

University of Kaiserslautern

Xputer Lab Consultants and services not listed by Xilinx nor Altera (index)

Algotronix, Edinburgh, Andraka Consulting Group Arkham Technology, Pasadena, CA Barco Silex, Louvain-la-Neuve, Belgium, Bottom Line Technologies, Milford, NJCodelogic, Helderberg, South Africa, Coelacanth Engineering, Norwell, MASS Comit Systems, Inc., Santa Clara, CAEDTN Programmable Logic Design Center

Flexibilis, Tampere, Finland, Geoff Bostock Designs, Wiltshire, England, Great River Technology, Alberquerque, NM, New Horizons GB Ltd, United Kingdom, North West LogicSilicon System Solutions, Canterbury, Australia, Smartech, Tampere, Finland, Tekmosv, Austin, Texas, The Rockland Group, Garden Valley, CANick Tredennick, Los Gatos, California, Vitesse,

Page 19: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de19

University of Kaiserslautern

Xputer Lab Consultants and services not listed by Xilinx nor Altera (1)

Algotronix, Edinburgh, Reconfigurable Computing and FPL in software radio, communications and computer security

Andraka Consulting Group high performance FPGA designs for DSP applications

Arkham Technology, Pasadena, low cost IP cores for Xilinx and Atmel, embedded processor, DSP, wireless communication, COM / CORBA / DirectX, client-server database programming, software internationalization, PCB design

Barco Silex, Louvain-la-Neuve, Belgium, IP integration boards for ASIC and FPGA, consultancy, design, sub-contracting

Page 20: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de20

University of Kaiserslautern

Xputer Lab Consultants and services not listed by Xilinx nor

Altera (2)Bottom Line Technologies, Milford, New Jersey, FPGA design, training, designing Xilinx parts since 1985

Codelogic, Helderberg, South Africa, consulting, FPGA design services

Coelacanth Engineering, Norwell, Massachusetts, design services, test development services, in wireless communication, DSP-based instrumentation, mixed-signal ATE

Comit Systems, Inc., Santa Clara, California, DSP, ASIC, networking, embedded control in avionics -- FPGA / ASIC design and system software

EDTN Programmable Logic Design Center

Page 21: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de21

University of Kaiserslautern

Xputer Lab Consultants and services not listed by Xilinx nor

Altera (3)FirstPass, Castle Rock, Colorado

Vitesse, ASIC design

Flexibilis, Tampere, Finland, VHDL IP cores for Xilinx products

Geoff Bostock Designs, Wiltshire, England, FPGA design services

Great River Technology, Alberquerque, New Mexico, FPGA design services in digital video and point-to-point data transmission for aerospace, military, and commercial broadcasters

New Horizons GB Ltd, United Kingdom, FPGA design and training, Xilinx specialist

North West Logic; FPGA and embedded processor design in digital communications, digital video

Page 22: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de22

University of Kaiserslautern

Xputer Lab Consultants and services not listed by Xilinx nor

Altera (4)Silicon System Solutions, Canterbury, Australia, VHDL IP cores for the ASIC and FPGA/CPLD/EPLD markets

Smartech, Tampere, Finland, ASIC and FPGA design

Tekmosv, Austin, Texas, Multiple Designs on a Single Gate Array, HDL synthesis, design conversions, chip debug, test generation

The Rockland Group, Garden Valley, California, a TeleConsulting organization about logic design for FPGAs

Nick Tredennick, Los Gatos, California, investor and consultant

Page 23: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de23

University of Kaiserslautern

Xputer Lab>> Terminology

• Configware Industry

• Terminology

• MoPL data-procedural language

• Anti architecture and circuitry

• Stream-based Memory Architecture

http://www.uni-kl.de

Page 24: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de24

University of Kaiserslautern

Xputer LabTerminology

Paradigm Platform Programming

source

“von Neumann” Hardware Software

Soft Machine (w. soft datapaths)

Coarse grain Flexware

high level Configware

RL (FPGA etc.) fine grain Flexware netlist level

Configware

Page 25: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de25

University of Kaiserslautern

Xputer LabTerminology & Acronyms

• Software (SW): procedural sources*• Configware (CW): structural sources• Hardware (HW): hardwired platforms• ASIC: customizable hardwired platforms • Flexware (FW): reconfigurable platforms• FPGA: field-programmable gate array• FPL: field-programmable logic

• RC: reconfigurable computing• RL: reconfigurable logic

*) note: firmware is SW !

Page 26: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de26

University of Kaiserslautern

Xputer LabStream-based Computing (2)

terms:

• DPU: datapath unit• DPA: datapath array• rDPU: reconfigurable

DPU• rDPA: reconfigurable

DPA

• stream-based computing: using complex pipe network (super-systolic: Kress et al.)

Page 27: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de27

University of Kaiserslautern

Xputer LabConfusing Terminology

Computer Science and EE as well as ist R&D and applicatgion areas suffer from a babylonial confusion.

Communication not only between Computer Science and EE, but also between ist special areas, even between ist different abstrac tion levels is made difficult – mainly because of immature terminology in relation to reconfigurable circuits and their applications.

Terms are rarely standardized and often used with drastically different meanings – even within then same special area.

Often terms have been so badly coined, that they are not self-explanatory, but mesleading. A demonstratory example is the comparizon of terms used used in VHDL and Verilog.  

Ideal are "intuitive" terms. But often Intuition yields the wrong idea. Whenever a new term appears in teaching, I often have to tell the students, that the term does not mean, what he believes.  

Page 28: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de28

University of Kaiserslautern

Xputer LabTerms (1)

Term Meaning Example

Hardware hardwired ASIC, CPU, DPU, DPA

Morphware Reconfigurable(structurally programmable)

FPLA, FPGA, rDPU, rDPA

Firmware Microprogramme (rarely used after introduction of RISC proc.)

IBM 360 Computer Family

Software procedural programs (instruction stream exec. by a CPU)

Word, C, OS, Compiler, etc.

Streamware

data-procedural programs (data streams exec. by a DPU or DPA)

data schedules, data streams, e. g. MoPL programs

Configware structural programs, soft IP cores, personalizing CPLD, FPGA, or other Flexware

f. configuration of rDPA, FPGA, e. g. as a logic circuit, state machine, datapath, function 

 

[à la Ingo Kreuz]

Page 29: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de29

University of Kaiserslautern

Xputer LabTerms (2)

Term Meaning Example

data objects of computing: w. “data” property depends on the moment of watching

Bits, numbers, operands, results, any text (also compiler input) lists, graphs, tables, images, ...

data stream ordered, also parallel data word lists, obtained by scheduling

I/O data streams for systolic or other arrays,Also DSP

programming

personalisation by loading programm code

procedural code or structural code: for (re)configuration

program source text or object code for programming

procedural or structural

  

[à la Ingo Kreuz]

Page 30: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de30

University of Kaiserslautern

Xputer LabTerms (3)

Term Meaning Example

boot program simple program to enable programming- usually saved in non-volatile memory

comparable to the starter of the engine of a car

booting load and execute a boot program

 

  

[à la Ingo Kreuz]

Page 31: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de31

University of Kaiserslautern

Xputer LabHardware Terms (1)

Term Meaning Example

machine execution unit, driven by deterministic sequencer, + memory

von Neumann, or anti machine

„dataflow machine“

not a machine, since without a deterministic sequencer (exotic concept)

(dead research area)

CPU instruction stream processor ("von Neumann”): program counter (instruction sequencer) and DPU - mode of operation: deterministically instruction-driven

ARM, Pentium core,

DPU, rDPU (reconfigurable) data path unit*

DPA, rDPA (reconfigurable) DPU array* KressArray

[à la Ingo Kreuz]

*) processing datastreams (transport-triggered), not yet a machine: autosequencing memory missing

Page 32: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de32

University of Kaiserslautern

Xputer LabHardware Terms (2)

Term Meaning Example

DPU data path unit, processes operands - no CPU since without sequencer - no maschine

ALU with registers, multiplexers etc.

Computer CPU with RAM and interfaces  

Parallel Computer

ensemble of several Computers

 

Xputer deterministically data-driven Machine, (transport-triggered) - data counter(s) used instead of a program counterm

MoM architectures (Kaiserslautern)

dataflow machine

indeterministically data-driven (execution sequence unpredictable)

(sleeping research area)

[à la Ingo Kreuz]

Page 33: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de33

University of Kaiserslautern

Xputer LabTerms on Parallelism (1)

Term Meaning Example

parallelism several levels of parallelism distinguished

parallel processes, parallelism at instruction set level, pipelines,

concurrent parallel processes run on different CPUs of a parallel computer - may occasionally exchange signals or data

weather prognisis, complex simulations, etc.

ISP (instruction set parallelism)

several CPUs run in parallel by clocked synchronization

VLIW (very long instruction word) computer

[à la Ingo Kreuz]

Page 34: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de34

University of Kaiserslautern

Xputer LabTerms on Parallelism (2)

Term Meaning Example

pipelining several uniform or different DPUs running simultaneously - connected to a pipeline by buffer registers.

pipelined CPUs, pipe networks, systolic, etc.

chaining several uniform or different DPUs running simultaneously - connected to a pipeline without buffer registers

Schaltnetze, komplexe arithmetische Operatoren

Pipe network Ensemble of DPUs, also multiple pipelines, also with irregular or wild structures

systolisc arrays, stream-based computing arrays

[à la Ingo Kreuz]

Page 35: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de35

University of Kaiserslautern

Xputer LabTerms on Parallelism (3)

Term Meaning Example

Systolic Array Pipe network with only linear (straight-on, no branching), uniform pipelines (all DPUs hardwired and with same functionality) pipelines

Matrix computation, DSP, DNA sequencing, etc.

stream-based computing arrays (super-systolic arrays)

pipe network, configured before fabrication

image processing, DSP, complex functions and algorithms

(coarse grain) reconf. stream-based arrays

stream-based arrays, configurable after fabrication

KressArray

[à la Ingo Kreuz]

Page 36: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de36

University of Kaiserslautern

Xputer LabCounterparts

category property counterpart

programing mode

procedural (classical)

structural (synthesis, design) - „field-programmable“, PLA „programming“, etc.

machine: principle of operation

controlflow-driven (instruction-driven): v. Neumann

Data-driven: Xputer machine

system: principle of operation

instruction-flow-driven (parallel computer etc.)

Data-stream-based (systolisc array, DPU array, KressArray)

Set-up time (datapaths switched thru)

during run time; (instruction-driven)

before run time:FPGA (at compile time)Gate Array (at fabrication)

 

[à la Ingo Kreuz]

Page 37: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de37

University of Kaiserslautern

Xputer Lab>> MoPL data-procedural

language

• Configware Industry

• Terminology

• MoPL data-procedural language

• Anti architecture and circuitry

• Stream-based Memory Architecture

http://www.uni-kl.de

Page 38: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de38

University of Kaiserslautern

Xputer LabFundamental Ideas available

(1)

• Data Sequencer Methodology

• Data-procedural Languages (Duality with v N)

• ... supporting memory bandwidth optimization

• Soft Data Path Synthesis Algorithms

• Parallelizing Loop Transformation Methods

• Compilers supporting Soft Machines

• SW / CW Partitioning Co-Compilers

Page 39: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de39

University of Kaiserslautern

Xputer LabFundamental Ideas available

(2)

• Programming Xputers

• Similarities to programming computers

• How not to get confused by similarities

• What benefits vs. Computers ?

Page 40: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de40

University of Kaiserslautern

Xputer Lab Programming Language Paradigms

language category Computer Languages Xputer Languages

both deterministic procedural sequencing: traceable, checkpointable

operation sequence driven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes, instruction stream branching

read next data item, goto (data addr.),

jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching

state register program counter data counter(s) address computation

massive memory cycle overhead overhead avoided

Instruction fetch memory cycle overhead overhead avoided parallel memory bank access interleaving only no restrictions

easy to learn

Page 41: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de41

University of Kaiserslautern

Xputer LabSimilar Programming Language

Paradigms

language category Computer Languages Xputer Languages

both deterministic procedural sequencing: traceable, checkpointable

sequencingdriven by:

read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching

read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching

very easy to learn

Page 42: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de42

University of Kaiserslautern

Xputer Lab

JPEG zigzag scan pattern

x

y

EastScan is step by [1,0]end EastScan;

SouthScan isstep by [0,1]endSouthScan;

*> Declarations

NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;

SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;

HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;

goto PixMap[1,1]

HalfZigZag;SouthWestScanuturn (HalfZigZag)

HalfZigZag

HalfZigZag

data counterdata counter

data counterdata counter

1

3

2

4 published in 1993

Page 43: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de43

University of Kaiserslautern

Xputer Lab>> Anti architecture and circuitry

•Configware Industry

•Terminology

•MoPL data-procedural language

• Anti architecture and circuitry

•Stream-based Memory Architecturehttp://www.uni-kl.de

Page 44: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de44

University of Kaiserslautern

Xputer Lab

GAG =Address

Generatorc

Generic GAU generic address unit Scheme

BaseSlider

B0

LimitSlider

L0

0B

[

AddressStepper

A

A

A

|| ||

L

]

limit

all 3 are copiesof the same BSU

stepper circuitGAU

published

in 1990

Page 45: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de45

University of Kaiserslautern

Xputer Lab GAG: Address Stepper

GAG =

AddressGenerator

Generic

+ / –

A

AAddress

Escape

ClauseEnd

Detect

endExec

StepCounter

=o

maxStepCount

inittag

0BBase[

L

Limit

]

A

stepVector| |

A LB0

[ ]|| ||limit

GAG: Address Stepper

stepper

sequencing

BSU =

StepperUnit

Basic

published

in 1990

Page 46: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de46

University of Kaiserslautern

Xputer LabGeneric Sequence Examples

LimitSlider

BaseSlider

GAU

AddressStepper

B0AL0

A

published

in 1990

a) b)

c)

d) e) f) g)

video scan

-90º rotated video scan

sheared video scan

non-rectangular video scan

zigzag video scan

spiral scan

feed-back-driven scans

atomic scan linear scan

-45º rotated (mirx (v scan))

perfectshuffle

until

Page 47: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de47

University of Kaiserslautern

Xputer Lab

floor

F

address

ceiling

C

Slider Animation Demo

yx

B0 L0

LB

A

B L

published

in 1990

Page 48: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de48

University of Kaiserslautern

Xputer LabGAG Complex Sequencer

Implementation

LimitSlider

BaseSlider

GAU

AddressStepper

B0AL0

A

all `been published

in 1990

LimitSlider

BaseSlider

GAU

AddressStepper

B0AL0

A

LimitSlider

BaseSlider

GAU

AddressStepper

B0AL0

A

GAUGAU

GAGGeneric Address Generator

SDS

GAG

VLIWstack

Page 49: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de49

University of Kaiserslautern

Xputer Lab>> Stream-based Memory

Architecture

• Configware Industry

• Terminology

• MoPL data-procedural language

• Anti architecture and circuitry

• Stream-based Memory Architecturehttp://www.uni-kl.de

Page 50: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de50

University of Kaiserslautern

Xputer LabMoM Xputer Architecture

rDPA MultipleRAM banks

Smart memory interface

Scan Window „Cache“

published

in 1990

Page 51: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de51

University of Kaiserslautern

Xputer LabAntimachine: MoM architecture

x

y

handle positions

scan window

scan pattern (high level sequencing)

example

intra scan window accesses(low level sequencing)

Handle Position Generator

Scan Window Generator

handleposition

bank 0 1 • • • n

y-GAG x-GAG

memory accesses

Page 52: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de52

University of Kaiserslautern

Xputer LabLinear Filter Application

b)

r

r r r

r

r/w r r

r

rr r

w / r r r

r

r r r

r

w/r r r

r

r r r Bank a

Bank a

Bank b

w r

r

r

scan step

Page 53: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de53

University of Kaiserslautern

Xputer LabScanline unrolling

r r

r/w r r

r

r r r

r/w r r

r/w r r

r r r

Page 54: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de54

University of Kaiserslautern

Xputer Lab90o Rotation of Scan Pattern

r r

rr

r

r

r

r

r

r

Bank a

Bank a

Bank b

Bank b

w wwr rr rr

r rr rrw ww

w w w

r

w

r

rr

r

r

r

r

w

r

r

w

Bank a

Bank a

Bank b

Bank b

scanwindowoverlaparea

r r/wr r/w r/w

r

r

r/w

r

rr

r

r

r

r/w

r

r

r/w

r

r

Page 55: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de55

University of Kaiserslautern

Xputer LabLinear Filter Application

after inner scan line loop unrolling

final design

after scan line

unrolling

hardw. level access optim.

initial design

Parallelized Merged Buffer Linear Filter Applicationwith example image of x=22 by y=11 pixel

Page 56: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de56

University of Kaiserslautern

Xputer LabXMDS Scan Pattern Editor GUI

Page 57: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de57

University of Kaiserslautern

Xputer LabMoM Architecture Features

• Scan Cache Size adjustable at run time

• Any other shape than square supported

• 2-dimensional memory space

• Supports generic „scan patterns“

– Subject of parallel access transformations

– compare Francky Cathoor et al .

• Supports visualization

Page 58: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de58

University of Kaiserslautern

Xputer LabHot Research Topic: Memory Architectures

•High Performance Embedded Memory Architectures [Cathoor et al.]

•High Performance Memory Communication Architectures [Herz]

•Custom Memory Management Methodology [Cathoor et al]

•Data Reuse Transformations [Kougia et al.]

•Data Reuse Exploration [Soudris, Wuytak]

•Rapidly greowing market: IP cores, module generators ets.

Page 59: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de59

University of Kaiserslautern

Xputer LabProcessor Memory Performance Gap

1

10

100

1000Performance

1980 1990 2000

µProc60%/yr..

DRAM7%/yr..

Processor-MemoryPerformance Gap:(grows 50% / year)

DRAM

CPU

von Neumann bottleneck

Page 60: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de60

University of Kaiserslautern

Xputer LabrDPAs: classical cache does not help

• the memory bandwidth problem is often more dramatic then for microprocessors

• classical interleaving is not practicable, since based on sequential instruction streams

• classical caches do not help, since instruction sequencing is not used

• the problem: throughput of parallel data streams, not instruction streams

• super pipe networks, no parallel computers !

• Stream-based arrays are a memory bandwidth problem

however, the anti m

achine has n

o vN bottleneck

!

Page 61: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de61

University of Kaiserslautern

Xputer LabData-Stream-based Soft Anti

Machine

SchedulerMemory(data memory)

memory bank

memory bank

memory bank

memory bank

memory bank

...

...

“instructions”

rDPACompiler

Sequencers(data stream

generator)

Page 62: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de62

University of Kaiserslautern

Xputer LabThe Disk Farm? or

a System On a Card?

The 500GB disc cardLOTS of bandwidthA few disks replaced by >10s Gbytes RAM and a processor

14"

MicroDrive:1.7” x 1.4” x 0.2” 2006: ?

1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek

2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)

Integrated IRAM processor2x height

Connected via crossbar switchgrowing like Moore’s law

16 Mbytes; ; 1.6 Gflops; 6.4 Gops10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops

[Gordon Bell, Jim Gray,

ISCA2000]

Page 63: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de63

University of Kaiserslautern

Xputer LabMoM Application Examples

• Image Processing• Grid-based design rule check [1983*]

– 4 by 4 word scan cache– Pattern-matching based– Our own nMOS „DPLA“ design – design rule violation pixel map automatically

generated from textual design rules– 256 M&C nMOS, 800 single metal CMOS– Speed-up > 10000 vs. Motorola 68000

*) „machine“ not yet discovered

Page 64: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de64

University of Kaiserslautern

Xputer Lab

Schedule

time slot

08.30 – 10.00

Reconfigurable Computing (RC)

10.00 – 10.30

coffee break

10.30 – 12.00

Stream-based Computingfor RC

12.00 – 14.00

lunch break

14.00 – 15.30

Resources for RC

15.30 – 16.00

coffee break

16.00 – 17.30

FPGAs: recent developments

Page 65: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de65

University of Kaiserslautern

Xputer Lab>>> Coarse Grain

- END -

Page 66: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de66

University of Kaiserslautern

Xputer Lab

Schedule

time slot

08.30 – 10.00

Reconfigurable Computing (RC)

10.00 – 10.30

coffee break

10.30 – 12.00

Stream-based Computing for RC

12.00 – 14.00

lunch break

14.00 – 15.30

Resources for RC

15.30 – 16.00

coffee break

16.00 – 17.30

FPGAs: recent developments

Page 67: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de67

University of Kaiserslautern

Xputer Lab

http://kressarray.de

Efficient Memory Communicationshould be directly supported by the Mapper Tools

sequencers

memory ports

application

not used

Legend:Optimized ParallelMemory Controller

An example byNageldinger’s KressArray Xplorer

Synthesizable Memory Communication

Page 68: Reiner Hartenstein University of Kaiserslautern

© 2002, [email protected] http://kressaray.de68

University of Kaiserslautern

Xputer LabMemory Communication Architecture

• hot research topic in embedded systems

• storage context transformations [Herz, others]

• for low power

• for high performance

• startups provide memory IP or generators