GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

49
GenTera’s IMAGINE 3 Introducing: GenTera’s IMAGINE 3 HANS DE VRIES

Transcript of GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

Page 1: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Introducing:

GenTera’s

IMAGINE 3HANS DE VRIES

Page 2: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Building Blocks

PCI/AGPBus

interface

PCI/AGPBus

interface

128 bitDDR-

SDRAMBus

128 bitDDR-

SDRAMBus

Imagine 3 Core Processor

Multi-Stream (32)Scalar / Vector Processor

80 Billion operations / second

Imagine 3 Core Processor

Multi-Stream (32)Scalar / Vector Processor

80 Billion operations / second

Advanced High Quality 3D Graphics / Volume processing

Pipelines

220 Billion operations / second

Advanced High Quality 3D Graphics / Volume processing

Pipelines

220 Billion operations / second

Graphics MaskGenerator

Graphics MaskGenerator

Motion Estimator100 Billion op/s

Motion Estimator100 Billion op/s

Data(Video)Input

Data(Video)Input

Data flowRingInput

Data flowRingInput

Data(Video)Output

Data(Video)Output

Data flowRing

Output

Data flowRing

Output2.0 Gigabyte/s 2.0 Gigabyte/s

160 Megabyte/s 1.0 Gigabyte/s

4.2 Gigabyte/s0.5 Gigabyte/s

Page 3: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Core Processor

HISC™ processor architecture 120 General Purpose registers (2x32 bit) 256 Vector registers (2x32 bit) 256x4 MAC Vector registers (2x32 bit) 128 Special Purpose control registers. (2x32 bit), 1200 control table registers (2x32 bit)

80 Billion operations per second (320 operations per cycle)

10 Giga Byte per second streaming I/O (memory & processor I/O)

including 64 Multiply Accumulates per cycle with saturate. 40 Conditional operations per cycle. 24 internal addresses per cycle 32 simultaneous concatenated vector streams (32 bit) (128 in byte mode) Single cycle 2D and 3D addressing modes. (1D, 2D and 3D memory management) C and C++ compiler, Image Processing Library Assembler, Linker, Debugger 3D graphics Library Visual Simulator Multi Media Library Soft In circuit Emulator Machine Vision Library

Page 4: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 HISC Processor Architecture

RISC LEVEL:provides

C and C++compatibility

VLIW LEVEL:A moderate length VLIW instruction word plus fully programmable bus interconnect directly

controlled by the instruction code.

EXTENDED VECTOR PROCESSING:Numerous function specific Control Register add extended functionality that is activated

by the of group extended operations (as opposed to the basic operations)This increases the effective instruction word for vector operations to 1000+ bits

VARIABLE LENGTH VECTOR PROCESSING: Enables up to 32 simultaneous and concatenated Vector Processing

Streams. Word based Vector Processing (32, 2x16, 4x8) is symmetrically applied throughout the entire architecture.

HISC:Hierarchical

Instruction SetComputer

Page 5: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Core Processor

Examples of Basic Processor Stream performance(from external memory to external memory)

Standard GUI functions:

Screen to Screen Copy 2000 Mega pixels/s 8 bit pixels 500 Mega pixels/s 32 bit pixels

3 operand ROPS 1000 Mega pixels/s 8 bit pixels Bitmap to Color expansion 2000 Mega pixels/s 8 bit pixels

Windows Direct Draw GUI functions:

Pseudo to True Color 500 Mega pixels/s 8 bit pseudo to 16 bit or 32 bit colorsTrue Color to Pseudo 500 Mega pixels/s 32,16 bit color to 8 bit pseudo colorZ buffer aware copy 666 Mega pixels/s 8 bit pixels, 16 bit Z buffer

500 Mega pixels/s 16 bit pixels, 16 bit Z bufferAlpha Blended Copy 250 Mega pixels/s 32 bit ARGB pixels

Page 6: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Core Processor

Examples of Core Processor stream performance (2)(from external memory to external memory)

Multi Media Functions: (numbers in result pixels/s)

YUV to RGB conversion 500 Mega pixels/s ( 32 bit color, 16 bit hi-color, 8 bit pseudo)DCT and IDCT (8x8 blocks) 167 Mega pixels/s ( 16 bit values, 32 bit calculations)

DCT and IDCT (8x8 blocks) 667 Mega pixels/s ( 8 bit values, 16 bit calculations)

Photo shop type Image Processing Functions: (numbers in result pixels/s)

3x3 kernel convolution 2000 Mega pixels/s (8 bit pixels, 16 bit calculations)7x7 kernel convolution 500 Mega pixels/s (8 bit pixels, 16 bit calculations)Bi-cubic Rotation 1000 Mega pixels/s (8 bit pixels, 16 bit calculations)Bi-cubic Scaling 1000 Mega pixels/s (8 bit pixels, 16 bit calculations)

3D graphics Geometry:(4x4) homogeneous transformations plus perspective divides for X , Y and Z for meshedtriangles in 32 bit floating point (IEEE): 50 Million triangles/s

Page 7: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Core Processor

DIO WR

VIO WR

X0

MACX0ALU X0

X1

MAC X1ALU X1

Y0

MACY0 ALU Y0

Y1

MAC Y1ALU Y1

Interconnect(100 % connectivity)

REG A0VIO 0

A0

REG A1VIO 1

A1

REG B0DIO 0

B0

REG B1DIO 1

B1

REG WR1

REG WR0

Data Read Ports

Data Processing Units

DataWritePorts

DataWritePorts

Page 8: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Core Processor

A1/0

DIO

A0/1A0/1

I3D0

B0

MES0

B0

RING0

A0B0

REG

X0

ALU

Y0

ALU

X0

MAC

Y0

MAC

B0/1

VIO 0

Control Register Busses

SEQ

Control reg bus 1 bits [63:32]

Control reg bus 0 bits [31:0]

bus interconnect

I3D1

A1/0

MES1

B1

RING1

B1

REG

A1B1

ALU

X1

ALU

Y1

MAC

X1

MAC

Y1

VIO 1

B1/0

MSK0

VAU 0

VAU 1

MSK1

MTABEMI

Page 9: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Instruction Word

Dd Wr0 B0 A0 Y0 X0

Da Wr1 B1 A1 Y1 X1

127 123 112 64100 88 76

63 59 48 036 24 12

Highly orthogonal VLIW instruction word

ND0= 0

Data Processing Functions

Page 10: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Interconnect

Select path 1

A0 A1 B0 B1 X0 X1 Y0 Y1

Select path 2

A0 A1 B0 B1 X0 X1 Y0 Y1

Data Processing

Unit

Select path

A0 A1 B0 B1 X0 X1 Y0 Y1

Data Write Port

Instruction Word provides 8-wayInterconnectivity

InScalar-Processing

Mode

Page 11: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Interconnect

Select path 1 Select path 2

Data Processing Unit

Data Write Port

Instruction Word provides

100% Interconnectivity

InVector Processing

Mode

A0 R E GA0 M E MB0 R E GB0 M E MX0 A L UX0 M A CY0 A L UY0 M A CA1 R E GA1 M E MB1 R E GB1 M E MX1 A L UX1 M A CY1 A L UY1 M A C

Select path 2

A0 R E GA0 M E MB0 R E GB0 M E MX0 A L UX0 M A CY0 A L UY0 M A CA1 R E GA1 M E MB1 R E GB1 M E MX1 A L UX1 M A CY1 A L UY1 M A C

A0 R E GA0 M E MB0 R E GB0 M E MX0 A L UX0 M A CY0 A L UY0 M A CA1 R E GA1 M E MB1 R E GB1 M E MX1 A L UX1 M A CY1 A L UY1 M A C

Page 12: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Instruction Word

0 1 Shift, Ufu path 1 path 2 0 1 Shift, Ufu path 1 path 2

24 20 16 012 8 4

Y0 X0

1 MAC path 1 path 2

0 0 ALU path 1 path 2

1 MAC path 1 path 2

0 0 ALU path 1 path 2

Data processing instruction fields

Page 13: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Instruction Word

48 44 40 2436 32 28

B0 A0

Data read ports instruction fields

memory port

0 0 0

0 1 register size

1 0 control register size

0 0 Be31 16 bit imm. [15:8]

register port

0 0 Be20 16 bit imm. [7:0]

0 1 register

1

size

11 bit signed immediate

0 VIO function size 0 0 0 0 DIO read size

register port

memory port

Page 14: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Instruction Word

12363 4856 52

register port

Wr0ND

0 DIO address

DIO address / data and (control-) register write ports fields

size 0 register path

DIOaddress select

wr addrNon data-processing

function1 control register path

127

size rd addr

59

DIO rd/wr

DIOdata

select

62

x wr data

x rd addr

58

Page 15: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Parallel Conditional Processing

64 bit Uniform Status Register

X1 Y1 X1 Y1 X1 Y1 X1 Y1 X0 Y0 X0 Y0 X0 Y0 X0 Y0

[63:56] [55:48] [47:40] [39:32] [31:24] [23:16] [15:8] [7:0]

Status forByte 0

Status forByte 1

Status forByte 2

Status forByte 3

Status forByte 4

Status forByte 5

Status forByte 6

Status forByte 7

S0 C0 M0 Z0

W0 L0 H0 I0

ALU Status: Overflow, Carry, Minus, Zero (ALU, Shifts, Unary functions)

MAC Status: Wrong, Lower, Higher, Inside

Page 16: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Parallel Conditional Processing

Status: Generation, Collection and Application

Y0 0

X0 0

Y0 1

X0 1

Y0 2

X0 2

Y0 3

X0 3

Y1 4

X1 4

Y1 5

X1 5

Y1 6

X1 6

Y1 7

X1 7

Y0

ALUMAC

0

1

2

3

Y1

ALUMAC

0

1

2

3

X0

ALUMAC

0

1

2

3

X1

ALUMAC

0

1

2

3

V0

MSKVAU

0

1

2

3

V1

MSKVAU

0

1

2

3

A0B0

VEC.REG.

0

1

2

3

A1B1

VEC.REG.

0

1

2

3

0

1

2

3

4

5

6

7

Page 17: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Register File

256 vector registers

2 x 32 bit wide4 x 16 bit wide8 x 8 bit wide

up to 24 independent and conditional byte addresses

up to 8 independent and conditional byte write enables

256 vector registers

2 x 32 bit wide4 x 16 bit wide8 x 8 bit wide

up to 24 independent and conditional byte addresses

up to 8 independent and conditional byte write enables

120 general registers

2 x 32 bit / 4 x16 bit / 8 x 8 bit

120 general registers

2 x 32 bit / 4 x16 bit / 8 x 8 bit

8 x Write Indices

8 x Read AIndices

8 x Read BIndices

Write Port CVector Indexgenerators

Write Port CVector Indexgenerators

Read Port AVector Indexgenerators

Read Port AVector Indexgenerators

Read Port BVector Index generators

Read Port BVector Index generators

General Register

AddressesFrom the

InstructionCode

Write Port CInput BUS

select

Write Port CInput BUS

select

Read Port Aoutput BUS

register

Read Port Aoutput BUS

register

Read Port Boutput BUS

register

Read Port Boutput BUS

register

INTERNAL

BUS

MATRIX

ADDRESSES DATA PORTSGENERAL PURPOSE REGISTERS,VECTOR REGISTERS

2 x Read BAddress

2 x Read AAddress

2 x WriteAddress

Write Data 2,4,8 x

Read AData 2,4,8 x

Read BData 2,4,8 x

A1

A0

B1

B0

Page 18: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Function Units

A L U

Arithmetic,Boolean,

Shift / Rotate,Unary Functions

4 x 8, 2 x 16, 1 x 3232 bit float

A L U

Arithmetic,Boolean,

Shift / Rotate,Unary Functions

4 x 8, 2 x 16, 1 x 3232 bit float

MULTIPLIER

(un)signed x (un)signedbinary point at:

end, middle or topgraphics formats( 0.0..1.0 == 00..ff )

4 x 8, 2 x 16, 1 x 3232 bit float

MULTIPLIER

(un)signed x (un)signedbinary point at:

end, middle or topgraphics formats( 0.0..1.0 == 00..ff )

4 x 8, 2 x 16, 1 x 3232 bit float

MAC

VectorRegisters

256words

x 64 bit

MAC

VectorRegisters

256words

x 64 bit

ACCUMULATORACCUMULATOR

Variable Range ClampVariable Range Clamp

Page 19: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Multiplier / Accumulator8 bit Matrix functions:

Quad Inproduct (16 multiplies & 12 adds per MAC)

Matrixvec (16 multiplies & 12 adds per MAC)

32 bit input data into a 4 tab shift register (4 times for each byte)

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

8 bit

8 bit

8 bit

8 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

16 bit

32 bitinput data distributed to all four columns

( 4 times for 4 bytes )

Page 20: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Multiplier / Accumulator8 bit Matrix functions:

Open GL Blend Function ( 8 multiplies & 4 adds per MAC)

Coefficients fixed or derived from the input operands:

16 bit16 bit16 bit16 bit

32 bit input data into a 4 tab shift register (4 times for each byte)

8 bit 16 bit16 bit8 bit 16 bit16 bit8 bit 16 bit16 bit8 bit 16 bit16 bit

32 bit input data into a 4 tab shift register (4 times for each byte)

8 bit 16 bit16 bit8 bit 16 bit16 bit8 bit 16 bit16 bit8 bit 16 bit16 bit

0 BLEND_CONSTANT1 BLEND_ZERO2 BLEND_ONE3 SRC_COLOR4 INV_SRC_COLOR5 SRC_ALPHA6 INV_SRC_ALPHA7 DST_ALPHA 8 INV_DST_ALPHA9 DST_COLOR

10 INV_DST_COLOR11 SRC_ALPHA_SATURATE12 BOTH_SRC_ALPHA (source) BOTH_SRC_ALPHA (dest)13 BOTH_INV_SRC_ALPHA (source) BOTH_INV_SRC_ALPHA (dest)14 MAX_INTENSITY (source) MAX_INTENSITY (dest) 15 MIN_INTENSITY (source) MIN_INTENSITY (dest)

Page 21: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Multiplier / Accumulator16 bit Matrix functions:

Convolute (4 multiplies & 2 adds per Multiplier)

Transform (4 multiplies & 2 adds per Multiplier)

32 bit input data into a 2 tab shift register (2 times for each 16 word)

16 bit

16 bit

32 bit

32 bit

16 bit

32 bit

32 bit

16 bit

16 bit

32 bit

32 bit

16 bit

32 bit

32 bit

32 bit input data distributed to

both columns ( 2 times for each 16

word)

16 bit

16 bit

32 bit

32 bit

16 bit

32 bit

32 bit

16 bit

16 bit

32 bit

32 bit

16 bit

32 bit

32 bit

Mix: MH [63:32] =Coef 10[31:0]

. Mb [31:16] + Coef 11[31:0]

. Ma [31:16]

ML [ 31:0 ] =Coef 00[31:0]

. Mb [ 15:0 ] + Coef 01[31:0]

. Ma [ 15:0 ]

Merge: MH [63:32] =Coef 10[31:0]

. Ma [31:16] + Coef 11[31:0]

. Ma [ 15:0 ]

ML [ 31:0 ] =Coef 00[31:0]

. Mb [31:16] + Coef 01[31:0]

. Mb [ 15:0 ]

Page 22: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Multiplier/Accumulator

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

Single Multiplier/Accumulatorhandles all with the same hardware!

32 x 32 bit extern32 x 32 bit intern 64 bit accumulate

Single Multiplier/Accumulatorhandles all with the same hardware!

32 x 32 bit extern32 x 32 bit intern 64 bit accumulate

Imagine 3 operations per cycle:

64: 8x16 bit: quad in-product (4 comp.)64: 8x16 bit: 4x4 matrix x vector32: 8x16 bit: Open GL blending functions16: 16x16 bit: in-product, cross-product16: 16x16 bit: complex product16: 16x32 bit: FIR filter16: 16x32 bit: in-product, cross-product 16: 16x32 bit: complex product

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

8 x 8 extern8 x16 intern

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern

32 bit accumulate

16 x 16 bit extern16 x 32 bit intern

32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern 32 bit accumulate

16 x 16 bit extern16 x 32 bit intern

32 bit accumulate

16 x 16 bit extern16 x 32 bit intern

32 bit accumulate

Single Multiplier/Accumulatorhandles all with the same hardware!

32 x 32 bit extern32 x 32 bit intern 64 bit accumulate

Single Multiplier/Accumulatorhandles all with the same hardware!

32 x 32 bit extern32 x 32 bit intern 64 bit accumulate

Single Multiplier/Accumulatorhandles all with the same hardware!

32 x 32 bit extern32 x 32 bit intern 64 bit accumulate

Single Multiplier/Accumulatorhandles all with the same hardware!

32 x 32 bit extern32 x 32 bit intern 64 bit accumulate

Each of the 4 Multiplier/Accumulatorshandles all operations by utilizing

the same hardware!32 x 32 bit extern

32 x 32 bit intern32 x 32 bit floating point

64 bit accumulate

Each of the 4 Multiplier/Accumulatorshandles all operations by utilizing

the same hardware!32 x 32 bit extern

32 x 32 bit intern32 x 32 bit floating point

64 bit accumulate

Page 23: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Vector processing

1 2 16 17 18 19 20 21 223 4 5 6 7 8 9 10 11 12 13 14 15 23 24

ACTUAL ASSEMBLY CODE FOR THE EXAMPLE ABOVE:repeat, graph (label_1);;;label_1: genad(A0) => B0=input, A0=rd4x8(ri) => X0=mult(A,V,nuu ) ===> genad(A1) =>A1=rd4x8(ri) => Y0=subsat(X0,A1), B1=rd4x8(RING_Data) => X1=mult(Y0,B1,nus) ===> DA=Again ==> D0=word4x8(uI), X0=addsat(X1,D0) => Y0=matxvec(X0), Y1=inproduct(X0) =====> X1=addsat(Y0,Y1) => outputV1;

Variable length vector processing made simple.26 27 28 29 30 31 3225 33 34 35

genad(A0)

genad(A1)

A0=rd4x8(ri)

A1=rd4x8(ri)

Y0=subsat(X0,A1)

B1=rd(RING_Data)

B0=input

X0=mult(A0,B0,nuu)

X1=mult(Y0,B1,nus)

X0=addsat(X1,D0)

Y0=matxvec(X0)

Y1=inproduct(X0)

X1=addsat(Y0,Y1)

DA=again

D0=word4x8(uI)

outputV1

Page 24: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 10 Gigabyte Streaming I/O

IMAGINE 3Internal Data Processing

Core

VECTOR UNITS: Simultaneousinput and output to and from memory

DATA CACHE or 3D GRAPHICS /VOLUME pipelinesINPUT AND OUTPUT

DataflowRinginput

DataflowRing

output

The Imagine 3 core canstream data from memoryor other processors at 10 GByte/sec. (Compared to0.48 GByte/sec. for the Imagine 1 )

Page 25: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Non-aligned S I M D

SIMD processing made simple with non-aligned memory accesses(No complex time-consuming shift-mask-merge operations needed)

32 bit memory word

32 bit memory word

32 bit memory word

32 bit word

8 bit8 bit 8 bit8 bit 8 bit8 bit 8 bit8 bit 8 bit8 bit 8 bit8 bit 8 bit8 bit 8 bit8 bit

Page 26: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Non Aligned Vector Accesses

32 bit words

2 x 16 bit words

16 bit words

4 x 8 bit words

8 bit words

2 x 8 bit words

2 Input and 2 output vectors simultaneous

Page 27: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Memory Vector Accesses

2 kB Vectorpre-fetch buffer

2 kB Vectorpre-fetch buffer

Vector Access Units: up to 32 vectors in flightVector Access Units: up to 32 vectors in flight

data/color outputconversion

data/color outputconversion

Mask Unit 256 pixels / voxels

Mask Unit 256 pixels / voxels

2D restructuringVector pipeline

2D restructuringVector pipeline

data/color inputconversion

data/color inputconversion

Vector I/OVector I/O

External MemoryInterfaceImagine 3ProcessorCore

2.25 kB Vectorwrite buffer

2.25 kB Vectorwrite buffer

2 kB Vectorpre-fetch buffer

2 kB Vectorpre-fetch buffer

2.25 kB Vectorwrite buffer

2.25 kB Vectorwrite buffer

2D restructuringVector pipeline

2D restructuringVector pipeline

2D restructuringVector pipeline

2D restructuringVector pipeline

2D restructuringVector pipeline

2D restructuringVector pipeline

Mask Unit 256 pixels / voxels

Mask Unit 256 pixels / voxels

data/color inputconversion

data/color inputconversion

data/color outputconversion

data/color outputconversion

Page 28: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 31, 2 and 3D memory management

1 M Byte PAGE 1 M Byte PAGE 1 M Byte PAGE

1024x

1024

8 bit pixelTILE

256x

1024

32 bit pixelTILE

512x

1024

16 bit pixelTILE

X

Y

128 x 128x 128

16 bitvoxel

BRICK

256 x 128 x 128

8 bitvoxel

BRICK

64 x 128 x 128

32 bitvoxel

BRICK Y

Z X

Page 29: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 3D texture/volume Hardware

Very High Quality

220 Billion operations/sec: 2 x 440 operations per cycle (4 ns)

Texture Quality: BI linear, TRI Linear and QUAD interpolation.Texture Types: 32 bit ARGB, 16 bit (4 types), 8,4,2 and 1 bit pseudo color

16 bit and 32 bit greyscale (signed and unsigned), 2x16 bit complex Texture Size: 16,384 x 16,384 max (2d)2048 x 2048 x 2048 max (3d)Texture Dimension: 1, 2 and 3 dimensional textures.Texture Clamping: Clamp and Wrap for all 3 co-ordinates.Texture Border: 0 or 1 pixels texture borders, Border Color supported.Texture MIP maps up to 16 levels: selection made for each individual pixel.

Perspective division for al 9 parameters: S, T, R, Alpha, Red, Green, Blue, Fog, Z Perspective Correct Texture Mapping,Perspective Correct Texture Lighting,Perspective Correct Linear and Exponential (2 types) Fog,Perspective Correct Depth Buffering,

Page 30: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 3D graphics Pipelines

D BUS

3Dgraphicspipelinecontrol

unit

3Dgraphicspipelinecontrol

unit

Perspect.MIP mapprocessing

pipeline

Perspect.MIP mapprocessing

pipeline

Bressenha

m Edge Start

Interpolators(Q,R,S,T,

Z-1)

(F,A,R,G,

B)

Bressenha

m Edge Start

Interpolators(Q,R,S,T,

Z-1)

(F,A,R,G,

B)

Vector

StartInterpolators(Q,R,S,T,

Z-1)

(F,A,R,G,

B)

Vector

StartInterpolators(Q,R,S,T,

Z-1)

(F,A,R,G,

B)

Pixel Valu

eInterpolators(Q,R,S,T,

Z-1)

(F,A,R,G,

B)

Pixel Valu

eInterpolators(Q,R,S,T,

Z-1)

(F,A,R,G,

B)Perspective

3D co-ordinateGenerator

5 stages

Perspective3D co-ordinate

Generator

5 stages

Perspective3D correct

Lighting

5 stages

Perspective3D correct

Lighting

5 stages

Perspective MIP Map Addresses

Calculations

2 stages

Perspective MIP Map Addresses

Calculations

2 stages

PerspectiveInterpolatio

nCoefficients

PerspectiveInterpolatio

nCoefficients

Perspective Lighting &

FogCoefficients

Perspective Lighting &

FogCoefficients

Memory

Access

Input Fifo

/ Port Selec

t

Memory

Access

Input Fifo

/ Port Selec

t

External Memory

withMIP Map Textures

4 - 6 stages

External Memory

withMIP Map Textures

4 - 6 stages

Memor

y Acce

ssRe-orde

r buffers

Memor

y Acce

ssRe-orde

r buffers

Memory AccessInternal Delay Line

forInterpolation, Lighting &

FogCoefficients3 - 17 stages

Memory AccessInternal Delay Line

forInterpolation, Lighting &

FogCoefficients3 - 17 stages

Memor

y Acce

ssData Loa

d unit

Memor

y Acce

ssData Loa

d unit

TexelInterp./

Lightingcontrol

unit

TexelInterp./

Lightingcontrol

unit

Texel Selection / Expansion

Texel Selection / Expansion

Texel

Color

Look Up

Texel

Color

Look Up

Texel Interpolation / Lightingcoefficients generator

Texel Interpolation / Lightingcoefficients generator

Texel

Interpolation / Light

ing

Multiply stage

Texel

Interpolation / Light

ing

Multiply stage

Texel Interpolation / Light

ing

Summation

stage

Texel Interpolation / Light

ing

Summation

stage

Page 31: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 3D texture/volume Hardware

3D graphics Pipeline + Core stream performance(from external memory to external memory)

Direct Draw functions: (numbers in result pixels/s)Bilinear Image Scale: 333 Mega pixels/s (32 bit gray scale or 32 bit color pixels )Bilinear Image Rotate: 333 Mega pixels/s (32 bit gray scale or 32 bit color pixels )Bilinear Affine Transform: 333 Mega pixels/s (32 bit gray scale or 32 bit color pixels )

MPEG functions: (numbers in result pixels/s)Bilinear Scaling plus kYUV to αRGB 333 Mega pixels/s (32 bit αRGB pixels)

3D functions: (numbers in result pixels/sec)Z-buffered, Perspective Correct, Bilinear Interpolated Texture mapping with perspectivecorrect lighting and exponential fog (Texture size up to 16k x 16k), MIP-Mapping: 300 Mega pixels/sec. (32 bit αRGB pixels, 16 bit hi-color, 8 bit pseudo, 16 bit Z values)

Page 32: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Fan Beam Back projection

The 3D Texture/Volume pipelines and the Multiplier / Accumulators in the Imagine 3 can handle eight 16 bit linear interpolated samples per cycle with 32 bit accuracy.

VectorDirectionBack

ProjectionDirection

Page 33: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Cone beam reconstruction

The Back projection in cone beam systems requires the:

Inverseperspectivemapping

from filtered images back to a 3D volume. The Imagine 3 performs this directly with it’s 3D volume pipelines.

Page 34: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 De-blur filtering

FIR filter performance (16 bit input, 32 bit calculations)

128 Tab: 32 Mega-pixels / second256 Tab: 16 Mega-pixels / second512 Tab: 8 Mega-pixels / second

324 projections512 values

840 projections928 values

256x256 resultimage

512 x 512 resultimage

Filtered Backprojection for Medical Imaging324 x 512 to 256 x 256

De-blur filtering 10 ms (256 tabs)Backprojection 11 ms Reconstruction 21 ms

Filtered Backprojection for Medical Imaging840 x 928 to 512 x 512

De-blur filtering 100 ms (512 tabs)Backprojection 108 ms Reconstruction 208 ms

Page 35: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 De-blur filtering (FFT)

Complex input Fast Fourier Transform performance (vectorized) 32 bit Floating Point 32 bit Integer 16 bit Integer

256 Point: 8 μs 4 μs 2.0 μs 512 Point: 18 μs 9 μs 4.4 μs 1024 Point: 40 μs 20 μs 10 μs 2048 Point: 88 μs 44 μs 22 μs 4096 Point: 192 μs 96 μs 48 μs 8192 Point: 436 μs 218 μs 109 μs16384 Point: 896 μs 448 μs 224 μs

1200 projectionsof

960 values

512 x 512resultimage

Filtered Back-projection for Medical Imaging1200 x 960 to 512 x 512

FFT filtering 106 ms (2048 point FP)Back-projection 157 ms

Reconstruction 263 ms

Page 36: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Radar Display Processing

Cartesian to Polar conversion with bi-linear interpolation 32 bit colors:

250 Mega-pixels /second

Page 37: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Motion Estimators

Motion Estimation Unit for MPEG1…MPEG4 video encoding

100 Billion operations / second- software controllable,- arbitrary MxN kernel sizes up to 256 by 256- arbitrary search space sizes up to 4096 by 4069 for HDTV and higher- allows optimizing algorithms (reduced search space)- forward and backward prediction- vector processing co-operation with core for bi-cubic pixel interpolation / rotation

Performance:

Compare a 16x16 pixel block with any other 16x16 pixel block(half, quarter, 1/8th, 1/16th pixels with bi-cubic interpolation)

120 Million Block Compares / second

Page 38: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Graphics Mask Generators

Generates Transparent and Opaque Masks for 512 pixelsmultiple units work in parallel:

Window Mask GeneratorAutomatically clips pixels outside the View Port (scissoring)

Span line Mask Generator for Concave Polygons and arbitrary Objects

Range Mask generator for Depth Buffer Tests, Stencil Buffer Tests, Alpha Test, Chroma Keying Tests et cetera

Complex Mask Generator for Concave and Complex Polygons according to the odd/evenor winding rules

Alpha Mask GeneratorFor objects with partially covered pixels

Page 39: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Graphics Mask Generators

Spanlin

e Address

Spanlin

e Address

Overlap

triangle

Window X

min /max

Window X

min /max

Window Y

min /max

Window Y

min /max

Spanline 0 Star

t/ End

Spanline 0 Star

t/ End

Spanline 1 Star

t/ End

Spanline 1 Star

t/ End

Spanline 2 Star

t/ End

Spanline 2 Star

t/ End

Spanline 3 Star

t/ End

Spanline 3 Star

t/ End

Spanlin

e Delt

a Star

t

Spanlin

e Delt

a Star

t

Spanlin

e Delt

a End

Spanlin

e Delt

a End

Spanline Y min

/ max

Spanline Y min

/ max

Spanlin

e Length (-1)

Spanlin

e Length (-1)

Range

mask 0

Range

mask 0

Range

mask 1

Range

mask 1

Range

mask 2

Range

mask 2

Range

mask 3

Range

mask 3

Complex

mask 0

Complex

mask 0

Complex

mask 1

Complex

mask 1

Complex

mask 2

Complex

mask 2

Complex

mask 3

Complex

mask 3

The Rang

e Mask conta

ins the

result of the Deph

t buffer test

(overlappin

g triang

le)

The Complex Mask

is used in this example to hold the

Polygon

Stipple

pattern

The Spanl

ine registers

define the outlines of the

triangle

The Window is defined by the

Window

registers

Page 40: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Multi media I/O units

Video Output(Α), R, G, B outputs with 330 MHz dot clock for 1800 x 1400 screen format at 90 Hz.12 (16) bit video out for Studio Quality video processing. Interface to DVI-TFTtransmitters for high resolution, high quality LCD displays.

Video InputCCIR 656: 8 bit digital video input for NTSC, PAL, SECAM, HDTV and custom formats

Audio Codec 97 InterfaceStandard from Intel, Creative Labs, Yamaha, Analog Devices and Nat.SemiconductorSupports Analog speakers, Microphone, Headphone + Headphone micro, Telephony and

Modem signals, CD analog audio in, Analog Video Sound In, PC beep in, et ceteraDigital Audio: 4 stereo serial I/O ports (I2S type and S type emulation capabilities) Supports CD , DVD and Dolby AC3 input or output

External Device Control 8 bit classic μP interface bus and I2C type emulation capability

MIDI interface (Input and output for synthesizers and keyboards)

Page 41: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Real Time Support

MULTI MEDIA REAL TIME SUPPORT

Level 1 Events (1 micro second response time requirement)Horizontal Sync interrupts, Video I/O interrupts, Register Virtualization interrupts.

Level 2 Events(2 - 100 micro second response time requirement)Communication Fifo interrupts, Mailbox Interrupts, I2S Fifo Interrupts, Ac97 Fifo InterruptsMidi Interrupt, I2C interrupt, Vertical Sync Interrupts, Scheduler Clock Tick, et cetera

Threads ( 100 micro - 10 millisecond response time requirement) Host Command Queues ManagerAudio Stream managersModem Stream managersUser definable threads

Page 42: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 High-end Board

8 Processors: 3.2 Tera operations/s 4 GigaByte memory

IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3

Page 43: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 High-end Board

8 Imagine 3 processors, 3200 Billion operations per second

32 GigaByte per second Memory Bandwidth

16 GigaByte per second Inter-Processor Bandwidth

- Perspective Volume Rendering: 1000 x 1000 x 1000 at 15 frames/second (based on 25% volume traversal)

- Cone Beam Reconstruction: 512 x 512 x 512 from 10002x128 in 4 seconds

- Real Time 3D ultra sound reconstruction and visualization

- Real Time HDTV MPEG 4 video encoding

- Advanced Radar Processing

Page 44: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 High Speed Dataflow Ring

IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3

IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3

Up to 2 Gigabyte per second Dataflow Ring (SSTL-2)Point-to-point with Broadcast options and auto configuration

Page 45: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 High Speed System I/O

The Dataflow Ring also provides very high speed System I/O.Entry level system can use the programmable Video Data I/O for

general purpose I/O. ( 160 MB/s per processor, 1 GB/s per processor )

IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3

IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3IMAGINE

3

Video In160 MB/s

Video out1 GB/s

OptionalSystem

I/OFPGA

e.g:Xilinx

Virtex II

OptionalSystem

I/OFPGA

e.g:Xilinx

Virtex II

Data-flow

input:Up to

2.0 GB/s

Data-Flow

Output:Up to

2.0 GB/s

Page 46: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 Pipeline Processing

The Dataflow Ring allows long vector processing pipelinesover multiple processors. Here an example with just 2 processors

MAC as3D blend unit

MAC as3D blend unit

ALUALU256 entryvector register

256 entryvector register

MAC asFIR filterMAC asFIR filter

ALUALU

ALUALU

Bi linear Interpolated Data from the Graphics pipelineBi linear Interpolated Data from the Graphics pipeline

Bi linear Interpolated Data from the Graphics pipelineBi linear Interpolated Data from the Graphics pipeline

Vector Write to memory

Vector Write to memory

Vector Read from memoryVector Read

from memoryVector Write to memory

Vector Write to memory

Vector Read from memoryVector Read

from memory

DataflowRing

DataflowRing

DataflowRing

DataflowRing

DataflowRing

DataflowRing

Page 47: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 128 bit memory bus (reads)

16 kbyte1st Level

data cache

16 kbyte1st Level

instruction cache

Dual 128 word x 128 bit

Vector input fifo’s

Dual3D-graphics

pipelines

PCI/AGPMemory

Read access

Video Output 128 word

x 128 bit fifo

4.2 Gigabyte /second Memory Bus: 128 bit PC2100

128 bit

Page 48: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 128 bit memory bus (writes)

16 kbyte1st level

data cache

Dual 128 word x 128 bit

Vector output fifos

16 word x 128 bitwrite buffer

PCI/AGPMemory

Write access

4.2 Gigabyte /second Memory Bus.

(128 bit PC2100)

128 bit

8-fold address interleaved memory reads and writes. Out of order

accesses with coherency checking

Page 49: GenTera’s I M A G I N E 3 Introducing: GenTera’s I M A G I N E 3 HANS DE VRIES.

GenTera’s

IMAGINE 3 END

GenTera’s

IMAGINE 3HANS DE VRIES