Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E....

110
Philips Research ECLIPSE ECLIPSE Extended CPU Local Irregular Processing Extended CPU Local Irregular Processing Structure Structure IST E. van Utteren IPA W.J. Lippmann PROMMPT J.T.J. v. Eijndhoven DD&T C. Niessen ESAS A. van der Werf ViPs G. Depovere IT E. Dijkstra DS & PC A. van Gorkum IC Design G. Beenker ECLIPSE CPU AV & MS Th. Brouste LEP, HVE T. Doyle CRB 1992-412 [email protected]
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E....

Page 1: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

PhilipsResearch

ECLIPSEECLIPSEExtended CPU Local Irregular Processing StructureExtended CPU Local Irregular Processing Structure

ECLIPSEECLIPSEExtended CPU Local Irregular Processing StructureExtended CPU Local Irregular Processing Structure

ISTE. van Utteren

ISTE. van Utteren

IPAW.J. Lippmann

IPAW.J. Lippmann

PROMMPTJ.T.J. v. Eijndhoven

PROMMPTJ.T.J. v. Eijndhoven

DD&TC. Niessen

DD&TC. Niessen

ESASA. van der Werf

ESASA. van der Werf

ViPsG. Depovere

ViPsG. Depovere

ITE. Dijkstra

ITE. Dijkstra

DS & PCA. van Gorkum

DS & PCA. van Gorkum

IC DesignG. Beenker

IC DesignG. Beenker

ECLIPSEECLIPSE CPUCPU

AV & MSTh. Brouste

AV & MSTh. Brouste

LEP, HVET. Doyle

LEP, HVET. Doyle

CRB [email protected]

Page 2: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

2

PhilipsResearch

?

DVP: design problem

Nexperia mediaprocessors

Page 3: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

3

PhilipsResearch

DVP: application domain

• High volume consumer electronics productsfuture TV, home theatre, set-top box, etc.

• Media processing:audio, video, graphics, communication

Page 4: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

4

PhilipsResearch

DVP: SoC platform

• Nexperia line of media processors for mid- to high-end consumer media processing systems is based on DVP

• DVP provides template for System-on-a-Chip

• DVP supports families of evolving products

• DVP is part of corporate HVE strategy

Page 5: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

5

PhilipsResearch

DVP: system requirements

• High degree of flexibility, extendability and scalability– unknown applications

– new standards

– new hardware blocks

• High level of media processing power– hardware coprocessor support

Page 6: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

6

PhilipsResearch

DVP: architecture philosophy

• High degree of flexibility is achieved by supporting media processing in software

• High performance is achieved by providing specialized hardware coprocessors

• Problem: How to mix & match hardware based and software based media processing?

Page 7: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

7

PhilipsResearch

DVP: model of computation

ProcessFIFO Read

Write

A C

B Execute

Model of computation is Kahn Process Networks:

• The Kahn model allows ‘plug and play’:• Parallel execution of many tasks

• Configures different applications by instantiating and connecting tasks

• Maintains functional correctness independent of task scheduling issues

• TSSA: API to transform C programs into Kahn models

Page 8: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

8

PhilipsResearch

DVP: model of computation

CPU coproc1 coproc2

Application- parallel tasks- streams

Mapping- static

Architecture- programmable graph

Page 9: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

9

PhilipsResearch

DVP: architecture philosophy

• Kahn processes (nodes) are mapped onto (co)processors

• Communication channels (graph edges) are mapped onto buffers in centralized memory

• Scheduling and synchronization (notification & handling of empty or full buffers) is performed by control software

• Communication pattern between modules (data flow graph) is freely programmable

Page 10: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

10

PhilipsResearch

DVP: generic architecture

• Shared, single address space, memory model• Flexible access

• Transparent programming model

• Physically centralized random access memory• Flexible buffer allocation

• Fits well with stream processing

• Single memory-bus for communication• Simple and cost effective

Page 11: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

11

PhilipsResearch

DVP: example architecture instantiation

VLIWcpu

I$

video-in

video-out

audio-in

SDRAM

audio-out

PCI bridge

Serial I/O

timers I2C I/O

D$

MIPScpu

I$

D$Imagescaler

Page 12: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

12

PhilipsResearch

DVP: TSSA abstraction layer

TM-CPU software

Traditional coarse-grain TM co-processors

TSSA stream data, buffered in off-chip SDRAM,synchronization with CPU interrupts

TSSA-OS

TSSA-Appl1 TSSA-Appl2

Page 13: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

13

PhilipsResearch

DVP: TSSA abstraction layer

• Hides implementation details:• graph setup

• buffer synchronization

• Runs on pSOS (and other RTKs)

• Provides standard API

• Defines standard data formats

Page 14: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

14

PhilipsResearch

Outline

• DVP

• Eclipse DVP subsystem

• Eclipse architecture

• Eclipse application programming

• Simulator

• Status

Page 15: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

15

PhilipsResearch

Eclipse DVP subsystem

Objective

Increase flexibility of DVP systems, while maintaining cost-performance.

Customer• Semiconductors: Consumer Systems (Transfer to TTI)

• Consumer Electronics: Domain 2 (BG-TV Brugge)

• Research

ProductsMid- to high-end DVP / TSSA systems: DTVs and STBs

Page 16: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

16

PhilipsResearch

Eclipse DVP subsystem: design problem

• Increase application flexibility through re-use of medium-grain function blocks, in HW and SW

• Keep streaming data on-chip

But ?

• More bandwidth visible

• Limited memory size

• High synchronization rate

• CPU unfriendly

SDRAM

HDVO condor

MPEG CPU

DVP/TSSA system:

• Coarse-grain ‘solid’ function blocks(reuse, HWSW ?)

• Stream data buffered in off-chip memory(bandwidth, power ?)

Page 17: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

17

PhilipsResearch

Design problem: new DVP subsystem

VO

MPEG2decode

CPU

1394DVDdecode

MPEG2encode CPU

Eclipse

externalmemory

Page 18: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

18

PhilipsResearch

Eclipse DVP subsystem: application domain

Now, target for 1st instance:

• Dual MPEG2 full HD decode (1920 x 1080 @ 60i)

• MPEG2 SD transcoding and HD decoding

Anticipate:

• Range of formats (DV, MJPEG, MPEG4)

• 3D-graphics acceleration

• Motion-compensated video processing

Page 19: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

19

PhilipsResearch

Application domain: MPEG2 decoding (HD)

+

Motioncompenstion

Referencepictures

Zig-zag scan

Run lengthdecoding

Variable lengthdecoding

Inversequantization

InverseDCT

HD Video141 MB/s

< 10 MB/s

MPEG2 HD Bitstream

141 MB/s 106 MB/s

94 MB/s

> 221 MB/s< 407 MB/s

saturate

94 MB/s

8 MB/s

Page 20: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

20

PhilipsResearch

Application domain: MPEG2 encoding (SD)

DCT+ Quantization Zig-zag scan

Run lengthencoding

Variable lengthencoding

Referencepictures

+

Motioncompenstion

Inversequantization

InverseDCT

-

SD Video

19 MB/s

19 MB/s

19 MB/s

21 MB/s 28 MB/s 28 MB/s

28 MB/s

28 MB/s

<1.9 MB/s1.6 MB/sN 2́8 - N 5́3 MB/s12-25 MB/s

44-81MB/s

motionvectors

21 MB/s

Picturere-order

Motionestimation

SD MPEG2Bitstream

Page 21: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

21

PhilipsResearch

Application domain: MPEG-4 video decoding

ReferencePicturesReferencePicturesReferencePictures

InverseScan

VariableLength

DecodingIDCT

MotionComp.

MVDecoder

ReferencePictures

InverseQuantization

PictureReconst.

MPEG-4 ES

DC & ACPrediction

Context ArithmeticDecoding

ShapeMotion

Compensation

Shape MVPrediction

<384

800

128

9090

<220

90

90

900.1

90

<7

Page 22: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

22

PhilipsResearch

Sandra

Eclipse

CPU

MPEG-4: system level application partitioning

Composition and rendering

Videoobject

3D Gfxobject

Audioobject

De-multiplex

Scene description

Decompression

Network layer

Page 23: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

23

PhilipsResearch

VO(SANDRA)

MPEG-4: partitioning Eclipse - SANDRA

SDRAM

MMI

MediaCPU

D$

I$

SRAMVLD

DCT

MC

VI

MBS

Eclipse

Page 24: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

24

PhilipsResearch

Eclipse DVP subsystem: current TSSA style

TM-CPU software

Traditional coarse-grain TM co-processors

TSSA stream data, buffered in off-chip SDRAM,synchronization with CPU interrupts

TSSA

TSSA-Appl1 TSSA-Appl2

Page 25: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

25

PhilipsResearch

Eclipse DVP subsystem: Eclipse tasks embedded in TSSA

TSSA

TSSA-Appl1 TSSA-Appl2

Eclipse Driver

Eclipse task on HW

Eclipse task in SW

Eclipse data streamvia on-chip memory

TSSA task on Eclipse

TSSA task in SW

TSSA task on DVP HW

TSSA data streamvia off-chip memory

Page 26: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

26

PhilipsResearch

Eclipse DVP subsystem: scale down

Hierarchy in the DVP system:

• Computational model which fits neatly inside DVP & TSSA

Scale down from SoC to subsystem:

• Limited internal distances

• High data bandwidth and local storage

• Fast inter-task synchronization

Page 27: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

27

PhilipsResearch

Outline

• DVP

• Eclipse DVP subsystem

• Eclipse architecture• Model of computation

• Generic architecture

• Eclipse application programming

• Simulator

• Status

Page 28: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

28

PhilipsResearch

Eclipse architecture: model of computation

CPU coproc1 coproc2

Application- parallel tasks- streams

Mapping- static

Architecture- programmable- medium grain- multitasking

Page 29: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

29

PhilipsResearch

Model of computation: architecture philosophy

The Kahn model allows ‘plug and play’:

• Parallel execution of many tasks

• Application configuration by instantiating and connecting tasks.

• Functional correctness independent of task scheduling issues.

Eclipse is designed to accomplish this with:

• A mixture of HW and SW tasks.

• High data rates (GB/s) and medium buffer sizes (KB).

• Re-use of co-processors over applications through multi-tasking

• Runtime application reconfiguration.

Page 30: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

30

PhilipsResearch

Allow proper balance in HW/SW combination

Function-specificengines

DSP-CPU

Application flexibility of given siliconLow High

Energyefficiency

Low

HighEclipse

Page 31: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

31

PhilipsResearch

Previous Kahn style architectures in PRLE

CPACPA C-HeapC-Heap

EclipseEclipse

Explicit synchronizationShared memory model

Mixed HW/SW

Data drivenHW synchronizationMultitasking coprocs

But ?Dynamic applicationsCPU in media processing

But ?High performance

Variable packet sizes

Page 32: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

32

PhilipsResearch

Outline

• DVP

• Eclipse DVP subsystem

• Eclipse architecture• Model of computation

• Generic architecture• Coprocessor shell interface

• Shell communication interface

• Architecture instantiation

• Eclipse application programming

• Simulator

• Status

Page 33: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

33

PhilipsResearch

Generic architecture: inter-processor communication

• On-chip, dedicated network for inter-processor communication:• Medium grain functionsHigh bandwidth (up to several GB/s)Keep data transport on-chip

• Use DVP-bus for off-chip communication only

Page 34: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

34

PhilipsResearch

Generic architecture: communication network

CoprocessorCoprocessorCPU

Communication network

Page 35: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

35

PhilipsResearch

Generic architecture: memory

• Shared, single address space, memory model• Flexible access

• Software programming model

• Centralized wide memory• Flexible buffer allocation

• Fits well with stream processing

• Single wide memory-bus for communication• Simple and cost effective

Page 36: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

36

PhilipsResearch

Generic architecture: shared on-chip memory

CoprocessorCoprocessorCPU

Communication network

Memory

Page 37: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

37

PhilipsResearch

Generic architecture: task level interface

Partition functionality between application-dependentcore and generic support.Introduce the (co-)processor shell:

• Shell is responsible for application configuration, task scheduling, data transport and synchronization

• Shell (parameterized) micro-architecture is re-used for each coprocessor instance

• Allow future updates of communication network while re-using (co-)processor core design

• Implementations in HW or SW

Page 38: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

38

PhilipsResearch

Communication network layer

Generic support layer

Computation layer

Generic architecture: layering

CoprocessorCoprocessorCPU

Shell-HW Shell-HWShell-SWShell-HW

Task-level interface

Communication interface

Communication network

Memory

Page 39: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

39

PhilipsResearch

Task level interface: five primitives

Multitasking, synchronization, and data transport:

• int GetTask( location, blocked, error, &task_info)

• bool GetSpace ( port_id, n_bytes)

• Read( port_id, offset, n_bytes, &byte_vector)

• Write( port_id, offset, n_bytes, &byte_vector)

• PutSpace ( port_id, n_bytes)

GetSpace is used for both get_data and get_room calls.PutSpace is used for both put_data and put_room calls.

The processor has the initiative, the shell answers.

Page 40: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

40

PhilipsResearch

Task level interface: port IO

a: Initial situation of ‘data tape’ with current access point:

b: Inquiry action provides window on requested space:

c: Read/Write actions on contents:

d: Commit action moves access point ahead:

n_bytes2

offset

n_bytes1

Task A

Page 41: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

41

PhilipsResearch

Task level interface: communication through streams

Task A Task B

Space filled with data

Empty space

A B

Granted window for writer

Granted window for reader

Kahn model:

Implementation with shared circular buffer:

The shell takes care that the access windows have no overlap

Page 42: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

42

PhilipsResearch

Task level interface: multicast

Forked streams:

The task implementations are fixed (HW or SW).Application configuration is a shell responsibility.

Task A

Task C

Task B

Space filled with data

Empty space

A B

Granted window for writer

Granted window for reader B

CGranted window for reader C

Page 43: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

43

PhilipsResearch

Task level interface: characteristics

• Linear (fifo) synchronization order is enforced

• Random access read/write inside acquired window through offset argument

• Shells operate on unformatted sequences of bytesAny semantical interpretation is left to the processor

• A task is not aware of where its streams connect to,or other tasks sharing the same processor

• The shell maintains the application graph structure

• The shell takes care of: fifo size, fifo memory location, wrap-around addressing, caching, cache coherency, bus alignment

Page 44: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

44

PhilipsResearch

Task level interface: multi-tasking

int GetTask( location, blocked, error, &task_info)

• Non-preemptive task scheduling

• Coprocessor provides explicit task-switch moments

• Task switches separate ‘processing steps’(Granularity: tens or hundreds of clock cycles)

• Shell is responsible for task selection and administration

• Coprocessor provides feedback to the shell on task progress

Page 45: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

45

PhilipsResearch

Generic support layer

Communication network layer

Computation layer

Generic architecture: generic support

CoprocessorCoprocessorCPU

Shell-HW Shell-HWShell-SWShell-HW

Task-level interface

Communication interface

Communication network

Memory

Page 46: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

46

PhilipsResearch

Generic support: the Shell

The shell takes care of:

• The application graph structure, supporting run-time reconfiguration

• The local memory map and data transport(fifo size, fifo memory location, wrap-around addressing, caching, cache coherency, bus alignment)

• Task scheduling and synchronization

The distributed implementation:

• Allows fast interaction with local coprocessor

• Creates a scalable solution

Page 47: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

47

PhilipsResearch

Generic support: synchronization

Coprocessor A

Communication network

Shell

space – = n

PutSpace( port, n )

Coprocessor B

Shell

space + = n

GetSpace( port, m )

Message: putspace( gsid, n )

m space

• PutSpace and GetSpace return after local update or inquiry.

• Delay in messaging does not affect functional correctness.

Page 48: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

48

PhilipsResearch

Generic support: application configuration

Coprocessor

Communication network

Shell

Tas

k_id

Str

eam

_id

Stream table Task table

addr size space gsid . . . info budget . . .str_id

Shell tables are accessible through a PI-bus interface

Page 49: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

49

PhilipsResearch

Generic support: data transport caching

• Translate byte-oriented coprocessor interface to wide and aligned bus transfers.

• Separated caches for read and write.

• Direct mapped: two adjacent words per port

• Coherency is enforced as side-effect of GetSpace and PutSpace

• Support automatic prefetching and preflushing

Page 50: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

50

PhilipsResearch

Generic support: cache coherency

a : R e a d f e t c h e s w o r d e n t i r e l y i n s i d e g r a n t e d w i n d o w

F i g u r e 1

b : R e a d f e t c h e s w o r d w h i c h e x t e n d s o u t s i d e w i n d o w , b u t i n s i d e k n o w n a v a i l a b l e s p a c e b : G e t S p a c e p r o v i d e s o w n e r s h i p o n r e q u e s t e d s p a c e :

c : R e a d f e t c h e s w o r d w h i c h e x t e n d s i n t o d i r t y s p a c e

R e a d r e q u e s t

M e m o r y t r a n s f e r u n i t s ( w o r d s )

G e t S p a c e w i n d o w A v a i l a b l e s p a c e k n o w n b y s h e l l

Page 51: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

51

PhilipsResearch

Generic support: task scheduling

A simple task scheduler runs locally in each shell:

• Observes empty/full states of fifos and task blocking

• Round-Robin selection of ‘runnable’ tasks

• Parameterized ‘compute resource’ budgets per task

• Temporary disabling of tasks for reconfiguration at specified locations in the data stream

Page 52: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

52

PhilipsResearch

Task scheduling: computation budget

• Computation budget = maximum number of time slices allowed per task selection– Relative budget value controls compute resource partitioning

over tasks

– Absolute budget value controls task switch frequency, influencing overhead of state save & restore

• Running budget is set to the computation budget each time the task is selected in round-robin order

• The running budget is decremented with a fixed clock period, once every time slice

Page 53: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

53

PhilipsResearch

Task scheduling algorithm

TaskId++ mod NrTasks

N

Y

RunningBudget = Budget[TaskId]return TaskId

Runnable[TaskId]?

RunningBudget > 0& Runnable?

return TaskId

GetTask

RunningBudget– –

Clock Event

Page 54: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

54

PhilipsResearch

Task scheduling algorithm: dynamic workload

Shell does not interpret media data but performs a best guess

• Space: the amount of available data/room in the stream buffer

• Blocked flag: true if insufficient space on the last inquiry

• Schedule flag: If false, a task may be selected even whenSpace = 0 (data dependent stream selection)

• Task Enable flag: true if the task is configured to be active

Page 55: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

55

PhilipsResearch

Task scheduling algorithm: Runnable criterion

StreamsTask

ScheduleSpaceBlocked

TaskEnableRunnable

! | 0&!&

)(GetSpace! nbytesBlocked

SpacefalseBlocked increases PutSpaceexternal an when

Page 56: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

56

PhilipsResearch

Task scheduling: parallel implementation

Task selection background process:1. For each task, check if it is runnable, based on available

space in the stream buffers

2. Select a new task from the list of runnable tasks, round-robin

Provide an immediate answer to a GetTask inquiry:– Continue current task if its computation budget is not depleted

– Otherwise, start pre-selected next task.

Selection of next task may lag behind on buffer status:– Only the active task decreases space in the stream buffer

– All incoming PutSpace messages increase space in the buffer

Page 57: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

57

PhilipsResearch

Task scheduler implementation

Active Task

Runnable?

TaskSelection

PutSpace

Space Blocked Schedule TaskId . . .

GetSpaceGetTask

TaskId

RunningBudgetGetTask?

NextTask

Enable Runnable Budget . . .

Task TableStream Table

Coprocessor

ShellDecrement

Budget

Page 58: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

58

PhilipsResearch

Generic support: internal view

ShellSync

DTW DTR SS TS

Coprocessor

Communication network

Page 59: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

59

PhilipsResearch

Generic support layer

Communication network layer

Computation layer

Generic architecture: communication network

CoprocessorCoprocessorCPU

Shell-HW Shell-HWShell-SWShell-HW

Task-level interface

Communication interface

Communication network

Memory

Page 60: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

60

PhilipsResearch

Communication network: characteristics

• Synchronization messages are passed through a token ring, allowing one message per clock cycle

• Fifos are mapped in a shared on-chip memory, allowing flexible application configuration.

• Data transport is implemented with a wide data bus:• DTL based bus protocol

• Separately arbitrated busses for read and write

• Independently pipelined for efficient single-word transfers

• All communication paths are uni-directional and pipelined, allowing the insertion of clock-domain bridges

Page 61: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

61

PhilipsResearch

Communication network

Shell

Arbiter

SRAM

Shell

Token ring

Dual DTL bus

Communication network

Page 62: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

62

PhilipsResearch

Communication network: clock domains

• VLIW CPU wants low and fixed latency for memory access.

• CPU and memory can run at high clock rate.

• Synthesized coprocessors and long bus must run at lower clock rate.

Page 63: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

63

PhilipsResearch

Example Eclipse instantiation

2 x 128 bits @ 150MHz Local bus

32 Kbyte, 128 bit words

128 bits @ 300MHz

64 bits @ 150MHz 32 bits @ 75MHz

ShellShell

CoprocCoproc

ShellShell

CoprocCoproc

ShellShell

CoprocCoproc

ArbiterArbiter

LocalMemory

LocalMemory

ShellShell

CPU64CPU64

I$ D$

EB

DVP hubDVP hub PI bridgePI bridge

PI bus DVP bus

300 MHz clock domain 150 MHz clock domain

Page 64: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

64

PhilipsResearch

Outline

• DVP

• Eclipse DVP subsystem

• Eclipse architecture

• Eclipse application programming• Coprocessor definition

• System software

• Simulator

• Status

Page 65: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

65

PhilipsResearch

Coprocessor definition: starting point

ProcessFIFO Read

Write

A C

B Execute

• Model of computation: Kahn Process Networks

• YAPI: simple API to transform C programs into Kahn models

• Expose parallelism and communication

• Decisions on grain sizes for processes and data

• Adopted by various groups in Philips for application modeling

Page 66: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

66

PhilipsResearch

Application C codeApplication C codeApplication C codeApplication C code

Generic YAPI Generic YAPI

Coprocessor definition: process

EclipseTailored YAPI

EclipseTailored YAPI

Function

Control

Function

Control

Function

ControlCoproc

Page 67: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

67

PhilipsResearch

Coprocessor definition: control

• Define processing steps by inserting GetTask, breaking up process iterations.

• Choose explicit synchronization moments.

• Implement state saving around GetTask calls.

• Discern different data types that share a stream.

• Discern different functions to handle the data.

Page 68: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

68

PhilipsResearch

Coprocessor definition: packets

Packets wrap data; packet headers indicate data type

Type Payload

NBytes Payload

0

Type1

Byte 0 Byte 1 Byte 2

Page 69: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

69

PhilipsResearch

Coprocessor definition: location packets

Packets of type ‘location’:

• Payload holds unique identifier denoting location in the stream.

• Used for application reconfiguration at specified points in the data processing.

• All tasks forward location packets to output streams.

• Location identifiers are passed to the shell via GetTask.

• The shell compares a location identifier with its corresponding field in the task table. When these match:• The task is disabled.

• The shell sends an interrupt to the cpu.

• Location identifiers also serve as debug breakpoints.

Page 70: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

70

PhilipsResearch

Coprocessor definition: example

while( true ){ tid = GetTask(location, blocked, error, &task_info); if (!tid) return; blocked = !GetSpace( IN, 2) || !GetSpace( OUT, 2); if (blocked) return;

// handle location packets Read( IN, 0, 2, &packet); if (IsLocation( packet)) { location = PayLoad( packet); Write( OUT, 0, 2, packet); PutSpace( IN, 2); PutSpace( OUT, 2); return; }

// handle real data ...

Page 71: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

71

PhilipsResearch

Coprocessor definition: example

// handle real data size = NBytes( packet); blocked = !GetSpace( IN, 2 + size) || !GetSpace( OUT, OUTSIZE); if (blocked) return;

Read( IN, 2, size, &in_data); PutSpace( IN, 2 + size);

error = Compute( task_info, in_data, &out_data);

Write( OUT, 0, 2 + OUTSIZE, Packet( TYPE, OUTSIZE, out_data)); PutSpace( OUT, 2 + OUTSIZE); }

Page 72: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

72

PhilipsResearch

System software

Different types of software:

• Media processing software kernels:TM-CPU software with media operations and communication/synchronization primitives.

• Runtime support:Task scheduler, Quality-of-service control.

• System re-configuration:Network programming, memory management.

Page 73: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

73

PhilipsResearch

Outline

• DVP

• Eclipse DVP subsystem

• Eclipse architecture

• Eclipse application programming

• Simulator• Software architecture

• Retargetability

• Flexibility

• Performance metrics

• Status

Page 74: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

74

PhilipsResearch

Simulation objective

• Verification and validation of the Eclipse architecture

• Architecture design space exploration

• Application development platform

• Starting point for hardware development

• Collaboration with LEP (Sandra)

• Transfer to PS-DVI (Dr. Evil)

Page 75: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

75

PhilipsResearch

Simulator toolchain

Create Vld

Create Dct

Create Mc

Dct{

NTasks: 2

Shell{

NStreams: 2

Dtr.NPorts : 1

}

}

Applicationsetup

Architecturesetup

Performancemetrics

Wave forms

7: [Eclipse.Input.Shell.Ts.Computation] CoprocGetTask: location_id=0x0 blocked=08: [Eclipse.Input.Coproc.Computation] GetTask: location_id=0x0 blocked=0 new task_id=1 task_info=08: [Eclipse.Input.Coproc.Computation] GetSpace: port_id=0 size=13010: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=0 data=0x457f801f11: [Eclipse.Input.Shell.Dtw.Computation] CoprocWrite: size=4 offset=0 data=0x457f801f12: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=4 data=0x0201464c13: [Eclipse.Input.Shell.Dtw.Computation] CoprocWrite: size=4 offset=4 data=0x0201464c13: [Eclipse.Output.Shell.Ts.Computation] CoprocGetTask: location_id=0x0 blocked=014: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=8 data=0x00000000

Debug traces

eclipse_sim -d2 -c1000 -l1 -DTHREADLEVEL=2

Simulationmode

Page 76: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

76

PhilipsResearch

Simulator flexibility: simulation modes

Modes of execution

• Sequential executionApplication development with functional verification

• Timed executionSystem level performance analysis

• TSS executionHardware development

All execution modes are implemented in one code base.

Only the interfaces differentiate between these modes.

Page 77: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

77

PhilipsResearch

Simulator: modeled hardware architecture

Coprocessor

Dtw Ss Ts

Sync

Dtr

Shell

Transport network

Sync networkSync network

Read Write

GetSpacePutSpace

GetTask

Page 78: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

78

PhilipsResearch

Simulator software architecture

Coprocessor

Dtw Ss Ts

Sync

Dtr

Shell

Transport network

Sync networkSync network

IFIF IF IF

IF IF

IFIF

m m m m

s s s s

m m

ss

s m

Page 79: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

79

PhilipsResearch

Simulator software architecture: shell

Dtw Ss Ts

Sync

Dtr

IF

IF

IFIF

s

sm

m IFs mm

s

ms

Shell

Page 80: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

80

PhilipsResearch

DctVld McRlsq

Coproc

LeafComponent

Coproc Dtw Dtr Ss Sync Ts Transport

Interface 0..*0..*

Protocol

11

LeafComponent

Eclipse ShellClient

CompositeComponent

ComponentSetup()Init()MicroscopeRead()MicroscopeWrite()Run()

CompositeComponentLeafComponent

1..*1..*

Simulator components

Page 81: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

81

PhilipsResearch

Simulator: sequential execution

• Very fast functional verification

• One single thread of control

• Communication through function calls

• Statistics, e.g. number of reads, cache misses, …

• Compiles and runs without TSS

Page 82: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

82

PhilipsResearch

Simulator: sequential execution implementation

Simulate(){ for ( execution=0; execution=100; execution++ ) { Component->Run(); }}

SequentialSimulatorSimulate()

ComponentRun()

1..*1..*

Page 83: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

83

PhilipsResearch

Simulator: timed execution

• Performance metrics

• Full communication protocols

• Sequential C-code via multi-threading

• Run time definition of threads

• Compiles and runs without TSS

Page 84: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

84

PhilipsResearch

Simulator: timed execution implementation

Simulate(){ for ( cycle=0; cycle=1000; cycle++ ) { ComponentThread->JumpThread(); }}

ThreadingSimulatorSimulate()

ComponentThreadThread()

1..*1..* ComponentRun()

11

Thread(){ while( 1 ) { Component->Run(); }}

Page 85: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

85

PhilipsResearch

Timed execution: Execute()

void Dct::Thread(){ while( 1 ) { Dct(); Execute(64); }}

void Execute(int delay){ while( delay > 0 ) { delay--;

JumpMain(); }}

void MainScheduler(){ for (int cycle=0; cycle < 10000; cycle++) { Dct->JumpThread(); Vld->JumpThread(); Mc->JumpThread(); }}

Page 86: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

86

PhilipsResearch

void DtrInterface::Read(int port, int offset, int size, DataT &data){ PortOut.Set( port ); OffsetOut.Set( offset ); SizeOut.Set( size ); RequestOut.Set( !RequestOut ); while ( AckIn.Get() != RequestOut ) JumpMain(); data = DataIn.Get();}

void Dct::Thread()

{ while( 1 ) { … DtrInterface->Read(0,0,8,data); … }

}

Timed execution: Read()

void DtrInterface::Poll() { if ( RequestIn.Get() != AckOut ) { int port = PortIn.Get(); int offset = OffsetIn.Get(); int size = SizeIn.Get(); DataT data[size]; Dtr->Read( port, offset, size, data ); DataOut.Set( data ); AckOut->Set( RequestIn.Get() ); }} void Dtr::Read(int port, int offset, int size, DataT &data)

{ … // Get data from cache data = …

}

Page 87: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

87

PhilipsResearch

Simulator: TSS execution

• Dynamic binding of TSS code to the simulator

• Run time definition of TSS module boundaries

• Thread model inside TSS module

• TSS port creation

• Automatic Netlist generation

Page 88: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

88

PhilipsResearch

Simulator: TSS execution implementation

Clock(){ ComponentThread->JumpThread();}

ComponentRun()

Thread(){ while( 1 ) { Component->Run(); }}

TssSimulatorSimulate()

TssModuleClock()

ComponentThreadThread()

111..*1..*

Page 89: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

89

PhilipsResearch

Shell

Shell

TSS: module boundaries

Vld

Transport Network

Dtr Dtw Ss Ts

Sync

Mc

Dtr Dtw Ss Ts

Sync

Vld.ModuleName : Vld

Vld.Shell.ModuleName : VldShell

Mc.ModuleName : Mc

Mc.Shell{

Dtr.ModuleName : McShellDtr

Dtw.ModuleName : McShellDtw

Ss.ModuleName : McShellSs

Ts.ModuleName : McShellTs

Sync.ModuleName : McShellSync

}

Transport.ModuleName : Transport

Page 90: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

90

PhilipsResearch

TSS: module boundaries

Vld

Shell

Transport Network

Dtr Dtw Ss Ts

Sync

Vld

Shell

Dtr Dtw Ss Ts

Sync

ModuleName : Eclipse

Page 91: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

91

PhilipsResearch

TSS: co-simulation TSS-Verilog

Ts

Vld

Shell

Transport Network

Dtr Dtw Ss Ts

Sync

Mc

Shell

Dtr Dtw Ss Ts

Sync

Page 92: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

92

PhilipsResearch

Simulator retargetability

Create Vld

Create Dct

Create Mc

Dct{

NTasks: 2

Shell{

NStreams: 2

Dtr.NPorts : 1

}

}

Applicationsetup

Architecturesetup

Performancemetrics

Wave forms

7: [Eclipse.Input.Shell.Ts.Computation] CoprocGetTask: location_id=0x0 blocked=08: [Eclipse.Input.Coproc.Computation] GetTask: location_id=0x0 blocked=0 new task_id=1 task_info=08: [Eclipse.Input.Coproc.Computation] GetSpace: port_id=0 size=13010: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=0 data=0x457f801f11: [Eclipse.Input.Shell.Dtw.Computation] CoprocWrite: size=4 offset=0 data=0x457f801f12: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=4 data=0x0201464c13: [Eclipse.Input.Shell.Dtw.Computation] CoprocWrite: size=4 offset=4 data=0x0201464c13: [Eclipse.Output.Shell.Ts.Computation] CoprocGetTask: location_id=0x0 blocked=014: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=8 data=0x00000000

Debug traces

eclipse_sim -d2 -c1000 -l1 -DTHREADLEVEL=2

Simulationmode

Page 93: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

93

PhilipsResearch

Simulator retargetability: Eclipse instantiation

Create Vld

Create Dct

Create Mc

Dct{

NTasks: 2

Shell{

NStreams: 2

Dtr.NPorts : 1

}

}

Vld Dct

Shell Shell

Transport Network

Mc

Shell

Page 94: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

94

PhilipsResearch

McRun()

RlsqFactoryCreateCoproc()

McFactoryCreateCoproc()

Creates

VldFactoryCreateCoproc()

Creates

CoprocFactoryCreateCoproc()

CoprocFactoryRegistryRegister()GetCoprocFactory(Name)

Name1..*1..*

DctFactoryCreateCoproc()

Register

CreatesCreates

Name

DctRun()

RlsqRun()

VldRun()

Coprocessor instantiation CoprocInit()Run()GetTask()Read()Write()GetSpace()PutSpace()Execute()

DctRun()

RlsqRun()

VldRun()

McRun()

Create Vld Create Vld Create Mc

Shell

Page 95: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

95

PhilipsResearch

DctVld McRlsq

Coproc

LeafComponent

Coproc Dtw Dtr Ss Sync Ts Transport

Interface 0..*0..*

Protocol

11

LeafComponent

Eclipse ShellClient

CompositeComponent

ComponentSetup()Init()MicroscopeRead()MicroscopeWrite()Run()

CompositeComponentLeafComponent

1..*1..*

Architecture setup

Page 96: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

96

PhilipsResearch

Retargetability: application configuration

Dct.Shell{

Ss.StreamTable{

TASK_ID: 1

BUF_SPACE : 0x100

}

}

Page 97: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

97

PhilipsResearch

DctVld McRlsq

Coproc

LeafComponent

Coproc Dtw Dtr Ss Sync Ts Transport

Interface 0..*0..*

Protocol

11

LeafComponent

Eclipse ShellClient

CompositeComponent

ComponentSetup()Init()MicroscopeRead()MicroscopeWrite()Run()

CompositeComponentLeafComponent

1..*1..*

Application setup

Page 98: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

98

PhilipsResearch

Simulator output

Create Vld

Create Dct

Create Mc

Dct{

NTasks: 2

Shell{

NStreams: 2

Dtr.NPorts : 1

}

}

Applicationsetup

Architecturesetup

Performancemetrics

Wave forms

7: [Eclipse.Input.Shell.Ts.Computation] CoprocGetTask: location_id=0x0 blocked=08: [Eclipse.Input.Coproc.Computation] GetTask: location_id=0x0 blocked=0 new task_id=1 task_info=08: [Eclipse.Input.Coproc.Computation] GetSpace: port_id=0 size=13010: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=0 data=0x457f801f11: [Eclipse.Input.Shell.Dtw.Computation] CoprocWrite: size=4 offset=0 data=0x457f801f12: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=4 data=0x0201464c13: [Eclipse.Input.Shell.Dtw.Computation] CoprocWrite: size=4 offset=4 data=0x0201464c13: [Eclipse.Output.Shell.Ts.Computation] CoprocGetTask: location_id=0x0 blocked=014: [Eclipse.Input.Coproc.Computation] Write: port_id=0 size=4 offset=8 data=0x00000000

Debug traces

eclipse_sim -d2 -c1000 -l1 -DTHREADLEVEL=2

Simulationmode

Page 99: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

99

PhilipsResearch

Simulation output: wave forms

Page 100: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

100

PhilipsResearch

Simulator output: performance data collection

• Collection of critical performance indicators

• Subset of performance indicators implemented in HWin stream and task tables

• Used for:• Architecture evaluation at silicon design time

• Application tuning at application design time

• QoS resource management at run-time

Page 101: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

101

PhilipsResearch

Viewing performance data

Page 102: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

102

PhilipsResearch

Viewing performance data: processor dynamics

Page 103: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

103

PhilipsResearch

Viewing performance data: processor metrics

Page 104: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

104

PhilipsResearch

Viewing performance data: buffer filling

Page 105: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

105

PhilipsResearch

Outline

• DVP

• Eclipse DVP subsystem

• Eclipse architecture

• Eclipse application programming

• Simulator

• Status

Page 106: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

106

PhilipsResearch

Status

Abs

trac

tion

High

Low

Cos

tLow

High Alternative realizations

Initial architecture study (1997)

Feasibility study(October 1998)

Generic architecture definition (August 1999)

Specific architecture definition (February 2000)

Specific architecture implementation (July 2000)

Page 107: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

107

PhilipsResearch

Current status

• Eclipse documentation • Concepts

• Design path

• Implementation

• Applications:• Coprocessor functional models for MPEG2 HD/SD decoding

(Vld, Mc, Idct, Rlsq) supporting downscaling

• MPEG2 encoder generic Yapi

• MPEG4, 3D Gfx scheduled for 2001

• Natural Motion anticipated

Page 108: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

108

PhilipsResearch

Simulator status

• Simulator framework:• Retargetable and flexible through design patterns

• Re-use of methodology, design patterns, implementation (Sandra, QoS, TSSA-2)

• Simulator hardware model:• Functional, bit-level accurate model of shells

• Abstract model of transport network and coprocessors

• Simulator toolchain:• Approx. 25,000 lines of C++ code, 250 file

( CVS version management, multi-platform makefile structure,

automatic source documentation )

• Integration testing phase

• Submitted to CRE 2001

Page 109: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

109

PhilipsResearch

Conclusion

• Eclipse fits neatly in DVP system level architecture

• Flexibility through:

• Application (re-)configuration

• Medium-grain HW / SW interaction

• Co-processor multi-tasking (without runtime CPU control)

• Cost-effectiveness through:

• HW / SW balancing

• Time-shared co-processor use

• Tools for application configuration, simulation, and performance analysis are alive

Page 110: Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

110

PhilipsResearch

Acknowledgements

Persons from several groups in PRLE:

• IPA (Lippmann): Evert-Jan Pol, Jos van Eijndhoven, Martijn Rutten, Anup Gangwar

• ESAS (van Utteren): Pieter van der Wolf, Om Prakash Gangwal, Gerben Essink

• IT (Dijkstra): Koen Meinds

• Video processing & Visual Perception (Depovere): Gerben Hekstra, Egbert Jaspers, Erik van der Tol, Martijn van Balen

• Digital Design & Test (Niessen): Manish Garg