CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems...

24
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU Professor at University Joseph Fourier – Grenoble (France) TIMA Lab - SLS 46 av. Félix Viallet – 38000 Grenoble – France [email protected] Communication Synthesis in Low Level Software for Hierarchical Heterogeneous Systems

Transcript of CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems...

Page 1: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems

Workshop & School: Roma January 17-18th 2011

Frédéric ROUSSEAU

Professor at University Joseph Fourier – Grenoble (France)TIMA Lab - SLS

46 av. Félix Viallet – 38000 Grenoble – [email protected]

Communication Synthesis in Low Level Software

for Hierarchical Heterogeneous Systems

Page 2: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

Context of MPSoC

An increasing number of processors: 380 processors on chip in 2015 (ITRS) Heterogeneity is the trend (good ratio FLOPS/W)

• In the High Performance Computing TOP500 (Nov. 2010): 2 heterogeneous architectures in the top 3 GREEN500 (June 2010): 3 heterogeneous architectures in the top 3

• In the embedded world TI OMAP, Nexperia, D940, …

A hierarchical structure is mandatory• 3 levels: tile, chip, system (multi-chip)

2

System

Page 3: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

Communication in hierarchical structure

Challenges in communication synthesis• Hierarchy and HW should be transparent for the system designer

• Complexity of the infrastructure and abstraction Heterogeneity of tile, chip and system Specific processor (VLIW) Non Uniform Memory Access Multiple hierarchy Use of complex network interfaces

• Efficient use of communication infrastructure

• Control of the limited resources (memory)

TIMA is in charge of providing low level software that includes communication synthesis

3

Page 4: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

Binary code generation flow

4

Application &Source code of task

Application &Source code of taskArchitectureArchitecture

Mapping

Parsing ofinput models

Parsing ofinput models SW

componentselection

SWcomponentselection

Compilation and linking tools

Compilation and linking tools

ComCom

OSOS

FRONT-END

BACK-END

Y-CHART

01001000

01001000

Binary

SW componentlibraries

Page 5: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

Binary code generation flow

5

Application &Source code of task

Application &Source code of taskArchitectureArchitecture

Mapping

Parsing ofinput models

Parsing ofinput models SW

componentselection

SWcomponentselection

Compilation and linking tools

Compilation and linking tools

ComCom

OSOS

FRONT-END

BACK-END

Y-CHART

01001000

01001000

Binary

SW componentlibraries

Communication paths FIFO in KPN model

Association path<->FIFO

Page 6: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

6

Outline

Introduction HW communication paths Software components for communication Conclusion

Page 7: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

7

The need of HW paths

Introduction to HW paths• HW components used for communications (data transfers)• Use or not of specific components (DMA, …)• Intermediate memories• These HW paths are given by the architecture designer

Why do we need these HW paths ?• Communication synthesis• System designers want to have a control on communication

Where do we use these HW paths ?• Used in simulation (architecture exploration, CF DOL methodology)• Mapping• Perspectives: analyze and verification …

Page 8: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

8

Read and Write paths for intra-tile

CPU3

Mem3

Mem2 NI2

NI3

NI4

Tile

CPU1

CPU2

Mem1 NI1

Network 3

Network 2

Network 4

Network 1

WP(CPU,Mem) = LocalMem0⋅Network0⋅ NIi,Networki+1( )*⋅Mem{ }

Page 9: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

9

Read and Write paths for inter-tile

WP(CPU,NetworkMT ) = LocalMem0{ ⋅Network0⋅ NIi,Networki+1( )*⋅NIMT ⋅NetworkMT}

Multi-Tile Network 1Multi-Tile Network 1 9

CPU1

CPU2

CPU3

Mem1 Mem3

Mem2

NI1

NI2

NI3

NI4

Network 1

Network 2

Network 3

Network 4

NI6

Multi-TileNetwork 2

NI5

Tile

Page 10: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

How to use these HW paths ?

Hypothesis• All HW paths are listed in the architecture model

• In the mapping, each channel from the application model should be associated with one HW path A protocol may be given

The communication synthesis consists in• Parsing architecture and mapping models

• Selecting the SW components

• Specializing SW components (ex: FIFO size, base address, …)

• And then providing a source code ready to be compiled and linked

10

Page 11: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

11

Outline

Introduction HW communication paths Software components for communication Conclusion

Page 12: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

12

Software stack

Application • 1 task per process• Source code of task

OS• Task and driver management• Virtual file system (VFS)• HW access only via HAL

COM• Based on VFS

HAL• Interface for HW access: Interrupts,

locks, caches, endianess, …

Page 13: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

13

Software stack: write function

function t1_behavior(Channel c1) begin

channel_write(c1, buffer, len);

end

int main() { Channel c1; Thread t1;

// Communication channel initialization c1= channel_init(“/dev/fifo.0”);

// Task initialization t1 = thread_create(…, t1_behavior);

…}

Page 14: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

14

Software stack: write function

function channel_write(Channel c, char *buffer, int len) begin

vfs_write(c->desc, buffer, len);

end

function t1_behavior(Channel c1) begin

channel_write(c1, buffer, len);

end

Page 15: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

15

Software stack: write function

function vfs_write(Vfile f, char *buffer, int len) begin

f->stream->write(desc->id,buffer, len);

end

Driver choice(Software FIFO inter-CPU, Rendez-vous,…)

function channel_write(Channel c, char *buffer, int len) begin

vfs_write(c->desc, buffer, len);

end

Page 16: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

16

Software stack: write function

function vfs_write(Vfile f, char *buffer, int len) begin

f->stream->write(desc->id,buffer, len);

end

function fifo_write(char *buffer, int len) begin config = getConfiguration();

HAL_WRITE (buffer, config->writeptr, len);

end

Page 17: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

function fifo_write(char *buffer, int len) begin config = getConfiguration();

HAL_WRITE (buffer, config->writeptr, len);

end

17

Software stack: write function

function HAL_WRITE(char *from, char *to, int len) begin

// May use of DMA end

Page 18: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

18

The need of driver library

One driver for each HW path is not realist• Too much development

Only few drivers corresponding to few HW paths• Need of driver configurability

Memory addresses Platform resources: locks, timer, … Exotic configurations while using specific network interfaces (DNP !)

=> Tradeoff efficiency/number of paths represented

Page 19: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

19

About the HW path selected

Each driver should be specialized• To respect the selected HW path

• Right configuration

• To access all HW components mentioned in the HW path

BUT it has to be compatible with the HAL• HAL has a limited number of interfaces (and limited HW access)

Efficiency Ease the porting to another platform

Difficult to respect HW paths given in the mappingDue to HAL (usually minimal but expected as optimal)Local memory not necessary respected by compilers

Page 20: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

20

Available protocols

For the D940 platform (ARM & mAgicV processors)

Intra-tile SW FIFO Rendez-vous in synchronous mode

Inter-tile Sockets RDMA protocols (eager and Rendez-vous)

Page 21: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

21

Example of results

LQCD application (from INFN)• About 50 processes and 100 channels

Protocols used• Intra-tile: Rendez-vous

• Inter-tile: Eager for small message, Rendez-vous otherwise

Mapping Intra-proc Intra-tileInter-proc

Inter-tiles #Drivers Specializations

1 tile, ARM 96 0 0 1 96

1 tile, ARM+DSP 80 16 0 2 112

2 tiles, ARM 72 0 24 2 120

2 tiles, ARM+DSP 56 16 24 3 136

8 tiles, ARM 34 0 62 2 156

8 tiles, ARM+DSP 18 16 62 3 174

Page 22: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

What next for EURETILE ?

WP4: Distributed Hardware Dependant Software Generation• OS, HAL, communication mechanisms

• 3 main topics Brain-inspired many processes SW requirements Fault tolerance aware capabilities provided by HW Real-time aspect

• Interesting solution: task migration, but it is challenging Heterogeneity of the architecture NUMA Message passing Semi-centralized architecture

22

Page 23: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

TIMA Laboratory - Frédéric ROUSSEAU -

CASTNESS’11

Roma January 18th

Conclusion & perspectives

Communication synthesis in multi-tile platform• Formalization of multi-tile communications

• Introduction of HW paths

• Development of communication driver library

• Automatic selection and configuration of drivers

What is really implemented may not be what has been decided• HAL constraints

Communication are the basics for task migration in a message passing system

23

Page 24: CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.

CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems

Workshop & School: Roma January 17-18th 2011

Frédéric ROUSSEAU

Professor at University Joseph Fourier – Grenoble (France)TIMA Lab - SLS

46 av. Félix Viallet – 38000 Grenoble – [email protected]

Communication Synthesis in Low Level Software

for Hierarchical Heterogeneous Systems