CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems...
-
Upload
augustus-phillips -
Category
Documents
-
view
213 -
download
0
Transcript of CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems...
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems
Workshop & School: Roma January 17-18th 2011
Frédéric ROUSSEAU
Professor at University Joseph Fourier – Grenoble (France)TIMA Lab - SLS
46 av. Félix Viallet – 38000 Grenoble – [email protected]
Communication Synthesis in Low Level Software
for Hierarchical Heterogeneous Systems
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
Context of MPSoC
An increasing number of processors: 380 processors on chip in 2015 (ITRS) Heterogeneity is the trend (good ratio FLOPS/W)
• In the High Performance Computing TOP500 (Nov. 2010): 2 heterogeneous architectures in the top 3 GREEN500 (June 2010): 3 heterogeneous architectures in the top 3
• In the embedded world TI OMAP, Nexperia, D940, …
A hierarchical structure is mandatory• 3 levels: tile, chip, system (multi-chip)
2
System
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
Communication in hierarchical structure
Challenges in communication synthesis• Hierarchy and HW should be transparent for the system designer
• Complexity of the infrastructure and abstraction Heterogeneity of tile, chip and system Specific processor (VLIW) Non Uniform Memory Access Multiple hierarchy Use of complex network interfaces
• Efficient use of communication infrastructure
• Control of the limited resources (memory)
TIMA is in charge of providing low level software that includes communication synthesis
3
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
Binary code generation flow
4
Application &Source code of task
Application &Source code of taskArchitectureArchitecture
Mapping
Parsing ofinput models
Parsing ofinput models SW
componentselection
SWcomponentselection
Compilation and linking tools
Compilation and linking tools
ComCom
OSOS
FRONT-END
BACK-END
Y-CHART
01001000
01001000
Binary
SW componentlibraries
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
Binary code generation flow
5
Application &Source code of task
Application &Source code of taskArchitectureArchitecture
Mapping
Parsing ofinput models
Parsing ofinput models SW
componentselection
SWcomponentselection
Compilation and linking tools
Compilation and linking tools
ComCom
OSOS
FRONT-END
BACK-END
Y-CHART
01001000
01001000
Binary
SW componentlibraries
Communication paths FIFO in KPN model
Association path<->FIFO
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
6
Outline
Introduction HW communication paths Software components for communication Conclusion
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
7
The need of HW paths
Introduction to HW paths• HW components used for communications (data transfers)• Use or not of specific components (DMA, …)• Intermediate memories• These HW paths are given by the architecture designer
Why do we need these HW paths ?• Communication synthesis• System designers want to have a control on communication
Where do we use these HW paths ?• Used in simulation (architecture exploration, CF DOL methodology)• Mapping• Perspectives: analyze and verification …
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
8
Read and Write paths for intra-tile
CPU3
Mem3
Mem2 NI2
NI3
NI4
Tile
CPU1
CPU2
Mem1 NI1
Network 3
Network 2
Network 4
Network 1
€
WP(CPU,Mem) = LocalMem0⋅Network0⋅ NIi,Networki+1( )*⋅Mem{ }
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
9
Read and Write paths for inter-tile
€
WP(CPU,NetworkMT ) = LocalMem0{ ⋅Network0⋅ NIi,Networki+1( )*⋅NIMT ⋅NetworkMT}
Multi-Tile Network 1Multi-Tile Network 1 9
CPU1
CPU2
CPU3
Mem1 Mem3
Mem2
NI1
NI2
NI3
NI4
Network 1
Network 2
Network 3
Network 4
NI6
Multi-TileNetwork 2
NI5
Tile
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
How to use these HW paths ?
Hypothesis• All HW paths are listed in the architecture model
• In the mapping, each channel from the application model should be associated with one HW path A protocol may be given
The communication synthesis consists in• Parsing architecture and mapping models
• Selecting the SW components
• Specializing SW components (ex: FIFO size, base address, …)
• And then providing a source code ready to be compiled and linked
10
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
11
Outline
Introduction HW communication paths Software components for communication Conclusion
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
12
Software stack
Application • 1 task per process• Source code of task
OS• Task and driver management• Virtual file system (VFS)• HW access only via HAL
COM• Based on VFS
HAL• Interface for HW access: Interrupts,
locks, caches, endianess, …
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
13
Software stack: write function
function t1_behavior(Channel c1) begin
…
channel_write(c1, buffer, len);
end
int main() { Channel c1; Thread t1;
// Communication channel initialization c1= channel_init(“/dev/fifo.0”);
// Task initialization t1 = thread_create(…, t1_behavior);
…}
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
14
Software stack: write function
function channel_write(Channel c, char *buffer, int len) begin
…
vfs_write(c->desc, buffer, len);
end
function t1_behavior(Channel c1) begin
…
channel_write(c1, buffer, len);
end
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
15
Software stack: write function
function vfs_write(Vfile f, char *buffer, int len) begin
…
f->stream->write(desc->id,buffer, len);
end
Driver choice(Software FIFO inter-CPU, Rendez-vous,…)
function channel_write(Channel c, char *buffer, int len) begin
…
vfs_write(c->desc, buffer, len);
end
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
16
Software stack: write function
function vfs_write(Vfile f, char *buffer, int len) begin
…
f->stream->write(desc->id,buffer, len);
end
function fifo_write(char *buffer, int len) begin config = getConfiguration();
…
HAL_WRITE (buffer, config->writeptr, len);
end
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
function fifo_write(char *buffer, int len) begin config = getConfiguration();
…
HAL_WRITE (buffer, config->writeptr, len);
end
17
Software stack: write function
function HAL_WRITE(char *from, char *to, int len) begin
// May use of DMA end
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
18
The need of driver library
One driver for each HW path is not realist• Too much development
Only few drivers corresponding to few HW paths• Need of driver configurability
Memory addresses Platform resources: locks, timer, … Exotic configurations while using specific network interfaces (DNP !)
=> Tradeoff efficiency/number of paths represented
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
19
About the HW path selected
Each driver should be specialized• To respect the selected HW path
• Right configuration
• To access all HW components mentioned in the HW path
BUT it has to be compatible with the HAL• HAL has a limited number of interfaces (and limited HW access)
Efficiency Ease the porting to another platform
Difficult to respect HW paths given in the mappingDue to HAL (usually minimal but expected as optimal)Local memory not necessary respected by compilers
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
20
Available protocols
For the D940 platform (ARM & mAgicV processors)
Intra-tile SW FIFO Rendez-vous in synchronous mode
Inter-tile Sockets RDMA protocols (eager and Rendez-vous)
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
21
Example of results
LQCD application (from INFN)• About 50 processes and 100 channels
Protocols used• Intra-tile: Rendez-vous
• Inter-tile: Eager for small message, Rendez-vous otherwise
Mapping Intra-proc Intra-tileInter-proc
Inter-tiles #Drivers Specializations
1 tile, ARM 96 0 0 1 96
1 tile, ARM+DSP 80 16 0 2 112
2 tiles, ARM 72 0 24 2 120
2 tiles, ARM+DSP 56 16 24 3 136
8 tiles, ARM 34 0 62 2 156
8 tiles, ARM+DSP 18 16 62 3 174
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
What next for EURETILE ?
WP4: Distributed Hardware Dependant Software Generation• OS, HAL, communication mechanisms
• 3 main topics Brain-inspired many processes SW requirements Fault tolerance aware capabilities provided by HW Real-time aspect
• Interesting solution: task migration, but it is challenging Heterogeneity of the architecture NUMA Message passing Semi-centralized architecture
22
TIMA Laboratory - Frédéric ROUSSEAU -
CASTNESS’11
Roma January 18th
Conclusion & perspectives
Communication synthesis in multi-tile platform• Formalization of multi-tile communications
• Introduction of HW paths
• Development of communication driver library
• Automatic selection and configuration of drivers
What is really implemented may not be what has been decided• HAL constraints
Communication are the basics for task migration in a message passing system
23
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems
Workshop & School: Roma January 17-18th 2011
Frédéric ROUSSEAU
Professor at University Joseph Fourier – Grenoble (France)TIMA Lab - SLS
46 av. Félix Viallet – 38000 Grenoble – [email protected]
Communication Synthesis in Low Level Software
for Hierarchical Heterogeneous Systems