DAL: Programming Efficient and Fault-Tolerant Applications ... · DAL: Programming Efficient and...
Transcript of DAL: Programming Efficient and Fault-Tolerant Applications ... · DAL: Programming Efficient and...
DAL: Programming Efficient and Fault-Tolerant Applications for Many-Core Systems
Iuliana Bacivarov1, Ikbel Belaid2, Andrea Biagioni3, Ashraf El Antably2, Nicolas Fournel2, Ottorino Frezza3, Jovana Jovic4, Rainer Leupers4, Francesca Lo Cicero3, Alessandro Lonardo3, Luis Murillo4, Pier Stanislao Paolucci3, Devendra Rai1, Davide Rossetti3, Frédéric Rousseau2,
Lars Schor1, Christoph Schumacher4, Francesco Simula3, Lothar Thiele1, Laura Tosoratto3, Piero Vicini3, Hoeseok Yang1
References • L. Schor, I. Bacivarov, H. Yang, L. Thiele: “Worst-Case Temperature Guarantees for Real-Time Applications on Multi-Core Systems”, Proc. Real-Time and Embedded Technology and Applications Symposium (RTAS), Apr. 2012.
• K. Huang, W. Haid, I. Bacivarov, M. Keller, L. Thiele: “Embedding Formal Performance Analysis into the Design Cycle of MPSoCs for Real-time Multimedia Applications”, ACM Transactions on Embedded Computing Systems, 2012.
• A. Chagoya-Garzon, N. Poste, F. Rousseau: “Semi-Automation of Configuration Files Generation for Heterogeneous Multi-Tile Systems”, Proc. Computer Software and Application Conference (COMPSAC), July 2011.
• I. Bacivarov, W. Haid, K. Huang, L. Thiele: “Methods and Tools for Mapping Process Networks onto Multi-Processor Systems-On-Chip”, Handbook of Signal Processing Systems, Springer, pages 1007-1040, Oct. 2010.
• C. Schumacher, R. Leupers, D. Petras and A. Hoffmann: “parSC: Synchronous Parallel SystemC Simulation on Multi-Core Host Architectures”, Proc. Int’l Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), Oct. 2010.
• R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, P.S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto, P. Vicini: “apeNET+: High Bandwidth 3D Torus Direct Network for PetaFLOPS Scale Commodity Clusters”, Proc. Int’l Conf. on Computing in High Energy and Nuclear Physics (CHEP), Oct. 2010.
• Stefan Bleuler, Marco Laumanns, Lothar Thiele, Eckart Zitzler: “PISA - A Platform and Programming Language Independent Interface for Search Algorithms”, Evolutionary Multi-Criterion Optimization (EMO), Apr. 2003.
Vision Effective Many-Tile System-Level Programming Environment Fault-Tolerance at System-Level
Application Specification • Hierarchical model of computation • FSM controls multiple concurrent
Kahn process networks • Events causing scenario transition
Contact 1 Computer Engineering and Networks Laboratory, ETH Zurich, Switzerland,
email: [email protected] 2 System Level Synthesis Group, Laboratoire TIMA , France,
email: [email protected] 3 INFN Roma, Italy, email: [email protected] 4 Institute for Comm. Technologies and Embedded Systems, RWTH-Aachen University,
Germany, email: [email protected]
EURETILE: http://euretile.roma1.infn.it
DAL: http://www.tik.ee.ethz.ch/~euretile
Platform Specification • Scalable many-tile platform • Hierarchical communication infrastructure: First level (cortical columns): instruction-level
parallelism, intra-process parallelism Second level (cortical areas):
process-network parallelism Third level (neo-cortex):
concurrent applications
Run-Time Environment • Hierarchically centralized system of controllers: Many-core architecture divided into several clusters Each cluster has a single (local) cluster controller Cluster controllers under the control of a main controller
• Cluster controller: Receives events from running applications Produces commands to the distributed system that
lead to pausing/stopping/starting of tasks and queues
Hardware-dependent Software (HdS) • Applications and run-time environment independent of the target platform • HdS: software stack to abstract the hardware by the application code: Operating system (OS) Communication primitives (middleware) Hardware abstraction layer (HAL)
• Multiprocessor targets: Intel x86-based HPC platform RISC (IRISC-based) embedded platform
Virtual EURETILE Platform (VEP) • Virtual platform scalable to many
simulated tiles/cores • Non-intrusive controllability and
visibility to foster debug and programming better than in HW
• Fault injection and many-tile concurrency debug frameworks
• Two simulation modes: fast host- compiled and accurate ISS-based
EURETILE HPC platform • Based on QUonG (Quantum chromodynamics ON GPU) PC mesh based on Intel multi-core CPUs accelerated
with high-end GPU and interconnected via 3-d torus network
Communicating with custom interconnect (APEnet+: DNPs on FPGA-based PCI Express card)
Software-programmable accelerators in the form of ASIPs
Scenario 3
Scenario 2
Scenario 2
Scenario 3
Scenario 1
e1, e2
e3
e4 e5, e6
e7
e8, e9
Design Space Exploration • Each “application scenario” is linked to one set of mappings • Mapping optimization using PISA and EXPO • MOEA (multi-objective evolutionary algorithm) module
to compute the Pareto front of optimal mappings • Performance analysis using MPA (modular performance
analysis) framework to provide real-time behavior and temperature guarantees while optimizing average power consumption, data throughput, and latency
Mapping 3 Mapping 2
Mapping 1
EXPO
mapping generation and variation
candidate mapping modular performance
analysis (MPA)
Analysis
Controller Mechanism
Scen. 3
Scen. 1
Scen. 2
e1, e2
e3
e4 e5, e6
e7
e8, e9
virtual mapping
physical mapping
events
fault-tolerance (dynamic remapping)
dynamic selection of mapping
commands for state transactions
Pareto optimal
solutions
MOEA
evolutionary algorithm
DNP MEM
ASIP ASIP DSP DSP
RISC RISC
“cortical columns”
“cortical areas” “neo-cortex”
high-temperature
fault
dynamic applications
dynamic mapping
many-tile hardware
Parallelism
instruction-level
task-level
application-level
Model of Computation (MoCs)
sequential
parallel
dynamic, concurrent, scalable
•Providing reliability guarantees on mappings at design time (e.g., maximum temperature)
Conservative Analysis • For non-critical applications • Empty (disengaged) tiles and
clusters • In case of a fault: restart
application on an empty tile or cluster
Over-Provisioning • For safety critical applications •Distinction between pure
computation processes and I/O processes
•Duplication of computation processes
Task Duplication
Syst
em-l
evel
per
form
ance
an
alys
is m
od
el c
alib
rati
on
fault
main controller cluster controller
Supported commands:
Stop application
Start application
Pause application
Resume application
System Structure
APEnet+
HdS API
DNA-OS
HAL
HAL API
HdS
application • Component-based operating system • Provides high-level mechanisms such as: Threads (based on POSIX.1-2001 API standard) Semaphores Dynamic memory (malloc and free) Inputs/outputs management
• Low memory footprint • Minimal impact on the overall performances
DNA-OS
COM
Mem.
Periph. Bu
s
Tile
DNP
or
Host-
Compiled
HAL
AED
Target ISS
IRISC
Link Probe
DNP DNP
DNP DNP
tile tile
tile tile
Many-tile Debug Framework
Fault Injection Framework VEP-EX
deadlock, lost packet…
Link down, CPU fault…
Scenario 1
a 1
a 3
P 1
P 3
b risc
RISC
a 1 1
a 2 3
C 1
C 2
b bus
BUS
a 1 2
a 2 2
P 2
b dsp
DSP