A Pragmatic, Scalable Approach to Correct-by-construction ...
Time-Aware Correct-By-Construction Systems Design
Transcript of Time-Aware Correct-By-Construction Systems Design
Time-Aware Correct-By-Construction Systems Design SCS Seminar, December 4, 2014
David Broman Associate Professor, KTH Royal Institute of Technology
Assistant Research Engineer, University of California, Berkeley
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
2
Agenda
Part II
Predictable Processors for Mixed-Criticality Systems
Part I
Time-Aware Systems Design: a Vision
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
3
Part I
Time-Aware Systems Design: a Vision
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
4
What is a Time-Aware System?
Time-aware systems are systems where time or timing affects the correctness of the system behavior.
Real-time systems are time-aware. For instance, execution of tasks must finish within certain deadlines.
Simulation systems can be time-aware, but are not necessarily real-time.
Cyber-Physical Systems are time-aware and real-time. Emphasis on networks and the interaction between cyber and physical.
Distributed systems can be time-aware that are not CPS.
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
5
Time-Aware Systems - Examples
Aircraft Automotive Process Industry and Industrial Automation
Cyber-Physical Systems (CPS)
Time-Aware Simulation Systems
Physical simulations (Simulink, Modelica, etc.)
Time-Aware Distributed Systems
Time-stamped distributed systems
(E.g. Google Spanner)
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
6
Time-Aware Systems Design
Physical system (the plant) Cyber system: Computation (embedded) + Networking
Sensors
Actuators
System
Model
Modeling
Equation-based model
Platform 1
Physical Plant 2
Physical Plant 2
PhysicalInterface
Physical Plant 1
NetworkPlatform 2
Platform 3
PhysicalInterface
Sensor
Sensor
PhysicalInterfaceActuator
PhysicalInterface Actuator
Computation 3
Delay 1Computation 1
Computation 4Computation 2
Delay 2
Various models of computation (MoC)
Simulation with timing properties
Modeling
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
7
Physical prototyping
Compiling/ synthesizing
Physical system (the plant) Cyber system: Computation (embedded) + Networking
Sensors
Actuators
System
Model Equation-based model
Platform 1
Physical Plant 2
Physical Plant 2
PhysicalInterface
Physical Plant 1
NetworkPlatform 2
Platform 3
PhysicalInterface
Sensor
Sensor
PhysicalInterfaceActuator
PhysicalInterface Actuator
Computation 3
Delay 1Computation 1
Computation 4Computation 2
Delay 2
Various models of computation (MoC)
Simulation with timing properties
Modeling Modeling
Challenge: Compile/synthesize the model’s cyber part, such that the simulated model and the behavior of the real system coincide. The main challenge is to guarantee correct timing behavior.
Model fidelity problem
“Ensuring that the model accurately represents the real system”
Time-Aware Systems Design
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
8
What is our goal?
“Everything should be made as simple as possible, but not simpler“
Execution time should be as short as possible, but not shorter
attributed to Albert Einstein
Task
Deadline
Slack
No point in making the execution time shorter, as long as the deadline is met.
Minimize the slack Objective: Minimize area, memory, energy.
Challenge: Still guarantee to meet all timing constraints.
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
9
A Story…
Success?
They have to purchase and store microprocessors for at least 50 years production and maintenance…
Fly-by-wire technology controlled by software.
Why?
Apparently, the software does not specify the behaviour that has been validated and certified!
Safety critical ! �Rigorous validation and certification
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
10
Programming Model and Time
Timing is not part of the software semantics Correct execution of programs (e.g., in C, C++, C#, Java, Scala, Haskell, OCaml) has nothing to do with how long time things takes to execute.
Programming Model
Timing Dependent on the Hardware Platform
Make time an abstraction within the programming model
Traditional Approach
Programming Model
Our Objective
Timing is independent of the hardware platform (within certain constraints)
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
11
Time-Aware Tool Chain Vision
Modeling Languages
Programming Languages
Assembly Languages
Modelyze (Broman and Siek, 2012)
Ptolemy II (Eker et al., 2003)
Simulink/ Stateflow (Mathworks)
Modelica (Modelica
Associations)
Real-Time Euclid (Klingerman & Stoyenko, 1986)
Real-time Concurrent C (Gehani and Ramamritham, 1991)
The assembly languages for todays processors lack the notion of time
PRET Machines at UC Berkeley (see part II)
Giotto and E machine
(Henzinger et al, 2003)
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
12
Programming Languages
Assembly Languages PRET
ISA
Timed C
Work-in-progress: C extended with timing constructs
Difficult to compute WCET (e.g., determine loop bounds and infeasible paths)
Time-Aware Tool Chain Vision
Modeling Languages
Modelyze (Broman and Siek, 2012)
Ptolemy II (Eker et al., 2003)
Simulink/ Stateflow (Mathworks)
Modelica (Modelica
Associations)
Giotto and E machine
(Henzinger et al, 2003)
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
13
Programming Languages
Assembly Languages
Timed C
PRETIL - Abstracting away memory hierarchy (scratchpad, DRAM etc.)
- Expose timing constructs
Our current work-in-progress is an extension to LLVM
Time-Aware Tool Chain Vision
Work-in-progress: C extended with timing constructs
Modeling Languages
Modelyze (Broman and Siek, 2012)
Ptolemy II (Eker et al., 2003)
Simulink/ Stateflow (Mathworks)
Modelica (Modelica
Associations)
Giotto and E machine
(Henzinger et al, 2003)
PRET ISA
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
14
Programming Languages
Assembly Languages
Timed C
PRETIL - Abstracting away memory hierarchy (scratchpad, DRAM etc.)
- Expose timing constructs
Other (non PRET) ISA
Time-Aware Compilation
Time-Aware Tool Chain Vision
PRET ISA
Modeling Languages
Modelyze (Broman and Siek, 2012)
Ptolemy II (Eker et al., 2003)
Simulink/ Stateflow (Mathworks)
Modelica (Modelica
Associations)
Giotto and E machine
(Henzinger et al, 2003)
Work-in-progress: C extended with timing constructs
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
15
Time-Aware Systems Design
Research Objective: Develop methodologies, algorithms, and a time-aware tool chain that change the way we develop these kind of systems using a correct-by-construction approach. Area 1: Programming Languages
and APIs with timing constrains - Modelyze - Timed C - Functional Mockup Interface (FMI)
Modeling/Program Language
Time-Aware Tool Chain
(Compilation/Synthesis)
Area 3: Predictable Architectures and Clock Synchronization - PRET processors - Clock Synchronization
Clock sync
Clock sync
Area 2: Time-aware compilation/synthesis - LLVM-based time-aware compiler - WCET analysis
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
16
Part II
Predictable Processors for Mixed-Criticality Systems
* This part highlights key aspects of two papers that will appear in RTAS 2014 (April 15-17, Berlin), authored by the following persons:
Michael Zimmer David Broman Chris Shaver Edward A. Lee
Yooseong Kim David Broman Jian Cai Aviral Shrivastava
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
17
Modern Systems with Many Processor Platforms
Modern aircraft have many computer controlled systems • Engine control • Electric power control • Radar system • Navigation system • Flight control • Environmental control system etc…
Modern cars have many ECU (Electronic Control Units) • Airbag control • Door control • Electric power steering control • Power train control • Speed control • Battery management. etc.. Over 80 ECUs in a high-end model (Albert and Jones, 2010)
Automotive
Aerospace
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
18
Mixed-Criticality Systems
Issues with too many processors • High cost • Space and weight • Energy consumption
Federated Approach Each processor has its own task
Consolidate into fewer processors
Task Processor Platform
Required for Safety • Spatial isolation between tasks • Temporal isolation between tasks
(necessary to meet deadlines)
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
19
Consolidate into fewer processors
Required for Safety • Spatial isolation between tasks • Temporal isolation between tasks
(necessary to meet deadlines)
Mixed-Criticality Challenge Reconcile the conflicting requirements of: • Partitioning (for safety) • Sharing (for efficient resource usage) (Burns & Davis, 2013)
…but such safety requirements are only needed for highly critical tasks
Mixed-Criticality Systems
Issues with too many processors • High cost • Space and weight • Energy consumption
Federated Approach Each processor has its own task
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
20
Our solution
FlexPRET Softcore
Fine-grained Multithreaded Processor Platform (thread interleaved) implemented on an FPGA
Flexible schedule (1 to 8 active threads) and scheduling frequency (1, 1/2, 2/3, 1/4, 1/8 etc.)
Hard real-time threads (HRTT) with predictable timing behavior • Thread-interleaved pipleine (no pipeline hazards) • Scratchpad memory instead of cache Soft real-time threads (SRTT)
with cycle stealing from HRTT
WCET-Aware Scratchpad
Memory (SPM) Management
Automatic DMA transfer of code to SPM
Optimal mapping for minimizing WCET
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
21
Related Work
Software Scheduling for Mixed Criticality • Reservation-based partitioning, ARNIC 653 • First priority-based MC (Vestal, 2007) • Sporadic task scheduling (Baruha and
Vestal, 2008) • Slack scheduling (Niz et al. 2009) • Review of MC area, 168 references (Burns &
David, 2013)
WCET Analysis
Predictable and Multithreaded Processors
• WCET-aware compiler (Falk & Lukuciejewski, 2010)
• Detection of loop and infeasible paths (Gustafsson et al., 2006)
• Cache analysis (Ferdinand & Wilhelm, 1999) • WCET Survey (Wilhelm et al., 2008)
• PRET idea (Edwards and Lee, 2007) • PTARM (Liu et al., 2012) • Patmos (Schoeberl et al., 2011) • JOP (Schoeberl, 2008) • XMOS X1 (May, 2009) • MERASA, MC on multicore (Ungerer, 2010)
Scratchpad Memory Management • Average case SPM methods for SMM
(Bai et al, 2013; Jung et al., 2010; Pabalkar et al. 2008; Baker et al., 2010)
• Static SPM WCET methods (Keinaorge 2008, Platzar 2012)
• SPM management at basic block level (Puaut & Pais, 2007) Several EU projects related to Mixed-Criticality:
MultiPARTES, Recomp, CERTAINTY, Proxima,…
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
22
Flexible Scheduling with Cycle Stealing
• FlexPRET allow arbitrary interleaving • Soft real-time threads (SRTT) can steal
cycles from hard real-time threads(HRTT)
HRTT
SRTT
Example execution (read from up to down, left to right)
Task A (hard) frequency 2/4 = 1/2 Task B (hard) frequency 1/4 Task C (soft) frequency 1/4 + cycle stealing
Task B finish, cycles are used by task C (soft thread) Task A and B are temporally isolated
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
23
C level programming using real-time
D. Timing Instructions
New timing instructions augment the RISC-V ISA forexpressing real-time semantics. In contrast to previous PRETarchitectures supporting timing instructions [14], [18], [21],our design is targeted for mixed-critical systems.
The FlexPRET processor contains an internal clock thatcounts the number of elapsed nanoseconds since the processorwas booted. The current time is stored in a 64-bit register,meaning that the processor can be active for 584 years withoutthe clock counter wrapping around. Two new instructions canbe used to get the current time: get time high GTH r1 andget time low GTL r2 store the higher and lower 32 bits inregister r1 and r2, respectively. When GTL is executed, theprocessor stores internally the higher 32 bits of the clockand then returns this stored value when executing GTH. Asa consequence, executing GTL followed by GTH is atomic, aslong as the instruction order is preserved.
To provide a lower bound on the execution time for acode fragment, the RISC-V ISA is extended with a delay untilinstruction DU r1,r2, where r1 is the higher 32 bits and r2is the lower 32 bits of an absolute time value. Semantically,the thread is delayed (replays this instruction) until the currenttime becomes larger or equal to the time value specified by r1and r2. However, in contrast to previous processors supportingtiming instructions (e.g., PTARM [14], [18]), the clock cyclesare not wasted, but can instead be utilized for other SRTTs.
To provide an upper bound on execution time withoutconstantly polling, a task needs to be interrupted. Instructionexception on expire EE r1,r2 enables a timer exception thatis executed when the current time exceeds r1,r2. The jumpaddress is specified by setting a control register with MTPCR(move to program control register). Only one exception perthread can be active at any point in time; nested exceptionsmust be implemented in software. The instruction deactivateexception on expire DE deactivates the timer exception.
Exception on expire can be used for many purposes, suchas detecting and handling a deadline miss, implementing apreemptive scheduler, or performing timed I/O. By first issuingan exception on expire and then executing a new thread sleepTS instruction, the clock cycles for the sleeping thread can beutilized by other active SRTTs. Another use of exception onexpire is for anytime algorithms, that is, algorithms that canbe interrupted at any point in time and returns a better solutionthe longer time it is executed.
E. Memory Hierarchy
For spatial isolation between threads, FlexPRET allowsthreads to read anywhere in memory, but only write to certainregions. The regions are specified by control registers that canonly be set by a thread in supervisory mode with MTPCR.Virtual memory is a standard and suitable approach, but Flex-PRET currently uses a different scheme for simplicity. Thereis one control register for the upper address of a shared region(which starts at the bottom of data memory) and two controlregisters per thread for the lower and upper addresses of athread-specific region. Memory is divided into 1kB regions,and a write only succeeds if the address is within the shared orthread-specific region. By specifying all thread-specific regions
and the shared region to be disjoint, each thread will have bothprivate memory and access to shared memory.
For timing predictability, FlexPRET uses scratchpad mem-ories [22]. These are local memories that have a separateaddress space than main memory and are explicitly controlledby software; all valid memory accesses always succeed and aresingle cycle, unlike caches where execution time depends oncache state. There is active research in scratchpad memorymanagement techniques to reduce WCET [23]. Instructionsare stored in instruction scratchpad memory (I-SPM) anddata is stored separately in data scratchpad memory (D-SPM). Scratchpad memories are not required; caches couldbe used instead if the reduction in fine-grained predictability isacceptable. We envision a hybrid approach where HRTTs tasksuse scratchpads and SRTTs use caches for future versions ofFlexPRET.
F. Programming, Compilation, and Timing Analysis
FlexPRET can be programmed using low level program-ming languages, such as C, that are augmented with con-structs for expressing temporal semantics. FlexPRET can bean integral part of a precision timed infrastructure [24] thatincludes languages and compilers with an ubiquitous notionof time. Such a complete infrastructure with timing-awarecompilers is outside the scope of this paper; instead, we usea RISC-V port of the gcc compiler and implement the newtiming instructions using inline assembly. The following codefragment illustrates how a simple periodic control loop can beimplemented.1 int h,l; // High and low 32-bit values2 get_time(h,l); // Current time in nanoseconds3 while(1){ // Repeat control loop forever4 add_ms(h,l,10); // Add 10 milliseconds5 exception_on_expire(h,l,missed_deadline_handler);6 compute_task(); // Sense, compute, and actuate7 deactivate_exception(); // Deadline met8 delay_until(h,l); // Delay until next period9 }
Before the control loop is executed, the current time(in nanoseconds) is stored in variables h and l (line2). The time is incremented by 10ms (line 4) and atimer exception is enabled (line 5), followed by taskexecution (line 6). If a deadline is missed, an excep-tion handler missed_deadline_handler is called. Toforce a lower bound on the timing loop, the executionis delayed until the time period has elapsed (line 8);the cycles during the delay can be used by an activeSRTT. Functions get_time, exception_on_expire,deactivate_exception, and delay_until implementthe new RISC-V timing instructions using inline assembly.
To have full control over timing, real-time applications canbe implemented as bare-metal software, using only lightweightlibraries for hardware interaction. As a scheduling designmethodology, we propose that tasks with the highest criticalitylevel (e.g. A in DO-178C [4]) are assigned individual HRTTs,thus providing both temporal and spatial isolation. The next-highest criticality level tasks (e.g. B in DO-178C) also useHRTTs, but several tasks can share the same thread, thusreducing the hardware enforced isolation. Lower criticalitytasks (e.g. C, D, E in DO-178C) can then share SRTTs
5
• Currently using a GCC port for RISC-V when compiling programs with C inline assembly macros.
• Work-in-progress of a LLVM based WCET-aware compiler
1-2: Get time in nano seconds (64 bits)
5: Add en exception handler (immediate detection of missed deadline)
6: Compute
7-8: Deactivate and delay (force lower bound)
NOTE: The delay until (DU) instruction is used for cycle stealing
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
24
Software Managed Multicores WCET-Aware Dynamic Code Management onScratchpads for Software-Managed Multicores
Yooseong Kim⇤†, David Broman⇤‡, Jian Cai†, and Aviral Shrivastaval⇤†⇤ University of California, Berkeley, {yooseongkim, davbr, aviral}@berkeley.edu† Arizona State University, {yooseong.kim, jian.cai, aviral.shrivastava}@asu.edu
‡ Linkoping University, [email protected]
Abstract—Software Managed Multicore (SMM) architectureshave advantageous scalability, power efficiency, and predictabilitycharacteristics, making SMM particularly promising for real-timesystems. In SMM architectures, each core can only access itsscratchpad memory (SPM); any access to main memory is doneexplicitly by DMA instructions. As a consequence, dynamic codemanagement techniques are essential for loading program codefrom the main memory to SPM. Current state-of-the-art dynamiccode management techniques for SMM architectures are, how-ever, optimized for average-case execution time, not worst-caseexecution time (WCET), which is vital for hard real-time systems.In this paper, we present two novel WCET-aware dynamic SPMcode management techniques for SMM architectures. The firsttechnique is optimal and based on integer linear programming(ILP), whereas the second technique is a heuristic that is sub-optimal, but scalable. Experimental results with benchmarks fromMalardalen WCET suite and MiBench suite show that our ILPsolution can reduce the WCET estimates up to 80% comparedto previous techniques. Furthermore, our heuristic can, for mostbenchmarks, find the same optimal mappings within one secondon a 2GHz dual core machine.
I. INTRODUCTION
In real-time [1] and cyber-physical [2] systems, timing is acorrectness criterion, not just a performance factor. Executionof program tasks must be completed within certain timingconstraints, often referred to as deadlines. When real-timesystems are used in safety-critical applications, such as auto-mobiles or aircraft, missing a deadline can cause devastating,life-threatening consequences. Computing safe upper boundsof a task’s worst-case execution time (WCET) is essential toguarantee the absence of missed deadlines.
Real-time systems are becoming more and more complexwith increasing performance demands. Performance improve-ments in recent processor designs have mainly been drivenby the multicore paradigm because of power and temperaturelimitations with single-core designs [3]. Some recent real-time systems architectures are moving towards multicore [4]or multithreaded [5], [6] designs. However, coherent caches,which are popular in traditional multicore platforms, are not agood fit for real-time systems. Coherent caches make WCETanalysis difficult and result in pessimistic WCET estimates [7].
This work was supported in part by the iCyPhy Research Center (IndustrialCyber-Physical Systems, supported by IBM and United Technologies), theSwedish Research Council (#623-2011-955), and the Center for Hybrid andEmbedded Software Systems (CHESS) at UC Berkeley (supported by theNational Science Foundation, NSF awards #0720882 (CSR-EHS: PRET),#1035672 (CPS: Medium: Timing Centric Software), and #0931843 (Action-Webs), the Naval Research Laboratory (NRL #N0013-12-1-G015), and thefollowing companies: Bosch, National Instruments, and Toyota).
SPM
Core
SPM
Core
SPM
Core
Main Memory
SPM
Core
DM
A
DM
A
DM
A
DM
A
SPM
Core
DM
A
Main Memory
(a) (b)
Fig. 1. (a) SMM architecture vs. (b) traditional architecture with SPM. Corescannot access main memory directly in SMM architecture. All code and datamust be present in SPM at the time of execution.
SMM (Software Managed Multicore) architectures [8], [9]are a promising alternative for real-time systems. In SMM,each core has a scratchpad memory (SPM), so-called localmemory, as shown in Fig. 1(a). A core can only access itsSPM in an SMM architecture, as opposed to the traditionalarchitecture in Fig. 1(b) where a core can access both mainmemory and SPM with different latencies. Accesses to themain memory must be done explicitly through the use of directmemory access (DMA) instructions. The absence of coherencymakes such architectures scalable and simpler to design andverify compared to traditional multicore architectures [3]. Anexample of an SMM architecture is the Cell processor that isused in Playstation 3 [10].
If all code and data of a task can fit in the SPM, thetiming model of memory accesses is trivial: each load andstore always take a constant number of clock cycles. However,if all code or data does not fit in the SPM, it must bedynamically managed by executing DMA instructions duringruntime. Dynamic code management strongly affects timingand must consequently be an integral part of WCET analysis.
In traditional architectures that have SPMs, cores candirectly access main memory, though it takes a longer timeto access main memory than the SPM. In such architectures,the question is what to bring in the SPM to reduce the WCETof a task. This approach is not, however, feasible in SMMarchitectures because all relevant code must be present in theSPM at the time of execution. For this reason, existing WCET-aware dynamic code management techniques for SPMs [11],[12]—which select part of the code to be loaded in the SPMand keep the rest in the main memory—are not applicable inSMM architecture.
There exists previous work on developing dynamic code
This is the author prepared accepted version. © 2014 IEEE. The published version is: Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
WCET-Aware Dynamic Code Management onScratchpads for Software-Managed Multicores
Yooseong Kim⇤†, David Broman⇤‡, Jian Cai†, and Aviral Shrivastaval⇤†⇤ University of California, Berkeley, {yooseongkim, davbr, aviral}@berkeley.edu† Arizona State University, {yooseong.kim, jian.cai, aviral.shrivastava}@asu.edu
‡ Linkoping University, [email protected]
Abstract—Software Managed Multicore (SMM) architectureshave advantageous scalability, power efficiency, and predictabilitycharacteristics, making SMM particularly promising for real-timesystems. In SMM architectures, each core can only access itsscratchpad memory (SPM); any access to main memory is doneexplicitly by DMA instructions. As a consequence, dynamic codemanagement techniques are essential for loading program codefrom the main memory to SPM. Current state-of-the-art dynamiccode management techniques for SMM architectures are, how-ever, optimized for average-case execution time, not worst-caseexecution time (WCET), which is vital for hard real-time systems.In this paper, we present two novel WCET-aware dynamic SPMcode management techniques for SMM architectures. The firsttechnique is optimal and based on integer linear programming(ILP), whereas the second technique is a heuristic that is sub-optimal, but scalable. Experimental results with benchmarks fromMalardalen WCET suite and MiBench suite show that our ILPsolution can reduce the WCET estimates up to 80% comparedto previous techniques. Furthermore, our heuristic can, for mostbenchmarks, find the same optimal mappings within one secondon a 2GHz dual core machine.
I. INTRODUCTION
In real-time [1] and cyber-physical [2] systems, timing is acorrectness criterion, not just a performance factor. Executionof program tasks must be completed within certain timingconstraints, often referred to as deadlines. When real-timesystems are used in safety-critical applications, such as auto-mobiles or aircraft, missing a deadline can cause devastating,life-threatening consequences. Computing safe upper boundsof a task’s worst-case execution time (WCET) is essential toguarantee the absence of missed deadlines.
Real-time systems are becoming more and more complexwith increasing performance demands. Performance improve-ments in recent processor designs have mainly been drivenby the multicore paradigm because of power and temperaturelimitations with single-core designs [3]. Some recent real-time systems architectures are moving towards multicore [4]or multithreaded [5], [6] designs. However, coherent caches,which are popular in traditional multicore platforms, are not agood fit for real-time systems. Coherent caches make WCETanalysis difficult and result in pessimistic WCET estimates [7].
This work was supported in part by the iCyPhy Research Center (IndustrialCyber-Physical Systems, supported by IBM and United Technologies), theSwedish Research Council (#623-2011-955), and the Center for Hybrid andEmbedded Software Systems (CHESS) at UC Berkeley (supported by theNational Science Foundation, NSF awards #0720882 (CSR-EHS: PRET),#1035672 (CPS: Medium: Timing Centric Software), and #0931843 (Action-Webs), the Naval Research Laboratory (NRL #N0013-12-1-G015), and thefollowing companies: Bosch, National Instruments, and Toyota).
SPM
Core
SPM
Core
SPM
Core
Main Memory
SPM
Core
DM
A
DM
A
DM
A
DM
A
SPM
Core
DM
A
Main Memory
(a) (b)
Fig. 1. (a) SMM architecture vs. (b) traditional architecture with SPM. Corescannot access main memory directly in SMM architecture. All code and datamust be present in SPM at the time of execution.
SMM (Software Managed Multicore) architectures [8], [9]are a promising alternative for real-time systems. In SMM,each core has a scratchpad memory (SPM), so-called localmemory, as shown in Fig. 1(a). A core can only access itsSPM in an SMM architecture, as opposed to the traditionalarchitecture in Fig. 1(b) where a core can access both mainmemory and SPM with different latencies. Accesses to themain memory must be done explicitly through the use of directmemory access (DMA) instructions. The absence of coherencymakes such architectures scalable and simpler to design andverify compared to traditional multicore architectures [3]. Anexample of an SMM architecture is the Cell processor that isused in Playstation 3 [10].
If all code and data of a task can fit in the SPM, thetiming model of memory accesses is trivial: each load andstore always take a constant number of clock cycles. However,if all code or data does not fit in the SPM, it must bedynamically managed by executing DMA instructions duringruntime. Dynamic code management strongly affects timingand must consequently be an integral part of WCET analysis.
In traditional architectures that have SPMs, cores candirectly access main memory, though it takes a longer timeto access main memory than the SPM. In such architectures,the question is what to bring in the SPM to reduce the WCETof a task. This approach is not, however, feasible in SMMarchitectures because all relevant code must be present in theSPM at the time of execution. For this reason, existing WCET-aware dynamic code management techniques for SPMs [11],[12]—which select part of the code to be loaded in the SPMand keep the rest in the main memory—are not applicable inSMM architecture.
There exists previous work on developing dynamic code
This is the author prepared accepted version. © 2014 IEEE. The published version is: Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
In FlexPRET, HRTT can only access Scratchpad memory (SPM) directly.
Problem: How can we dynamically load code from the main memory to SPM such that WCET is minimized?
Traditional use of SPM. Static allocation (partioning) and direct access to main memory.)
Software Managed Multicore (SMM) Only access to SPM. Need DMA.
Examples: • Cell processor • FlexPRET
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
25
WCET-Aware Scratchpad Allocation: main idea
SPM
R1
R2
R3
RN
Main Memory
N number of regions in SPM.
M number of functions.
F1
Function-to-region mapping
F2
…
F3
F4
F5
F6
FM
Task1: Given a function-to-region mapping, compute WCET Task2: Find an optimal mapping the minimizes WCET Contribution: • Formalized an optimal solution using ILP • Developed a scalable, but sub-optimal heuristic
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
26
More info on this topic
Michael Zimmer, David Broman, Chris Shaver, and Edward A. Lee. FlexPRET: A Processor Platform for Mixed-Criticality Systems. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.
Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.
WCET-Aware Dynamic Code Management onScratchpads for Software-Managed Multicores
Yooseong Kim⇤†, David Broman⇤‡, Jian Cai†, and Aviral Shrivastaval⇤†⇤ University of California, Berkeley, {yooseongkim, davbr, aviral}@berkeley.edu† Arizona State University, {yooseong.kim, jian.cai, aviral.shrivastava}@asu.edu
‡ Linkoping University, [email protected]
Abstract—Software Managed Multicore (SMM) architectureshave advantageous scalability, power efficiency, and predictabilitycharacteristics, making SMM particularly promising for real-timesystems. In SMM architectures, each core can only access itsscratchpad memory (SPM); any access to main memory is doneexplicitly by DMA instructions. As a consequence, dynamic codemanagement techniques are essential for loading program codefrom the main memory to SPM. Current state-of-the-art dynamiccode management techniques for SMM architectures are, how-ever, optimized for average-case execution time, not worst-caseexecution time (WCET), which is vital for hard real-time systems.In this paper, we present two novel WCET-aware dynamic SPMcode management techniques for SMM architectures. The firsttechnique is optimal and based on integer linear programming(ILP), whereas the second technique is a heuristic that is sub-optimal, but scalable. Experimental results with benchmarks fromMalardalen WCET suite and MiBench suite show that our ILPsolution can reduce the WCET estimates up to 80% comparedto previous techniques. Furthermore, our heuristic can, for mostbenchmarks, find the same optimal mappings within one secondon a 2GHz dual core machine.
I. INTRODUCTION
In real-time [1] and cyber-physical [2] systems, timing is acorrectness criterion, not just a performance factor. Executionof program tasks must be completed within certain timingconstraints, often referred to as deadlines. When real-timesystems are used in safety-critical applications, such as auto-mobiles or aircraft, missing a deadline can cause devastating,life-threatening consequences. Computing safe upper boundsof a task’s worst-case execution time (WCET) is essential toguarantee the absence of missed deadlines.
Real-time systems are becoming more and more complexwith increasing performance demands. Performance improve-ments in recent processor designs have mainly been drivenby the multicore paradigm because of power and temperaturelimitations with single-core designs [3]. Some recent real-time systems architectures are moving towards multicore [4]or multithreaded [5], [6] designs. However, coherent caches,which are popular in traditional multicore platforms, are not agood fit for real-time systems. Coherent caches make WCETanalysis difficult and result in pessimistic WCET estimates [7].
This work was supported in part by the iCyPhy Research Center (IndustrialCyber-Physical Systems, supported by IBM and United Technologies), theSwedish Research Council (#623-2011-955), and the Center for Hybrid andEmbedded Software Systems (CHESS) at UC Berkeley (supported by theNational Science Foundation, NSF awards #0720882 (CSR-EHS: PRET),#1035672 (CPS: Medium: Timing Centric Software), and #0931843 (Action-Webs), the Naval Research Laboratory (NRL #N0013-12-1-G015), and thefollowing companies: Bosch, National Instruments, and Toyota).
SPM
Core
SPM
Core
SPM
Core
Main Memory
SPM
Core
DM
A
DM
A
DM
A
DM
A
SPM
Core
DM
A
Main Memory
(a) (b)
Fig. 1. (a) SMM architecture vs. (b) traditional architecture with SPM. Corescannot access main memory directly in SMM architecture. All code and datamust be present in SPM at the time of execution.
SMM (Software Managed Multicore) architectures [8], [9]are a promising alternative for real-time systems. In SMM,each core has a scratchpad memory (SPM), so-called localmemory, as shown in Fig. 1(a). A core can only access itsSPM in an SMM architecture, as opposed to the traditionalarchitecture in Fig. 1(b) where a core can access both mainmemory and SPM with different latencies. Accesses to themain memory must be done explicitly through the use of directmemory access (DMA) instructions. The absence of coherencymakes such architectures scalable and simpler to design andverify compared to traditional multicore architectures [3]. Anexample of an SMM architecture is the Cell processor that isused in Playstation 3 [10].
If all code and data of a task can fit in the SPM, thetiming model of memory accesses is trivial: each load andstore always take a constant number of clock cycles. However,if all code or data does not fit in the SPM, it must bedynamically managed by executing DMA instructions duringruntime. Dynamic code management strongly affects timingand must consequently be an integral part of WCET analysis.
In traditional architectures that have SPMs, cores candirectly access main memory, though it takes a longer timeto access main memory than the SPM. In such architectures,the question is what to bring in the SPM to reduce the WCETof a task. This approach is not, however, feasible in SMMarchitectures because all relevant code must be present in theSPM at the time of execution. For this reason, existing WCET-aware dynamic code management techniques for SPMs [11],[12]—which select part of the code to be loaded in the SPMand keep the rest in the main memory—are not applicable inSMM architecture.
There exists previous work on developing dynamic code
This is the author prepared accepted version. © 2014 IEEE. The published version is: Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores. Proceedings of the 20th IEEE Real-Time and Embedded Technology and Application Symposium (RTAS), Berlin, Germany, April 15-17, 2014.
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
27
Conclusions
Part I Time-Aware Systems Design: a Vision
Part II Predictable Processors for Mixed-Criticality Systems
David Broman [email protected]
28
Conclusions
• Time-aware systems are systems where time or timing affects the correctness of the system behavior.
Thanks for listening!
Some key take away points:
• Cyber-physical systems (CPS) are Time-Aware, but systems without physical plants can also be time-aware (e.g., distributed time-stamped systems)
• Overall objective: Develop a new methodology, algorithms, and a tool chain that are time-aware and use a correct-by-construction approach.
• Mixed-criticality systems can be designed using predictable processors.