Programming Safety-Critical Embedded Systems
description
Transcript of Programming Safety-Critical Embedded Systems
![Page 1: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/1.jpg)
1
Programming Safety-Critical Embedded Systems
Work mainly bySidharta Andalam and Eugene Yip
Main supervisor: Advisor:Dr. Partha Roop Dr. Alain Girault(UoA) (INRIA)
![Page 2: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/2.jpg)
2
Outline
• Introduction• Synchronous Languages• PRET-C• ForeC
![Page 3: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/3.jpg)
3
Outline
• Introduction• Synchronous Languages• PRET-C• ForeC
![Page 4: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/4.jpg)
4
Introduction
• Safety-critical systems:
– Perform specific real-time tasks.– Comply with strict safety standards
[IEC 61508, DO 178]– Time-predictability useful in real-time designs.
[Paolieri et al 2011] Towards Functional-Safe Timing-Dependable Real-Time Architectures.
Embedded Systems
Safety-critical concerns
Timing/Functionality requirements
Timing analysis
![Page 5: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/5.jpg)
5
Introduction
Domain of application
Processor
Embedded Desktop
Single-core
Multicore
Manycore
C
RTOS(VxWorks)
UPCX10
Intel Cilk Plus
SharCGrace
SHIMSigma C
ForkLight
Esterel SCADESimulink Protothreads
OpenMPOpenCLPthreads
ParC
PRET-C
ForeC
![Page 6: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/6.jpg)
6
Outline
• Introduction• Synchronous Languages• PRET-C• ForeC
![Page 7: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/7.jpg)
7
Synchronous Languages
• Deterministic concurrency (formal semantics).– Concurrent control behaviours.– Typically compiled away.
• Execution model similar to digital circuits.– Threads execute in lock-step to a global clock.– Threads communicate via instantaneous signals.
[Benveniste et al 2003] The Synchronous Languages 12 Years Later.
Global ticks
Inputs
Outputs1 2 3 4
![Page 8: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/8.jpg)
8
Synchronous Languages
Physical time1s 2s 3s 4s
Time for a tick
Must validate:max(Reaction time) < min(Time for each tick)
Reaction time
Specified by the system’s timing requirements
[Benveniste et al 2003] The Synchronous Languages 12 Years Later.
![Page 9: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/9.jpg)
9
Synchronous Languages
• Esterel, Lustre, Signal• Synchronous extensions to C:
– PRET-C– Reactive Shared Variables– Synchronous C– Esterel C Language
[Roop et al 2009] Tight WCRT Analysis of Synchronous C Programs.[Boussinot 1993] Reactive Shared Variables Based Systems.[Hanxleden et al 2009] SyncCharts in C - A Proposal for Light-Weight, Deterministic Concurrency.[Lavagno et al 1999] ECL: A Specification Environment for System-Level Design.
Retain the essence of C and add deterministic concurrency and thread communication.
![Page 10: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/10.jpg)
10
Outline
• Introduction• Synchronous Languages• PRET-C• ForeC
![Page 11: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/11.jpg)
11
PRET-CStages
1. PRET-C: Simple synchronous extension to C (using macros).2. TCCFG: Intermediate format.3. TCCFG’: Updated after cache analysis.4. Model Checking: Binary search for the WCRT.
PRET-C
void main() { while(1) { abort PAR(sampler,display); when(reset); EOT; }}
TCCFG
Cache analysis Model Checker
WCRT
Final Output
![Page 12: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/12.jpg)
12
PRET-C
• Simple set of synchronous extensions to C:– Light-weight multi-threading.– Macro-based implementation.– Thread-safe shared memory accesses.– Amenable to timing analysis for ensuring time-
predictability.
![Page 13: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/13.jpg)
PRET-CStatement DescriptionReactiveInput I Declares I as a reactive input coming from the
environment.ReactiveOutput O Declares O as a reactive output emitted to the
environment.PAR(T1, ..., Tn) Synchronously executes threads T1 to Tn in parallel. Thread
Ti has higher execution priority over Ti+1.
EOT Marks the end of a tick.[weak] abort P when C Terminates P when C is true.
The semantics of PRET-C is presented using structural operational style,along with proofs for reactivity and determinism [IEEE TC 2013 March]
![Page 14: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/14.jpg)
PRET-CCode
...PAR(T1,T2)...
T1: A; EOT; C; EOT
T2: B; EOT; D; EOT
A
B
C
D
Time
T1
T2
Global Tick Global Tick
Local tick Local tick
![Page 15: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/15.jpg)
15
Outline
• Introduction• Synchronous Languages• PRET-C• ForeC
![Page 16: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/16.jpg)
16
Introduction
• Safety-critical systems:– Shift from single-core to multicore processors.– Cheaper, better power vs. execution performance.
Coren
Core0
System bus
Resource Resource
Shared
Shared Shared
[Blake et al 2009] A Survey of Multicore Processors.[Cullmann et al 2010] Predictability Considerations in the Design of Multi-Core Embedded Systems.
![Page 17: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/17.jpg)
17
Introduction
• Parallel programming:– From super computers to mainstream computers.– Frameworks designed for systems without
resource constraints or safety-concerns.• Optimised for average-case performance (FLOPS), not
time-predictability.– Threaded programming model.
• Pthreads, OpenMP, Intel Cilk Plus, ParC, ...• Non-deterministic thread interleaving makes
understanding and debugging hard.
[Lee 2006] The Problem with Threads.
![Page 18: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/18.jpg)
18
Introduction
• Parallel programming:– Programmer responsible for shared resources.– Concurrency errors:
• Deadlock, Race condition, Atomic violation, Order violation.
[McDowell et al 1989] Debugging Concurrent Programs.[Lu et al 2008] Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics.
![Page 19: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/19.jpg)
19
Introduction
• Synchronous languages– Esterel, Lustre, Signal– Synchronous extensions to C:
• PRET-C• Reactive Shared Variables• Synchronous C• Esterel C Language
[Roop et al 2009] Tight WCRT Analysis of Synchronous C Programs.[Boussinot 1993] Reactive Shared Variables Based Systems.[Hanxleden et al 2009] SyncCharts in C - A Proposal for Light-Weight, Deterministic Concurrency.[Lavagno et al 1999] ECL: A Specification Environment for System-Level Design.
Sequential execution semantics. Unsuitable for parallel execution.
![Page 20: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/20.jpg)
20
Introduction
• Synchronous languages– Esterel, Lustre, Signal– Synchronous extensions to C:
• PRET-C• Reactive Shared Variables• Synchronous C• Esterel C Language
[Roop et al 2009] Tight WCRT Analysis of Synchronous C Programs.[Boussinot 1993] Reactive Shared Variables Based Systems.[Hanxleden et al 2009] SyncCharts in C - A Proposal for Light-Weight, Deterministic Concurrency.[Lavagno et al 1999] ECL: A Specification Environment for System-Level Design.
Compilation produces sequential programs. Unsuitable for parallel execution.
![Page 21: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/21.jpg)
21
ForeC
“Foresee” ForeC • C-based, multi-threaded, synchronous
language. Inspired by PRET-C and Esterel.• Deterministic parallel execution on embedded
multicores.• Fork/join parallelism and shared memory
thread communication.• Program behaviour independent of chosen
thread scheduling.
![Page 22: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/22.jpg)
22
ForeC
Thread distribution
ForeCsource code CCFG
Static scheduling
Compiled program
CCFG with assembly
Architecture model
Reachability Computed WCRT
Compilation Timing AnalysisProgramming
![Page 23: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/23.jpg)
23
ForeC
• Additional constructs to C:– pause: Synchronisation barrier. Pauses the
thread’s execution until all threads have paused.– par( st1, ..., stn ): Forks each statement to
execute as a parallel thread. Each statement is implicitly scoped.
– [weak] abort st when [immediate] exp: Preempts the statement st when exp evaluates to a non-zero value. exp is evaluated in each global tick before st is executed.
![Page 24: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/24.jpg)
24
ForeC
• Additional variable type-qualifiers to C:– input and output: Declares a variable whose
value is updated or emitted to the environment at each global tick.
![Page 25: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/25.jpg)
25
ForeC
• Additional variable type-qualifiers to C:– shared: Declares a shared variable that can be
accessed by multiple threads.
![Page 26: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/26.jpg)
26
ForeC
• Additional variable type-qualifiers to C:– shared: Declares a shared variable that can be
accessed by multiple threads. 1. Threads make local copies of shared variables that they
may use at the start of their local ticks.2. Threads only modify their local copies during execution.3. If a par statement terminates:
• Modified copies from the child threads are combined (using a commutative & associative function) and assigned to the parent.
3. If the global tick ends:• The modified copies are combined and assigned to the actual
shared variables.
a
b
![Page 27: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/27.jpg)
27
Execution Exampleshared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Synchronisation
Fork-join
Shared variable
Commutative and associative combine function
![Page 28: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/28.jpg)
28
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global
sum = 1
![Page 29: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/29.jpg)
29
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global
sum = 1Global tick start
![Page 30: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/30.jpg)
30
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global Local
f1 f2
sum = 1
sum1 = 1 sum2 = 1
Global tick start
![Page 31: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/31.jpg)
31
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global Local
f1 f2
sum = 1
sum1 = 1sum1 = 2
sum2 = 1sum2 = 3
Global tick start
![Page 32: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/32.jpg)
32
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global Local
f1 f2
sum = 1
sum1 = 1sum1 = 2
sum2 = 1sum2 = 3
Global tick start
Global tick end
![Page 33: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/33.jpg)
33
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global Local
f1 f2
sum = 1
sum1 = 1sum1 = 2
sum2 = 1sum2 = 3
sum = 5
Global tick start
Global tick end
![Page 34: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/34.jpg)
34
Execution Example 1shared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
Global Local
f1 f2
sum = 1
sum1 = 1sum1 = 2
sum2 = 1sum2 = 3
sum = 5
sum1 = 5. . .
sum2 = 5. . .
Global tick start
Global tick end
Global tick start
![Page 35: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/35.jpg)
Execution Example 2
Sum a set of data.shared int v=0 combine with plus;int[4] data={1,2,3,4};
void main(void) { f(data);}void f(int *data) { par(add(0,data), add(2,data));}void add(int x, int *data) { v=data[x] + data[x+1];}
![Page 36: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/36.jpg)
Execution Example 2shared int v=0 combine with plus;int[4] data={1,2,3,4}; int[4] data1={5,6,7,8};
void main(void) { f(data);}void f(int *data) { par(add(0,data), add(2,data));}void add(int x, int *data) { v=data[x] + data[x+1];}
Sum sets of data in parallel.
![Page 37: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/37.jpg)
Execution Example 2shared int v=0 combine with plus;int[4] data={1,2,3,4}; int[4] data1={5,6,7,8};
void main(void) { par(f(data), f(data1));}void f(int *data) { par(add(0,data), add(2,data));}void add(int x, int *data) { v=data[x] + data[x+1];}
Sum sets of data together in parallel.
![Page 38: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/38.jpg)
Execution Example 2
main
f f
add add add add
v
![Page 39: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/39.jpg)
Execution Example 2
main
f f
add add add add
v v
![Page 40: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/40.jpg)
Execution Example 2int[4] data={1,2,3,4}; int[4] data1={5,6,7,8};
void main(void) { par(f(data), f(data1));}void f(int *data) { shared int v=0 combine with plus; par(add(0,data,&v), add(2,data,&v));}void add(int x, int *data, shared int *const v combine with +) { *v=data[x] + data[x+1];}
![Page 41: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/41.jpg)
41
Execution Example
Shared variables:– Threads modify local copies of shared variables.
• Isolation of thread execution allows threads to truly execute in parallel.
• Thread interleaving does no affect the program’s behaviour.
– Prevents most concurrency errors.• Deadlock, Race condition: No locks.• Atomic and order violation: Local copies.
– Copies for a shared variable can be split into groups and combined in parallel.
![Page 42: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/42.jpg)
42
Execution Example
Shared variables:– Programmer has to define a suitable combine
function for each shared variable.• Must ensure the combine function is indeed
commutative & associative.– Notion of “combine functions” is not entirely new:
• Intel Cilk Plus, OpenMP, MPI, UPC, X10• Esterel, Reactive Shared Variables
[Intel Cilk Plus] http://software.intel.com/en-us/intel-cilk-plus [OpenMP] http://openmp.org[MPI] http://www.mcs.anl.gov/research/projects/mpi/ [Unified Parallel C] http://upc.lbl.gov/ [X10] http://x10-lang.org/[Berry et al 1992] The Esterel Synchronous Programming Language: Design, Semantics and Implementation.[Boussinot 1993] Reactive Shared Variables Based Systems.
![Page 43: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/43.jpg)
43
[Intel Cilk Plus] http://software.intel.com/en-us/intel-cilk-plus [OpenMP] http://openmp.org[MPI] http://www.mcs.anl.gov/research/projects/mpi/ [Unified Parallel C] http://upc.lbl.gov/ [X10] http://x10-lang.org/[Berry et al 1992] The Esterel Synchronous Programming Language: Design, Semantics and Implementation.[Boussinot 1993] Reactive Shared Variables Based Systems.
Execution Example
Shared variables: – Programmer has to define a suitable combine
function for each shared variable.• Must ensure the combine function is indeed
commutative & associative.– Notion of “combine functions” is not entirely new:
• Intel Cilk Plus, OpenMP, MPI, UPC, X10• Esterel, Reactive Shared Variables
cilk::reducer_opcilk::holder_op
shared varreduction(operator: var)
MPI_ReduceMPI_Gather
shared varcollectives
Aggregates
![Page 44: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/44.jpg)
44
[Intel Cilk Plus] http://software.intel.com/en-us/intel-cilk-plus [OpenMP] http://openmp.org[MPI] http://www.mcs.anl.gov/research/projects/mpi/ [Unified Parallel C] http://upc.lbl.gov/ [X10] http://x10-lang.org/[Berry et al 1992] The Esterel Synchronous Programming Language: Design, Semantics and Implementation.[Boussinot 1993] Reactive Shared Variables Based Systems.
Execution Example
Shared variables: – Programmer has to define a suitable combine
function for each shared variable.• Must ensure the combine function is indeed
commutative & associative.– Notion of “combine functions” is not entirely new:
• Intel Cilk Plus, OpenMP, MPI, UPC, X10• Esterel, Reactive Shared Variables
Valued signalsCombine operator
shared varCombine operator
![Page 45: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/45.jpg)
45
Shared Variable Design Patterns
• Point-to-point• Broadcast• Software pipelining• Divide and conquer
– Scatter/Gather– Map/Reduce
![Page 46: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/46.jpg)
46
Point-to-pointshared int sum = 0 combine with plus;
void main(void) { par( f(), g() );}
void f(void) { while (1) { sum = comp1(); pause; }}
void g(void) { while (1) { comp2(sum); pause; }}
New value of sum is received in the next global tick.
Combine operation is not required.
![Page 47: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/47.jpg)
47
Broadcastshared int sum = 0 combine with plus;
void main(void) { par( f(), g(), g() );}
void f(void) { while (1) { sum = comp1(); pause; }}
void g(void) { while (1) { comp2(sum); pause; }}
Multiple receivers.
Combine operation is not required.
New value of sum is received in the next global tick.
![Page 48: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/48.jpg)
48
Software Pipeliningshared int s1 = 0, s2 = 0 combine with plus;
void main(void) { par( stage1(), stage2(), stage3() );}
void stage1(void) { while (1) { s1 = comp1(); pause; }}void stage2(void) { pause; while (1) { s2 = comp2(s1); pause; }}
Outputs from each stage are buffered.
Use the delayed behaviour of shared variables to buffer each stage.
void stage3(void) { pause; pause; while (1) { comp3(s2); pause; }}
![Page 49: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/49.jpg)
49
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Count the number of edges in an image.
![Page 50: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/50.jpg)
50
Scheduling
• Light-Weight Static Scheduling:– Take advantage of multicore performance while
delivering time-predictability.– Generate code to execute directly on hardware
(bare metal/no OS).– Thread allocation and scheduling order on each
core decided at compile time by the programmer.• Develop a WCRT-aware scheduling heuristic.• Thread isolation allows for scheduling flexibility.
– Cooperative (non-preemptive) scheduling.
![Page 51: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/51.jpg)
51
Scheduling
• Cores synchronise to fork/join threads and end each global tick.
• One core to perform housekeeping tasks at the end of the global tick:– Combining shared variables.– Emitting outputs.– Sampling inputs and trigger the next global tick.
![Page 52: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/52.jpg)
52
Results
Multicore simulator (Xilinx MicroBlaze):– Based on http://www.jwhitham.org/c/smmu.html
and extended to be cycle-accurate and support multiple cores and a TDMA bus.
Core0
TDMA Shared Bus
Global memory
Datamemory
Instruction memory Core
nDatamemory
Instruction memory16KB
16KB
32KB5 cycles
1 cycle
5 cycles/core(Bus schedule round = 5 * no. cores)
![Page 53: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/53.jpg)
53
WCRT Execution Results
Able to achieve speed ups for all programs. The benefit of multicore execution diminishes with increasing number of cores due to overheads (Bus, memory accesses, scheduling routines).
1 2 3 40
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
FmRadio
Cores
1 2 3 4 5 6 70
1,000
2,000
3,000
4,000
5,000
6,000
7,000
Fly by Wire
Cores
1 2 3 4 5 6 7 80
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
Life
Cores1 2 3 4 5 6 7 8
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Matrix
Cores
1 2 3 4 5 6 7 8 9 100
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
802.11a
Cores
![Page 54: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/54.jpg)
54
Programming PTARM using ForeCshared int sum = 1 combine with plus;
int plus(int copy1, int copy2) { return (copy1 + copy2);}
void main(void) { par(f(1), f(2));}
void f(int i) { sum = sum + i; pause; ...}
![Page 55: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/55.jpg)
Execution of ForeCint main(void) {
SET_THREAD_LOCATION(0, _pt_hwt0);SET_THREAD_LOCATION(1, _pt_hwt1);SET_THREAD_LOCATION(2, _pt_idle);SET_THREAD_LOCATION(3, _pt_idle);
_pt_hwt0:initialize code;goto main;
_pt_hwt1:wait for par;goto f(2);
_pt_idle: goto _pt_idle; continues ...
![Page 56: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/56.jpg)
Execution of ForeCint main(void) {
SET_THREAD_LOCATION(0, _pt_hwt0);SET_THREAD_LOCATION(1, _pt_hwt1);SET_THREAD_LOCATION(2, _pt_idle);SET_THREAD_LOCATION(3, _pt_idle);
_pt_hwt0:initialize code;goto main;
_pt_hwt1:wait for par;goto f_2;
_pt_idle: goto _pt_idle;continues ...
![Page 57: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/57.jpg)
Execution of ForeCmain:
fork f_1 and f_2; _par_resume:
return 0;
f_1: sum = 1;synchronization code;thread termination code;
f_2:sum = 2;synchronization code;thread termination code;
}
![Page 58: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/58.jpg)
58
Non-Realtime Threads in ForeC
• A non-realtime thread (NRT): – no strict timing requirements.– possibly unbounded execution time.– asynchronous computation.– E.g., file archiving, compression, data analysis.
![Page 59: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/59.jpg)
59
Non-Realtime Threads in ForeC
Splitting the execution time of NRTs into periods.
• Guarantee f() to execute for at least min_t and at most max_t in each global tick. – When the period elapses, the execution pauses.– Execution resumes in the next global tick.
// Non-realtime thread.void nrt(void) { do { f(); } until (min_t, max_t);}
![Page 60: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/60.jpg)
60
Non-Realtime Threads in ForeC// Non-realtime thread.void nrt(void) { // Set deadline equal to // the current time + min_t. setDeadline(min_t);
// Enable timing exception // and register a handler. enableException(max_t, handler);
// Execute the body. f();
// The body is finished executing. // Disable the timing exception. disableException(); goto end;
// Timing exception handler. handler: { // Save the execution context. pause; setDeadline(min_t); // Restore the execution context. } end:;}
// Non-realtime thread.void nrt(void) { do { f(); } until (min_t, max_t);}
![Page 61: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/61.jpg)
PTARM modifications
• Boot-up– Modified to allow loading of multiple hardware
threads.• Exceptions
– Added the exception handler in boot loader• Context Saving
– Modified VHDL to save PC to LR– Saves registers onto stack in exception routine
![Page 62: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/62.jpg)
Tick Precise Allocation Device
Matthew KuoMain supervisor: Partha Roop
![Page 63: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/63.jpg)
Introduction
Cache
Performance Timing Precision
• Traditionally Caches– to bridge the memory gap– Small fast piece of memory
• Temporal locality• Spatial locality
– Hardware Controlled• Hard real time systems
– Compute the WCRT• Needs to model the architecture• Caches models
– Complex – Not tight
![Page 64: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/64.jpg)
Introduction
Scratchpad
Performance Timing Precision
• Small piece of memory• Software controlled• Requires an allocation algorithm
– ILP– Greedy
• Hard real time systems– Easy to compute tight the WCRT– Reduces the average case performance
• May also be worse than cache for worst case performance• Not as efficient as caches
![Page 65: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/65.jpg)
Introduction
Cache Scratchpad
Performance Timing Precision
![Page 66: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/66.jpg)
Introduction
Cache Scratchpad
Performance Timing Precision
TickPAD
![Page 67: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/67.jpg)
Tick Precise Allocation Device
• TickPAD - Tick Precise Allocation Device• Memory controller
– Hybrid between caches and scratchpads• software controlled memory like a scratchpad• Hardware controlled features
• Hard real-time synchronous programs
![Page 68: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/68.jpg)
TickPAD System Specifications
0x00 0x04 0x08 0x0C0x00
4 Instructions
1 Cache Line
Takes 1 burst transfer from main memory
buffer
4 x 32 bits
Buffers are 1 cache line in size
![Page 69: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/69.jpg)
TickPAD – scratchpad memory for synchronous programs
![Page 70: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/70.jpg)
TickPAD – scratchpad memory for synchronous programs
To accelerate linear code
![Page 71: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/71.jpg)
TickPAD – scratchpad memory for synchronous programs
• For predictable temporal locality – Statically allocated
• Dynamically loaded
![Page 72: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/72.jpg)
TickPAD – scratchpad memory for synchronous programs
• Stores the resumptions address of active threads
• Stores the instructions at the resumption of the next active thread– To reduce context switching overhead at
state/tick boundaries
![Page 73: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/73.jpg)
TickPAD – scratchpad memory for synchronous programs
Stores a set of commands to be executed by the TickPAD controller. Command – the type of operation Address – the PC value at which the
command is activated Operand- stores data need for the
command
A buffer to store operands fetched from main memory Command requiring 2+ operands
![Page 74: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/74.jpg)
Spatial Memory Pipeline
• Exploit spatial locality– Predictability prefetch the next line of instructions
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 75: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/75.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 76: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/76.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
310 314 318 31C
![Page 77: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/77.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 78: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/78.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 79: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/79.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 80: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/80.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 81: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/81.jpg)
Spatial Memory Pipeline
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
![Page 82: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/82.jpg)
Spatial Memory Pipeline
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
![Page 83: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/83.jpg)
Spatial Memory Pipeline
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
![Page 84: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/84.jpg)
Spatial Memory Pipeline
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
![Page 85: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/85.jpg)
Spatial Memory Pipeline
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
![Page 86: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/86.jpg)
Spatial Memory Pipeline
ToggleBrach
Instruction Check
TAG
ADDR[TAG]ADDR[Block Offset]
Instruction[32]
Tick FIFO
Control Logic WriteEnTAG
Main Memory
Associative Loop Memory
Dem
ux
Dem
ux
Demux
Demux
Dem
ux
SMP Buffer 1
SMP Buffer 2
Spatial Memory Pipeline
Command Buffer
hasBranchclk
Address[32]
Execute Buffer
Fetch Buffer
Processor Execution 310 320
310 320
320 330 Disabled
Linear Code Branch
Stall
330
330 3B0
3B0
FetchingFetching
Fetching
Stall Stall
![Page 87: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/87.jpg)
Command Table
• A Look Up table to dynamically load– Tick Instruction Buffer– Tick Queue– Associative Loop Memory
• Statically Allocated• Command are executed when the PC matches
the address stored on the command
![Page 88: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/88.jpg)
TickPAD Design flow
ReachabilityAnalysis
PRET-CProgram
Graph Construction
TickPAD Allocation Analysis
TickPAD Timing Analysis
TCCFGTickPAD
Configuration File
Updated TCCFG
Worst Case Reaction Time
1 2
3
![Page 89: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/89.jpg)
TickPAD Design flow
ReachabilityAnalysis
PRET-CProgram
Graph Construction
TickPAD Allocation Analysis
TickPAD Timing Analysis
TCCFGTickPAD
Configuration File
Updated TCCFG
Worst Case Reaction Time
1 2
3
![Page 90: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/90.jpg)
TickPAD Design flow
ReachabilityAnalysis
PRET-CProgram
Graph Construction
TickPAD Allocation Analysis
TickPAD Timing Analysis
TCCFGTickPAD
Configuration File
Updated TCCFG
Worst Case Reaction Time
1 2
3
![Page 91: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/91.jpg)
Command Table Allocation
Node Command Address
FORK Store Tick Address Queue x N Address of FORK
EOT Store Tick Address QueueLoad Tick Instruction Buffer
Address of EOT
KILL Load Tick Instruction Buffer Address of Kill
Loops Discard Loop Associative MemoryStore Loop Associative Memory
Address at start of Loop
![Page 92: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/92.jpg)
Node Command Address
FORK Store Tick Address Queue x N Address of FORK
EOT Store Tick Address QueueLoad Tick Instruction Buffer
Address of EOT
KILL Load Tick Instruction Buffer Address of Kill
Loops Discard Loop Associative MemoryStore Loop Associative Memory
Address at start of Loop
Command Table Allocation
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
![Page 93: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/93.jpg)
Node Command Address
FORK Store Tick Address Queue x N Address of FORK
EOT Store Tick Address QueueLoad Tick Instruction Buffer
Address of EOT
KILL Load Tick Instruction Buffer Address of Kill
Loops Discard Loop Associative MemoryStore Loop Associative Memory
Address at start of Loop
Command Table Allocation
![Page 94: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/94.jpg)
Command Table Allocation
Node Command Address
FORK Store Tick Address Queue x N Address of FORK
EOT Store Tick Address QueueLoad Tick Instruction Buffer
Address of EOT
KILL Load Tick Instruction Buffer Address of Kill
Loops Discard Loop Associative MemoryStore Loop Associative Memory
Address at start of Loop
![Page 95: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/95.jpg)
Command Table Allocation
Node Command Address
FORK Store Tick Address Queue x N Address of FORK
EOT Store Tick Address QueueLoad Tick Instruction Buffer
Address of EOT
KILL Load Tick Instruction Buffer Address of Kill
Loops Discard Loop Associative MemoryStore Loop Associative Memory
Address at start of Loop
![Page 96: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/96.jpg)
Tick Address Queue Tick Instruction Buffer
• Reduce cost of context switching• Make context switching points appear as
linear code– Paired using Spatial Memory Pipeline
Tick Queue
Tick Buffer
![Page 97: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/97.jpg)
Tick Address Queue Tick Instruction Buffer
• Reduce cost of context switching• Make context switching points appear as
linear code– Paired using Spatial Memory Pipeline
Stores an ordered list of the resumptions addresses of each thread
Tick Queue
Tick Buffer
![Page 98: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/98.jpg)
Tick Address Queue Tick Instruction Buffer
• Reduce cost of context switching• Make context switching points appear as
linear code– Paired using Spatial Memory Pipeline
Stores the instructions of the next active thread
Tick Queue
Tick Buffer
![Page 99: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/99.jpg)
Tick Address Queue Tick Instruction Buffer
• Reduce cost of context switching• Make context switching points appear as
linear code– Paired using Spatial Memory Pipeline
2B0
Tick Queue
Tick Buffer
Stores the instructions of the next active thread
2B0 2B4 2B8 2BC
![Page 100: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/100.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
PC: 2B0
Tick Address Queue Tick Instruction Buffer
Tick Queue
Tick Buffer
![Page 101: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/101.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*980*4F0*2FO
Tick Queue
Tick Buffer
PC: 2C0
Tick Address Queue Tick Instruction Buffer
![Page 102: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/102.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*980*4F0
2FO
Tick Queue
Tick Buffer
PC: 2C0
Tick Address Queue Tick Instruction Buffer
![Page 103: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/103.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*980*4F0
2FO
Tick Queue
Tick Buffer
PC: 2F0
Tick Address Queue Tick Instruction Buffer
![Page 104: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/104.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*980*4F0
2FO
Tick Queue
Tick Buffer
PC: 300
Tick Address Queue Tick Instruction Buffer
![Page 105: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/105.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*310*980*4F0
2FO
Tick Queue
Tick Buffer
PC: 310
Tick Address Queue Tick Instruction Buffer
![Page 106: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/106.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*310*980
4F0
Tick Queue
Tick Buffer
PC: 310
Tick Address Queue Tick Instruction Buffer
![Page 107: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/107.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*310*980
4F0
Tick Queue
Tick Buffer
PC: 310
Tick Address Queue Tick Instruction Buffer
![Page 108: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/108.jpg)
2F0300
310
2
3
4
5
22
23
28
2C0
2A02B0
4F0500510
520
9809909A0
Commands1.Discard and Store Associative Loop Memory 2.Fetch Tick Address Queue and Fill Tick Instruction Buffer3.Load Tick Address Queue
*310*980
4F0
Tick Queue
Tick Buffer
PC: 4F0
Tick Address Queue Tick Instruction Buffer
![Page 109: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/109.jpg)
Associative Loop Memory
• Statically Allocated– Greedy– ILP
• Fetches Loop Before Executing– Predictable – easy and tight to model– Exploits temporal locality
![Page 110: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/110.jpg)
Results
![Page 111: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/111.jpg)
Results
8.5% compared to locked scratchpad memory 12.3%compared to thread interleaved of scratchpad
![Page 112: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/112.jpg)
Results
![Page 113: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/113.jpg)
Results - Synthesis
![Page 114: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/114.jpg)
Conclusions
• C-based synchronous languages for writing deterministic, time-predictable software.– PRET-C: Single-cores– ForeC: Multicores
• Can achieve WCRT speedup while providing time-predictability.
• Very precise and fast timing analysis for PRET-C and ForeC programs using reachability.
![Page 115: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/115.jpg)
Conclusions
• A new time precise memory architecture - TickPAD
• Showed the use TickPAD is comparable to using the cache and scratchpad memories
• Future direction– The use of TickPAD for data caches– Implement TickPAD on Precise Timed Architecture
![Page 116: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/116.jpg)
116
Questions?
![Page 117: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/117.jpg)
117
Outline
• Introduction• ForeC Language• Timing Analysis• Results• Conclusions
![Page 118: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/118.jpg)
118
Timing Analysis
Compute the program’s worst-case reaction time (WCRT).
Physical time1s 2s 3s 4s
Time for a tick
Must validate:max(Reaction time) < min(Time for each tick)
Reaction time
Specified by the system’s timing requirements
[Benveniste et al 2003] The Synchronous Languages 12 Years Later.
![Page 119: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/119.jpg)
119
Timing Analysis
Existing approaches for synchronous programs:• Integer Linear Programming (ILP)• “Coarse-grained” Reachability (Max-Plus)• Model Checking
One existing approach for analysing the WCRT of synchronous programs on multicores:• [Ju et al 2010] Timing Analysis of Esterel Programs on General-Purpose
Multiprocessors.• Uses ILP, no tightness result, all experiments performed 4-core processor.
![Page 120: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/120.jpg)
120
Timing Analysis
Existing approaches for synchronous programs.• Integer Linear Programming (ILP)
– Execution time of the program described as a set of integer equations.
– Solving ILP is NP-complete.
[Ju et al 2010] Timing Analysis of Esterel Programs on General-Purpose Multiprocessors.
![Page 121: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/121.jpg)
121
Timing Analysis
Existing approaches for synchronous programs.• “Coarse-grained” Reachability (Max-Plus)
– Compute the WCRT of each thread.– Using the thread WCRTs, the WCRT of the program
is computed.– Assumes there is a global tick where all threads
execute their worst-case.
[M. Boldt et al 2008] Worst Case Reaction Time Analysis of Concurrent Reactive Programs.
![Page 122: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/122.jpg)
122
Timing Analysis
Existing approaches for synchronous programs.• Model Checking
– Computes the execution time along all possible execution paths.
– State-space explosion problem.– Binary search: Check the WCRT is less than “x”.– Trades-off analysis time for precision.– Counter example: Execution trace for the WCRT.
[P. S. Roop et al 2009] Tight WCRT Analysis of Synchronous C Programs.
![Page 123: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/123.jpg)
123
Timing Analysis
Proposed “fine-grained” Reachability approach:• Only consider local ticks that can execute
together in the same global tick.• Timed execution trace for the WCRT.• To handle the state-space explosion:
– Reduce the program’s CCFG before analysis.
Program binary
(annotated)
Find all global ticks
(Reachability)WCRT
Reconstruct the program’s
CCFG
![Page 124: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/124.jpg)
124
Timing Analysis
Programs executed on the following multicore architecture:
Core0
TDMA Shared Bus
Global memory
Datamemory
Instruction memory Core
nDatamemory
Instruction memory
![Page 125: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/125.jpg)
125
Timing Analysis
Computing the execution time:1. Overlapping of thread execution time from
parallelism and inter-core synchronizations.2. Scheduling overheads.3. Variable delay in accessing the shared bus.
![Page 126: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/126.jpg)
126
Timing Analysis
1. Overlapping of thread execution time from parallelism and inter-core synchronisations.
• An integer counter to track each core’s execution time.• Synchronisation occurs when forking/joining, and ending
the global tick.• Advance the execution time of participating cores.
Core 1: Core 2:main f2
f1
Core 1 Core 2main
f2f1
f1 f2
main
![Page 127: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/127.jpg)
127
Timing Analysis
2. Scheduling overheads.– Synchronisation: Fork/join and global tick.
• Via global memory.– Thread context-switching.
• Copying of shared variables at the start the thread’s local tick via global memory.
SynchronisationThread context-switch
Core 1 Core 2main
f2f1
Global tick
![Page 128: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/128.jpg)
128
Timing Analysis
2. Scheduling overheads.– Required scheduling routines statically known.– Analyse the scheduling control-flow.– Compute the execution time for each scheduling
overhead. Core 1 Core 2main
f1
Core 1 Core 2main
f2f1f2
![Page 129: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/129.jpg)
129
Timing Analysis
3. Variable delay in accessing the shared bus.– Global memory accessed by scheduling routines.– TDMA bus delay has to be considered.
Core 1 Core 2main
f1 f2
![Page 130: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/130.jpg)
130
Timing Analysis
3. Variable delay in accessing the shared bus.– Global memory accessed by scheduling routines.– TDMA bus delay has to be considered.
121212121212
Core 1 Core 2
slotsCore 1 Core 2
main
f1 f2
![Page 131: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/131.jpg)
131
Timing Analysis
3. Variable delay in accessing the shared bus.– Global memory accessed by scheduling routines.– TDMA bus delay has to be considered.
121212121212
Core 1 Core 2main
f1 f2
Core 1 Core 2main
f1 f2
![Page 132: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/132.jpg)
132
Timing Analysis
CCFG optimisations:– merge: Reduces the number of CFG nodes that
need to be traversed.– merge-b: Reduces the number of alternate paths
in the CFG. (Reduces the number of global ticks)– Precision of the analysis is unaffected because we
are not performing value analysis to prune infeasible paths.
![Page 133: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/133.jpg)
133
Timing Analysis
CCFG optimisations:– merge: Reduces the number of CFG nodes that
need to be traversed.– merge-b: Reduces the number of alternate paths
in the CFG. (Reduces the number of global ticks)
cost = 1
cost = 4
cost = 3
cost = 1
cost= 1 + 3= 4
cost= 1 + 4 + 1= 6
cost = 6
merge merge-b
![Page 134: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/134.jpg)
134
Outline
• Introduction• ForeC Language• Timing Analysis• Results• Conclusions
![Page 135: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/135.jpg)
135
Results
For the proposed reachability-based timing analysis, we demonstrate:
– the precision of the computed WCRT.– the efficiency of the analysis, in terms of analysis
time.
![Page 136: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/136.jpg)
136
Results
Timing analysis tool:
Program binary
(annotated)
Fine-grained Reachability(Proposed)
Coarse-grained
Reachability(Max-Plus)
Taking into account the 3 factors
WCRTProgram CCFG (optimisations)
![Page 137: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/137.jpg)
137
Results
Multicore simulator (Xilinx MicroBlaze):– Based on http://www.jwhitham.org/c/smmu.html
and extended to be cycle-accurate and support multiple cores and a TDMA bus.
Core0
TDMA Shared Bus
Global memory
Datamemory
Instruction memory Core
nDatamemory
Instruction memory16KB
16KB
32KB5 cycles
1 cycle
5 cycles/core(Bus schedule round = 5 * no. cores)
![Page 138: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/138.jpg)
138
Results
• Mix of control/data computations, thread structure and computation load.
* [Pop et al 2011] A Stream-Computing Extension to OpenMP.# [Nemer et al 2006] A Free Real-Time Benchmark.
*
*#
Benchmark programs.
![Page 139: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/139.jpg)
139
Results
• Each benchmark program was distributed over varying number of cores.– Up to the maximum number of parallel threads.
• Observed the WCRT:– Test vectors to elicit different execution paths.
• Computed the WCRT:– Proposed– Max-Plus
![Page 140: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/140.jpg)
140
802.11a ResultsObserved:• WCRT decreases
until 5 cores.• Global memory
increasingly expensive.
• Scheduling overheads.
1 2 3 4 5 6 7 8 9 100
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000Observed
Proposed
MaxPlus
Cores
WC
RT
(clo
ck cy
cles
)
![Page 141: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/141.jpg)
141
802.11a Results
1 2 3 4 5 6 7 8 9 100
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000Observed
Proposed
MaxPlus
Cores
WC
RT
(clo
ck cy
cles
)
Proposed:• ~2% over-
estimation.• Benefit of fine-
grained reachability.
![Page 142: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/142.jpg)
142
802.11a ResultsMax-Plus:• Loss of execution
context: Uses only the thread WCRTs.
• Assumes one global tick where all threads execute their worst-case.
• Max execution time of the scheduling routines.1 2 3 4 5 6 7 8 9 10
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000Observed
Proposed
MaxPlus
Cores
WC
RT
(clo
ck cy
cles
)
![Page 143: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/143.jpg)
143
802.11a ResultsBoth approaches:• Estimation of
synchronisation cost is conservative. Assumed that the receive only starts after the last sender.
1 2 3 4 5 6 7 8 9 100
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000Observed
Proposed
MaxPlus
Cores
WC
RT
(clo
ck cy
cles
)
![Page 144: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/144.jpg)
144
802.11a Results
1 2 3 4 5 6 7 8 9 100
500
1,000
1,500
2,000
2,500
Cores
Ana
lysi
s Tim
e (s
econ
ds)
Max-Plus takes less than 2 seconds.Proposed
![Page 145: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/145.jpg)
145
802.11a Results
1 2 3 4 5 6 7 8 9 100
500
1,000
1,500
2,000
2,500
Cores
Ana
lysi
s Tim
e (s
econ
ds)
Proposed (merge)
ProposedMax-Plus takes less than 2 seconds.
merge:• Reduction of ~9.34x
![Page 146: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/146.jpg)
146
802.11a Results
1 2 3 4 5 6 7 8 9 100
500
1,000
1,500
2,000
2,500
Cores
Ana
lysi
s Tim
e (s
econ
ds)
Proposed (merge)
Proposed (merge-b)
ProposedMax-Plus takes less than 2 seconds.
merge:• Reduction of ~9.34xmerge-b:• Reduction of ~342x• Less than 7 sec.
![Page 147: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/147.jpg)
147
Results
Reduction in states reduction in analysis time
Number of global ticks explored.
![Page 148: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/148.jpg)
148
Results
Proposed:• ~1 to 8% over-estimation.• Loss in precision mainly from over-estimating the synchronisation
costs.
1 2 3 40
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
FmRadio
Cores
1 2 3 4 5 6 70
1,000
2,000
3,000
4,000
5,000
6,000
7,000
Fly by Wire
Cores
1 2 3 4 5 6 7 80
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Life
Cores1 2 3 4 5 6 7 8
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Matrix
ObservedProposedMaxPlus
Cores
![Page 149: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/149.jpg)
149
Results
Max-Plus:• Over-estimation very dependent on program structure.• FmRadio and Life very imprecise. Loops iterating over par
statement(s) multiple times. Over-estimations accumulate.• Matrix quite precise. Executes in one global tick. Thus, thread
WCRT assumption is valid.
1 2 3 40
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
FmRadio
Cores
1 2 3 4 5 6 70
1,000
2,000
3,000
4,000
5,000
6,000
7,000
Fly by Wire
Cores
1 2 3 4 5 6 7 80
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Life
Cores1 2 3 4 5 6 7 8
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Matrix
ObservedReachabilityMaxPlus
Cores
![Page 150: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/150.jpg)
150
Results
• Our tool generates a timed execution trace for the computed WCRT:– For each core: Thread start/end time, context-
switching, fork/join, ...– Can be used to tune the thread distribution.
• Was used to manually find good thread distributions for each benchmark program.
![Page 151: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/151.jpg)
Outline
• Introduction• ForeC Language• Timing Analysis• Results• Conclusions
![Page 152: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/152.jpg)
Conclusions
• ForeC language for deterministic parallel programming of embedded multicores.
• Based on the synchronous framework, but amenable to parallel execution.
• Can achieve WCRT speedup while providing time-predictability.
• Very precise and fast timing analysis for parallel programs using reachability.
![Page 153: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/153.jpg)
Future work
• Complete the formal semantics of ForeC.
Thread distribution
ForeCsource code CCFG
Static scheduling
Compiled program
CCFG with assembly
Architecture model
Reachability Computed WCRT
Compilation Timing AnalysisProgrammingAutomatic WCRT-aware scheduling.
Cache hierarchy.
Prune additional infeasible paths using value analysis.
![Page 154: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/154.jpg)
154
Questions?
![Page 155: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/155.jpg)
155
Design Patterns
• Point-to-point• Broadcast• Software pipelining• Divide and conquer
– Scatter/Gather– Map/Reduce
![Page 156: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/156.jpg)
156
Point-to-pointshared int sum = 0 combine with plus;
void main(void) { par( f(), g() );}
void f(void) { while (1) { sum = comp1(); pause; }}
void g(void) { while (1) { comp2(sum); pause; }}
New value of sum is received in the next global tick.
Combine operation is not required.
![Page 157: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/157.jpg)
157
Broadcastshared int sum = 0 combine with plus;
void main(void) { par( f(), g(), g() );}
void f(void) { while (1) { sum = comp1(); pause; }}
void g(void) { while (1) { comp2(sum); pause; }}
Multiple receivers.
Combine operation is not required.
New value of sum is received in the next global tick.
![Page 158: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/158.jpg)
158
Software Pipeliningshared int s1 = 0, s2 = 0 combine with plus;
void main(void) { par( stage1(), stage2(), stage3() );}
void stage1(void) { while (1) { s1 = comp1(); pause; }}void stage2(void) { pause; while (1) { s2 = comp2(s1); pause; }}
Outputs from each stage are buffered.
Use the delayed behaviour of shared variables to buffer each stage.
void stage3(void) { pause; pause; while (1) { comp3(s2); pause; }}
![Page 159: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/159.jpg)
159
Divide and Conquerinput int[1024] image;int edges = 0;
void main(void) { analyse(0, 1023);}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Count the number of edges in an image.
Sequential 1
![Page 160: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/160.jpg)
160
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Parallel 1
![Page 161: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/161.jpg)
161
Divide and Conquerinput int[1024] image;int edges = 0;
void main(void) { analyse(0, 1023);}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Keep a running total of the number of edges in an image.
For the parallel version, it is not as easy as this.
Sequential 2
![Page 162: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/162.jpg)
162
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }} edges = (1+2) + (1+2) = 6
Parallel 2
![Page 163: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/163.jpg)
163
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Global Local
analyse(0,511)
analyse(512,1023)
edges = 0
edges = 0edges = 1
edges = 0edges = 2
edges = (1+2) + (1+2) = 6
Parallel 2
![Page 164: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/164.jpg)
164
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Global Local
analyse(0,511)
analyse(512,1023)
edges = 0
edges = 3
edges = 0edges = 1
edges = 0edges = 2
edges = (1+2) + (1+2) = 6
Parallel 2
![Page 165: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/165.jpg)
165
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Global Local
analyse(0,511)
analyse(512,1023)
edges = 0
edges = 3
edges = 0edges = 1
edges = 0edges = 2
edges = 3edges = 4
edges = 3edges = 5
edges = (1+2) + (1+2) = 6
Parallel 2
![Page 166: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/166.jpg)
166
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Global Local
analyse(0,511)
analyse(512,1023)
edges = 0
edges = 3
edges = 0edges = 1
edges = 0edges = 2
edges = 9
edges = 3edges = 4
edges = 3edges = 5
edges = (1+2) + (1+2) = 6
Parallel 2
![Page 167: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/167.jpg)
167
Divide and Conquerinput int[1024] image;shared int edges = 0 combine with plus;
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges++; } pause; }}
Global Local
analyse(0,511)
analyse(512,1023)
edges = 0
edges = 3
edges = 0edges = 1
edges = 0edges = 2
edges = 9
edges = 3edges = 4
edges = 3edges = 5
edges = (1+2) + (1+2) = 6
We should track the running total separately from the number of new edges.
Parallel 2
![Page 168: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/168.jpg)
168
Divide and Conquerinput int[1024] image;typedef struct { int total; int new } Edges;shared Edges edges = { .total = 0, .new = 0 } combine with accum;
Edges accum(Edges copy1, Edges copy2) { copy1.total = copy1.total + copy1.new + copy2.new; copy1.new = 0; return copy1;}
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges.new = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges.new++; } pause; }}
edges = (1+2) + (1+2) = 6
Parallel 3
![Page 169: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/169.jpg)
169
Divide and Conquerinput int[1024] image;typedef struct { int total; int new } Edges;shared Edges edges = { .total = 0, .new = 0 } combine with accum;
Edges accum(Edges copy1, Edges copy2) { copy1.total = copy1.total + copy1.new + copy2.new; copy1.new = 0; return copy1;}
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges.new = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges.new++; } pause; }}
edges = (1+2) + (1+2) = 6
Global Local
analyse(0,511)
analyse(512,1023)
edges = { .total=0, .new=0}
edges = { .total=0, .new=0}edges = { .total=0, .new=1}
edges = { .total=0, .new=0}edges = { .total=0, .new=2}
Parallel 3
![Page 170: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/170.jpg)
170
Divide and Conquerinput int[1024] image;typedef struct { int total; int new } Edges;shared Edges edges = { .total = 0, .new = 0 } combine with accum;
Edges accum(Edges copy1, Edges copy2) { copy1.total = copy1.total + copy1.new + copy2.new; copy1.new = 0; return copy1;}
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges.new = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges.new++; } pause; }}
edges = (1+2) + (1+2) = 6
Global Local
analyse(0,511)
analyse(512,1023)
edges = { .total=0, .new=0}
edges = { .total=3, .new=0}
edges = { .total=0, .new=0}edges = { .total=0, .new=1}
edges = { .total=0, .new=0}edges = { .total=0, .new=2}
Parallel 3
![Page 171: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/171.jpg)
171
Divide and Conquerinput int[1024] image;typedef struct { int total; int new } Edges;shared Edges edges = { .total = 0, .new = 0 } combine with accum;
Edges accum(Edges copy1, Edges copy2) { copy1.total = copy1.total + copy1.new + copy2.new; copy1.new = 0; return copy1;}
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges.new = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges.new++; } pause; }} edges = (1+2) + (1+2) = 6
Global Local
analyse(0,511)
analyse(512,1023)
edges = { .total=0, .new=0}
edges = { .total=3, .new=0}
edges = { .total=0, .new=0}edges = { .total=0, .new=1}
edges = { .total=0, .new=0}edges = { .total=0, .new=2}
edges = { .total=3, .new=0}edges = { .total=3, .new=1}
edges = { .total=3, .new=0}edges = { .total=3, .new=2}
Parallel 3
![Page 172: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/172.jpg)
172
Divide and Conquerinput int[1024] image;typedef struct { int total; int new } Edges;shared Edges edges = { .total = 0, .new = 0 } combine with accum;
Edges accum(Edges copy1, Edges copy2) { copy1.total = copy1.total + copy1.new + copy2.new; copy1.new = 0; return copy1;}
void main(void) { par( analyse(0, 511), analyse(512, 1023) );}
void analyse(int start, int end) { while (1) { edges.new = 0; for (i = start; i < end; ++i) { ... image[i] ... ; edges.new++; } pause; }} edges = (1+2) + (1+2) = 6
Global Local
analyse(0,511)
analyse(512,1023)
edges = { .total=0, .new=0}
edges = { .total=3, .new=0}
edges = { .total=0, .new=0}edges = { .total=0, .new=1}
edges = { .total=0, .new=0}edges = { .total=0, .new=2}
edges = { .total=6, .new=0}
edges = { .total=3, .new=0}edges = { .total=3, .new=1}
edges = { .total=3, .new=0}edges = { .total=3, .new=2}
Parallel 3
![Page 173: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/173.jpg)
Introduction
• Existing parallel programming solutions.– Shared memory model.
• OpenMP, Pthreads• Intel Cilk Plus, Thread Building Blocks• Unified Parallel C, ParC, X10
– Message passing model.• MPI, SHIM
– Provides ways to manage shared resources but not prevent concurrency errors.
[OpenMP] http://openmp.org [Pthreads] https://computing.llnl.gov/tutorials/pthreads/ [X10] http://x10-lang.org/[Intel Cilk Plus] http://software.intel.com/en-us/intel-cilk-plus [Intel Thread Building Blocks] http://threadingbuildingblocks.org/[Unified Parallel C] http://upc.lbl.gov/ [Ben-Asher et al] ParC – An Extension of C for Shared Memory Parallel Processing.[MPI] http://www.mcs.anl.gov/research/projects/mpi/ [SHIM] SHIM: A Language for Hardware/Software Integration.
![Page 174: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/174.jpg)
Introduction
• Deterministic runtime support.– Pthreads
• dOS, Grace, Kendo, CoreDet, Dthreads.– OpenMP
• Deterministic OMP– Concept of logical time.– Each logical time step broken into an execution
and communication phase.
[Bergan et al 2010] Deterministic Process Groups in dOS.[Olszewski et al 2009] Kendo: Efficient Deterministic Multithreading in Software. [Bergan et al 2010] CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution.[Liu et al 2011] Dthreads: Efficient Deterministic Multithreading.[Aviram 2012] Deterministic OpenMP.
![Page 175: Programming Safety-Critical Embedded Systems](https://reader035.fdocuments.in/reader035/viewer/2022062218/56816345550346895dd3d5e6/html5/thumbnails/175.jpg)
ForeC Language
• Behaviour of shared variables is similar to:– Intel Cilk+ (Reducers)– Unified Parallel C (Collectives)– DOMP (Workspace consistency)– Grace (Copy-on-write)– Dthreads (Copy-on-write)