Download - Fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik.

fakultät für informatikinformatik 12

technische universität dortmund

Mapping: Applications Platforms- Sessions 7-9 -

Peter MarwedelTU DortmundInformatik 12

Germany

Slides use Microsoft cliparts. All Microsoft restrictions apply.

- 2 -technische universitätdortmund

fakultät für informatik

p. marwedel, informatik 12, 2008

TU Dortmund

Schedule of the course

Time Monday Tuesday Wednesday Thursday Friday

09:30-11:00

1: Orientation, introduction

2: Models of computation + specs


9: Mapping of applications to platforms

13: Memory aware compilation


11:00 Brief break Brief break Brief break Brief break

11:15-12:30

6: Lab*: Ptolemy

10: Lab*: Scheduling

14: Lab*: Mem. opt.

18: Lab*: Mem. opt.

12:30 Lunch Lunch Lunch Lunch Lunch

14:00-15:20



11: High-level optimizations*


19: WCET & compilers*

15:20 Break Break Break Break Break

15:40-17:00

4: Lab*: Kahn process networks




20: Wrap-up

* Dr. Heiko Falk




TU Dortmund

Hypothetical design flow

Specifications

Embedded System HW

Standard Software, Real-Time Operating Systems

Applications of applications to execution platforms

Evaluation

Testing

Optimization of Embedded Systems

App

licat

ion

Kno

wle

dge




TU Dortmund

Scope of mapping algorithms

Useful terms from hardware synthesis: Resource Allocation

Decision concerning type and number of available resources

Resource AssignmentMapping: Task (Hardware) Resource

xx to yy binding:Describes a mapping from behavioral to structural domain, e.g. task to processor binding, variable to memory binding

SchedulingMapping: Tasks Task start timesSometimes, resource assignment isconsidered being included in scheduling.




TU Dortmund

Real-time scheduling

Assume that we are given a task graph G=(V,E).

Def.: A schedule of G is a mapping V Tof a set of tasks V to start times from domain T.

V1 V2 V4V3

t

G=(V,E)

T

Typically, schedules have to respect a number of constraints, incl. resource constraints, dependency constraints, deadlines.Scheduling = finding such a mapping.




TU Dortmund

Hard and soft deadlines

Def.: A time-constraint (deadline) is called hard if not meeting that constraint could result in a catastrophe [Kopetz, 1997].

All other time constraints are called soft.

We will focus on hard deadlines.




TU Dortmund

Periodic and aperiodic tasks

Def.: Tasks which must be executed once every p units of time are called periodic tasks. p is called their period. Each execution of a periodic task is called a job.

All other tasks are called aperiodic.

Def.: Tasks requesting the processor at unpredictable times are called sporadic, if there is a minimum separation between the times at which they request the processor.




TU Dortmund

Preemptive and non-preemptive scheduling

Non-preemptive schedulers:

Tasks are executed until they are done.

Response time for external events may be quite long. Preemptive schedulers: To be used if

- some tasks have long execution times or- if the response time for external events to be short.




TU Dortmund

Dynamic/online scheduling

Dynamic/online scheduling:Processor allocation decisions (scheduling) at run-time; based on the information about the tasks arrived so far.




TU Dortmund

Static/offline scheduling

Static/offline scheduling:Scheduling taking a priori knowledge about arrival times, execution times, and deadlines into account.Dispatcher allocates processor when interrupted by timer. Timer controlled by a table generated at design time.

In a time-triggered system, the temporal control structure of all tasks is established a priori by off-line support-tools.




TU Dortmund

Cost functions

Cost function: Different algorithms aim at minimizing different functions.

Def.: Maximum lateness = maxall tasks (completion time – deadline)

Is <0 if all tasks complete before deadline.

t

T1

T2Max. lateness




TU Dortmund

Classes of mapping algorithmsconsidered in this course

Classical scheduling algorithmsMostly for independent tasks & ignoring communication, mostly for mono- and homogeneous multiprocessors

Resource access protocols Dependent tasks as considered in architectural

synthesisInitially designed in different context, but applicable

Hardware/software partitioningDependent tasks, heterogeneous systems,focus on resource assignment

Design space exploration using genetic algorithmsHeterogeneous systems, incl. communication modeling




TU Dortmund

Aperiodic scheduling- Scheduling with no precedence constraints -

Let {Ti } be a set of tasks. Let:

ci be the execution time of Ti ,

di be the deadline interval, that is, the time between Ti becoming available and the time until which Ti has to finish execution.

ℓi be the laxity or slack, defined as ℓi = di - ci

fi be the finishing time.

ℓi

di

cit

i




TU Dortmund

Uniprocessor with equal arrival times

Preemption is useless.

Earliest Due Date (EDD): Execute task with earliest due date (deadline) first.

EDD requires all tasks to be sorted by their (absolute) deadlines. Hence, its complexity is O(n log(n)).

fifi fi




TU Dortmund

Optimality of EDD

EDD is optimal, since it follows Jackson's rule:Given a set of n independent tasks, any algorithm that executes the tasks in order of non-decreasing (absolute) deadlines is optimal with respect to minimizing the maximum lateness.

Proof (See Buttazzo, 2002):

Let be a schedule produced by any algorithm A

If A EDD Ta, Tb, da ≤ db, Tb immediately precedes

Ta in .

Let ' be the schedule obtained by exchanging Ta and Tb.




TU Dortmund

Exchanging Ta and Tb cannot increase lateness

Max. lateness for Ta and Tb in is Lmax(a,b)=fa-da

Max. lateness for Ta and Tb in ' is L'max(a,b)=max(L'a,L'b)

Two possible cases

1. L'a ≥ L'b: L'max(a,b) = f'a – da < fa – da = Lmax(a,b)since Ta starts earlier in schedule '.

2. L'a ≤ L'b: L'max(a,b) = f'b – db = fa – db ≤ fa – da = Lmax(a,b) since fa=f'b and da ≤ db

L'max(a,b) ≤ Lmax(a,b)Tb

TbTa

'

Ta

fa=f'b




TU Dortmund

EDD is optimal

Any schedule with lateness L can be transformed into an EDD schedule n with lateness Ln ≤ L, which is the minimum lateness.

EDD is optimal (q.e.d.)




TU Dortmund

Earliest Deadline First (EDF)- Horn’s Theorem -

Different arrival times: Preemption potentially reduces lateness.

Theorem [Horn74]: Given a set of n independent tasks with arbitrary arrival times, any algorithm that at any instant executes the task with the earliest absolute deadline among all the ready tasks is optimal with respect to minimizing the maximum lateness.




TU Dortmund

Earliest Deadline First (EDF)- Algorithm -

Earliest deadline first (EDF) algorithm: Each time a new ready task arrives: It is inserted into a queue of ready tasks, sorted by their

absolute deadlines. Task at head of queue is executed. If a newly arrived task is inserted at the head of the

queue, the currently executing task is preempted.Straightforward approach with sorted lists (full comparison with existing tasks for each arriving task) requires run-time O(n2); (less with binary search or bucket arrays).

Sorted queue

Executing task




TU Dortmund

Earliest Deadline First (EDF)- Example -

Later deadline no preemption

Earlier deadline preemption




TU Dortmund

Least laxity (LL), Least Slack Time First (LST)

Priorities = decreasing function of the laxity (the less laxity, the higher the priority); dynamically changing priority; preemptive.

ℓℓ

ℓℓ

ℓℓ

ℓℓℓ

ℓℓℓ




TU Dortmund

Scheduling with precedence constraints

Task graph and possible schedule:




TU Dortmund

Simultaneous Arrival Times:The Latest Deadline First (LDF) Algorithm

LDF [Lawler, 1973]: reads the task graph and among the tasks with no successors inserts the one with the latest deadline into a queue. It then repeats this process, putting tasks whose successor have all been selected into the queue.At run-time, the tasks are executed in the generated total order.LDF is non-preemptive and is optimal for mono-processors.

If no local deadlines exist, LDF performs just a topological sort.




TU Dortmund

Asynchronous Arrival Times:Modified EDF Algorithm

This case can be handled with a modified EDF algorithm.The key idea is to transform the problem from a given set of dependent tasks into a set of independent tasks with different timing parameters [Chetto90].This algorithm is optimal for mono-processor systems.

If preemption is not allowed, the heuristic algorithm developed by Stankovic and Ramamritham can be used.




TU Dortmund

Overview

© L. Thiele, ETH Zürich, 2006

Equal arrival timesNon preemptive

Arbitrary arrival timespreemptive

Independent tasks

EDD (Jackson) EDF (Horn)

Dependent tasks LDF (Lawler) EDF* (Chetto)

Scheduling of aperiodic tasks with real time constraints:Table with some known algorithms:




TU Dortmund

Periodic scheduling

For periodic scheduling, the best that we can do is to design

an algorithm which will always find a schedule if one exists.

A scheduler is defined to be optimal iff it will find a

schedule if one exists.

T1

T2




TU Dortmund

Periodic scheduling

Let pi be the period of task Ti, ci be the execution time of Ti, di be the deadline interval, that is, the time between a job

of Ti becoming available and the time until the same job Ti has to finish execution.

ℓi be the laxity or slack, defined as ℓi = di - ci pi

di

ci ℓi




TU Dortmund

Average utilization

Average utilization:

n

i i

i

p

c

1

Necessary condition for schedulability(with m=number of processors):

m




TU Dortmund

Independent tasks:Rate monotonic (RM) scheduling

Most well-known technique for scheduling independentperiodic tasks [Liu, 1973].Assumptions: All tasks that have hard deadlines are periodic.

All tasks are independent.

di=pi, for all tasks.

ci is constant and is known for all tasks.

The time required for context switching is negligible.

For a single processor and for n tasks, the following equation holds for the average utilization µ:

)12( /1

1

nn

i i

i np

c




TU Dortmund

Rate monotonic (RM) scheduling- The policy -

RM policy: The priority of a task is a monotonically decreasing function of its period.

At any time, a highest priority task among all those that are ready for execution is allocated.

Theorem: If all RM assumptions are met, schedulability is guaranteed.




TU Dortmund

Maximum utilization for guaranteed schedulability

Maximum utilization as a function of the number of tasks:

)2ln()12((lim

)12(

/1

/1

1

n

n

nn

i i

i

n

np

c




TU Dortmund

Example of RM-generated schedule

T1 preempts T2 and T3.T2 and T3 do not preempt each other.




TU Dortmund

Case of failing RM scheduling

Task 1: period 5, execution time 2Task 2: period 7, execution time 4µ=2/5+4/7=34/35 0.97 2(21/2-1) 0.828

Missed deadline

Missing computations scheduled in the next period




TU Dortmund

Intuitively: Why does RM fail ?

No problem if p2 = m p1, mℕ :

T1

T2

t fits

T1

T2

t

should be completed

Switching to T1 too early, despite early deadline for T2

leviRTS animation




TU Dortmund

Critical instants

Definition: A critical instant of a task is the time at which the release of a task will produce the largest response time.

Lemma: For any task, the critical instant occurs if that task is simultaneously released with all higher priority tasks.

Proof: Let T={T1, …,Tn}: periodic tasks with i: pi ≦ pi +1.

Source: G. Buttazzo, Hard Real-time Computing Systems, Kluwer, 2002




TU Dortmund

Critical instances (1)

Response time of Tn is delayed by tasks Ti of higher priority:

cn+2ci

Tn

Ti

t

Maximum delay achieved if Tn and Ti start simultaneously.

Maximum delay achieved if Tn and Ti start simultaneously.

cn+3ci

Tn

Ti

t

Delay may increase if Ti starts earlierDelay may increase if Ti starts earlier




TU Dortmund

Critical instants (2)

Repeating the argument for all i = 1, … n-1:

The worst case response time of a task occurs when it is released simultaneously with all higher-priority tasks. q.e.d.

Schedulability is checked at the critical instants.

If all tasks of a task set are schedulable at their critical instants, they are schedulable at all release times.

Observation helps designing examples




TU Dortmund

Properties of RM scheduling

RM scheduling is based on static priorities. This allows RM scheduling to be used in standard OS, such as Windows NT.

No idle capacity is needed if i: pi+1=F pi:

i.e. if the period of each task is a multiple of the period of the next higher priority task, schedulability is then also guaranteed if µ 1.

A huge number of variations of RM scheduling exists.

In the context of RM scheduling, many formal proofs exist.




TU Dortmund

Summary

Mapping of applications to platforms Scheduling algorithms for aperiodic task sets

• Earliest Due Date (EDD)• Earliest Deadline First (EDF)• Least Laxity (LL)• Latest Deadline First (LDF)

Scheduling algorithms for periodic task sets• rate monotonic scheduling (RMS)




TU Dortmund

Coffee/tea break (if on schedule)

Q&A?




TU Dortmund



09:30-11:00








11:15-12:30

6: Lab*: Ptolemy


14: Lab*: Mem. opt.

18: Lab*: Mem. opt.


14:00-15:20







15:40-17:00





20: Wrap-up

* Dr. Heiko Falk




TU Dortmund

EDF

EDF can also be applied to periodic scheduling.

EDF optimal for every period

Optimal for periodic scheduling

EDF must be able to schedule the example in which RMS failed.




TU Dortmund

Comparison EDF/RMS

RMS:

EDF:EDF:

T2 not preempted, due to its earlier deadline.

EDF-animation




TU Dortmund

EDF: Properties

EDF requires dynamic priorities

EDF cannot be used with a standard operating system just providing static priorities.




TU Dortmund

Comparison RMS/EDF

RMS EDF

Priorities Static Dynamic

Works with std. OS with fixed priorities

Yes No

Uses full computational power of processor

No,just up till µ=n(21/n-1)

Yes

Possible to exploit full computational power of processor without provisioning for slack

No Yes




TU Dortmund

Sporadic tasks

If sporadic tasks were connected to interrupts, the execution

time of other tasks would become very unpredictable.

Introduction of a sporadic task server,periodically checking for ready sporadic tasks;

Sporadic tasks are essentially turned into periodic tasks.




TU Dortmund

Resource access protocols

Critical sections: sections of code at whichexclusive access to some resource must be guaranteed.Can be guaranteed with semaphores S or “mutexes”.

P(S)

V(S)

P(S)

V(S)

P(S) checks semaphore to see if resource is available and if yes, sets S to “used“. Uninterruptible operations!If no, calling task has to wait.

V(S): sets S to “unused“ and starts sleeping task (if any).

Mutually exclusiveaccessto resourceguarded byS

Task 1 Task 2




TU Dortmund

Priority inversion

Priority T1 assumed to be > than priority of T2.If T2 requests exclusive access first (at t0), T1 has to wait until T2 releases the resource (time t3), thus inverting the priority:

In this example:duration of inversion bounded by length of critical section of T2.




TU Dortmund

Duration of priority inversion with >2 taskscan exceed the length of any critical section

Priority of T1 > priority of T2 > priority of T3.T2 preempts T3:T2 can prevent T3 from releasing the resource.

critical sectionnormal execution




TU Dortmund

Solutions

Disallow preemption during the execution of all critical sections.

Simple, but creates unnecessary blocking as unrelated tasks may be blocked.


T3

T2

T1

T1 blocked





TU Dortmund

The MARS Pathfinder problem (1)

“But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing total system resets, each resulting in losses of data.The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once".” …




TU Dortmund

The MARS Pathfinder problem (2)

“Pathfinder contained an "information bus", …a shared memory area used for passing information between different components of the spacecraft.”

A bus management task ran frequently with high priority to move certain kinds of data in and out of the information bus. Access to the bus was synchronized with mutual exclusion locks (mutexes).”

The meteorological data gathering task ran as an infrequent, low priority thread, … When publishing its data, it would acquire a mutex, do writes to the bus, and release the mutex. ..

The spacecraft also contained a communications task that ran with medium priority.”

High priority: retrieval of data from shared memoryMedium priority: communications taskLow priority: thread collecting meteorological data




TU Dortmund

Coping with priority inversion:the priority inheritance protocol

Tasks are scheduled according to their active priorities. Tasks with the same priorities are scheduled FCFS.

If task T1 executes P(S) & exclusive access granted to T2: T1 will become blocked.If priority(T2) < priority(T1): T2 inherits the priority of T1. T2 resumes. Rule: tasks inherit the highest priority of tasks blocked by it.

When T2 executes V(S), its priority is decreased to the highest priority of the tasks blocked by it.If no other task blocked by T2: priority(T2):= original value. Highest priority task so far blocked on S is resumed.

Transitive: if T2 blocks T1 and T1 blocks T0,then T2 inherits the priority of T0.




TU Dortmund

Example

T3 inherits the priority of T1 and

T3 resumes.

How would priority inheritance affect our example with 3 tasks?

V(S)




TU Dortmund

Priority Inheritance Protocol (PIP)

Example with nested critical sections


T1

T3

T2

p3

a


b

P(a) P(b)

b

P(b)

P(a)

b

V(b)

a

V(a)

a

t1 t2 t3

P3

P2

P1

V(a)

V(b)

b




TU Dortmund


Example of transitive priority inheritance:

© L. Thiele, ETH Zürich, 2006Source: G. Buttazzo

T1

T3

T2

p3

bP(b)

P(b)P(a)

P(a)

bV(b)

bV(a)

a

t1 t2 t3

P3

P2

P1

V(a)

V(b)a a

b

T1 blocked by T2, T2 blocked by T3

T3 inherits priority from T1 via T2





TU Dortmund


Problem: Deadlock


b

a

b

P(a)

P(b)

P(b)

P(a)

T1

T2

P(a)

P(a)P(b)

P(b)

V(a)

V(a)V(b)

V(b)

T1 T2……

……

© L. Thiele, ETH Zürich, 2006Source: G. Buttazzo




TU Dortmund

Priority inversion on Mars

Priority inheritance also solved the Mars Pathfinder problem: the VxWorks operating system used in the pathfinder implements a flag for the calls to mutex primitives. This flag allows priority inheritance to be set to “on”. When the software was shipped, it was set to “off”.

The problem on Mars was corrected by using the debugging facilities of VxWorks to change the flag to “on”, while the Pathfinder was already on the Mars

[Jones, 1997]




TU Dortmund

Remarks on priority inheritance protocol

Possible large number of tasks with high priority.

Possible deadlocks.

Ongoing debate about problems with the protocol:

Victor Yodaiken: Against Priority Inheritance,http://www.fsmlabs.com/articles/inherit/inherit.html

Finds application in ADA: During rendez-vous,task priority is set to the maximum.

More sophisticated protocol: priority ceiling protocol.




TU Dortmund

Impact on access methods for remote objects

Software packages for access to remote objects;Example: CORBA (Common Object Request Broker Architecture).Information sent to Object Request Broker (ORB) via local stub. ORB determines location to be accessed and sends information via the IIOP I/O protocol.

Access times not predictable.




TU Dortmund

Real-time (RT-) CORBA

A very essential feature of RT-CORBA is to provide end-to-end predictability of timeliness in a fixed priority

system. This involves respecting thread priorities between client

and server for resolving resource contention, and bounding the latencies of operation invocations. Thread priorities might not be respected when threads

obtain mutually exclusive access to resources RT-CORBA includes provisions for bounding the time

during which such priority inversion can happen. Priority management for primitives for mutually exclusive

access to resources. Priority inheritance protocol must be available in implementations of RT-CORBA.




TU Dortmund

Classes of mapping algorithmsconsidered in this course

Classical scheduling algorithmsMostly for independent tasks & ignoring communication, mostly for mono- and homogeneous multiprocessors

Resource access protocols Dependent tasks as considered in architectural

synthesisInitially designed in different context, but applicable

Hardware/software partitioningDependent tasks, heterogeneous systems,focus on resource assignment

Design space exploration using genetic algorithmsHeterogeneous systems, incl. communication modeling




TU Dortmund

Classification of Scheduling Problems

Scheduling

Independent Tasks

RMS, EDF, LLF

Dependent Tasks

Resource constrained

Time constrained

Uncon-strained

ASAP,ALAPFDSLS

1 Proc.

LDF




TU Dortmund

Dependent tasks

The problem of deciding whether or not a schedule existsfor a set of dependent tasks and a given deadlineis NP-complete in general [Garey/Johnson].

Strategies:

1. Add resources, so that scheduling becomes easier

2. Split problem into static and dynamic part so that only a minimum of decisions need to be taken at run-time.

3. Use scheduling algorithms from high-level synthesis




TU Dortmund

Taskgraph

Assumption: execution time = 1for all tasks

a

b c d e f g

h i j

k l m

n

z




TU Dortmund

As soon as possible (ASAP) scheduling

ASAP: All tasks are scheduled as early as possible

Loop over (integer) time steps: Compute the set of unscheduled tasks for which all

predecessors have finished their computation Schedule a selected subset of these tasks to start at the

current time step.




TU Dortmund

As soon as possible (ASAP) scheduling: Example

=0

=2

=3

=4

=5

a

b c d e f g

h i j

k l m

n

z

=1




TU Dortmund

As-late-as-possible (ALAP) scheduling

ALAP: All tasks are scheduled as late as possible

Start at last time step*:

Schedule tasks with no successors and tasks for which all successors have already been scheduled.

* Generate a list, starting at its end




TU Dortmund

As-late-as-possible (ALAP) scheduling: Example

=0

=2

=3

=4

=5Start

a

b c d e f g

h i j

k l m

n

z

=1




TU Dortmund

(Resource constrained) List Scheduling

List scheduling: extension of ALAP/ASAP method

Preparation:

Topological sort of task graph G=(V,E)

Computation of priority of each task:

Possible priorities u:

• Number of successors

• Longest path

• Mobility = (ALAP schedule)- (ASAP schedule)

Source: Teich: Dig. HW/SW Systeme




TU Dortmund

Mobility as a priority function

urgent

less urgent

Mobility is not very precise

=1

=2

=3

=4

=5

=1

=2

=3

=4

=5

a

b c d e f g

h i j

k l m

n

z

=0

a

b c d e f g

h i j

k l m

n

z

=0




TU Dortmund

Algorithm

List(G(V,E), B, u){i:=0; repeat { Compute set of candidate tasks Ai ; Compute set of not terminated tasks Gi ; Select Si Ai of maximum priority r such that |Si| + |Gi| ≤ B (*resource constraint*) foreach (vj Si): (vj):=i; (*set start time*) i:=i+1; } until (all nodes are scheduled); return ();}

Complexity: O(|V|)

may be repeated for different task/ processor classes




TU Dortmund

Example

Assuming B=2, unit execution time and u: path length

u(a)=u(b)=4u(c)=u(f)=3u(d)=u(g)=u(h)=u(j)=2u(e)=u(i)=u(k)=1i: Gi=0

a b

i

c f

g

h j

k

d

ea b

c

f

g

d

e

h

i

j

k

=0

=1

=2

=3

=4

=5

Modified example based on J. Teich




TU Dortmund

(Time constrained) Force-directed scheduling

Goal: balanced utilization of resourcesBased on spring model;Originally proposed for high-level synthesis

* [Pierre G. Paulin, J.P. Knight, Force-directed scheduling in automatic data path synthesis, Design Automation Conference (DAC), 1987, S. 195-202]

© ACM




TU Dortmund

Phase 1: Generation of ASAP and ALAP Schedule

=1

=2

=3

=4

=5

=1

=2

=3

=4

=5

a

b c d e f g

h i j

k l m

n

z

=0

a

b c d e f g

h i j

k l m

n

z

=0




TU Dortmund

Phase 2: Compution of Distribution Graphs D(i)

R(j)={ (ASAP(j)) .. (ALAP(j)) }

=1

=2

=3

=4

=5

=1

=2

=3

=4

=5

a

b c d e f g

h i j

k l m

n

z

=0

a

b c d e f g

h i j

k l m

n

z

=0

0

1

2

3

4

5

2 31 4 5

otherwise

if

:0

)(1

,jRi

|R(j)|ijP

i

j

ijPiD ),()(




TU Dortmund

Next: computation of “forces”

Direct forces push each task into the direction of lower values of D(i). Impact of direct forces on dependent tasks taken into account by

indirect forces Balanced resource usage smallest forces For our simple example and time constraint=6:

result = ALAP schedule

0

1

2

3

4

5

2 31 4 5

i

=1

=2

=3

=4

=5

a

b c d e f g

h i j

k l m

n

z

=0




TU Dortmund

Overall approach

procedure forceDirectedScheduling;begin

AsapScheduling;AlapScheduling;while not all tasks scheduled do

beginselect task T with smallest total force;schedule task T at time step minimizing forces;recompute forces;

end;end

May be repeated for different task/ processor classes

Not sufficient for today's complex, heterogeneous hardware platforms




TU Dortmund

Trend: multiprocessor systems-on-a-chip (MPSoCs)

http

://w

ww

.mps

oc-f

orum

.org

/200

7/sl

ides

/Hat

tori.

pdf




TU Dortmund

Multiprocessor systems-on-a-chip (MPSoCs) (2)

http

://w

ww

.mps

oc-f

orum

.org

/200

7/sl

ides

/Hat

tori.

pdf




TU Dortmund


http

://w

ww

.mps

oc-f

orum

.org

/200

7/sl

ides

/Hat

tori.

pdf




TU Dortmund

Summary

Mapping of applications to platforms Scheduling algorithms for periodic task sets

• Earliest Deadline First (EDF) Preemptive scheduling + mutexes Priority inversion

The priority inheritance protocol (PIP) reduces problems.However, PIP adds to the complexity.Better avoid this effect altogether (e.g. by using a DF MoC).

Scheduling for dependent task sets• ASAP, ALAP, list scheduling, force directed scheduling

Architectures of Multiprocessor Systems on a Chip (MPSoCs)




TU Dortmund

Questions (if on schedule)?

Q&A?




TU Dortmund



09:30-11:00








11:15-12:30

6: Lab*: Ptolemy


14: Lab*: Mem. opt.

18: Lab*: Mem. opt.


14:00-15:20







15:40-17:00





20: Wrap-up

* Dr. Heiko Falk




TU Dortmund

Hardware/software partitioning

Functionality to be implemented in software or in hardware?

Need to consider special purpose hardware in the long run?“No”, for fixed functionality, but “yes” in general, since“By the time MPEG-n can be implemented in software, MPEG-n+1 has been invented” [de Man]




TU Dortmund

Hardware/software partitioning: approach

[Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer Academic Publishers, 1998 (Comprehensive mathematical model)]

Processor P1

Processor P2 Hardware

Specification

Mapping

Inputs to COOL:1. Target technology2. Design constraints3. Required behavior




TU Dortmund

Steps of the COOL partitioning algorithm (1)

1. Translation of the behavior into an internal graph model

2. Translation of the behavior of each node from VHDL into C

3. Compilation• All C programs compiled for the target processor,• Computation of the resulting program size, • estimation of the resulting execution time

(simulation input data might be required) 4. Synthesis of hardware components:

leaf nodes, application-specific hardware is synthesized. High-level synthesis sufficiently fast.




TU Dortmund


5. Flattening of the hierarchy:

• Granularity used by the designer is maintained.

• Cost and performance information added to the nodes.

• Precise information required for partitioning is pre-computed

6. Generating and solving a mathematical model of the optimization problem:

• Integer programming IP model for optimization.Optimal with respect to the cost function(approximates communication time)




TU Dortmund


7. Iterative improvements: Adjacent nodes mapped to the same hardware component are now merged.

8. Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created.




TU Dortmund

An integer linear programming modelfor HW/SW partitioning

Notation:

Index set I denotes task graph nodes.

Index set L denotes task graph node typese.g. square root, DCT or FFT

Index set KH denotes hardware component types.e.g. hardware components for the DCT or the FFT.

Index set J of hardware component instances

Index set KP denotes processors.All processors are assumed to be of the same type




TU Dortmund

An ILP model for HW/SW partitioning

Xi,k: =1 if node vi is mapped to hardware component type

k KH and 0 otherwise.

Yi,k: =1 if node vi is mapped to processor k KP and 0

otherwise.

NY ℓ,k =1 if at least one node of type ℓ is mapped to

processor k KP and 0 otherwise.

T is a mapping from task graph nodes to their types:T: I L

The cost function accumulates the cost of hardware units:

C = cost(processors) + cost(memories) + cost(application specific hardware)




TU Dortmund

Constraints

Operation assignment constraints

KHk KPk

kiki YXIi 1: ,,

All task graph nodes have to be mapped either in software or in hardware.

Variables are assumed to be integers.

Additional constraints to guarantee they are either 0 or 1:

1:: , kiXKHkIi

1:: , kiYKPkIi




TU Dortmund

Operation assignment constraints (2)

ℓ L, i:T(vi)=cℓ, k KP: NY ℓ,k Yi,k

For all types ℓ of operations and for all nodes i of this type:if i is mapped to some processor k, then that processor must implement the functionality of ℓ.

Decision variables must also be 0/1 variables:

ℓ L, k KP: NY ℓ,k 1.




TU Dortmund

Scheduling

Processorp1 ASIC h1

FIR1 FIR2

v1 v2 v3 v4

v9 v10

v11

v5 v6 v7 v8

e3 e4

t

p1

v8 v7

v7 v8

or

...

... ...

...

t

c1

or

...

... ...

...e3

e3

e4

e4t

FIR2 on h1

v4 v3

v3 v4

or

...

... ...

...

Communication channel c1




TU Dortmund

Scheduling / precedence constraints

For all nodes vi1 and vi2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable bi1,i2 withbi1,i2=1 if vi1 is executed before vi2 and

= 0 otherwise.Define constraints of the type(end-time of vi1) (start time of vi2) if bi1,i2=1 and(end-time of vi2) (start time of vi1) if bi1,i2=0

Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph.

Approach just fixes the order of execution and avoids the complexity of computing start times during optimization.




TU Dortmund

Example

HW types H1, H2 and H3 with costs of 20, 25, and 30.

Processors of type P.Tasks T1 to T5.Execution times:

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100




TU Dortmund


T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

X1,1+Y1,1=1 (task 1 mapped to H1 or to P)X2,2+Y2,1=1X3,3+Y3,1=1X4,3+Y4,1=1X5,1+Y5,1=1

KHk KPk

kiki YXIi 1: ,,




TU Dortmund


Assume types of tasks are ℓ =1, 2, 3, 3, and 1.

ℓ L, i:T(vi)=cℓ, k KP: NYℓ,k Yi,k

Functionality 3 to be implemented on

processor if node 4 is mapped to it.




TU Dortmund

Other equations

Time constraints leading to: Application specific hardware required for time constraints under 100 time units.

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Cost function:C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory)




TU Dortmund

Result

For a time constraint of 100 time units and cost(P)<cost(H3):

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Solution (educated guessing) :T1 H1T2 H2T3 PT4 PT5 H1




TU Dortmund

Application example

Audio lab (mixer, fader, echo, equalizer,balance units); slow SPARC processor1µ ASIC libraryAllowable delay of 22.675 µs (~ 44.1 kHz)

SPARCprocessor

ASIC(Compass,1 µ)

External memory

Outdated technology; just a proof of concept.




TU Dortmund

Design space for audio lab

Everything in software: 72.9 µs, 0 2 Everything in hardware: 3.06 µs, 457.9x106 2

Lowest cost for given sample rate: 18.6 µs, 78.4x106 2,




TU Dortmund

HW/SW partitioningin the context of mapping applications to processors

Handling of heterogeneous systems

Handling of task dependencies

Considers of communication (at least in COOL)

Considers memory sizes etc (at least in COOL)

For COOL: just homogeneous processors

No link to scheduling theory

Still handles just a single processor type




TU Dortmund

Survey of Mapping Techniques

1st Workshop on Mapping ApplicationsTo MPSoCs, Rheinfels castle, June, 2008Information: http://www.artist-embedded.org/artist/-Mapping-of-Applications-to-MPSoCs-.html

Automatic parallelization of C-code(see talk by Heiko Falk)

Work at ETH Zürich (SPEA2, Lothar Thiele)

Work at RWTH Aachen (MAPS, Leupers)

Work at Leiden University (Daedalus, Ed Deprettere)

Mapping to the CELL processor (U. Bologna and others)

Work at IMEC (D. Verkest)




TU Dortmund

Daedalus Design-flow

System-level synthesis

Library ofIP cores

Platform specification

Sequentialapplication

Parallel application specification

AutomaticParallelization

High-levelModels

Mappingspecification

System-level design space exploration

Explore, modify, select instances

Multi-processor System on Chip(Synthesizable VHDL and C/C++ code for processors)

RTL-levelModels

Common XMLInterface

Library ofIP cores

KPNgenSesame

ESPAM

Xilinx Platform Studio (XPS)

RTL-LevelSpecification

System-LevelSpecification

Synthesizable VHDL

C/C++ code for processors

MP-SoC

Kahn Process Network

Sequentialapplication

© E. Deprettere,U. Leiden




TU Dortmund

Example architecture instances for a single-tile JPEG encoder:

JPEG/JPEG200 case study

2 MicroBlaze processors (50KB) 1 MicroBlaze, 1HW DCT (36KB)

6 MicroBlaze processors (120KB) 4 MicroBlaze, 2HW DCT (68KB)

Vin

8KB

4x2KB

4x2KB

4x16KB

32KB

16KB 32KB2KB

32KB

2KB

2KB 2KB

2KB2KB 2KB

8KB

8KB

8KB

32KB 4KB

VLE, Vout

DCT, Q

DCT, Q

DCT, Q

DCT, Q

Vin,DCT Q,VLE,Vout Vin,Q,VLE,Vout

Vin

Q

Q

VLE, Vout

DCT

DCT

DCT





TU Dortmund

Sesame DSE results: Single JPEG encoder DSE





TU Dortmund

Work at ZürichMapping Scenario: Overview

Given1. specification of the task structure (task model) = for each flow the

corresponding tasks to be executed2. different usage scenarios (flow model)

Soughtprocessor implementation (resource model) =

architecture* + task mapping + schedulingObjectives:

1. maximize performance2. minimize cost

Subject to:1. memory constraints2. delay constraints

*: 2 cases:1. fixed architecture2. architecture to be designed

based on Thiele’s slides

(performance model)




TU Dortmund

Design Space

Scheduling/Arbitration

proportionalshareWFQ

staticdynamicfixed priority

EDFTDMA

FCFS

Communication Templates

Architecture # 1 Architecture # 2

Computation Templates

DSP

E

Cipher

SDRAMRISC

FPGA

LookUp

DSP

TDMA

Priority

EDF

WFQ

RISC

DSP

LookUp

Cipher

E E E

E E E

static

Which architecture is better suited for our application?

© L. Thiele, ETHZ




TU Dortmund

Practical problem in automotive design

Which processor should run the software?




TU Dortmund

Exploration Cycle

EXPO – Tool architecture (1)

MOSES

EXPO SPEA 2

selectionof “good” architectures

system architectureperformance values

task graph, scenario graph,

flows & resources

© L. Thiele, ETHZ




TU Dortmund

SYMTA/S: System optimization using evolutionary algorithms

[R. Ernst et al.: A framework for modular analysis and exploration of

heteterogenous embedded systems, Real-time Systems, 2006, p. 124]




TU Dortmund

Summary

Mapping applications for complex heterogeneous multiprocessor platforms needs

allocation (if hardware is not fixed)

binding of tasks to resources

scheduling

Approaches presented

HW/SW codesign tool COOL

Daedalus (briefly)

Symta/S (briefly)

SPEA2: Evolutionary algorithms in use at ETH Zürich




TU Dortmund

Brief break (if on schedule)

Q&A?