Synthesis of Embedded Software for Reactive Systems Jordi Cortadella Universitat Politècnica de...

Synthesis of Embedded Software for Reactive Systems

Jordi CortadellaUniversitat Politècnica de Catalunya, Barcelona

Joint work with:

Robert Clarisó, Alex Kondratyev, Luciano Lavagno, Claudio Passerone and Yosinori Watanabe (UPC, Cadence Berkeley Labs, Politecnico di Torino)

System Design

External IP provider(e.g. software modem)

Internal IP provider(e.g. MPEG2 engine)

System designer(Set-top box)

Contents provider(e.g. TV broadcast company)

Methodologyfor

Platform-basedSystem Design

Platform provider(e.g. Semiconductor company)

µP

DSPCom

s

MPEG2Engine

customGraphics

: Requirements specification, Testbench

: Functional and performance models

(with agreed interfaces and abstraction levels)

etropolisMetropolis Project

• Goal: develop a formal design environment– Design methodologies: abstraction levels, design problem formulations

– EDA: formal methods for automatic synthesis and verification,

a modeling mechanism: heterogeneous semantics, concurrency

• Participants:– UC Berkeley (USA): methodologies, modeling, formal methods

– CMU (USA): formal methods

– Politecnico di Torino (Italy): modeling, formal methods

– Universitat Politècnica de Catalunya (Spain): modeling, formal methods

– Cadence Berkeley Labs (USA): methodologies, modeling, formal methods

– Philips (Netherlands): methodologies (multi-media)

– Nokia (USA, Finland): methodologies (wireless communication)

– BWRC (USA): methodologies (wireless communication)

– BMW (USA): methodologies (fault-tolerant automotive controls)

– Intel (USA): methodologies (microprocessors)

Metropolis Framework

Design

Constraints

Function

Specification

Architecture

Specification

Metropolis Infrastructure

• Design methodology• Meta model of computation• Base tools - Design imports - Meta model compiler - Simulation

Metropolis Formal Methods:Synthesis/Refinement

Metropolis Formal Methods:Analysis/Verification

Outline

• The problem– Synthesis of concurrent specifications

for sequential processors

– Compiler optimizations across processes

• Previous work: Dataflow networks– Static scheduling of SDF networks

– Code and data size optimization

• Quasi-Static Scheduling of process networks– Petri net representation of process networks

– Scheduling and code generation

• Open problems

Environmental controller

AC Dehumidifier Alarm

Temperature

Humidity

ENVIRONMENTALCONTROLLER

TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA

AC-on DRYER-on ALARM-on


TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


TEMP-FILTER

float sample, last;last = 0;forever { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE(TDATA, sample); }}


TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


TEMP-FILTER

float sample, last;last = 0;forever { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE(TDATA, sample); }}

HUMIDITY-FILTER

float h, max;forever { h = READ(HSENSOR); if (h > MAX) WRITE(HDATA, h);}


TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


CONTROLLER

float tdata, hdata;forever { select(TDATA,HDATA) { case TDATA: tdata = READ(TDATA); if (tdata > TFIRE) WRITE(ALARM-on,10); else if (tdata > TMAX)

WRITE(AC-on, tdata-TMAX); case HDATA: hdata = READ(HDATA); if (hdata > HMAX) WRITE(DRYER-on, 5); }}

TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


TsensorT-FILTERwakes up

T-FILTERexecutes

T-FILTERsleeps

HsensorH-FILTERwakes up

H-FILTERexecutes &sends datato HDATA

H-FILTERsleeps

CONTROLLERwakes up

CONTROLLERexecutes &reads data

from HDATA...

Environ. Processes OS

Operating system

Compiler optimizations

• Instruction level

• Basic blocks

• Intra-procedural(across basic blocks)

• Inter-procedural

• Inter-process ?

• a = b*16 a = b >> 4

• common subexpr.,copy propagation

• loop invariants,induction variables

• inline expansion,parameter propagation

• channel optimizations,OS overhead reduction

Each optimization enables further optimizations at lower levels

Partial evaluation (example)

Specification:

subsets (n,k) = n! / (k! * (n-k)!)

________________________________________________

int subsets (int n, int k)

{ return fact(n) / (fact(k) * fact(n-k)); }

int pairs (int n)

{ return subsets (n,2);}

... print (pairs(x+1)) ...... print (pairs(x+1)) ...

... print (pairs(5)) ...

Partial evaluation (compiler optimizations)


Specification:

subsets (n,k) = n! / (k! * (n-k)!)

________________________________________________



int pairs (int n)


... print ((x+1)*x / 2) ...... print ((x+1)*x / 2) ...

... print (pairs(5)) ...Partial evaluation (compiler optimizations)


Specification:

subsets (n,k) = n! / (k! * (n-k)!)

________________________________________________



int pairs (int n)


... print ((x+1)*x / 2) ...... print ((x+1)*x / 2) ...

... print (10) ...

Inter-process partial evaluation

forever {

n = read (A);

write (B,n);

write (C, n-2);

write (D, 2);}

forever {

x = read (E);

y = read (F);

z = read (G);

write (H, x/(y*z));}

x!

A

H

x!

x!

n

pairs (n)


forever {

n = read (A);

write (B,n);

write (C, n-2);

write (D, 2);}

forever {

x = read (E);

y = read (F);

z = read (G);


x!

A

H

x!

x!

No chances for optimization


forever {

n = read (A);

write (B,n);

write (C, n-2);

write (D, 2);}

forever {

x = read (E);

y = read (F);

z = read (G);


x!

A

H

x!

x!2...2 2...2


forever {

n = read (A);

write (B,n);

write (C, n-2);

write (G, 2);}

forever {

x = read (E);

y = read (F);

z = read (G);


x!

A

H

x!

2...2 2...2


forever {

n = read (A);

write (B,n);

write (C, n-2);

write (G, *);}

forever {

x = read (E);

y = read (F);

read (G);

write (H, x/(y*2));}

x!

A

H

x!

• Copy propagation across processes• Channel G only synchronizes (token available)


forever {

n = read (A);

write (B,n);

write (C, n-2);

write (G, *);}

forever {

x = read (E);

y = read (F);

read (G);


x!

A

H

x!

By scheduling operations properly, FIFOs may become variables(one element per FIFO, at most)


forever {

n = read (A);

v1 = n;

v3 = n-2;

x = v2;

y = v4;


x!

A

H

x!

v1 v2

v3 v4


forever { n = read (A); v1 = n; v2 = fact (v1); x = v2; v3 = n-2; v4 = fact (v3); y = v4; write (H, x/(y*2));}

A

H

And now we can apply conventional compiler optimizations


forever { n = read (A); x = fact (n); y = fact (n-2);


A

H

If some “clever” theorem prover could realize that

fact(n) = n*(n-1)*fact(n-2)

the following code could be derived ...


forever {

n = read (A);

write (H,n*(n-1)/*2);

}

A

H


forever {

n = read (A);

write (B,n);

write (C, n-2);

write (D, 2);}

forever {

x = read (E);

y = read (F);

z = read (G);


x!

A

H

x!

x!

This was the original specification of the system !


This is the final implementation after inter-process optimization:

• Only one process (no context switching overhead)

• Channels substituted by variables (no communication overhead)

forever {

n = read (A);

write (H,n*(n-1)/*2);

}

A

H

TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


Operating system

Goal:Goal: improve performance, code size power consumption, ...

• Reduce operating system overhead

• Reduce communication overhead

How?: Do as much as possible statically and automatically

• Scheduling

• Compiler optimizations

Outline







• Open problems

Dataflow networks

• Powerful mechanism for data-dominated systems

• (Often stateless) actors perform computation

• Unbounded FIFOs perform communication via sequences of tokens carrying values– (matrix of) integer, float, fixed point– image of pixels, …..

• Determinacy: – unique output sequences given unique input sequences

– Sufficient condition: blocking read(process cannot test input queues for emptiness)

Intuitive semantics

• Example: FIR filter– single input sequence i(n)

– single output sequence o(n)

– o(n) = c1 * i(n) + c2 * i(n-1)

c1

+ o

i i(-1)

c2

Examples of Dataflow actors

• SDF: Static Dataflow: fixed number of input and output tokens

• BDF: Boolean Dataflow control token determines number of consumed and produced tokens

+

1

11

FFT1024 1024 10 1

merge selectT F

FT

Static scheduling of DF• Key property of DF networks: output sequences do not depend on

firing sequence of actors (marked graphs)

• SDF networks can be statically scheduled at compile-time – execute an actor when it is known to be fireable– no overhead due to sequencing of concurrency– static buffer sizing

• Different schedules yield different – code size– buffer size– pipeline utilization

Balance equations

• Number of produced tokens must equal number of consumed tokens on every edge (channel)

• Repetitions (or firing) vector v of schedule S: number of firings of each actor in S

• v(A) ·np = v(B) ·nc

must be satisfied for each edge

np nc

A B A Bnp nc

Balance equations

B C

A3

1

1

1

2

2

11

• Balance for each edge:– 3 v(A) - v(B) = 0

– v(B) - v(C) = 0

– 2 v(A) - v(C) = 0

– 2 v(A) - v(C) = 0

Balance equations

• M ·v = 0iff S is periodic

• Full rank (as in this case) • no non-zero solution • no periodic schedule

(too many tokens accumulate on AB or BC)

3 -1 00 1 -12 0 -12 0 -1

M =

B C

A3

1

1

1

22

11

Balance equations

• Non-full rank• infinite solutions exist (linear space of dimension 1)

• Any multiple of v = |1 2 2|T satisfies the balance equations• ABCBC and ABBCC are minimal valid schedules• ABABBCBCCC is non-minimal valid schedule

2 -1 00 1 -12 0 -12 0 -1

M =

B C

A2

1

1

1

22

11

Static SDF scheduling

• Main SDF scheduling theorem (Lee ‘86):

– A connected SDF graph with n actors has a periodic schedule iff its topology matrix M has rank n-1

– If M has rank n-1 then there exists a unique smallest integer solution v to

M v = 0

Deadlock

• If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86)

B C

A

2

1

1

1

2

1

Schedule: (2A) B C

Deadlock !Deadlock !

Deadlock

• If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86)

B C

A

2

1

1

1

2

1

Schedule: (2A) B C

Code size minimization

• Assumptions (based on DSP architecture):– subroutine calls expensive– fixed iteration loops are cheap

(“zero-overhead loops”)

• Global optimum: single appearance schedulee.g. ABCBC A (2BC), ABBCC A (2B) (2C)

• may or may not exist for an SDF graph…

• buffer minimization relative to single appearance schedules

(Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)

• Assumption: no buffer sharing• Example:

v = | 100 100 10 1|T

• Valid SAS: (100 A) (100 B) (10 C) D• requires 210 units of buffer area

• Better (factored) SAS: (10 (10 A) (10 B) C) D• requires 30 units of buffer area, but…• requires 21 loop initiations per period (instead of 3)

Buffer size minimization

C D1 10

A

B10

10

1

1

Scheduling more powerful DF• SDF is limited in modeling power • More general DF is too powerful

– non-Static DF is Turing-complete (Buck ‘93) – bounded-memory scheduling is not always possible

• Boolean Data Flow: Quasi-Static Scheduling of special “patterns”– if-then-else, repeat-until, do-while

• Dynamic Data Flow: run-time scheduling– may run out of memory or deadlock at run time

• Kahn Process Networks: quasi-static scheduling using Petri nets – conservative: schedulable network may be declared unschedulable

Outline







• Open problems

The problem• Given: a network of Kahn processes

– Kahn process: sequential function + ports– communication: port-based, point-to-point, uni-

directional, multi-rate

• Find: a single sequential task– functionally equivalent to the original

network (modulo concurrency)– threads driven by input stimuli

(no OS intervention)

TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


Init()

last = 0;

Tsensor()

sample = READ(TSENSOR);

if (|sample - last| > DIF) {

last = sample;

if (sample > TFIRE)

WRITE(ALARM-on,10);

else if (sample > TMAX)

WRITE(AC-on,sample-TMAX);

}

Hsensor()

h = READ(HSENSOR);

if (h > MAX)

WRITE(DRYER-on,5);

Event-driven threads

Reset

The scheduling procedure

1. Specify a network of processes– process: C + communication operations– netlist: connection between ports

2. Translate to the computational model: Petri nets

3. Find a “schedule” on the Petri net

4. Translate the schedule to a task

TEMP-FILTER

float sample, last;last = 0;while (1) { sample = READ(TSENSOR); if (|sample - last|> DIF) { last = sample; WRITE(TDATA, sample); }}

TSENSOR

sample = READ(TSENSOR)

last = sample;WRITE(TDATA,sample)

TDATA

last = 0

TEMPFILTER

TSENSOR

TDATA

T

F

HUMIDITY-FILTER

float h, max;last = 0;while (1) { h = READ(HSENSOR); if (h > MAX) WRITE(HDATA, h);}

HUMIDITYFILTER

HSENSOR

HDATA

HSENSOR

h = READ(HSENSOR)

WRITE(HDATA,h)

HDATA

T

Fh > MAX ?

HDATA

WRITE(ALARM-on,10)

T

F

h > MAX ?

TDATA

hdata = READ(HDATA)tdata = READ(TDATA)

WRITE(AC-on,tdata-TMAX)

T

WRITE(DRYER-on,5)

Ftdata > TFIRE?

tdata > TMAX?

T

F

hdata > HMAX?

CONTROLLERwhile(1) { select(TDATA,HDATA) { case TDATA: tdata = READ(TDATA); if (tdata > TFIRE) WRITE(ALARM-on, 10); else if (tdata > TMAX) WRITE(AC-on, tdata-TMAX); case HDATA: hdata = READ(HDATA, hdata); if (hdata > HMAX) WRITE(DRYER-on, 5);}}

WRITE(ALARM-on,10)

T

F

h > MAX ?

TDATA



T

WRITE(DRYER-on,5)

Ftdata > TFIRE?

tdata > TMAX?

T

F

hdata > HMAX?

HSENSOR

h = READ(HSENSOR)

WRITE(HDATA,h)

T

F

TSENSOR



last = 0

|sample-last| > dif ?T

F

HDATA

h > MAX ?

Petri nets for Kahn process networks

Sequential processes (1 token per process)Input/Output ports (communication with the environment)Channels (point-to-point communication between processes)

Petri nets for Kahn process networks

True False True False

Data-dependent choices• Conservative assumption (any outcome is possible)

Schedule

• Infinite state space

• Schedule properties:• Finite (no infinite resources)• Inputs served infinitely often• All choice outcomes covered

Schedule

• Finding the optimal schedule is computationally expensive

• Heuristics are required• token count minimization• guidance by T-invariants (cycles)

Code generation

I1

I2

system

T F

TF

I1 I2

I1 I2

I1 I2

Initialization

Await state

Choice Generated code:

• ISRs driven by input stimuli (I1 and I2)

• Each tasks contains threads from one await state to another await state

Code generation

I1

I2

system

T F

TF

I1 I2

I1 I2

I1 I2

Generated code:



Code generation

I1

I2

system

Generated code:



C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

S1

S2 S3C11

I1I2

I1 I2

I1 I2

T

F

Code generation

C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

S1

S2 S3C11

I1I2

I1 I2

I1 I2

T

F

enum state {S1, S2, S3} S;

Code generation


Init () { C0(); S = S1; return;}

C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

S1

S2 S3C11

I1I2

I1 I2

I1 I2

T

F

C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

S1

S2 S3C11

I1I2

I1 I2

I1 I2

T

F

Code generation

C1

C2

C3

C5

C6

C7

S1

S2 S3C11

I1

I1

I1


ISR1 () { switch(S) { case S1: C1(); C2(); S=S2; return; case S2: C3(); C2(); return; case S3: C6(); C7(); C11(); C5(); return;} }

C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

S1

S2 S3C11

I1I2

I1

I1 I2

T

F

Code generation


ISR2 () { switch(S) { case S1: C4(); C5(); S=S3; break; case S2: C10(); C11(); C5(); S=S3; return; case S3: if (C8()) { C7(); C11(); C5(); return; } else { C9(); S = S1; return;} } }

C4

C5

C7

C8

C9

C10

S1

S2 S3C11

I2

I2

I2

Code generation



ISR1 () { switch(S) { case S1: C1(); C2(); S=S2; return; case S2: C3(); C2(); return; case S3: C6(); C7(); C11(); C5(); return;} } ISR2 () { switch(S) { case S1: C4(); C5(); S=S3; break; case S2: C10(); C11(); C5(); S=S3; return; case S3: if (C8()) { C7(); C11(); C5(); return; } else { C9(); S = S1; return;} } }

C0

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

S1

S2 S3C11

I1I2

I1 I2

I1 I2

T

F

Code generation




Init ()

ISR1 ()

ISR2 ()

Reset

I1

I2

S


AC Dehumidifier Alarm

Temperature

Humidity

ENVIRONMENTALCONTROLLER

TEMPFILTER

HUMIDITYFILTER

CONTROLLER

TSENSOR HSENSOR

HDATATDATA


WRITE(ALARM-on,10)

T

F

h > MAX ?

TDATA



T

WRITE(DRYER-on,5)

Ftdata > TFIRE?

tdata > TMAX?

T

F

hdata > HMAX?

HSENSOR

h = READ(HSENSOR)

WRITE(HDATA,h)

T

F

TSENSOR



last = 0

|sample-last| > dif ?T

F

HDATA

h > MAX ?

Et h > MAX ?

TDATA

I D

F

Jt

HSENSOR

G

Hf

TSENSOR

B

Ct

A

HDATA

p1

p2

p3

p4

p5

p6

p7

p8

Ef

p0

p9

p10

Cf

Jf

(p0 p8 p9)

(p1 p8 p9)

(p1 p3 p8 p9)

(p2 p8 p9)

(p1 p8 p9 TDATA)

(p1 p4 p8)

(p1 p5 p8)

(p1 p6 p8 p9)

(p2 p7 p9)

(p1 p8 p9 HDATA)

(p1 p8 p10)

A

B

Ct

Cf

D

Ef

Et

F

G

Ht

I

Hf

JfJt

TSENSOR HSENSOR

await state

(p0 p8 p9)

(p1 p8 p9)

(p1 p3 p8 p9)

(p2 p8 p9)

(p1 p8 p9 TDATA)

(p1 p4 p8)

(p1 p5 p8)

(p1 p6 p8 p9)

(p2 p7 p9)

(p1 p8 p9 HDATA)

(p1 p8 p10)

A

B

Ct

Cf

D

Ef

Et

F

G

Ht

I

Hf

JfJt

TSENSOR HSENSOR

(p0 p8 p9)

(p1 p8 p9)

(p1 p3 p8 p9)

(p2 p8 p9)

(p1 p8 p9 TDATA)

(p1 p4 p8)

(p1 p5 p8)

(p1 p6 p8 p9)

(p2 p7 p9)

(p1 p8 p9 HDATA)

(p1 p8 p10)

A

B

Ct

Cf

D

Ef

Et

F

G

Ht

I

Hf

JfJt

TSENSOR HSENSOR

TEMP-FILTERHUMIDITY-FILTER

CONTROLLER

Tsensor()

{



last = sample;

WRITE (TDATA,sample);

tdata = READ (TDATA);

if (tdata > TFIRE)

WRITE(ALARM-on,10);

else if (tdata > TMAX)

WRITE(AC-on,tdata-TMAX);

}

}

Channel elimination

Code generation and optimization

Tsensor()

{

READ(TSENSOR,sample,1);


last = sample;

WRITE (TDATA,sample,1);

READ (TDATA,tdata,1);

if (tdata > TFIRE)

WRITE(ALARM-on,10);

else if (tdata > TMAX)

WRITE(AC-on,tdata-TMAX);

}

}

Copy propagationtdata = sample


Tsensor()

{

READ(TSENSOR,sample,1);


last = sample;

WRITE (TDATA,sample);

tdata = READ (TDATA);

if (sample > TFIRE)

WRITE(ALARM-on,10);



}

}


Init()

last = 0;

Tsensor()



last = sample;

if (sample > TFIRE)

WRITE(ALARM-on,10);



}

Hsensor()

h = READ(HSENSOR);

if (h > MAX)

WRITE(DRYER-on,5);

Event-driven threads

Reset

Application example: ATM Switch

Input cells: accept?

Output cells: emit?

• No static schedule due to:– Inputs with independent rates

(need Real-Time dynamic scheduling) – Data-dependent control

(can use Quasi-Static Scheduling)

Functional Decomposition

4 Tasks(+ 1 arbiter)

Accept/discard cell

Clock divider

Output time selector

Output cell enabler

Minimal (QSS) Decomposition

2 Tasks

Input cell processing

Output cell processing

ATM: experimental results

Sw Implementation QSS Functional partitioning

Number of tasks 2 5

Lines of C code 1664 2187

Clock cycles 197,526 249,726

4+1 Tasks 2 Tasks

Functional partitioning QSS

Producer-Filter-Consumer Example

controller

filterproducer consumer

init

Req AckCoeff

Pixels Pixels

pixels

Experimental Results

# of clock cycles

size of channels

4-task implementation

1-task implementation

Open problems

• Is a system schedulable ? (decidability)

• False paths in concurrent systems(data dependencies)

• Synthesis for multi-processors

• Abstraction / partitioning

• and many others ...

(Quasi) Static Scheduling approaches

• Lee et al. ‘86: Static Data Flow: cannot specify data-dependent control

• Buck et al. ‘94: Boolean Data Flow: undecidable schedulability check, heuristic pattern-based algorithm

• Thoen et al. ‘99: Event graph: no schedulability check, no task minimization

• Lin ‘97: Safe Petri Net: no schedulability check, single-rate, reachability-based algorithm

• Thiele et al. ‘99: Bounded Petri Net: partial schedulability check, reachability-based algorithm

• Cortadella et al. ‘00: General Petri Net: maybe undecidable schedulability check, balance equation-based algorithm

t = READ(D)s = s + tj = j + 1

j = 0

s = 0

i = 0

i = i + 1

WRITE(D,data[i])

td

tf

te

tb

ta

tc

TF

T F

p1

p5

p6

p7p4

p3

p2i<3

j<3

False paths

Choices are correlated

#WRITES = #READS

i = j

Multi-processor allocation




Init ()

ISR1 ()

ISR2 ()

Reset

I1

I2

S

Processor 1Processor 1

Processor 2Processor 2

State and data are shared:

• Mutual exclusion required

Conclusions• Reactive systems

– OS required to control concurrency– Processes are often reused in different environments

• Static and Quasi-Static Scheduling minimize run-time overhead by automatic partitioning the system functions into input-driven threads– No context switch required (OS overhead is reduced)– Compiler optimizations across processes

• Much more research is needed:– strategies to find schedules (decidability ?)– false paths in concurrent systems– what about multiple processors?– ...

Synthesis of Embedded Software for Reactive Systems Jordi Cortadella Universitat Politècnica de...

Documents

Transcript of Synthesis of Embedded Software for Reactive Systems Jordi Cortadella Universitat Politècnica de...