Time-Predictable Execution of Embedded Software on Multi-core Platforms

Post on 02-Jan-2016

54 views 2 download

Tags:

description

Time-Predictable Execution of Embedded Software on Multi-core Platforms. Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury. Embedded Systems. Real-time Constraints. Hard real-time. Embedded system. Soft real-time. Timing Analysis. - PowerPoint PPT Presentation

Transcript of Time-Predictable Execution of Embedded Software on Multi-core Platforms

TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS

Sudipta Chattopadhyay

under the guidance of A/P Abhik Roychoudhury

1

EMBEDDED SYSTEMS

2

REAL-TIME CONSTRAINTS

3

Embeddedsystem

Hard real-time

Soft real-time

TIMING ANALYSIS

Hard real time systems require absolute timing guarantees System level analysis Single task analysis

Worst case execution time (WCET) analysis An upper bound on execution time for all possible

inputs Sound over-approximation is obtained by static

analysis

4

WCET ANALYSIS

Program Micro-architectural

modeling

Control flow

graph

WCET of basic blocks

constraints

Infeasible path

constraints

Loop bound

Path analysis

WCETboun

d

5

ARCHITECTURE

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

Memory

Shared busResource sharing

6

OVERVIEW

7

Dissertation work(Time-predictable execution in multi-core)

Unified cache

Shared cacheShared cache

+shared bus

A multi-core WCET tool

Cache related preemption delay

analysis

Coherence missmodeling

Shared scratchpadallocation

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

Memory

Shared bus Resource sharing

Main Memory

L1 instruction cache

Instr. accesses

Data accesses

Bus

L1 data cache

L2 unified cache

Processor

Conflicts with different instruction and data memory

blocks

MICRO-ARCHITECTURAL MODELING

pipelinecache

branch predictor shared cache

shared bus

Single Core Multi Core

8

(AI+MC) MC > RTSS’10 = RTSS’10

COMPARISON

9

Work Micro-arch. level

technique

Program level

technique

Precision

Scalability

Classical abstract

interpretation (AI)

AI AI × √

Classical model checking (MC)

MC MC √ ×RTS’00

(aiT, Chronos)AI Integer

linearprogrammi

ng

Can be improve

d

√RTSS’10 AI MC Can be

improved

_

Our approach (AI+MC) Integer linear

programming

> RTS’00 = RTS’00

IMPRECISION IN ABSTRACT INTERPRETATION

p1 p2

Cache state = C1

Cache state = C2

Joined Cache state = C3

10

a

b

b

x

Abstractcache set

Abstractcache set

youngyoung

b Joined cache statePath p1 or path p2?

Joined cache state loses information about path p1 and p2

MODEL CHECKING ALONE ?

A path sensitive search Path sensitive search is expensive – path

explosion Worse, combined with possible cache states

p1 p2

Cache state = C1

Cache state = C2

11

MODEL CHECKING ALONE ?

A path-sensitive search Path sensitive search is expensive – path

explosion Worse, combined with possible cache states

p1p2

12

a

b

young b

x

Abstract LRU cache set

young

a

b

Abstract LRU cache set

young b

xAbstract LRU

cache set

young

State Explosion

CACHE ANALYSIS

Program

Pipelineanalysis

Branch predictormodeling

WCET of basic blocks

constraints

Infeasible path

constraints

Loop bound

IPET

Micro architectural modeling

Path analysis

Cache analysis by

abstract interpretatio

n

Analysisoutcome

Refine by model checker

All checked

Timeout

13

Refinement by model checker can be terminated at any point

Model checker refinement steps are inherently parallel

Each model checker refinement step checks light assertion property

REFINEMENT (INTER-CORE)

14m

m

Task

Cache hit

start

exit

Conflicting task

Cache miss

m1

m2

m

cache

x < y

x == yInfeasible

m1

m2

Spurious

≠m ≠myoung

REFINEMENT (INTER-CORE)

m

m

Task

start

exit

Conflicting task

m1

m2

m

cache

x < y

x == yInfeasible

m1

m2

C_m++Increment

conflict

C_m++

Increment conflict

assert (C_m <= 1)

Verified

m

A Cache Hit

15

young

REFINEMENT (WHY IT WORKS?)

16

Path 2

Cache missm

m

Conflict to mm’

C_m++ Increment conflict

assert (C_m <= 0)

Property

Does not affect the value of

C_m

x < y

x == y

m’

m

EXPERIMENTAL SETUP (CHRONOS TOOLKIT)

17

C sourceGCC

simplescalar Binary code CFG

Micro architectural

modeling

cache pipeline Branchprediction

Micro-architectural constraints

ILP

Flowconstraints

WCET

CBMC

C bounded model checking

EXPERIMENTAL RESULT

18

EXPERIMENTAL RESULT

19

L1 cache L1 cache

Shared L2 cache

WCET

4-way associative, 8 KB

Direct-mapped, 256 bytes

Average time = 70 secs

Tasks

cnt

jfdctint

edn

fir

fdct

ndes

EXTENSION USING SYMBOLIC EXECUTION

Conflicting task

m1

m2

x < y

x == y

m1

m2

C_m++Increment

conflict

C_m++

Increment conflict

assert (C_m <= 1)

x < y

constraint

solver

x = y x = y

x < y x ≥ y

x < y ˄ x = y

unknown

NO

assert (C_m <= 1)

satisfied

abort

20

EXTENSION USING KLEE

21

C sourceGCC

simplescalar Binary code CFG

Micro architectural

modeling

cache pipeline Branchprediction

Micro-architectural constraints

ILP

Flowconstraints

WCET

CBMC/KLEE

A GENERIC FRAMEWORK

Three different architectural/application settings

Intra task(WCET in single core)

Highpriority

Lowpriority

Inter task(Cache Related

Preemption Delay analysis)

cache cache L1 cache L1 cache

Shared L2 cache

Task in Core 1

Task in Core 2

Inter core(WCET in multi-core)

22

Cacheconflict Cache

conflictCacheconflict

MICRO-ARCHITECTURAL MODELING

pipelinecache

branch predictor shared cache

shared bus

Single Core Multi Core

23

TASK-LEVEL INTERFERENCE

Timeline

T3

T2

T1

T1

T2

T3

Task interference graph24

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

T1 T2 T3

Shared bus

Tasks

SHARED CACHE + TDMA SHARED BUS

T1

T2

T3

T4

Core 1slot

Core 2slot

Core 1slot

Core 2slot

T1

T2

T3

T4

L2 missdue to

T2

Disjointlifetime

WAIT

T4

25

Core 1 Core 2

L1 cache L1 cache

Shared L2 cache

Shared bus

Task graphsTime Division Multiple Access (TDMA)

T1 T2

T3 T4

Bus access

Bus access

OVERVIEW OF THE FRAMEWORK

L1 cache analysis

L2 cacheanalysis

Filter

L1 cache analysis

L2 cache analysis

L2 conflict analysisInitial interference

Filter

Bus awareanalysis

WCRT computation

Interference changes ?

Yes

Estimated WCRT

No

Task interference monotonically

decreases

26

EVALUATION (2-CORE)

One core runs statemate another core runs the program under evaluation

27

EVALUATION (4-CORE)

Either runs (edn, adpcm, compress, statemate) or runs (matmult, fir, jfdcint, statemate) in 4 different cores

28

MICRO-ARCHITECTURAL MODELING

pipelinecache

branch predictor

Single Core

Interactions

shared cache

shared bus

Multi Core

29

TIMING ANOMALY (SHARED CACHE)

hit miss

hit hit missmiss

miss miss missmisshit hit hit hit

misshit

May not be the worst case path 30

BASELINE ABSTRACTION – TIMING INTERVAL

Representing each pipeline stage as a timing interval

IF

IF

IF

IF

IF

ID

ID

ID

ID

ID

EX

EX

EX

EX

EX

WB

WB

WB

WB

WB

CM

CM

CM

CM

CM

Structural dependency

R1 := R2 + 5

R5 := R1 * R7

R3 := R5 * 5

Contention

A fixed-point analysis derives the timing of each stage as an interval 31

[3,7] [4,10]start finish

latency

[1,3]End = Start + cache miss latency interval

TDMA SHARED BUS ANALYSIS

Time Division Multiple Access (TDMA) Offset abstraction

Core 0 Core 1 Core 0 Core 1

Core 0 Core 1 Core 0 Core 1

T(core 1)

offsetround round

offsetdelayT’

(core 0)

delay = 0

32

LOOP CONSTRUCT

How do we define bus context?

IF

IF

IF

IF

ID

ID

ID

ID

EX

EX

EX

EX

WB

WB

WB

WB

CM

CM

CM

CM

previousiteration

currentiteration

Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change

33

LOOP CONSTRUCT

Bus context flow graph

C1

C2

C3

C4

C5 C3C5

Property: If Ci Cj, then Ci+k Cj+k for any k > 0 34

Ci = bus context of the loop body at i-th iteration

LOOP CONSTRUCT

C1

C2

C3

C4

Compute WCET for each bus context

E(C1) = number of times context C1 is executed

Generate linear constraints:E(C1) + E(C2) + E(C3) + E(C4) ≤ loop boundE(C1) ≥ E(C2)

Bus context flow graph

35

loop bound

Program Micro-architectural

modeling

Control flow graph

WCET of basic blocks

constraints

Infeasible path

constraints

Loop bound

Path analysis

ILPsolve

r

ILP = Integer Linear Programming

BRANCH PREDICTION + CACHE

m’

m

m

Branch location

Maximum number of speculated instructions

JOIN

Unclearcache access

Cachecontent

Cachecontent

36

Cache conflict

EXPERIMENTAL SETUP (CHRONOS TOOLKIT)

C sourceGCC

simplescalar Binary code CFG

Micro architectural

modeling

Private cache

pipeline Branchprediction

Micro-architectural constraints

ILP

Flowconstraints

WCET

Shared cache Shared bus

37

EVALUATION (CACHE + PIPELINE)

jfdctintstatemate

Imprecision of sharedcache analysis

38

Core 1 Core 2

Vertically partition

Core 1

Core 2

Horizontally partition

EVALUATION (CACHE + PIPELINE + SPECULATION)

Imprecision of modelingspeculation

39

EVALUATION (BUS + PIPELINE)

Imprecision of sharedbus analysis

Imprecision of path analysis

40

RECAP

41

Dissertation work(Time-predictable execution in multi-core)

Unified cache

Shared cacheShared cache

+shared bus

A multi-core WCET tool

Cache related preemption delay

analysis

Coherence missmodeling

Shared scratchpadallocation

Core 1 Core n

L1 datacache

L1 data cache

Shared L2 cache

Memory

Shared bus

Coherencemiss traffic

Stale data items

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

High priority task

Low priority taskCache

conflict

Task

c

PE-0 PE-1 PE-N

SPM-0 SPM-1 SPM-N

Shared off-chip data bus

Off-chip memory

External Memory Interface

……

Fast on-chip communication media

PERSPECTIVE

42

Time-predictable execution in single-core

Time-predictable execution in multi-core

Resource sharing(cache and bus)

Data sharing(cache coherence)

Testing Static analysis

Shared cache

Shared bus

Cachecoherence

Customizedhardware

Sharedscratchpad

ARM Cortex A9 MPCoreSamsung Exynos

Nvidia Tegra II(smart phones)

Time Division Multiple Access

Aethreal Network-on-chip

Sony PSPIBM Cell

PERSPECTIVE

Spuriouscounter example

Abstraction

Property

Concrete domain

Verifier

Abstractionrefinement

Functionality Verification

Verified

SLAM

(Microsoft)

BLAST

(UC Berkley)

MAGIC

(CMU) Abstract

domain in abstract

Interpretation (AI)

AI

Concrete domain

May bespurious Generate

Quantitative property

Path-sensitive Verification

Quantitative Verification

Refinement

Anytime

Verificatio

n

of

Quantitative

properties

FUTURE WORK

44Battery life

Mobile devices

x < y

x == y

m1

m2

x < y

x = y x = y

x < y x ≥ y

assert (C_m <= 1)

Symbolic ExecutionStatic performance analysis + testing

Performancetesting

abort

Energy analysis of software

Energy-aware software testing

x < y ˄ x ≠ y

Input

(Quantitative property e.g. cache conflict)

THANK YOU

45

My sincere thanks to all the Examiners and especially the anonymous Examiner 1 for his

comment on symbolic execution