Time-Predictable Execution of Embedded Software on Multi-core Platforms

45
TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury 1

description

Time-Predictable Execution of Embedded Software on Multi-core Platforms. Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury. Embedded Systems. Real-time Constraints. Hard real-time. Embedded system. Soft real-time. Timing Analysis. - PowerPoint PPT Presentation

Transcript of Time-Predictable Execution of Embedded Software on Multi-core Platforms

Page 1: Time-Predictable Execution of Embedded Software on Multi-core Platforms

TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS

Sudipta Chattopadhyay

under the guidance of A/P Abhik Roychoudhury

1

Page 2: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EMBEDDED SYSTEMS

2

Page 3: Time-Predictable Execution of Embedded Software on Multi-core Platforms

REAL-TIME CONSTRAINTS

3

Embeddedsystem

Hard real-time

Soft real-time

Page 4: Time-Predictable Execution of Embedded Software on Multi-core Platforms

TIMING ANALYSIS

Hard real time systems require absolute timing guarantees System level analysis Single task analysis

Worst case execution time (WCET) analysis An upper bound on execution time for all possible

inputs Sound over-approximation is obtained by static

analysis

4

Page 5: Time-Predictable Execution of Embedded Software on Multi-core Platforms

WCET ANALYSIS

Program Micro-architectural

modeling

Control flow

graph

WCET of basic blocks

constraints

Infeasible path

constraints

Loop bound

Path analysis

WCETboun

d

5

Page 6: Time-Predictable Execution of Embedded Software on Multi-core Platforms

ARCHITECTURE

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

Memory

Shared busResource sharing

6

Page 7: Time-Predictable Execution of Embedded Software on Multi-core Platforms

OVERVIEW

7

Dissertation work(Time-predictable execution in multi-core)

Unified cache

Shared cacheShared cache

+shared bus

A multi-core WCET tool

Cache related preemption delay

analysis

Coherence missmodeling

Shared scratchpadallocation

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

Memory

Shared bus Resource sharing

Main Memory

L1 instruction cache

Instr. accesses

Data accesses

Bus

L1 data cache

L2 unified cache

Processor

Conflicts with different instruction and data memory

blocks

Page 8: Time-Predictable Execution of Embedded Software on Multi-core Platforms

MICRO-ARCHITECTURAL MODELING

pipelinecache

branch predictor shared cache

shared bus

Single Core Multi Core

8

Page 9: Time-Predictable Execution of Embedded Software on Multi-core Platforms

(AI+MC) MC > RTSS’10 = RTSS’10

COMPARISON

9

Work Micro-arch. level

technique

Program level

technique

Precision

Scalability

Classical abstract

interpretation (AI)

AI AI × √

Classical model checking (MC)

MC MC √ ×RTS’00

(aiT, Chronos)AI Integer

linearprogrammi

ng

Can be improve

d

√RTSS’10 AI MC Can be

improved

_

Our approach (AI+MC) Integer linear

programming

> RTS’00 = RTS’00

Page 10: Time-Predictable Execution of Embedded Software on Multi-core Platforms

IMPRECISION IN ABSTRACT INTERPRETATION

p1 p2

Cache state = C1

Cache state = C2

Joined Cache state = C3

10

a

b

b

x

Abstractcache set

Abstractcache set

youngyoung

b Joined cache statePath p1 or path p2?

Joined cache state loses information about path p1 and p2

Page 11: Time-Predictable Execution of Embedded Software on Multi-core Platforms

MODEL CHECKING ALONE ?

A path sensitive search Path sensitive search is expensive – path

explosion Worse, combined with possible cache states

p1 p2

Cache state = C1

Cache state = C2

11

Page 12: Time-Predictable Execution of Embedded Software on Multi-core Platforms

MODEL CHECKING ALONE ?

A path-sensitive search Path sensitive search is expensive – path

explosion Worse, combined with possible cache states

p1p2

12

a

b

young b

x

Abstract LRU cache set

young

a

b

Abstract LRU cache set

young b

xAbstract LRU

cache set

young

State Explosion

Page 13: Time-Predictable Execution of Embedded Software on Multi-core Platforms

CACHE ANALYSIS

Program

Pipelineanalysis

Branch predictormodeling

WCET of basic blocks

constraints

Infeasible path

constraints

Loop bound

IPET

Micro architectural modeling

Path analysis

Cache analysis by

abstract interpretatio

n

Analysisoutcome

Refine by model checker

All checked

Timeout

13

Refinement by model checker can be terminated at any point

Model checker refinement steps are inherently parallel

Each model checker refinement step checks light assertion property

Page 14: Time-Predictable Execution of Embedded Software on Multi-core Platforms

REFINEMENT (INTER-CORE)

14m

m

Task

Cache hit

start

exit

Conflicting task

Cache miss

m1

m2

m

cache

x < y

x == yInfeasible

m1

m2

Spurious

≠m ≠myoung

Page 15: Time-Predictable Execution of Embedded Software on Multi-core Platforms

REFINEMENT (INTER-CORE)

m

m

Task

start

exit

Conflicting task

m1

m2

m

cache

x < y

x == yInfeasible

m1

m2

C_m++Increment

conflict

C_m++

Increment conflict

assert (C_m <= 1)

Verified

m

A Cache Hit

15

young

Page 16: Time-Predictable Execution of Embedded Software on Multi-core Platforms

REFINEMENT (WHY IT WORKS?)

16

Path 2

Cache missm

m

Conflict to mm’

C_m++ Increment conflict

assert (C_m <= 0)

Property

Does not affect the value of

C_m

x < y

x == y

m’

m

Page 17: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EXPERIMENTAL SETUP (CHRONOS TOOLKIT)

17

C sourceGCC

simplescalar Binary code CFG

Micro architectural

modeling

cache pipeline Branchprediction

Micro-architectural constraints

ILP

Flowconstraints

WCET

CBMC

C bounded model checking

Page 18: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EXPERIMENTAL RESULT

18

Page 19: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EXPERIMENTAL RESULT

19

L1 cache L1 cache

Shared L2 cache

WCET

4-way associative, 8 KB

Direct-mapped, 256 bytes

Average time = 70 secs

Tasks

cnt

jfdctint

edn

fir

fdct

ndes

Page 20: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EXTENSION USING SYMBOLIC EXECUTION

Conflicting task

m1

m2

x < y

x == y

m1

m2

C_m++Increment

conflict

C_m++

Increment conflict

assert (C_m <= 1)

x < y

constraint

solver

x = y x = y

x < y x ≥ y

x < y ˄ x = y

unknown

NO

assert (C_m <= 1)

satisfied

abort

20

Page 21: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EXTENSION USING KLEE

21

C sourceGCC

simplescalar Binary code CFG

Micro architectural

modeling

cache pipeline Branchprediction

Micro-architectural constraints

ILP

Flowconstraints

WCET

CBMC/KLEE

Page 22: Time-Predictable Execution of Embedded Software on Multi-core Platforms

A GENERIC FRAMEWORK

Three different architectural/application settings

Intra task(WCET in single core)

Highpriority

Lowpriority

Inter task(Cache Related

Preemption Delay analysis)

cache cache L1 cache L1 cache

Shared L2 cache

Task in Core 1

Task in Core 2

Inter core(WCET in multi-core)

22

Cacheconflict Cache

conflictCacheconflict

Page 23: Time-Predictable Execution of Embedded Software on Multi-core Platforms

MICRO-ARCHITECTURAL MODELING

pipelinecache

branch predictor shared cache

shared bus

Single Core Multi Core

23

Page 24: Time-Predictable Execution of Embedded Software on Multi-core Platforms

TASK-LEVEL INTERFERENCE

Timeline

T3

T2

T1

T1

T2

T3

Task interference graph24

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

T1 T2 T3

Shared bus

Tasks

Page 25: Time-Predictable Execution of Embedded Software on Multi-core Platforms

SHARED CACHE + TDMA SHARED BUS

T1

T2

T3

T4

Core 1slot

Core 2slot

Core 1slot

Core 2slot

T1

T2

T3

T4

L2 missdue to

T2

Disjointlifetime

WAIT

T4

25

Core 1 Core 2

L1 cache L1 cache

Shared L2 cache

Shared bus

Task graphsTime Division Multiple Access (TDMA)

T1 T2

T3 T4

Bus access

Bus access

Page 26: Time-Predictable Execution of Embedded Software on Multi-core Platforms

OVERVIEW OF THE FRAMEWORK

L1 cache analysis

L2 cacheanalysis

Filter

L1 cache analysis

L2 cache analysis

L2 conflict analysisInitial interference

Filter

Bus awareanalysis

WCRT computation

Interference changes ?

Yes

Estimated WCRT

No

Task interference monotonically

decreases

26

Page 27: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EVALUATION (2-CORE)

One core runs statemate another core runs the program under evaluation

27

Page 28: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EVALUATION (4-CORE)

Either runs (edn, adpcm, compress, statemate) or runs (matmult, fir, jfdcint, statemate) in 4 different cores

28

Page 29: Time-Predictable Execution of Embedded Software on Multi-core Platforms

MICRO-ARCHITECTURAL MODELING

pipelinecache

branch predictor

Single Core

Interactions

shared cache

shared bus

Multi Core

29

Page 30: Time-Predictable Execution of Embedded Software on Multi-core Platforms

TIMING ANOMALY (SHARED CACHE)

hit miss

hit hit missmiss

miss miss missmisshit hit hit hit

misshit

May not be the worst case path 30

Page 31: Time-Predictable Execution of Embedded Software on Multi-core Platforms

BASELINE ABSTRACTION – TIMING INTERVAL

Representing each pipeline stage as a timing interval

IF

IF

IF

IF

IF

ID

ID

ID

ID

ID

EX

EX

EX

EX

EX

WB

WB

WB

WB

WB

CM

CM

CM

CM

CM

Structural dependency

R1 := R2 + 5

R5 := R1 * R7

R3 := R5 * 5

Contention

A fixed-point analysis derives the timing of each stage as an interval 31

[3,7] [4,10]start finish

latency

[1,3]End = Start + cache miss latency interval

Page 32: Time-Predictable Execution of Embedded Software on Multi-core Platforms

TDMA SHARED BUS ANALYSIS

Time Division Multiple Access (TDMA) Offset abstraction

Core 0 Core 1 Core 0 Core 1

Core 0 Core 1 Core 0 Core 1

T(core 1)

offsetround round

offsetdelayT’

(core 0)

delay = 0

32

Page 33: Time-Predictable Execution of Embedded Software on Multi-core Platforms

LOOP CONSTRUCT

How do we define bus context?

IF

IF

IF

IF

ID

ID

ID

ID

EX

EX

EX

EX

WB

WB

WB

WB

CM

CM

CM

CM

previousiteration

currentiteration

Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change

33

Page 34: Time-Predictable Execution of Embedded Software on Multi-core Platforms

LOOP CONSTRUCT

Bus context flow graph

C1

C2

C3

C4

C5 C3C5

Property: If Ci Cj, then Ci+k Cj+k for any k > 0 34

Ci = bus context of the loop body at i-th iteration

Page 35: Time-Predictable Execution of Embedded Software on Multi-core Platforms

LOOP CONSTRUCT

C1

C2

C3

C4

Compute WCET for each bus context

E(C1) = number of times context C1 is executed

Generate linear constraints:E(C1) + E(C2) + E(C3) + E(C4) ≤ loop boundE(C1) ≥ E(C2)

Bus context flow graph

35

loop bound

Program Micro-architectural

modeling

Control flow graph

WCET of basic blocks

constraints

Infeasible path

constraints

Loop bound

Path analysis

ILPsolve

r

ILP = Integer Linear Programming

Page 36: Time-Predictable Execution of Embedded Software on Multi-core Platforms

BRANCH PREDICTION + CACHE

m’

m

m

Branch location

Maximum number of speculated instructions

JOIN

Unclearcache access

Cachecontent

Cachecontent

36

Cache conflict

Page 37: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EXPERIMENTAL SETUP (CHRONOS TOOLKIT)

C sourceGCC

simplescalar Binary code CFG

Micro architectural

modeling

Private cache

pipeline Branchprediction

Micro-architectural constraints

ILP

Flowconstraints

WCET

Shared cache Shared bus

37

Page 38: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EVALUATION (CACHE + PIPELINE)

jfdctintstatemate

Imprecision of sharedcache analysis

38

Core 1 Core 2

Vertically partition

Core 1

Core 2

Horizontally partition

Page 39: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EVALUATION (CACHE + PIPELINE + SPECULATION)

Imprecision of modelingspeculation

39

Page 40: Time-Predictable Execution of Embedded Software on Multi-core Platforms

EVALUATION (BUS + PIPELINE)

Imprecision of sharedbus analysis

Imprecision of path analysis

40

Page 41: Time-Predictable Execution of Embedded Software on Multi-core Platforms

RECAP

41

Dissertation work(Time-predictable execution in multi-core)

Unified cache

Shared cacheShared cache

+shared bus

A multi-core WCET tool

Cache related preemption delay

analysis

Coherence missmodeling

Shared scratchpadallocation

Core 1 Core n

L1 datacache

L1 data cache

Shared L2 cache

Memory

Shared bus

Coherencemiss traffic

Stale data items

Core 1 Core n

L1 cache L1 cache

Shared L2 cache

High priority task

Low priority taskCache

conflict

Task

c

PE-0 PE-1 PE-N

SPM-0 SPM-1 SPM-N

Shared off-chip data bus

Off-chip memory

External Memory Interface

……

Fast on-chip communication media

Page 42: Time-Predictable Execution of Embedded Software on Multi-core Platforms

PERSPECTIVE

42

Time-predictable execution in single-core

Time-predictable execution in multi-core

Resource sharing(cache and bus)

Data sharing(cache coherence)

Testing Static analysis

Shared cache

Shared bus

Cachecoherence

Customizedhardware

Sharedscratchpad

ARM Cortex A9 MPCoreSamsung Exynos

Nvidia Tegra II(smart phones)

Time Division Multiple Access

Aethreal Network-on-chip

Sony PSPIBM Cell

Page 43: Time-Predictable Execution of Embedded Software on Multi-core Platforms

PERSPECTIVE

Spuriouscounter example

Abstraction

Property

Concrete domain

Verifier

Abstractionrefinement

Functionality Verification

Verified

SLAM

(Microsoft)

BLAST

(UC Berkley)

MAGIC

(CMU) Abstract

domain in abstract

Interpretation (AI)

AI

Concrete domain

May bespurious Generate

Quantitative property

Path-sensitive Verification

Quantitative Verification

Refinement

Anytime

Verificatio

n

of

Quantitative

properties

Page 44: Time-Predictable Execution of Embedded Software on Multi-core Platforms

FUTURE WORK

44Battery life

Mobile devices

x < y

x == y

m1

m2

x < y

x = y x = y

x < y x ≥ y

assert (C_m <= 1)

Symbolic ExecutionStatic performance analysis + testing

Performancetesting

abort

Energy analysis of software

Energy-aware software testing

x < y ˄ x ≠ y

Input

(Quantitative property e.g. cache conflict)

Page 45: Time-Predictable Execution of Embedded Software on Multi-core Platforms

THANK YOU

45

My sincere thanks to all the Examiners and especially the anonymous Examiner 1 for his

comment on symbolic execution