Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

28
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores Zheng Wu

description

Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores. Zheng Wu. Outline. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment Setup Experiment Results Contribution Conclusion. - PowerPoint PPT Presentation

Transcript of Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Page 1: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Zheng Wu

Page 2: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Background• Motivation• Analysis Framework• Intra-Core Cache Analysis• Cache Conflict Analysis• Optimization Techniques• WCRT Analysis• Experiment Setup• Experiment Results• Contribution• Conclusion

Outline

Page 3: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Hard Real-time Systems– Worst case execution time: essential input for schedulability analysis

• Static Program Analysis– Program path modeling: infeasible path/ loop bound detection– Micro-architecture modeling: instruction/data cache, branch prediction,

out-of-order pipeline.

Background

Dis

tribu

tion Actual WCET

Execution Time

ActualObserved

Observed WCET

Page 4: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Background• Concurrent Programs

– Task interaction: control/data dependency, preemption.– Resource contention: Shared cache multi-core architectures

• Problem: shared L2 instruction cache contention.

Core 1 Core nCPU

L1 Cache

CPU

L1 Cache

L2 Cache

……

Page 5: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Message Sequence Chart (MSC)

time

Process 1 Process 2 Process 3

fm1

fm2

fm4

fr0

fr1

fs0

Core 1 Core 2

task Message communication

System Model

Page 6: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Different cores conflict in L2 Shared cache.

Problem: Shared L2 cache conflicts

L2 Cache

Set 0

Set 1

Set 2

Set 3

Core 1 Core 2Process 1 Process 2 Process 3

fm1

fm2

fm4

fr0

fr1

fs0

m m’

concurrent

Page 7: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• J. Yan and W. Zhang RTAS 2008– T, T’ are from different cores and they conflict for cache set C. – All the accesses from T and T’ to C are cache misses in the worst case.

Related Work

L2 Cache

Set C - 1

Set C

Set C + 1

Core 2

m1, …, mk m’1, …, m’k

misses misses

Core 1Process 1

fm1

fm2

fm4

Process2 Process3

fr0

fr1

fs0

Page 8: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Task Execution Lifetime

Motivation

Core 1 Core 2

Start time

End time

Scenario 1: overlap lifetime

conflicts

Core 1 Core 2

Start time

End time

Scenario 2: disjoint lifetime

No conflicts

Task 1 Task 2 Task 1 Task 2

Page 9: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Analysis Framework

yes

no

Estimated WCRT

Intra Core Cache Analysis

Core 1

Initial task interference

Intra Core Cache Analysis

Core n

L2 cache conflict analysis

WCRT Analysis

Task Interference changes ?

……

Page 10: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Analysis Framework

yes

no

Estimated WCRT

Initial task interference

Modified taskinterference

L2 cache conflict analysis

WCRT Analysis

Interference changes ?

Intra Core Cache Analysis

Core n……Intra Core Cache Analysis

Core 1

Page 11: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Must Analysis: Always Hit (AH)– Memory blocks guaranteed to be present in the cache

• May Analysis: Always Miss (AM)– Memory blocks may be present in the cache.

• Persistence Analysis – Never evicted from cache after first iteration.

• Others: Non Classified (NC)

Intra-core Cache AnalysisH. Theiling, C. Ferdinand. and R. Wilheml. Fast and precise WCET prediction by separated cache and path analyses. RTS 2000.

Page 12: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• L1 Intra-core Cache Analysis.− Always Hit (AH), Always Miss (AM), Non Classified (NC)

• L2 Intra-core Cache Analysis

Intra-core Cache Analysis

AH AM NC

L2 cache analysis

L1 cache analysis

AH AM NC

access not access

Page 13: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Analysis Framework

yes

no

Estimated WCRT

Intra Core Cache Analysis

Core 1

Initial task interference

Modified taskinterference

Intra Core Cache Analysis

Core 2

L2 cache conflict analysis

WCRT Analysis

Interference changes ?

Page 14: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Initial Task interference graph

L2 Cache Conflict Analysis

Page 15: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

L2 Cache Conflict Analysis

• Analyze each cache set individually• Intra core L2 analysis

– Always miss– Non classified– Always hit

Set i

Set j

Task T

m0, m1

m2, m3

conflicting tasks

m’0, m’1Non classified

Always hit

L2 cache

Page 16: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Optimization for Set Associativity• Consider memory block age: LRU replacement

– age(m): maximal/upper bound of age of m.

m0

m1

Age: 1

2

3

4

Task T

m2

Always hitAlways hitAlways hit

m’0, m’1

Conflicting tasks

m0

m1

Age: 1

2

3

4

w/o optimization

m2

Non classifiedNon classifiedNon classified

m0

m1

Age: 1

2

3

4

with optimization

m2

Always hitAlways hitNon classified

2 memory blocks

Page 17: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Analysis Framework

yes

no

Estimated WCRT

Intra Core Cache Analysis

Core 1

Initial task interference

Modified taskinterference

Intra Core Cache Analysis

Core 2

L2 cache conflict analysis

WCRT Analysis

Interference changes ?

Page 18: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

BCET and WCET Analysis

L1 cache L2 cache Best-Case Worst-CaseAH --- L1 hit L1 hit

AM AH L2 hit L2 hit

AM AM L2 miss L2 miss

AM NC L2 hit L2 miss

NC AH L1 hit L2 hit

NC AM L1 hit L2 miss

NC NC L1 hit L2 miss

• BCET and WCET •Access Latency for best case and worst case.

– Assumption: no timing anomalies with other architecture features• Shortest (longest) path

Page 19: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

WCRT Analysis• Compute earliest, latest ready and finish time

Page 20: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Initial task interference

L2 cache conflict analysis and WCRT analysis

Interference graph

Estimated WCRT

Change ?Yes

No

Putting Together

Page 21: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Experiments Parameters

• Cache access latency– L1 hit: 1 cycle, L2 hit: 10 cycle, Memory access: 100 cycle

• Various core numbers– 1 core, 2 cores and 4 cores

•Various cache configurations – cache size, block size, associativity

• Real-World Benchmarks: DEBIE.

Page 22: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Real-World Benchmarks: DEBIE.• Space Debris Monitoring Software• 8 MSC, 35 tasks.

0-1k 1k-2k 2k-4k 4k-8k 8k-16k 16k-0

2

4

6

8

10

12Code Size Distribution

Task Code Size

#of t

asks

Experiments Parameters

Page 23: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Comparison with Yan-Zhang RTAS 2008. • Direct mapped cache only.

Experimental Results

(a) WCRT Comparison (b) Inter-core Eviction Comparison

1-core, L2:8KB

2-core, L2:16KB

4-core, L2:32KB

10,000,000

15,000,000

20,000,000

25,000,000

Yan-Zhang's Method Our Method

Core Configuration (L1: 2KB)

Estim

ated

WC

RT(

mill

ion)

1-core. L2:8KB

2-core. L2:16KB

4-core. L2:32KB

05,000

10,00015,00020,00025,00030,000

Yan-Zhang's Method Our Method

Core Configuration (L1: 2KB)In

ter c

ore

Evic

tions

Page 24: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Vary L1 and L2 Size.

Experimental Results

(a) Varying L1 Size

512B 1KB 2KB 4KB0

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

Yan-Zhang's Method Our Method

Core Configuration (2-core, L2: 16KB)

Estim

ated

WC

RT(

mill

ion)

4KB 8KB 16KB 32KB18,000,00019,000,00020,000,00021,000,00022,000,00023,000,00024,000,00025,000,00026,000,000

Yan-Zhang's Method Our Method

Core Configuration (2-core, L1: 2KB)Es

timat

ed W

CR

T (m

illio

n)

(b) Varying L2 Size

Page 25: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• Set associative cache optimizationsExperimental Results

1way 2way 4way 8way16,000,00017,000,00018,000,00019,000,00020,000,00021,000,00022,000,00023,000,00024,000,000

w/o optimization wtih optimization

Core Configuration (2-core, L1:2KB)

Est

imat

ed W

CR

T (

mill

ion)

m0

m1

w/o optimization

m2

m0

m1

w/o optimization

m2

Non classifiedNon classifiedNon classified

Always hitAlways hitNon classified

Age:1234

Page 26: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

2KB 4KB 8KB 16KB 32KB05

1015202530

L1:2x512B

L1:2x1KB

L1:2x2KB

Shared L2 Cache Size

Ana

lysi

s T

ime

(sec

)

Experimental Results

• Runtime of our iterative analysis

Page 27: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

• WCRT analysis of concurrent programs running on Shared cache multi-cores.

• Use task lifetime to indentify real conflicts.

• Optimizations for set associative cache.

• Experiments: tighter WCET than state of the art.

• Future work: data cache, other replacement policy

Conclusion

Page 28: Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

Thank You!