Multi-core scheduling optimizations for soft real-time ...

33
Multi-core scheduling optimizations for soft real-time applications a cooperation aware approach co-authors: Liria Matsumoto Sato Patrick Bellasi William Fornaciari Wolfgang Betz Lucas De Marchi sponsors:

Transcript of Multi-core scheduling optimizations for soft real-time ...

Page 1: Multi-core scheduling optimizations for soft real-time ...

Multi-core scheduling optimizationsfor soft real-time applications

a cooperation aware approach

co-authors:

Liria Matsumoto Sato

Patrick Bellasi

William Fornaciari

Wolfgang Betz

Lucas De Marchi

sponsors:

Page 2: Multi-core scheduling optimizations for soft real-time ...

Agenda

IntroductionMotivationObjectives

AnalysisOptimization descriptionExperimental resultsConclusions & future works

Page 3: Multi-core scheduling optimizations for soft real-time ...

Introduction – motivation

Page 4: Multi-core scheduling optimizations for soft real-time ...

Introduction – motivation

SMP + RTMultiple processing units inside a processorDeterminism

Parallel Programming ParadigmsData Level Parallelism (DLP)

Competitive tasks

Task Level Parallelism (TLP)Cooperative tasks

Page 5: Multi-core scheduling optimizations for soft real-time ...

Introduction – motivationDLP TLP

T1

T0

T2

T3

T0

T1

T2

Characterization:SynchronizationCommunication

Page 6: Multi-core scheduling optimizations for soft real-time ...

Introduction – motivation

Linux RT scheduler (mainline)Run as soon as possible (based on prio)⇔ Use as many CPUs as possibleOk for DLP!

But, what about TLP?Anyway, why do we care about TLP?

Page 7: Multi-core scheduling optimizations for soft real-time ...

Objectives

Study the behavior of RT Linux scheduler for cooperative tasks

Optimize the RT schedulerSmooth integration into mainline kernel

Don't throw away everything

Page 8: Multi-core scheduling optimizations for soft real-time ...

Analysis – benchmark

Simulation of a scenario where SW replaces HW

Multimedia-likeMixed workload:

DLP + TLPChallenge: map N

tasks to M cores optimally

Page 9: Multi-core scheduling optimizations for soft real-time ...

Analysis – metrics

ThroughputSample mean

time

DeterminismSample variance

Page 10: Multi-core scheduling optimizations for soft real-time ...

Analysis – metrics

Influencing factorsPreemption-disabledIRQ-disabledLocality of wake-up

Intel i7 920:4 cores2.8 GHz

Page 11: Multi-core scheduling optimizations for soft real-time ...

Analysis – locality of wake-upMigrations

wave0: 112010101010101010101030111301010101010131010321010033 wave1: 033333333333333333333313202133333333333303333202332121 wave2: 022022222222222222222222233302222222222222222213322330 wave3: 111101010101010101010101011110101010101010101010101002 mixer0: 023033333333333333333333333202333333333333333333322233 mixer1: 103302201010101010101010101010330101010101010101010101 mixer2: 230201010101010101010101012020101010101010101010101021monitor: 111122222222222222222222231112222222222222222233322303

a) migration patterns (wave0, mixer1 and mixer2)

Page 12: Multi-core scheduling optimizations for soft real-time ...

Analysis – locality of wake-upMigrations

wave0: 302121331102303323303311032320011021111111120101010101 wave1: 223303013333212010121032121012333210320230333333333333 wave2: 033332232302003111201211330021322233323330220222222222 wave3: 110211120211101030031121003030120101101111111010101010 mixer0: 322210301110320231310303021213120013220320230333333333 mixer1: 001003321202220131103203212130320331201031033022010101 mixer2: 230101020201202020121020021010130101201202302010101010monitor: 113022133333131312032133303222223233123111111222222222

b) occasional migrations

Page 13: Multi-core scheduling optimizations for soft real-time ...

Analysis – locality of wake-up

Cache-miss rate measurements

1 CPU 2 CPUs 4 CPUs Increase (1 - 4)

Load 7.58% 8.99% 9.44% +24.5%

Store 9.29% 9.78% 11.62% +25.1%

Page 14: Multi-core scheduling optimizations for soft real-time ...

Analysis – conclusion

Why do we care about TLP?Common parallelization technique

What about TLP?Current state of Linux scheduler is not as

good as we want

Page 15: Multi-core scheduling optimizations for soft real-time ...

Solution – benchmarkAbstraction:

One application level

sends data to another

Serialization (task cooperation)

Parallelization (task com

petition)

w0

w1

w2

w3

m0

m1

m2

Page 16: Multi-core scheduling optimizations for soft real-time ...

Solution – benchmarkAbstraction:

One application level

sends data to another

Reality:

shared buffers +

synchronization

Serialization (task cooperation)

Parallelization (task com

petition)

w0

w1

w2

w3

m0

m1

m2

buffer

Page 17: Multi-core scheduling optimizations for soft real-time ...

Solution – benchmarkAbstraction:

One application level

sends data to another

Reality:

shared buffers +

synchronization

Dependencies:

Define dependencies

among tasks in the

opposite way of

data flow

Serialization (task cooperation)

Parallelization (task com

petition)

w0

w1

w2

w3

m0

m1

m2

Page 18: Multi-core scheduling optimizations for soft real-time ...

If data do not go to tasks, then the tasks go to where data were produced

Make tasks run on same CPU of their

dependencies

Solution – ideaDependency followers

Page 19: Multi-core scheduling optimizations for soft real-time ...

Measurement tools

Ftracesched_switch tool

(Carsten Emde)*gtkwaveperfadhoc scripts

* http://www.osadl.org/Single-View.111+M5d51b7830c8.0.html

Page 20: Multi-core scheduling optimizations for soft real-time ...

Solution – task-affinityDependency followers

Task-affinity:

selection of the CPU in which a task (e.g. m0) executes takes into consideration the CPUs in which its dependencies (e.g. w0 and w1) ran last time.

w0

w1

w2

w3

m0

m1

m2

Page 21: Multi-core scheduling optimizations for soft real-time ...

Solution – task-affinityimplementation

2 lists inside each task_struct:taskaffinity_listfollowme_list

2 system calls to add/delete affinities:sched_add_taskaffinitysched_del_taskaffinity

Page 22: Multi-core scheduling optimizations for soft real-time ...

1 core 2 cores 4 cores

5.00%

6.00%

7.00%

8.00%

9.00%

10.00%

11.00%

12.00%

9.29%

9.78%

11.62%

10.17%

8.68%

10.35%

mainline task-aff inity

% o

f sto

re c

ach

e-m

iss

1 core 2 cores 4 cores

5.00%

5.50%

6.00%

6.50%

7.00%

7.50%

8.00%

8.50%

9.00%

9.50%

10.00%

7.58%

8.99%

9.44%

7.92% 7.88%

8.58%

mainline task-af f inity

% o

f lo

ad

ca

che

-mis

s

Experimental resultsCache-miss rates

More data/instruction hit

the L1 cache

less cache lines are invalidated

Measurements without and with task-affinity

Page 23: Multi-core scheduling optimizations for soft real-time ...

Experimental resultsWhat exactly to evaluate?

w0

w1

w2

w3

m0

m1

m2

Cache-miss rate is not exactly what we want to optimize

Optimization objectives:Lower the time to

produce a single sample

Increase determinism on production of several samples

Page 24: Multi-core scheduling optimizations for soft real-time ...

w ave0 w ave1 w ave2 w ave3 mixer0 mixer1 mixer2

8

10

12

14

16

18

20

8.81

9.479.21 9.19

16.34

15.41

17.55

9.15

9.649.47

9.84

17.39

16.69

17.82

task-aff inity mainline

ave

rag

e e

xecu

tion

tim

e (

µs

)Experimental resultsAverage execution time of each task

w0

w1

w2

w3

m0

m1

m2

Page 25: Multi-core scheduling optimizations for soft real-time ...

w ave0 w ave1 w ave2 w ave3 mixer0 mixer1 mixer2

0

0.5

1

1.5

2

2.5

0.22

0.31 0.290.36

0.61

0.380.35

0.570.51 0.48

0.83

1.45

2.19

0.47

task-aff inity mainline

vari

an

ce o

f th

e e

xecu

tion

tim

e (

µs

)²Experimental resultsVariance of execution time of each task

w0

w1

w2

w3

m0

m1

m2

Page 26: Multi-core scheduling optimizations for soft real-time ...

Experimental resultsProduction time of each single sample

mainline task-affinity

Results obtained for 150,000 samples

Page 27: Multi-core scheduling optimizations for soft real-time ...

Experimental resultsProduction time of each single sample

Empiric repartition functionReal-time metric (normal distribution):

average + 2 * standard deviation

Page 28: Multi-core scheduling optimizations for soft real-time ...

02

46

810

1214

1618

2022

2426

2830

3234

3638

4042

4446

4850

5254

5658

6062

6466

6870

7274

7678

8082

8486

8890

9294

9698

100102

104106

108110

112114

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

time (us)

% o

f sa

mp

les

41.2 µs

01

23

45

67

8910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

3738

3940

4142

4344

4546

4748

4950

51

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

time (us)

% o

f sa

mp

les

38.96 µs

Max: 114Min: 29Avg: 37.8Variance: 3.22

Max: 51Min: 35Avg: 38.0Variance: 0.21

mainline

task-affinity

Page 29: Multi-core scheduling optimizations for soft real-time ...

Experimental resultssummary

~15x

Page 30: Multi-core scheduling optimizations for soft real-time ...

Conclusion & future works

Average execution time is almost the sameDeterminism for real-time applications is

improved

Future works:Better focus on temporal localityImprove task-affinity configurationTest on other architecturesClean up the repository

Page 31: Multi-core scheduling optimizations for soft real-time ...

Conclusion & future works

Still a Work In ProgressGit repository:

git://git.politreco.com/linux-lcs.git

Contact:[email protected]@gmail.com

Page 32: Multi-core scheduling optimizations for soft real-time ...

Q & A

Page 33: Multi-core scheduling optimizations for soft real-time ...

Solution – Linux schedulerDependency followers

Linux scheduler: Change the decision process of the CPU in which a task executes when it is woken up

Mainscheduler

Periodicscheduler

CoreScheduler

CPUi

Contextswitch

RT IdleFair

RR FIFO Normal

Batch

Idle

Selecttask

Schedulerclasses

Schedulerpolicies