: ILP-based Worst-Case Contention Estimation for Mesh Real...

38
www.bsc.es NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores December 13 th Nashville, USA 39 th IEEE Real-Time Systems Symposium RTSS 2018 Jordi Cardona 1,2 , Carles Hernandez 1 , Enrico Mezzetti 1 , Jaume Abella 1 and Francisco J.Cazorla 1,3 1 Barcelona Supercomputing Center (BSC) 2 Universitat Politècnica de Catalunya (UPC) 3 IIIA-CSIC

Transcript of : ILP-based Worst-Case Contention Estimation for Mesh Real...

Page 1: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

www.bsc.es

NoCo: ILP-based Worst-Case Contention

Estimation for Mesh Real-Time Manycores

December 13th Nashville, USA

39th IEEE Real-Time Systems Symposium

RTSS 2018

Jordi Cardona1,2, Carles Hernandez1, Enrico Mezzetti1, Jaume Abella1

and Francisco J.Cazorla1,3

1Barcelona Supercomputing Center (BSC) 2Universitat Politècnica de Catalunya (UPC)

3IIIA-CSIC

Page 2: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Critical Real-Time Embedded Systems

Used in industries like:

Require:

– Functional Correctness

– Timing Correctness

Need to provide evidence against the safety standards

– Avionics: DO178B/C

– Automotive: ISO26262

Avionics Space Railway

Validation & Verification (V&V) process

2

Page 3: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Increasing Performance Needs in CRTES

New software implementing complex functionalities

– Complex AI algorithms

– Manage Huge amounts of data

Performance needs increase significantly

Autonomous driving

ARM predicts that the performance requirements

of ADAS to grow 100x from 2016 to 2024

3

Page 4: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Covering high-performance needs

How to deliver the performance needed by CRTES Software in

an efficient way?

– “Embrace” high-performance hardware coming from mainstream market

• Multicores and Manycores

• Caches

• Accelerators

Networks on chip (NoCs)

SnapDragon

(automotive) Nvidia Pascal

(automotive) Kalray MPPA-256

(aviation)

4

Page 5: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

The other side of the coin …

High-performance (complex) hardware complicates timing

analysis, i.e. deriving WCET estimates for tasks

Source of the problem: contention – Must be bounded and reduced

– Worst-case Contention Delay (WCD) Worst Case Execution Time (WCET)

2x2 2D mesh with 4 cores

5

Page 6: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Related work

Real-Time Specific NoC designs:

– Provide Contention-Free NoCs and easy to V&V

– Do not scale well (bad average performance in general)

– High costs for being adopted in Industry

Wormhole NoC designs (wNoC)

– Best-effort wormhole NoCs (wormhole switching)

• Used in Commercial Off the shelf processors (low costs for industry)

• More difficult to derive upperbounds (can be very pessimistic)

– Optimize parameters of these NoCs

» Mapping, Routing, Bandwidth distribution, …

6

Page 7: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Worst Case Execution Time (WCET) - ZLL

WCET = f(ZLL,WCD)

– Zero Load Latency (ZLL) = f(distance)

1. Mapping

3 hops

5 hops

7

Page 8: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Worst Case Execution Time (WCET)

WCET = f(ZLL,WCD)

– Zero Load Latency (ZLL) = f(distance)

1. Mapping

– Worst case Contention Delay (WCD) = f(Routing, Arbitration)

2. Routing

3. Bandwidth weighted allocation (walloc)

8

Page 9: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Worst Case Execution Time (WCET) - WCD

WCET = f(ZLL,WCD)

– Zero Load Latency (ZLL) = f(distance)

1. Mapping

– Worst case Contention Delay (WCD) = f(Routing, Arbitration)

2. Routing

3. Bandwidth weighted allocation (walloc)

3x3 mesh flows mapping using XY 3x3 mesh flows mapping using XY-YX combination

Y

X

9

Page 10: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Worst Case Execution Time (WCET) - WCD

WCET = f(ZLL,WCD)

– Zero Load Latency (ZLL) = f(distance)

1. Mapping

– Worst case Contention Delay (WCD) = f(Routing, Arbitration)

2. Routing

3. Bandwidth weighted allocation (walloc)

RR arbitration Weighted mesh arbitration (WRR) 2x2 2D mesh XY flows mapping

10

WCD = 15 WCD = 10

WCET is affected by all three parameters:

Mapping, Routing and Walloc

Page 11: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Parameters are inter-dependent

WCET = f(ZLL, WCD) = f(Mapping, Routing, Walloc)

Optimizing each parameter individually or in pairs, does not provide a global

optimal NoC configuration, just a local one.

11

All the parameters need to be optimized at the

same time

Routing constraints

Mapping constraints

Bandwidth constraints

Page 12: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Our proposal: NoCo

Given a

– Workload (Tasks)

– Wormhole Mesh NoC configuration

Optimizes

– The WCET of applications finding the best mesh configuration:

• Mapping

• Routing

• Weights allocation (Walloc)

NoCo uses:

– Stochastic exploration to optimize routing

– Integer Linear Programming (ILP) to optimize Mapping and Walloc

12

Page 13: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Agenda

Introduction and Motivation

Background and problem analysis

NoCo: Stochastic/ILP model

Evaluation

Conclusions

13

Page 14: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo: ILP-based Worst-Case Contention Estimation

for Mesh Real-Time Manycores

Jordi Cardona

PROPOSAL: NOCO

Page 15: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Optimization Framework

– Routing

• Stochastic

• Generates random routes and pass it to the ILP optimizer

– Placement, walloc

• Integer Linear Programming (ILP)

• Placement and Walloc are optimized per each routing

– Selection of the best setup

Route Route

Approach/Concept

15

Stochastic

Random

selection Route ILP

NoC

Performanc

e

Best

configuration

NoC

Performance

NoC

Optimized

configurations

Route generation Mapping and Walloc

Optimization

Page 16: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

Problem description:

– Tasks information

• Execution Time Observed (ETO)

• Memory Accesses

– NoC information:

• Target Node location

• Number of routers

– Constraints (only one task to each core)

Main stages of NoCo Framework

(mapping and walloc)

16

Page 17: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

Routing Stochastic exploration

– Generate Randomly Routing configurations

• Minimal distance routing policies

• Deterministic routing policies (ie XY, YX)

• Deadlock avoidance

– Prohibiting certain turns (no cycles)

(mapping and walloc)

Main stages of NoCo Framework

17

Page 18: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

Routing Random sampling (finite population)

– C = probability that one of the top X% routes is not in the random

sample.

The probability of having 1 routing in the 1% of the top routings in

a 1000 size sample is 1-0,000043 = 0,999957 (99,99%)

Worst routings Best routings

1% 0,1%

18

Page 19: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

Stochastic Routing

– It warrantees stochastically to find one of the best routing solutions at

low cost (without exploring all the possible routings)

• With 330 samples out of 2^16 = 65536 routings finds the best routing

(0,5% of the population)

Main stages of NoCo Framework

(mapping and walloc)

19

Page 20: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

Mapping and Walloc ILP optimization

Main stages of NoCo Framework

20

Page 21: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

ILP model – Objective function:

The WCET of the application is

determined by the WCET of the

slowest thread

Parallel applications

21

W_C1 W_C2 W_C3 W_C3

Page 22: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

ILP model

– Compute WCET:

• Bandwidth and WCD modeling

NoCo Proposal

22

Number of

Memory accesses

WCET in isolation

BW distribution constraints

from Routing configuration

Path flows mapping

from Routing configuration

Page 23: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

ILP model

– Compute WCET:

• Routing rules:

– Bandwidth distribution

– Path restrictions

• Other restrictions:

– One task assigned in one core

– One core can only run one task

– BW assigned to a cores > 0.0

– WCD of all tasks > 0.0

– Total BW in the mesh must be 1

23

Encoded in Boolean matrixes

Page 24: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo Proposal

Stochastic + ILP model

– Local solutions:

• Provides WCET of each task

• Mapping

• Bandwidth distribution (arbitration weights)

– Post processing (minimum WCET) Global solution

Main stages of NoCo Framework

24

Page 25: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo: ILP-based Worst-Case Contention Estimation

for Mesh Real-Time Manycores

Jordi Cardona

EVALUATION

Page 26: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Evaluation

Cycle-accurate Simulator

– SoCLib simulator integrated with gNoCSim

Benchmarks

– Key parameter: frequency of access to the NoC for loads/stores

Workloads

– Cover the range shown for Mediabench and EEMBC auto

– MIX Benchmarks (i.e MIX1 => ABCDEFFGH)

26

Page 27: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Evaluation: impact of optimizing each parameter

Incremental Evaluation

NoC configuration ILP Optimizations

Routing Weights Mapping

Static-base (RR)

Static-opt (WRR)

Map

Map + Walloc

Map + Walloc + R

Optimization versions evaluated

Baseline

NoCo

27

Page 28: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Results

Incremental Evaulation

– Static_opt (WRR) vs Static_base (RR)

16%

-1%

Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads)

28

Page 29: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Results

Incremental Optimizations

– Map vs Static-base (RR)

30%

17% 23%

Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads)

29

Page 30: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Results

Incremental Optimizations

– Map_Walloc vs Static-base (RR)

37% 31%

41%

30

Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads)

Page 31: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Results

Incremental Optimizations

– Map_Walloc_Routing (NoCo) vs XY_RR

50% 46% 40%

23%

14%

9%

3%

31

Effect of incremental optimizations: mapping, walloc and routing (3x3 heterogeneous workloads)

Page 32: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

NoCo: ILP-based Worst-Case Contention Estimation

for Mesh Real-Time Manycores

Jordi Cardona

CONCLUSIONS

Page 33: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Conclusions

Optimizing NoC to reduce WCET is a multidimensional problem

– Zero Load Latency (Mapping)

– Worst Case Delay (Routing and Arbitration)

Some proposals exist in the state of the art that optimize one or

combinations of the mentioned parameters that increase the

WCET of applications.

We propose NoCo a stochastic/ILP hybrid solution that optimizes

at the same time:

– Routing (XY, YX combinations)

– Arbitration (Walloc)

– Applications’ mapping

NoCo reduces the maxWCET of heterogeneous tasks in 3x3

meshes between 40 and 50% with respect XY-RR configuration.

33

Page 34: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

www.bsc.es

NoCo: ILP-based Worst-Case Contention

Estimation for Mesh Real-Time Manycores

December 13th Nashville, USA

39th IEEE Real-Time Systems Symposium

RTSS 2018

Jordi Cardona1,2, Carles Hernandez1, Enrico Mezzetti1, Jaume Abella1

and Francisco J.Cazorla1,3

1Barcelona Supercomputing Center (BSC) 2Universitat Politècnica de Catalunya (UPC)

3IIIA-CSIC

Page 35: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

BACKUP

35

Page 36: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Reliability of stochastic method

Random Routing vs Optimal Routing

2^9 = 512 routings 2^16 = 65536 routings

94,6% 88,7%

100 samples 98,5%

330 samples best solution in fifth examples 36

Page 37: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

37

Reliability of stochastic method

Improvement in homogeneous tasks running in parallel

– Reducing max WCET of all tasks

rILP(m,w) maxWCET results for 9 threads rILP(m,w) maxWCET results for 16 threads

rILP (9 threads):

Avg w.r.t RR 74%

Avg w.r.t WRR 26%

rILP (16 threads):

Avg w.r.t RR 88%

Avg w.r.t WRR 29%

Page 38: : ILP-based Worst-Case Contention Estimation for Mesh Real ...2018.rtss.org/wp-content/uploads/2018/12/6-4.pdf NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

38

Reliability of stochastic method

Improvement in heterogeneous tasks running in parallel

– Reducing summation of max WCET of all tasks

rILP(m,w) sumWCET results for 9 threads rILP(m,w) sumWCET results for 16 threads

rILP (9 threads):

Avg w.r.t RR 26%

Avg w.r.t WRR 19%

rILP (16 threads):

Avg w.r.t RR30%

Avg w.r.t WRR 23%