Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov...

34
Distributed Operation Layer: Efficient and Predictable KPN-Based Design Flow Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele ETH Zürich, Switzerland

Transcript of Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov...

Page 1: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

Distributed Operation Layer:Efficient and Predictable KPN-Based Design Flow

Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele

ETH Zürich, Switzerland

Page 2: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Efficiency vs. Predictability?

Efficiency is… … speed-up

… scalability

… small memory

… portability

… small effort

2

Distributed Operation Layer (DOL):

efficient and predictable

system-level MPSoC design flow

Predictability is… … analyzability

… guarantees

… fast estimates

… good estimates

… early in design

Page 3: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 3

Distributed Operation Layer

Reduce “accidental complexity” in design byraising the level of abstraction and automation

Page 4: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 4

Distributed Operation Layer

System specificationabstract MoC (KPN) vs. BSP

Performance analysissystem-level (formal) analysis vs. complete system simulation

Design space explorationautomated system-level exploration vs. trial-and-error

(Software) synthesisautomated synthesis on various MPSoCs(possible due to formal MoC)

Reduce “accidental complexity” in design byraising the level of abstraction and automation

Page 5: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 5

Outline

Introduction

Distributed operation layer design flow Specification

Synthesis

Design space exploration

Performance analysis

Some experimental results

Conclusions

Page 6: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 6

DOL Software System-Level Design FlowGoals Efficiency Predictability

Challenges Scalable specification Automated synthesis System-level design

space exploration Analytic performance

evaluation

Strengths Abstraction Automation

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

Page 7: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 7

System Specification

Roles Express data and functional

parallelism in application Specify mapping of application

on target architecture

Challenges Scalability Platform-independence

formal MoC– basis for efficient and predictable design

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

Page 8: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 8

Programming Model

Model of computation: Kahn process network Coordination: XML with performance annotations

Functionality: C/C++ with specific programming DOL API

Page 9: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 9

Programming Model – Scalability

Scalability: “iterators” for large, multi-tile descriptions

01: <process name="src">

02: <port type="output" name="out"/>

03: <source type="c" location="src.c"/>

04: </process>

01: <iterator variable="i" range="N">

02: <process name="src">

03: <append function="i"/>

04: <port type="output" name="out"/>

05: <source type="c" location="src.c"/>

06: </process>

07: </iterator>

Page 10: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 10

Abstract Platform Modeling

Elements Structure: processors, peripherals, memories, buses, etc.

Interconnect: explicit read and write communication paths

Performance data: e.g. latency and bandwidth of HW communication

Page 11: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 11

Abstract Platform – Scalability

Specification: XML, including “iterators” capability

Page 12: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 12

Mapping Specification

Scheduling

Constraints

Mapping

Binding Processes to processors

SW channels to HW paths

Page 13: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 13

System Synthesis

Role Close the gap between

system-level specification and implementation

Challenges Achieve desired performance Handle deadlocks,

starvation, and data races Preserve KPN semantics

automatic software synthesis – essential for efficient design

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

Page 14: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

DOL Synthesis

Synthesis Functional synthesis

SystemC untimed, native

execution model generation

Software synthesis

HdS generation for MPARM,

Atmel DIOPSIS, CELL

Strategy Source-to-source code generators from DOL KPN to

implementation

Automatic generation of “glue code”: processes and

channels implementation, bootstrapping, and scheduling

14

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

Page 15: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 15

Functional Synthesis

Synthesis DOL processes and FIFOs: SystemC threads and channels SystemC main file: bootstrapping and scheduling

Features Execution: native, un-timed Debugging: standard tools, i.e., gdb Performance data extraction: monitor READ/WRITE/FIRE

Automatic synthesis of DOL KPN in functional SystemC

sc thread

sc channel sc channel

sc

port

sc

port

P2.fire()

sc threadsc

port

P1.fire()

sc threadsc

port

P3.fire()

scheduler

write() read()

Page 16: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 16

DOL Software Synthesis MPARM: multi-ARM tiles connected

by NoC Atmel Diopsis 940: tile:ARM9+DSP

connected by an AMBA bus; several tiles connected via NoC

Cell BE: PowerPC and 8 SPEs connected via ring bus

MemoryPPE

MIC

Main storage

L2 Cache

PPU

L1 Cache

SPU

LS

MFC

SPU

LS

MFC

SPU

LS

MFC

SPU

LS

MFC

SPU

LS

MFC

SPU

LS

MFC

SPU

LS

MFC

SPU

LS

MFC

SP

E

Element interconnect bus (EIB)

Legend:

LS: Local Store

MFC: Memory Flow Controller

MIC: Memory Interface Controller

PPE: Power Processor Element

PPU: Power Processor Unit

SPE: Synergistic Processor Elements

SPU: Synergistic Processor Unit

tiletile

ARM

coreSP

x-bar

DRAM

ctrl

NI

switchswitch switch

tile

NoC

16CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Page 17: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 17

Design Space Exploration

Role Find Pareto-optimal mappings

of an application on target architecture

Challenges Multiple contradictory

objectives Exhaustive search not feasible Instruction-accurate simulation

too slow for design space exploration

system-level automated design space exploration – the key element of an efficient design

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

Page 18: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 18

Mapping Optimization Framework

Control & GUI: EXPO - https://www.tik.ee.ethz.ch/expo tool to explore the design space for network processor architectures

Interface: PISA - https://www.tik.ee.ethz.ch/pisa Platform and language independent Interface for Search Algorithms

SPEA2 (Strength Pareto

Evolutionary Algorithm)

MPA (Modular

Performance Analysis)

http://www.mpa.ethz.ch

Page 19: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 19

EXPO-PISA Illustration

0 2 4 6 8 10 12 14 16 18

2

4

6

8

10

12

14

16

18

20

max. processor load

max. bus load

Page 20: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 20

Performance Analysis

Roles Feedback for developer Verification of single

designs Decision basis for design

space exploration

Challenges Accuracy Speed

formal performance analysis – the key element of a predictable design

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

Page 21: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 21

DOL Performance Analysis

Goal: design real-time

systems (multi-media,

signal processing)

Method:

Modular Performance

Analysis (MPA)http://www.mpa.ethz.ch

Challenge: integrate

MPA in DOL

Generate MPA model

from high-level spec

Calibrate MPA model

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

design

space

exploration

mapping

specification

(XML)

application

specification

(XML & C)

functional

simulation

generation

simulation on

workstation

system

synthesis (HdS

generation)

simulation on

virtual platform

evaluation on

workstation

architecture

specification

(XML)

MPA analysis

model

generation

ca

lib

rati

on

da

ta b

ac

k-a

nn

ota

tio

n

performance data

tes

t &

de

bu

g

#(e

ve

nts

)

Δ

design

space

exploration #e

ve

nts

Page 22: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Modular Performance Analysis (MPA)*

Model based on Network Calculus modeling streams and

resources based on arrival and service curves

Output worst-case bounds on

system properties

(Large) MPSoC extensions complex activation schemes,

timing correlations, blocking semantics, cyclic dependencies

22

Resources

Streams bRISC bBUS bDSP

P1 FIFO1 P2

b’RISC b’DSP

FIFO2

b’BUS

P3a’

a

*http://www.mpa.ethz.ch

Page 23: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Modeling in MPA

23

intra-processor

communication

inter-processor

communication

process

complex

computation

modeling

Page 24: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 24

MPA Model Generation

Automatic MPA model generation in 2 steps Framework-

independent model (XML format)

Framework-specific model (Matlab script)

Challenges Relation betw. DOL

spec and MPA model Sequential evaluation

of parallel MPA model Accurate parameters

24CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Page 25: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 25

MPA Model Calibration Goal: collect accurate performance data from simulation

Problem: too slow during design space exploration

Strategy: collect parameters beforehand, with “calibration

mappings”

25CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Page 26: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 26

… A Few Results

bus

ARM tile NARM tile 1

ARM

core

scratchpad

memory

DMA

controller

MMS

ARM

core

scratchpad

memory

MMS

DMA

controller

instruction

and data

memory

instruction

and data

memory

executing MJPEG decoder on MPARM*

*MPARM - virtual simulation platform of U. Bologna

(optimal)

mapping

Page 27: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 27

Design Space Exploration

Set-up

PISA* and EXPO* (SPEA2)

Objectives1. end-to-end delay

(upper bound in MPA)

2. cost (additive model)

Population

60 individuals

x 50 generations

Pareto front 6 solutions

Search time ~2 hours

1 proc.

3 procs.

4 procs.en

d-t

o-e

nd

dela

y

cost

2 procs.

current population

*EXPO - https://www.tik.ee.ethz.ch/expo

*PISA - https://www.tik.ee.ethz.ch/pisa

Page 28: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 28

Performance Analysismapping MJPEG decoder on 3-tile MPARM

Page 29: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 29

… Some Performance Figures: Speed

Model calibration: time-expensive (usual for all flows) cannot be included in the design space exploration loop

Model generation and performance analysis in MPA: sec. reasonable for design space exploration

Page 30: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 30

… Some Performance Figures: Accuracy

Differences: ~ 20% some MPA operators do not produce tight bounds simulation cannot provide actual worst/best-case behavior

…but system model and underlying architecture are well suited for analyzing this application!

Observed (simulation) Estimated bounds (MPA)

Page 31: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 31

… Some More Performance Figures

The DOL framework is mainly implemented in Java

(available at http://www.tik.ee.ethz.ch/~shapes)

Code size of different parts of the design flow:

Page 32: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 32

Conclusions

“Accidental complexity” can be considerably reduced,

resulting in a both efficient and predictable design

flow by …using a fixed MoC (KPN) (vs. BSP approaches)

…formal performance analysis (vs. simulation)

…automated, system-level design space exploration (vs.

ad-hoc, manual techniques that include synthesis)

Complete SW design flow (specification, synthesis,

design space exploration, performance analysis)

available: http://www.tik.ee.ethz.ch/~shapes

Page 33: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 33

http://www.tik.ee.ethz.ch/~shapes

Page 34: Distributed Operation Layer: Efficient and Predictable KPN-Based …€¦ · Iuliana Bacivarov CASA, ESWEEK –DOL: Efficient and Predictable Design Flow 7 System Specification Roles

[email protected]

http://www.tik.ee.ethz.ch/~shapes

Thank You!

Questions?