Methods for Evaluation of Embedded Systems Simon Künzli, Alex Maxiaguine Institute TIK, ETH Zurich.
-
date post
15-Jan-2016 -
Category
Documents
-
view
225 -
download
0
Transcript of Methods for Evaluation of Embedded Systems Simon Künzli, Alex Maxiaguine Institute TIK, ETH Zurich.
Methods for Evaluation of Embedded Systems
Simon Künzli, Alex Maxiaguine
Institute TIK, ETH Zurich
System-Level Analysis
RISCRISC
DSPDSP
LookUpLookUp
CipherCipher
IP Telephony
Secure FTP
Multimedia streaming
Web browsing
Memory ?
Clock Rate ?
Bus Load ?
Packet Delays ?
Resource Utilization ?
Problems for Performance Estimation
RISCRISC
DSPDSP
SDRAMSDRAM
ArbiterArbiter
• Distributed processing of applications on different resources
• Interaction of different applications on different resources
• Heterogeneity, HW-SW
A “nice-to-have” performance model
• measuring what we want
• high accuracy
• high speed
• full coverage
• based on unified formal specification model
• composability & parameterization
• reusable across different abstraction levels at least easy to refine
Overview over Existing Approachessp
eed
accuracy
Thiele
Ernst
Givargis
Lahiri
BeniniRTL
SPADEJerraya
Discrete-event Simulation
System Model
• Architecture and Behavior• Components/Actors/Processes• Communication
channels/Signals
Event Scheduler
• Event queue
© The MathWorks
future events(e.g. signal changes)
actions to be executed
Accuracy vs. Speed:
How many events are simulated?
Discrete-event Simulation
“The design space”:
Time resolution
Modeling communication
Modeling timing of data-dependent execution
…
Time Resolution
x(t)
tt2t1 t3 t5t4 t6 t7
x(t)
tt2t1 t3 t5t4 t6 t7
discretetime
cont.time
a a c a c a a
a a c a c a a accu
racy
• Continuous time e.g. Gate-level simulation
• Discrete time or “cycle-accurate” e.g. Register Transfer Level (RTL) simulation system-level performance analysis
Modeling communication
• Pin-level model
all signals are modeled explicitly often combined with RTL
• Transaction-level Model
protocol details are abstracted e.g. burst mode transfers
• TLM simulator of AMBA bus x100 faster then pin-level model
Caldari et al. Transaction-Level Models for AMBA Bus Architecture Using SystemC 2.0. DATE 2003
C1 C2
ready
d0d1d2
C1 C2
<write> transactiontrue/false
Modeling timing of data-dependent execution
Problem: • How to model timing of data-
dependent functionality inside a component?
Possible solution: Estimate and annotate delays in the functional/behavioral model:
a=read(in)
a > b
task1()
write(out,c)
task2()
in out
d2d1a=read(in);
if(a>b) {
task1();
delay(d1);
else {
task2();
delay(d2);}
write(out,c);
• this approach works well for HW but may be too coarse for modeling SW
HW/SW Cosimulation Options
Application SW...
• … is delay-annotated & natively executes on workstation as a part of HW simulator
• … is compiled for target processor and its code is used as a stimuli to processor model that is a part of HW simulator
• … is not a part of the HW simulator -- a complete separation of Application and Architecture models
Processor Models: Simulation Environment
HW Sim. (rest of the system)
ProcessorModel
wrapper
RTL
Microarch.Sim. ISS
C/C++
Application SW
Compiler.exe
prog.code
Processor Models
• RTL model cycle-accurate or continuous time all the details are modeled (e.g. synthesizable)
• Microarchitecture Simulator cycle-accurate model models pipeline effects, etc can be generated automatically
(e.g. Liberty, LISA…)
• Instruction Set Simulator provides instruction count functional models of instructions
e.g. SimpleScalar
Multiprocessor System Simulator
L Benini, U Bologna
SystemC model
Cycle-accurate ISS
SystemCWrapper
Comparison of HW/SW Co-simulation techniques
simulator speed
(instructions/sec)
continuous time
(nano-second accurate)
1 - 100
cycle-accurate 50 – 1000
instruction level 2000 – 20,000
J. Rowson, Hardware/Software Co-Simulation, Proceedings of the 31st DAC, USA,1994
HW/SW Co-simulation Options
Application SW...
• … is delay-annotated & natively executes on workstation as a part of HW simulator
• … is compiled for target processor and its code is used as a stimuli to processor model that is a part of HW simulator
• … is not a part of the HW simulator -- a complete separation of Application and Architecture models
Independent Application and Architecture Models (“Separation of Concerns”)
RISCRISCDSPDSP
SRAMSRAM
Application
Architecture
Mapping
WORKLOAD
RESOURCES
Co-simulation of Application and Architecture Models
Basic principle: Application (or functional) simulator drives architecture (or
hardware) simulator The models interact via traces of actions The traces are produced
on-line or off-line
Advantages: system-level view flexible choice of abstraction level the models and the mapping can be easily altered
Trace-driven Simulation
SPADE: System level Performance Analysis and Design space Exploration
Application model
Architecture model
P. Lieverse et al., U Delft & Philips
Trace-driven Simulation (SPADE)
Lieverse et al., U Delft & Philips
Going away from discrete-event simulation…
Analysis for Communication SystemsLahiri et al., UC San Diego
A two-step approach:
1. simulation without communication (e.g. using ISS)
2. analysis for different communication architectures
K. Lahiri, UCSD
Overview
K. Lahiri, UCSD
Analytical Methods for Power Estimation
• Givargis et al. UC Riverside
• Analytical models for power consumption of: Caches Buses
• two-step approach for fast power evaluation collect intermediate data using simulation use equations to rapidly predict power couple with a fast bus estimation approach
Approach Overview Givargis, UC Riverside
• Bus equation:• m items/second (denotes the traffic N on the bus)• n bits/item• k bit wide bus• bus-invert encoding• random data assumption
222
21
2
1
2
1
2
1
1 k
k
nmCP
k
k
k
k
k
k
k
bus
Experiment Setup Givargis, UC Riverside
CProgram
TraceGenerator
CacheSimulator
CPUPower
ISS
Performance
+
Pow
er
MemoryPower
BusSimulator
I/D CachePower
• Dinero [Edler, Hill]
• CPU power [Tiwari96]
Analytical Method
scheduling discipline 1
e1
e2
CPU1
scheduling discipline 2
e3
e4
CPU2
?
?
Workload ?
periodic with jitter
J J JTT
periodic with burstTb
t
b
t
periodicTT
sporadicxt xt xt
Event Model Interface Classification Ernst, TU Braunschweig
jitter = 0burst length (b) = 1
t = T - J
t = T
t = t
lossless EMIF EMIF to less expressive model
T=T, t=T, b=1 T=T, J=0
Example: EMIFs & EAFs
scheduling discipline 1
e1
e2
CPU1
scheduling discipline 2
e3
e4
CPU2
?
?EMIF
EMIF
EAF
Event model interface needed
Event adaptation
function needed
Use standard scheduling analysis for single components.
General Framework
Functional Task Model
Abstract Task Model
Architecture Model
Abstract Components(Run-Time Environment)
T1 T2 T3
ARM9 DSP
Abstract Architecture
loadscenari
os
resource units
mapping
relations
functional
units
event streams
abstract resource
units
abstract functional
units
abstract event
streams
abstract load
scenarios
max: 2 packetsmin: 0 packetsmax: 3 packetsmin: 1 packet
u
l
Event & Resource Models
• use arrival curves to capture event streams• use service curves to capture processing capacity
time t
max: 1 packetmin: 0 packets
0 1 2
# of packets
1
2
3
Analysis for a Single Component
ul ,
ul ,
ul ,αα
ul ,
Analysis – Bounds on Delay & Memory
u,l
u,l
delay d
backlog b
service curve l
arrival curve u
b
Comparison between diff. Approaches
Simulation-Based
• can answer virtually any questions about performance
• can model arbitrary complex systems
• average case (single instance)
• time-consuming• accurate
Analytical Methods
• possibilities to answer questions limited by method
• restricted by underlying models
• good coverage (worst case)
• fast• coarse
Example: IBM Network Processor
Comparison RTC vs. Simulation
0
10
20
30
40
50
60
70
80
90
100Mbps 150Mbps 200Mbps 250Mbps 300Mbps 350Mbps 400Mbps
Linespeed
Uti
liza
tio
n [
%]
Simulation Analytical Method
OP B
PLB
w
rite
PLB
re
ad
Experiment Results Givargis, UC Riverside
0
0. 05
0. 1
0. 15
0. 2
0. 25
0. 3
Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9Execu
tio
n T
ime (
sec)
•Diesel application’s performance•Blue is obtained using full simulation•Red is obtained using our equations
4% error320x faster
Concluding Remarks
Backup
Metropolis Framework
Cadence Berkeley Lab & UC Berkeley