EC6703 -Embedded Real Time Systems Dr.D.Rukmanidevi ... Materials/7...a A high-speed bus, connected...
Transcript of EC6703 -Embedded Real Time Systems Dr.D.Rukmanidevi ... Materials/7...a A high-speed bus, connected...
8/6/2019 1
EC6703 -Embedded Real Time Systems
Dr.D.Rukmanidevi
Professor
R.M.D. Engineering College
R.M.D.Engineering College
8/6/2019
DESIGNING WITH
COMPUTING PLATFORMS
Designing with microprocessors.
Development and debugging.
System-level performance analysis.
2 R.M.D.Engineering College
8/6/2019
System architectures
Architectures and components:
software;
hardware.
Some software is very hardware-dependent.
3 R.M.D.Engineering College
8/6/2019
Hardware platform
architecture
Contains several elements:
CPU;
bus;
memory;
I/O devices: networking, sensors, actuators, etc.
How big/fast much each one be?
4 R.M.D.Engineering College
8/6/2019
Software architecture
Functional description must be broken into pieces:
division among people;
conceptual organization;
performance;
testability;
maintenance.
5 R.M.D.Engineering College
8/6/2019
Hardware and software
architectures
Hardware and software are intimately related:
software doesn’t run without hardware; how much hardware you need is
determined by the software requirements:
speed;
memory.
6 R.M.D.Engineering College
8/6/2019
Evaluation boards
Basic Platform chip and a variety of I/O Devices
Designed by CPU manufacturer or others.
Includes CPU, memory, some I/O devices.
May include prototyping section.
CPU manufacturer often gives out evaluation board netlist---can be used as starting point for your custom board design.
7 R.M.D.Engineering College
Beegle board
open source platform is used to develop a low-cost board for embedded systems. This board consists of ARM Cortex TM –A8 processor, several built-in I/O devices and many connectors (flash memory, video and audio). It is primarily intended to support software development and serve as a starting point for a product design
8/6/2019 8 R.M.D.Engineering College
8/6/2019
Adding logic to a board
Programmable logic devices (PLDs) provide low/medium density logic.
Field-programmable gate arrays (FPGAs) provide more logic and multi-level logic.
Application-specific integrated circuits (ASICs) are manufactured for a single purpose.
9 R.M.D.Engineering College
8/6/2019
The PC as a platform
Advantages:
cheap and easy to get;
rich and familiar software environment.
Disadvantages:
requires a lot of hardware resources;
not well-adapted to real-time.
More power hungry
More expensive
10 R.M.D.Engineering College
8/6/2019
Typical PC hardware
platform
CPU
CPU bus
memory
DMA
controller timers
bus
interface bus
inte
rfac
e
high-speed bus
low-speed bus
device
device
intr
ctrl
11 R.M.D.Engineering College
Typical PC hardware
platform
The CPU provides basic computational facilities.
RAM is used for program storage.
ROM holds the boot program.
A DMA controller provides DMA capabilities.
Timers are used by the operating system for a variety of purposes.
A high-speed bus, connected to the CPU bus through a bridge, allows fast devices to communicate efficiently with the rest of the system.
A low-speed bus provides an inexpensive way to connect simpler devices and may be necessary for backward compatibility as well.
8/6/2019 12 R.M.D.Engineering College
8/6/2019
Typical busses
PCI: standard for high-speed interfacing
33 or 66 MHz.
PCI Express.
USB (Universal Serial Bus), Firewire (IEEE 1394): relatively low-cost serial interface with high speed.
13 R.M.D.Engineering College
8/6/2019
Software elements
IBM PC uses BIOS (Basic I/O System) to implement low-level functions:
boot-up;
minimal device drivers.
BIOS has become a generic term for the lowest-level system software.
14 R.M.D.Engineering College
8/6/2019
Example: StrongARM
StrongARM system includes:
CPU chip (3.686 MHz clock)
system control module (32.768 kHz clock). • Real-time clock;
• operating system timer
• general-purpose I/O;
• interrupt controller;
• power manager controller;
• reset controller.
15 R.M.D.Engineering College
8/6/2019
Host/target design
Use a host system to prepare software for target system:
target
system
host system serial line
16 R.M.D.Engineering College
8/6/2019
Host-based tools
Cross compiler:
compiles code on host for target system.
Cross debugger:
displays target state, allows target system to
be controlled.
17 R.M.D.Engineering College
Debugging Techniques
The serial port (USB)- development debugging but also for diagnosing problems in the field.
A breakpoint allows the user to stop execution, examine system state, and change state and to specify an address at which the program’s execution is to break
LEDs can be used to show error conditions, when the code enters certain routines, or to show idle time activity
The microprocessor in-circuit emulator (ICE) is a specialized hardware tool that can help debug software in a working embedded system. Allows you to stop execution, examine CPU state, modify registers.
A logic analyzer is an array of low-grade oscilloscopes
8/6/2019 18 R.M.D.Engineering College
8/6/2019
Logic analyzers
A logic analyzer is an array of low-grade oscilloscopes:
19 R.M.D.Engineering College
8/6/2019
Logic analyzer
architecture
UUT sample
memory microprocessor
controller
system clock
clock
gen
state or
timing mode
vector
address
display keypad
20 R.M.D.Engineering College
8/6/2019
Debugging Challenges
Logical errors in software can be hard to track down, but errors in real-time code can create problems that are even harder to diagnose.
Real-time programs are required to finish their work within a certain amount of time; if they run too long, they can create much unexpected behavior.
The exact results of missing real-time deadlines depend on the detailed characteristics of the I/O devices and the nature of the timing violation. This makes debugging real-time problems especially difficult
21 R.M.D.Engineering College
Boundary scan
Simplifies testing of
multiple chips on a
board.
Registers on pins can be configured as a
scan chain.
Used for debuggers, in-circuit emulators.
8/6/2019 22 R.M.D.Engineering College
8/6/2019
How to exercise code
Run on host system.
Run on target system.
Run in instruction-level simulator.
Run on cycle-accurate simulator.
Run in hardware/software co-simulation environment.
23 R.M.D.Engineering College
8/6/2019
Debugging real-time code
Bugs in drivers can cause non-deterministic behavior in the foreground problem.
Bugs may be timing-dependent.
24 R.M.D.Engineering College
System-level performance
analysis
Performance depends
on all the elements of
the system:
CPU.
Cache.
Bus.
Main memory.
I/O device.
8/6/2019
memory
CPU
cache
25 R.M.D.Engineering College
8/6/2019
Bandwidth as performance
Bandwidth applies to several components:
Memory.
Bus.
CPU fetches.
Different parts of the system run at different clock rates.
Different components may have different widths (bus, memory).
26 R.M.D.Engineering College
8/6/2019
Bandwidth and data
transfers
Video frame: 320 x 240 x 3 = 230,400 bytes.
Transfer in 1/30 sec.
Transfer 1 byte/msec, 0.23 sec per frame.
Too slow.
Increase bandwidth:
Increase bus width.
Increase bus clock rate.
27 R.M.D.Engineering College
Bus bandwidth
T: # bus cycles.
P: time/bus cycle.
Total time for transfer:
t = TP.
D: data payload
length.
O1 + O2 = overhead
O.
8/6/2019
O1 D O2
W
Tbasic(N) = (D+O)N/W
28 R.M.D.Engineering College
Bus burst transfer
bandwidth
T: # bus cycles.
P: time/bus cycle.
Total time for transfer:
t = TP.
D: data payload
length.
O1 + O2 = overhead
O.
8/6/2019
B O
W
Tburst(N) = (BD+O)N/(BW)
2 1
…
29 R.M.D.Engineering College
8/6/2019
Memory aspect ratios
64 M 16 M
8 M
1 4 8
30 R.M.D.Engineering College
8/6/2019
Memory access times
Memory component access times comes from chip data sheet.
Page modes allow faster access for
successive transfers on same page.
If data doesn’t fit naturally into physical words:
A = [(E/w)mod W]+1
31 R.M.D.Engineering College
Embedded Real Time Systems
Dr.D.Rukmanidevi
Professor
R.M.D.Engineering College
8/6/2019 1 R.M.D.Engineering College
COMPONENTS FOR
EMBEDDED PROGRAMS
Components that are commonly used in embedded software: the state machine, the circular buffer, and the
queue
State machines are well suited to reactive systems such as user interfaces; circular buffers and queues are
useful in digital signal processing.
8/6/2019 2 R.M.D.Engineering College
8/6/2019
Software state machine
State machine keeps internal state as a variable, changes state based on inputs.
Uses:
control-dominated code;
reactive systems.
3 R.M.D.Engineering College
8/6/2019
State machine example
idle
buzzer seated
belted
no seat/-
seat/timer on
no belt
and no
timer/-
no belt/timer on
belt/- belt/
buzzer off
Belt/buzzer on
no seat/-
no seat/
buzzer off
4 R.M.D.Engineering College
8/6/2019
C implementation
#define IDLE 0
#define SEATED 1
#define BELTED 2
#define BUZZER 3
switch (state) {
case IDLE: if (seat) { state = SEATED; timer_on = TRUE; }
break;
case SEATED: if (belt) state = BELTED;
else if (timer) state = BUZZER;
break;
…
}
5 R.M.D.Engineering College
8/6/2019
circular buffer
Commonly used in signal processing:
new data constantly arrives;
each datum has a limited lifetime.
Use a circular buffer to hold the data stream.
d1 d2 d3 d4 d5 d6 d7
time t time t+1
6 R.M.D.Engineering College
8/6/2019
Circular buffer
x1 x2 x3 x4 x5 x6
t1 t2 t3
Data stream
x1 x2 x3 x4
Circular buffer
x5 x6 x7
7 R.M.D.Engineering College
8/6/2019
Circular buffers
Indexes locate currently used data, current input data:
d1
d2
d3
d4
time t1
use
input d5
d2
d3
d4
time t1+1
use
input
8 R.M.D.Engineering College
8/6/2019
Circular buffer
implementation: FIR filter
int circ_buffer[N], circ_buffer_head = 0;
int c[N]; /* coefficients */
…
int ibuf, ic;
for (f=0, ibuff=circ_buff_head, ic=0;
ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++)
f = f + c[ic]*circ_buffer[ibuf];
9 R.M.D.Engineering College
8/6/2019
Queues
Elastic buffer: holds data that arrives irregularly.
10 R.M.D.Engineering College
8/6/2019
Models of programs
Source code is not a good representation for programs:
clumsy;
leaves much information implicit.
Compilers derive intermediate representations to manipulate and optimize the program.
11 R.M.D.Engineering College
8/6/2019
Data flow graph
DFG: data flow graph.
Does not represent control.
Models basic block: code with no entry or exit.
Describes the minimal ordering requirements on operations.
12 R.M.D.Engineering College
8/6/2019
Single assignment form
x = a + b;
y = c - d;
z = x * y;
y = b + d;
original basic block
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
single assignment form
13 R.M.D.Engineering College
8/6/2019
Data flow graph
x = a + b;
y = c - d;
z = x * y;
y1 = b + d;
single assignment form
+ -
+ *
DFG
a b c d
z
x y
y1
14 R.M.D.Engineering College
8/6/2019
DFGs and partial orders
Partial order:
a+b, c-d; b+d x*y
Can do pairs of operations in any
order.
+ -
+ *
a b c d
z
x y
y1
15 R.M.D.Engineering College
8/6/2019
Control-data flow graph
CDFG: represents control and data.
Uses data flow graphs as components.
Two types of nodes:
decision;
data flow.
16 R.M.D.Engineering College
8/6/2019
Data flow node
Encapsulates a data flow graph:
Write operations in basic block form for simplicity.
x = a + b;
y = c + d
17 R.M.D.Engineering College
8/6/2019
Control
cond T
F
Equivalent forms
value v1
v2 v3
v4
18 R.M.D.Engineering College
8/6/2019
for loop
for (i=0; i<N; i++)
loop_body();
for loop
i=0;
while (i<N) {
loop_body(); i++; }
equivalent
i<N
loop_body()
T
F
i=0
19 R.M.D.Engineering College
Basic Compilation
Techniques
Compilation flow.
Basic statement translation.
Basic optimizations.
Interpreters and just-in-time compilers.
8/6/2019 20 R.M.D.Engineering College
Compilation
Compilation strategy (Wirth):
compilation = translation + optimization
Compiler determines quality of code:
use of CPU resources;
memory access scheduling;
code size.
8/6/2019 21 R.M.D.Engineering College
Basic compilation phases
High Level Language code
parsing, symbol table generation Semantic analysis
machine-independent
optimizations
Instruction level optimizations and code Generation
assembly
to break it into statements and expressions
symbol table is generated, which includes all the named objects in the
program
8/6/2019 22 R.M.D.Engineering College
Statement translation and
optimization
Source code is translated into intermediate form such as CDFG.
CDFG is transformed/optimized.
CDFG is translated into instructions with optimization decisions.
Instructions are further optimized.
8/6/2019 23 R.M.D.Engineering College
Arithmetic expressions
a*b + 5*(c-d)
expression
DFG
* -
*
+
a b c d
5
8/6/2019 24 R.M.D.Engineering College
2
3
4
1
Arithmetic expressions,
cont’d.
ADR r4,a
MOV r1,[r4]
ADR r4,b
MOV r2,[r4]
ADD r3,r1,r2
DFG
* -
*
+
a b c d
5
ADR r4,c
MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
MUL r7,r6,#5
ADD r8,r7,r3
code
8/6/2019 25 R.M.D.Engineering College
Control code generation
if (a+b > 0)
x = 5;
else
x = 7;
a+b>0 x=5
x=7
8/6/2019 26 R.M.D.Engineering College
3
2 1
Control code generation,
cont’d. ADR r5,a
LDR r1,[r5]
ADR r5,b
LDR r2,b
ADD r3,r1,r2
BLE label3
a+b>0 x=5
x=7 LDR r3,#5
ADR r5,x
STR r3,[r5]
B stmtent
LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
8/6/2019 27 R.M.D.Engineering College
Procedure linkage
Need code to:
call and return;
pass parameters and results.
Parameters and returns are passed on stack.
Procedures with few parameters may use
registers.
8/6/2019 28 R.M.D.Engineering College
Procedure stacks
proc1
growth
proc1(int a) {
proc2(5);
}
proc2
SP
stack pointer
(end of the current frame)
FP
frame pointer
(end of the last frame)
5 accessed relative to SP
8/6/2019 29 R.M.D.Engineering College
ARM procedure linkage
APCS (ARM Procedure Call Standard):
r0-r3 pass parameters into procedure. Extra
parameters are put on stack frame.
r0 holds return value.
r4-r7 hold register values.
r11 is frame pointer, r13 is stack pointer.
r10 holds limiting address on stack size to
check for stack overflows.
8/6/2019 30 R.M.D.Engineering College
Data structures
Different types of data structures use different data layouts.
Some offsets into data structure can be computed at compile time, others must be computed at run time.
8/6/2019 31 R.M.D.Engineering College
One-dimensional arrays
C array name points to 0th element:
a[0]
a[1]
a[2]
a
= *(aptr + i)
8/6/2019 32 R.M.D.Engineering College
Two-dimensional arrays
Column-major layout:
a[0,0]
a[0,1]
a[1,0]
a[1,1] = a[i*M+j]
...
M
...
N
8/6/2019 33 R.M.D.Engineering College
Structures
Fields within structures are static offsets:
field1
field2
aptr struct {
int field1;
char field2;
} mystruct;
struct mystruct a, *aptr = &a;
4 bytes
*(aptr+4)
8/6/2019 34 R.M.D.Engineering College
Expression simplification
Constant folding:
8+1 = 9
Algebraic:
a*b + a*c = a*(b+c)
Strength reduction:
a*2 = a<<1
8/6/2019 35 R.M.D.Engineering College
Dead code elimination
Dead code:
#define DEBUG 0
if (DEBUG) dbg(p1);
Can be eliminated by
analysis of control
flow, constant folding.
0
dbg(p1);
1
0
a Code that will never be executed can be
safely removed from the program
8/6/2019 36 R.M.D.Engineering College
Procedure inlining
The C++ programming language provides an inline construct that tells the
compiler to generate inline code for a function.
int foo(a,b,c) { return a + b - c;}
z = foo(w,x,y);
z = w + x + y;
does not have a separate procedure body and procedure linkage
inlined procedure is generated in expanded form whenever possible.
eliminate the procedure linkage instructions, when a cache is present,
having multiple copies of the function body may actually slow down
the fetches of these instructions.
Inlining also increases code size, and memory may be precious.
8/6/2019 37 R.M.D.Engineering College
Loop transformations
Goals:
reduce loop overhead;
increase opportunities for pipelining;
improve memory system performance.
8/6/2019 38 R.M.D.Engineering College
Loop unrolling
Reduces loop overhead, enables some other optimizations.
for (i=0; i<4; i++)
a[i] = b[i] * c[i];
for (i=0; i<2; i++) {
a[i*2] = b[i*2] * c[i*2];
a[i*2+1] = b[i*2+1] * c[i*2+1];
}
to replicate the code inside a loop body a number of times
8/6/2019 39 R.M.D.Engineering College
Loop fusion and
distribution
Fusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5;
for (j=0; j<N; j++) w[j] = c[j] * d[j];
for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];
}
Distribution breaks one loop into two.
Changes optimizations within loop body.
8/6/2019 40 R.M.D.Engineering College
Loop tiling
Breaks one loop into a nest of loops.
Changes order of accesses within array.
Changes cache behavior.
8/6/2019 41 R.M.D.Engineering College
Loop tiling example
for (i=0; i<N; i++)
for (j=0; j<N; j++)
c[i] = a[i,j]*b[i];
for (i=0; i<N; i+=2)
for (j=0; j<N; j+=2)
for (ii=0; ii<min(i+2,n); ii++)
for (jj=0; jj<min(j+2,N); jj++)
c[ii] = a[ii,jj]*b[ii];
8/6/2019 42 R.M.D.Engineering College
Register allocation
Goals:
choose register to hold each variable;
determine lifespan of varible in the register.
Basic case: within basic block.
8/6/2019 43 R.M.D.Engineering College
Register lifetime graph
w = a + b;
x = c + w;
y = c + d;
time
a
b
c
d
w
x
y
1 2 3
t=1
t=2
t=3
8/6/2019 44 R.M.D.Engineering College
Instruction scheduling
Non-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast.
In pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands.
8/6/2019 45 R.M.D.Engineering College
Software pipelining
Schedules instructions across loop iterations.
Reduces instruction latency in iteration i by inserting instructions from iteration i+1.
8/6/2019 46 R.M.D.Engineering College
Instruction selection
May be several ways to implement an operation or sequence of operations.
Represent operations as graphs, match possible instruction sequences onto graph.
*
+
expression templates
* +
*
+
MUL ADD
MADD
8/6/2019 47 R.M.D.Engineering College
Using your compiler
Understand various optimization levels (-O1, -O2, etc.)
Look at mixed compiler/assembler output.
Modifying compiler output requires care:
correctness;
loss of hand-tweaked code.
8/6/2019 48 R.M.D.Engineering College
Interpreters and Just In
Time(JIT) compilers
Interpreter: translates and executes program statements on-the-fly.
JIT compiler: between an interpreter and a stand-alone compiler. compiles small sections of code into instructions during program execution.
Eliminates some translation overhead.
Often requires more memory.
8/6/2019 49 R.M.D.Engineering College