Embedded Systems Design: A Unified d /S f d i Hardware/Software ...
Transcript of Embedded Systems Design: A Unified d /S f d i Hardware/Software ...
Embedded Systems Design: A Unified d /S f d iHardware/Software Introduction
Chapter 3 General Purpose Processors:Chapter 3 General-Purpose Processors: Software
1
IntroductionIntroduction
• General Purpose Processor• General-Purpose Processor– Processor designed for a variety of computation tasks
Low unit cost in part because manufacturer spreads NRE– Low unit cost, in part because manufacturer spreads NRE over large numbers of units
• Motorola sold half a billion 68HC05 microcontrollers in 1996 alone– Carefully designed since higher NRE is acceptable
• Can yield good performance, size and powerh i k / hi h– Low NRE cost, short time-to-market/prototype, high
flexibility• User just writes software; no processor designUser just writes software; no processor design
– a.k.a. “microprocessor” – “micro” used when they were implemented on one or a few chips rather than entire rooms
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
2
Basic ArchitectureBasic Architecture
C t l it d• Control unit and datapath– Note similarity to
Processor
Control unit Datapath
ALUNote similarity to single-purpose processor
U
R i t
Controller Control/Status
• Key differences– Datapath is general
Control unit doesn’t
Registers
– Control unit doesn t store the algorithm –the algorithm is “ d” i h
IRPC
I/O“programmed” into the memory Memory
I/O
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
3
Datapath OperationsDatapath Operations
• Load• Load– Read memory location
into register
Processor
Control unit Datapath
ALU
• ALU operation– Input certain registers
through ALU store
U
R i t
Controller Control/Status
+1
through ALU, store back in register
• Store
Registers
10 11– Write register to
memory locationIRPC
I/O
Memory
I/O
10...
...11
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
4
Control UnitControl Unit
• Control unit: configures the datapath g poperations
– Sequence of desired operations (“instructions”) stored in memory –“program”
Processor
Control unit Datapath
ALUprogram • Instruction cycle – broken into
several sub-operations, each one clock cycle, e.g.:
U
R i t
Controller Control/Status
clock cycle, e.g.:– Fetch: Get next instruction into IR– Decode: Determine what the
instruction means
Registers
– Fetch operands: Move data from memory to datapath register
– Execute: Move data through the ALU
IRPC
I/O
R0 R1
ALU– Store results: Write data from
register to memoryMemory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
5
Control Unit Sub-OperationsControl Unit Sub Operations
F t h• Fetch– Get next instruction
i t IR
Processor
Control unit Datapath
ALUinto IR– PC: program
counter always
U
R i t
Controller Control/Status
counter, always points to next instruction
Registers
– IR: holds the fetched instruction
IRPC
I/O
R0 R1100 load R0, M[500]
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
6
Control Unit Sub-OperationsControl Unit Sub Operations
D d• Decode– Determine what the
i t ti
Processor
Control unit Datapath
ALUinstruction means U
R i t
Controller Control/Status
Registers
IRPC
I/O
R0 R1100 load R0, M[500]
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
7
Control Unit Sub-OperationsControl Unit Sub Operations
F t h d• Fetch operands– Move data from
t d t th
Processor
Control unit Datapath
ALUmemory to datapath register
U
R i t
Controller Control/Status
Registers
10IRPC
I/O
R0 R1100 load R0, M[500]
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
8
Control Unit Sub-OperationsControl Unit Sub Operations
E t• Execute– Move data through
th ALU
Processor
Control unit Datapath
ALUthe ALU– This particular
instruction does
U
R i t
Controller Control/Status
instruction does nothing during this sub-operation
Registers
10pIRPC
I/O
R0 R1100 load R0, M[500]
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
9
Control Unit Sub-OperationsControl Unit Sub Operations
St lt• Store results– Write data from
i t t
Processor
Control unit Datapath
ALUregister to memory– This particular
instruction does
U
R i t
Controller Control/Status
instruction does nothing during this sub-operation
Registers
10pIRPC
I/O
R0 R1100 load R0, M[500]
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
10
Instruction CyclesInstruction Cycles
Processor
Control unit Datapath
ALU
PC=100Fetch ops
Exec. Store results
Fetch DecodeU
R i t
Controller Control/Status
clk
Registers
10IRPC
I/O
R0 R1load R0, M[500]100
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
11
Instruction CyclesInstruction Cycles
Processor
Control unit Datapath
ALU
PC=100Fetch Decode Fetch
opsExec. Store
results U
R i t
Controller Control/Status
clk
PC=101F h
+1
S Registers
10
Fetch Fetch ops
11
Exec. Store results
clk
Decode
IRPC
I/O
R0 R1inc R1, R0101
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
12
Instruction CyclesInstruction Cycles
Processor
Control unit Datapath
ALU
PC=100Fetch Decode Fetch
opsExec. Store
results U
R i t
Controller Control/Status
clk
PC=101F h S Registers
1110
Fetch Decode Fetch ops
Exec. Store results
clk
IRPC
I/O
R0 R1
PC=102store M[501], R1
Fetch Fetch Exec. Store l
Decode
102
Memory
I/O
10...
...load R0, M[500] 500
501
100inc R1, R0101
store M[501], R1102
ops
11
resultsclk
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
13
Architectural ConsiderationsArchitectural Considerations
N bit• N-bit processor– N-bit ALU, registers,
b d t
Processor
Control unit Datapath
ALUbuses, memory data interfaceEmbedded: 8 bit 16
U
R i t
Controller Control/Status
– Embedded: 8-bit, 16-bit, 32-bit common
– Desktop/servers: 32-
Registers
Desktop/servers: 32bit, even 64
• PC size determines
IRPC
I/OPC size determines address space Memory
I/O
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
14
Architectural ConsiderationsArchitectural Considerations
Cl k f• Clock frequency– Inverse of clock
i d
Processor
Control unit Datapath
ALUperiod– Must be longer than
longest register to
U
R i t
Controller Control/Status
longest register to register delay in entire processor
Registers
p– Memory access is
often the longest
IRPC
I/O
Memory
I/O
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
15
Pipelining: Increasing Instruction h hThroughput
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8WashNon-pipelined Pipelined
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8Dry
TimeTimenon-pipelined dish cleaning pipelined dish cleaning
Fetch-instr.
Decode
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Fetch ops.
Execute
Store res
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Pipelined
Instruction 1Store res. 1 2 3 4 5 6 7 8
Timepipelined instruction execution
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
16
Superscalar and VLIW ArchitecturesSuperscalar and VLIW Architectures
P f b i d b• Performance can be improved by:– Faster clock (but there’s a limit)– Pipelining: slice up instruction into stages, overlap stages– Multiple ALUs to support more than one instruction stream
• Superscalar– Scalar: non-vector operations– Fetches instructions in batches, executes as many as possible e c es s uc o s b c es, e ecu es s y s poss b e
• May require extensive hardware to detect independent instructions– VLIW: each word in memory has multiple independent instructions
R li th il t d t t d h d l i t ti• Relies on the compiler to detect and schedule instructions• Currently growing in popularity
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
17
Two Memory ArchitecturesTwo Memory Architectures
Processor Processor• Princeton
– Fewer memory wires
• HarvardProgram memory
Data memory Memory(program and data)
• Harvard– Simultaneous
program and data
Harvard Princeton
memory access
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
18
Cache MemoryCache Memory
M b l• Memory access may be slow• Cache is small but fast
P
Fast/expensive technology, usually on the same chip
memory close to processor– Holds copy of part of memory
Processor
– Hits and misses Cache
Memory
Slower/cheaper technology, usually on a different chip
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
19
Programmer’s ViewProgrammer s View
P d ’t d d t il d d t di f hit t• Programmer doesn’t need detailed understanding of architecture– Instead, needs to know what instructions can be executed
• Two levels of instructions:• Two levels of instructions:– Assembly level– Structured languages (C, C++, Java, etc.)g g ( , , , )
• Most development today done using structured languages– But, some assembly level programming may still be necessary– Drivers: portion of program that communicates with and/or controls
(drives) another device• Often have detailed timing considerations extensive bit manipulation• Often have detailed timing considerations, extensive bit manipulation• Assembly level may be best for these
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
20
Assembly-Level InstructionsAssembly Level Instructions
opcode operand1 operand2Instruction 1 opcode operand1 operand2
opcode operand1 operand2
opcode operand1 operand2
Instruction 1
Instruction 2
Instruction 3 opcode operand1 operand2
opcode operand1 operand2
...
Instruction 3
Instruction 4
I t ti S t• Instruction Set– Defines the legal set of instructions for that processor
• Data transfer: memory/register register/register I/O etc• Data transfer: memory/register, register/register, I/O, etc.• Arithmetic/logical: move register through ALU and back• Branches: determine next PC value when not just PC+1
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
21
A Simple (Trivial) Instruction SetA Simple (Trivial) Instruction Set
MOV Rn, direct 0000 Rn direct Rn = M(direct)
Assembly instruct. First byte Second byte Operation
MOV @Rn, Rm 0010 Rn
MOV direct, Rn 0001 Rn direct M(direct) = Rn
Rm M(Rn) = Rm
ADD Rn, Rm 0100 RmRn Rn = Rn + Rm
MOV Rn, #immed. 0011 Rn immediate Rn = immediate
SUB Rn, Rm 0101 Rm Rn = Rn - Rm
JZ Rn, relative 0110 Rn relative PC = PC+ relative( l if R i 0)
Rn
opcode operands
(only if Rn is 0)
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
22
Addressing ModesAddressing Modes
Addressing Register-file Memory
Immediate Data
Operand fieldg
modeg
contentsy
contents
Data
Immediate
Register-direct
Data
Register address
Registerindirect
Direct
Register address
Memory address
Memory address Data
DataDirect
Indirect
Memory address
Memory address
Data
Memory address
Data
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
23
Programmer ConsiderationsProgrammer Considerations
P d d t• Program and data memory space– Embedded processors often very limited
64 b 2 6 b f A ( d bl )• e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
• Registers: How many are there?– Only a direct concern for assembly-level programmers
• I/O– How communicate with external signals?
• Interruptsp
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
24
Development EnvironmentDevelopment Environment
D l t• Development processor– The processor on which we write and debug our programs
ll C• Usually a PC
• Target processor– The processor that the program will run on in our embedded
system Oft diff t f th d l t• Often different from the development processor
Development processor Target processor
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
25
Software Development ProcessSoftware Development Process
C ilC File C File Asm.
File
• Compilers– Cross compiler
Compiler
File
Assembler
• Runs on one processor, but generates code for
Linker
Binary File
Binary File
Binary File
another
• AssemblersLinker
Exec. File
Library Debugger
Profiler
• Linkers• Debuggers
Implementation Phase Verification Phase
Debuggers• Profilers
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
26
Running a ProgramRunning a Program
If d l t i diff t th t t h• If development processor is different than target, how can we run our compiled code? Two options:– Download to target processor– Simulate
• Simulation– One method: Hardware description language
• But slow, not always available
– Another method: Instruction set simulator (ISS)• Runs on development processor, but executes instructions of target
processor
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
27
Instruction Set Simulator For A Simple Processor
#include <stdio h> }#include <stdio.h>typedef struct {
unsigned char first_byte, second_byte;} instruction;
instruction program[1024]; //instruction memory//
}}return 0;
}
int main(int argc, char *argv[]) {unsigned char memory[256]; //data memory
void run_program(int num_bytes) {
int pc = -1;unsigned char reg[16], fb, sb;
FILE* ifs;
If( argc != 2 ||(ifs = fopen(argv[1], “rb”) == NULL ) {
return –1;g g[ ], , ;
while( ++pc < (num_bytes / 2) ) {fb = program[pc].first_byte;sb = program[pc].second_byte;switch( fb >> 4 ) {
case 0: reg[fb & 0x0f] = memory[sb]; break;
}
if (run_program(fread(program,sizeof(program)))==0){
print_memory_contents();return(0);case 0: reg[fb & 0x0f] = memory[sb]; break;
case 1: memory[sb] = reg[fb & 0x0f]; break;case 2: memory[reg[fb & 0x0f]] =
reg[sb >> 4]; break;case 3: reg[fb & 0x0f] = sb; break;case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
return(0);}else return(-1);
}
case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;case 6: pc += sb; break;default: return –1;
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
28
Testing and DebuggingTesting and Debugging
(a) (b) • ISSImplementation
PhaseImplementation
Phase
( ) (b) • ISS – Gives us control over time –
set breakpoints, look at register values, set values,
Verification Phase
Debugger
Development processor
register values, set values, step-by-step execution, ...
– But, doesn’t interact with real environment
Emulator
Debugger/ ISS • Download to board
– Use device programmer– Runs in real environment but
External tools
Runs in real environment, but not controllable
• Compromise: emulatorRuns in real environment at
VerificationPhase
Programmer– Runs in real environment, at
speed or near– Supports some controllability
from the PC
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
29
o t e C
Application-Specific Instruction-Set (AS )Processors (ASIPs)
• General purpose processors• General-purpose processors– Sometimes too general to be effective in demanding
applicationapplication• e.g., video processing – requires huge video buffers and operations
on large arrays of data, inefficient on a GPPB i l h hi h NRE– But single-purpose processor has high NRE, not programmable
• ASIPs targeted to a particular domain• ASIPs – targeted to a particular domain– Contain architectural features specific to that domain
• e g embedded control digital signal processing video processinge.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc.
– Still programmable
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
30
A Common ASIP: MicrocontrollerA Common ASIP: Microcontroller
• For embedded control applications• For embedded control applications– Reading sensors, setting actuators– Mostly dealing with events (bits): data is present, but not in huge
amounts– e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave ovenp ), g ,• Microcontroller features
– On-chip peripheralsi l di i l i l i i• Timers, analog-digital converters, serial communication, etc.
• Tightly integrated for programmer, typically part of register space– On-chip program and data memory– Direct programmer access to many of the chip’s pins– Specialized instructions for bit-manipulation and other low-level
operationsEmbedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis31
operations
Another Common ASIP: Digital Signal ( S )Processors (DSP)
F i l i li ti• For signal processing applications– Large amounts of digitized data, often streaming– Data transformations must be applied fast– e.g., cell-phone voice filter, digital TV, music synthesizer
• DSP features– Several instruction execution units– Multiple-accumulate single-cycle instruction, other instrs.– Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, etc.
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
32
Trend: Even More Customized ASIPsTrend: Even More Customized ASIPs
• In the past microprocessors were acquired as chips• In the past, microprocessors were acquired as chips• Today, we increasingly acquire a processor as Intellectual
Property (IP)p y ( )– e.g., synthesizable VHDL model
• Opportunity to add a custom datapath hardware and a few i i d l f i icustom instructions, or delete a few instructions
– Can have significant performance, power and size impacts– Problem: need compiler/debugger for customized ASIPProblem: need compiler/debugger for customized ASIP
• Remember, most development uses structured languages• One solution: automatic compiler/debugger generation
– e g www tensillica com– e.g., www.tensillica.com• Another solution: retargettable compilers
– e.g., www.improvsys.com (customized VLIW architectures)
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
33
Designing a General Purpose ProcessorDesigning a General Purpose Processor
N t thi b dd d Declarations:FSMD
• Not something an embedded system designer normally would do
Declarations:bit PC[16], IR[16];bit M[64k][16], RF[16][16];
Reset
Fetch
D d
IR=M[PC];PC=PC+1
PC=0;
fwould do– But instructive to see how
simply we can build one top d
Decode
Mov1 RF[rn] = M[dir]op = 0000
M[di ] RF[ ]
to Fetch
from states below
down– Remember that real processors
aren’t usually built this way
Mov2
Mov30010
0001M[dir] = RF[rn]
M[rn] = RF[rm]
to Fetch
to Fetchy y• Much more optimized, much
more bottom-up design
Mov4
Add0100
0011RF[rn]= imm
RF[rn] =RF[rn]+RF[rm]
to Fetch
to Fetch
Aliases:op IR[15..12]rn IR[11..8]rm IR[7..4]
dir IR[7..0]imm IR[7..0]rel IR[7..0]
Sub
Jz0110
0101RF[rn] = RF[rn]-RF[rm]
PC=(RF[rn]=0) ?rel :PC
to Fetch
to Fetch
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
34
Architecture of a Simple MicroprocessorArchitecture of a Simple Microprocessor
• Storage devices for each• Storage devices for each declared variable
– register file holds each of the variables
Datapath
ControllerRFwa RFw
2x1 muxRFsTo all input control signals
Control unit 1 0
variables• Functional units to carry out the
FSMD operationsOne ALU carries out every
Controller(Next-state and
controllogic; state register)
RF (16)RFwe
RFr1a
RFr1e
From all output control signals
16– One ALU carries out every required operation
• Connections added among the components’ ports
IRPC
RFr2a
RFr2eRFr1 RFr2
ALUALUs
ALUz
PCld
PCinc
PCclr
16Irld
components ports corresponding to the operations required by the FSM Uniq e identifiers created for
ALUz
3x1 muxMsMweMre
2 01
• Unique identifiers created for every control signal MemoryA D
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
35
A Simple MicroprocessorA Simple Microprocessor
MS=10;Irld=1;Mre=1;PCinc=1;
PCclr=1;Reset
Fetch
Decode
IR=M[PC];PC=PC+1
PC=0;
from states b l
Datapath
2x1 muxRFsTo all input contro
Control unit 1 0
RFwa=rn; RFwe=1; RFs=01;Ms=01; Mre=1;
RFr1a=rn; RFr1e=1; Ms=01; Mwe=1;
PCinc 1;
Mov1 RF[rn] = M[dir]
Mov20001
op = 0000
M[dir] = RF[rn]
to Fetch
to Fetch
below
Controller(Next-state and
controllogic; state
register)
RF (16)
RFwa
RFwe
RFr1a
RFwl signals
From all output control
RFr1a=rn; RFr1e=1; Ms=10; Mwe=1;
RFwa=rn; RFwe=1; RFs=10;
Mov3
Mov40011
0010M[rn] = RF[rm]
RF[rn]= imm
to Fetch
to FetchIRPC
g )RFr1e
RFr2a
RFr2eRFr1 RFr2
ALUs
PCld
PCinc
PC l
signals
16Irld
RFwa=rn; RFwe=1; RFs=00;RFr1a=rn; RFr1e=1;RFr2a=rm; RFr2e=1; ALUs=00RFwa=rn; RFwe=1; RFs=00;RFr1a=rn; RFr1e=1;RFr2a=rm; RFr2e=1; ALUs=01PCld= ALUz;
Add
Sub
Jz
0101
0100RF[rn] =RF[rn]+RF[rm]
RF[rn] = RF[rn]-RF[rm]
PC=(RF[rn]=0) ?rel :PC
to Fetch
to Fetch
ALUALUs
ALUz
PCclr
3x1 muxMsMweMre
2 01
FSM operations that replace the FSMD operations after a datapath is created
;RFrla=rn;RFrle=1;
Jz0110
PC (RF[rn] 0) ?rel :PCto Fetch
FSMDMemoryA D
You just built a simple microprocessor!
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
36