01 dsp intro_1
-
Upload
ghulam-raza -
Category
Internet
-
view
138 -
download
0
description
Transcript of 01 dsp intro_1
An introduction to
DSP’s
Examples of DSP applications
Why a DSP?
Characteristics of a DSP
Architectures
DSP example: mobile phone
DSP example: mobile phone with video camera
DSP: applications
Why a DSP?
� It’s easy: we want an architecture optimized for Digital Signal Processing
� Some versions are further optimized for some specific applications
- e.g. very low power consumption for mobile phones
Which is the difference between a DSP and a
general purpose processor? (1/4)Memory architecture and bus
� The first processors (in the ‘40) had a Harvard architecture: separate memories for program and data
� But it’s complex -> soon replaced by Von Neumann architecture: no real difference between program and data (an instruction has two fields: operation and data)
� Problem: the processor cannot access instructions and data simultaneously
� To improve performance: Harvard architecture again!
In particular
- separate memories and busses for program and data
- possibly, another separate bus for the DMA
Which is the difference between a DSP and a
general purpose processor? (2/4)
A DSP is often used to realize a linear filter
The convolution integral
is actually a sum:
yn=Σixn-ihi
- if the number of sums is finite: FIR filter (finite impulse response),
- otherwise: IIR (infinite impulse response),
- which can be realized using two finite sums:
yn=Σixn-ibi + Σiyn-iai
Which is the difference between a DSP and a
general purpose processor? (3/4)
� A common operation in a FIR or IIR filter is A=BC+D: we need- a hardware multiplier (introduced in DSPs in the '70)
- a multiply and accumulate in only one clock cycle: MAC instruction.
Actually, the MAC is in a loop: we also need a zero overhead loop:- H/W for address generation (the access to memory is not random)- loop management
- auto-increment; circular addressing
� Other possible H/W:- H/W saturation
- Instructions to perform a division quickly- Bit reversal for FFT
Which is the difference between a DSP and a
general purpose processor? (4/4)
Other possible features:� Often, data are 16- o 8-bit wide (e.g., audio or images)
- a 32-bit ALU can be splitted in two 16-bit ALUs or four 8-bit ALUs, -> 2 o 4 operations in parallel
� several ALUs which work in parallel� fixed point ALUs, o 16-bit ALUs, to reduce power
consumption and costs
� optimized versions:- cost: for consumer applications
- power: for mobile applications- for specific applications, e.g. electric motor control
� Example: ‘C30 (Texas Instruments,
1982)
� Example: FIR filter using a ‘C30
Note: several of these characteristics, which were born on DSPs, have been ported to general purpose processors
E.g.: the cache in the Pentium processor is
Harvard-like
� Another example.: several units working in parallel, and splittable ALUs (see. MMX extensions) in the Pentium 4
processor
Pipeline…
� Example of a 4-stage pipeline (TI ‘C30)
� each instruction is executed in 4 clock cycles, but (normally) can be put just 1 cycle after the previous one (data are needed only 3 cycles later)
Pipeline: branch (e.g. on the ‘C30)
� Standard branch: the pipeline is flushed to correctly handle
the PC -> 4 cycles
� Delayed branch: the pipeline is not flushed, and the 3
following instructions are loaded before modifying the PC
-> only 1 cycle needed!
BRD label ; delayed branch
MPYF ; executed
ADDF ; executed
SUBF ; executed
AND ; not executed
…
…
label MPYF ; fetched after SUBF
…
Two architectures
� In order to exploit the instruction level parallelism (ILP): two possible architectures- Superscalar: the parallelism is dynamically managed by the hardware- Very Long Instruction Word (VLIW): the parallelism is statically managed by the compiler
Which is the problem?
� Dependences in data or control can generate conflicts - on data (an instruction needs the result of a previous
instruction, but the results is not ready yet), or
- on control (conditional jump, but the condition is not ready yet)
-> pipeline stall
Superscalar
� The analysis of the independent instructions is dynamically done by hardware (which is complex!)
� The sequence of instructions can be executed out-of-order;
then, the completion of the instructions (commit) is done in-
order to correctly update the state of the CPU
VLIW
� Very Long Instruction Word (VLIW): the parallelism is statically managed by the compiler
� The analysis of independent instructions is statically realized during the compilation phase;
- the instructions which can be realized in parallel are assembled in long instructions and send to the various functional units in-order
� Convenient solution for DSP programs (fixed length cycles, few conditional operations); less convenient for general purpose applications
� Simpler hardware! But a specific compilation for each platform is needed
� Deterministic behaviour -> exact computation of execution times