Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Lecture 4

Introduction to Digital Signal Processors (DSPs)

Dr. Konstantinos Tatas

ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Outline/objectives

• Identify the most important DSP processor architecture features and how they relate to DSP applications

• Understand the types of code appropriate for DSP implementation

What is a DSP?

• A specialized microprocessor for real-time DSP applications– Digital filtering (FIR and IIR)– FFT– Convolution, Matrix Multiplication etc

ADC DACDSPANALOG

INPUTANALOG OUTPUT

DIGITAL INPUT

DIGITAL OUTPUT

Hardware used in DSP

ASIC FPGA GPP DSP

Performance Very High High Medium Medium High

Flexibility Very low High High High

Power consumption

Very low low Medium Low Medium

Development Time

Long Medium Short Short

Common DSP features• Harvard architecture • Dedicated single-cycle Multiply-Accumulate

(MAC) instruction (hardware MAC units)• Single-Instruction Multiple Data (SIMD) Very

Large Instruction Word (VLIW) architecture• Pipelining• Saturation arithmetic• Zero overhead looping• Hardware circular addressing• Cache• DMA

Harvard Architecture

• Physically separate memories and paths for instruction and data

DATA MEMORY

PROGRAM MEMORY

Single-Cycle MAC unit

Multiplier

Register

a xi i

a xi-1 i-1

a xi i a xi-1 i-1+

Σ(a x )i ii=0

Can compute a sum of n-products in n cycles

Single Instruction - Multiple Data (SIMD)

• A technique for data-level parallelism by employing a number of processing elements working in parallel

Very Long Instruction Word (VLIW)• A technique for

instruction-level parallelism by executing instructions without dependencies (known at compile-time) in parallel

• Example of a single VLIW instruction:

F=a+b; c=e/g; d=x&y; w=z*h;

VLIW instruction F=a+b c=e/g d=x&y w=z*h

CISC vs. RISC vs. VLIW

Pipelining• DSPs commonly feature deep pipelines• TMS320C6x processors have 3 pipeline stages

with a number of phases (cycles):– Fetch

• Program Address Generate (PG)• Program Address Send (PS)• Program ready wait (PW)• Program receive (PR)

– Decode• Dispatch (DP)• Decode (DC)

– Execute• 6 to 10 phases

Saturation Arithmetic• fixed range for operations like addition and

multiplication• normal overflow and underflow produce the

maximum and minimum allowed value, respectively

• Associativity and distributivity no longer apply• 1 signed byte saturation arithmetic examples:

• 64 + 69 = 127• -127 – 5 = -128• (64 + 70) – 25 = 122 ≠ 64 + (70 -25) = 109

Examples• Perform the following operations using

one-byte saturation arithmetic• 0x77 + 0x99 =• 0x4*0x42=• 0x3*0x51=

Zero Overhead Looping

• Hardware support for loops with a constant number of iterations using hardware loop counters and loop buffers

• No branching

• No loop overhead

• No pipeline stalls or branch prediction

• No need for loop unrolling

Hardware Circular Addressing

• A data structure implementing a fixed length queue of fixed size objects where objects are added to the head of the queue while items are removed from the tail of the queue.

• Requires at least 2 pointers (head and tail)

• Extensively used in digital filtering

y[n] = a0x[n]+a1x[n-1]+…+akx[n-k]

X[n-1]

X[n-2]

X[n-3]

X[n-1]

X[n-2]

X[n-3]

Cycle1

Cycle2

Direct Memory Access (DMA)

• The feature that allows peripherals to access main memory without the intervention of the CPU

• Typically, the CPU initiates DMA transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller once the operation is complete.

• Can create cache coherency problems (the data in the cache may be different from the data in the external memory after DMA)

• Requires a DMA controller

Cache memory

• Separate instruction and data L1 caches (Harvard architecture)

• Cache coherence protocols required, since most systems use DMA

DSP vs. Microcontroller

• DSP– Harvard Architecture– VLIW/SIMD (parallel

execution units)– No bit level operations– Hardware MACs– DSP applications

• Microcontroller– Mostly von Neumann

Architecture– Single execution unit– Flexible bit-level

operations– No hardware MACs– Control applications

Examples• Estimate how long will the following code

fragment take to execute on– A general purpose processor with 1 GHz operating

frequency, five-stage pipelining and 5 cycles required for multiplication, 1 cycle for addition

– A DSP running at 500 MHz, zero overhead looping and 6 independent ALUs and 2 independent single-cycle MAC units?

for (i=0; i<8; i++) { a[i] = 2*i + 3; b[i] = 3*i + 5;

Review Questions• Which of the following code fragments is

appropriate for SIMD implementation?a[0]=b[0]+c[0]; a[0]=b[0]&c[0];a[2]=b[2]+c[2]; a[0]=b[0]%c[0];a[4]=b[4]+c[4]; a[0]=b[0]+c[0];a[6]=b[6]+c[6]; a[0]=b[0]/c[0];

• Can the following instructions be merged into one VLIW instruction? If not in how many?– a=b+c;– d=c/e;– f=d&a;– g=b%c;

Review Questions

• Which of the following is not a typical DSP feature?– Dedicated multiplier/MAC– Von Neumann memory architecture– Pipelining– Saturation arithmetic

• Which implementation would you choose for lowest power consumption?– ASIC– FPGA– General-Purpose Processor– DSP

Examples• How many VLIW instructions does the following program

fragment require if there two independent data paths (a,b), with 3 ALUs and 1 MAC available in each and 8 instructions/word? How many cycles will it take to execute if they are the first instructions in the program and all instructions require 1 cycle, assuming the pipelining architecture of slide 10 with 6 phases of execution?ADD a1,a2,a3 ;a3 = a1+a2SUB b1,b3,b4 ;b4 = b1-b3MUL a2,a3,a5 ;a5 = a2-a3MUL b3,b4,b2 ;b2 = b3*b4AND a7,a0,a1 ;a1 = a7 AND a0MUL a3,a4,a5 ;a5 = a3*a4OR a6,a3,a2 ;a2 = a6 OR a3

References

• DR. Chassaing, “DSP Applications using C and the TMS320C6x DSK”, Wiley, 2002

• Texas Instruments, TMS320C64x datasheets

• Analog Devices, ADSP-21xx Processors

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Documents

Transcript of Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Portfolio Kalomalos konstantinos

Nocturnicon - Konstantinos

FINITE STATE MACHINES (FSMs) Dr. Konstantinos Tatas.

Companding in DSPs

thiv. DSPS 045B Job Skills Development – Job Application and Resume v. DSPS 045C Job Skills Development – Job Interview Skills vi. DSPS 046 Online Learning Strategies vii. DSPS

ACOE1611 Data Representation and Numbering Systems Dr. Costas Kyriacou and Dr. Konstantinos Tatas.

ACOE2511 Assembly Language for the 80X86/Pentium Intel Microprocessors Lecturer: Dr. Konstantinos Tatas.

Konstantinos IAKOVIDIS

Case studies of distributed embedded systems Dr. Konstantinos Tatas.

ACOE343 - Real-Time Embedded Processor Systems Dr. Konstantinos Tatas Office 107, FRC building .

DSPS Fac revs

NETWORK-ON-CHIP (NOC): A New SoC Paradigm Dr. Konstantinos Tatas.

Microprocessors Input/Output Interface (Chapter 10) Dr. Costas Kyriacou and Dr. Konstantinos Tatas.

Konstantinos Kalliris*

Sequential Digital Circuits Dr. Costas Kyriacou and Dr. Konstantinos Tatas.

EE544/AEEE561 – Advanced Digital Systems Design Dr. Konstantinos Tatas com.tk@fit.ac.cy .

Konstantinos Kavafis

Lecture 1: Introduction Original Lecture notes © 2010 David Money Harris Modified by Konstantinos Tatas.

Mobile DSPs Guide 2016

Microprocessors vs. DSPs: Fundamentals and · PDF fileDigital Signal Processors (DSPs) ... • E.g., MAC unit, Viterbi unit ... Microprocessors vs. DSPs: Fundamentals and Distinctions