DSP Processors Introduction………

63
DSP Processors Introduction

Transcript of DSP Processors Introduction………

Page 1: DSP Processors Introduction………

DSP Processors

Introduction………

Page 2: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

Processor Trends – Architectures.

What are Signal Processing Hardware

Trends – Other Processor options?

What is available in Market?

How to Choose a DSP?

Conclusions

Page 3: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

Processor Trends – Architectures.

What are Signal Processing Hardware

Trends – Other Processor options?

What is available in Market?

How to Choose a DSP?

Conclusions

Page 4: DSP Processors Introduction………

What is a DSP?

Digital Signal Processors are microprocessors

specifically designed to handle Digital Signal

Processing tasks.

DSPs must also have a predictable execution

time.

DSPs are designed to operate in real time.

Page 5: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

Processor Trends – Architectures.

What are Signal Processing Hardware

Trends – Processor options?

What is available in Market?

How to Choose a DSP?

Conclusions

Page 6: DSP Processors Introduction………

Processor Trends - Architectures

Hardware Units in DSP Processors

Multiplier Accumulator (MAC) Unit.

Most common operation in digital signal processing is

array multiplication.

Consider the implementation of an FIR digital filter, the

most common DSP technique.

Page 7: DSP Processors Introduction………

Hardware Units in DSP Processors

To implement the operation in real time

we require a hardware multiplier unit

which will give the result of multiplication

in a single clock cycle

We also need to add or accumulate the results,

so we need an adder.

Together it is known as a MAC unit

one of the mandatory requirements of a

programmable DSP.

Page 8: DSP Processors Introduction………

Hardware Units in DSP Processors

Circular Buffers:

To calculate the output sample, we must have access to a

certain number of the most recent samples from the

input.

For example, suppose we use eight coefficients in this

filter, a0, a1,…., a7. This means we must know the value of

the eight most recent samples from the input signal, x[n],

x[n-1],…x[n-7].

These eight samples must be stored in memory and

continually updated as new samples are acquired.

The best way to manage these stored samples is circular

buffering.

Page 9: DSP Processors Introduction………

Hardware Units in DSP Processors

Page 10: DSP Processors Introduction………

Hardware Units in DSP Processors

Circular buffer is placed eight consecutive memorylocations, 20041 to 20048.

The idea of circular buffering is that the end of this lineararray is connected to its beginning; memory location20041 is viewed as being next to 20048, just as 20044 isnext to 20045.

We keep track of the array by a pointer that indicateswhere the most recent sample resides.

When a new sample is acquired, it replaces the oldestsample in the array, and the pointer is moved one addressahead.

Page 11: DSP Processors Introduction………

Hardware Units in DSP Processors

Four parameters are needed to manage a circular buffer.

A pointer that indicates the start of the circular buffer in memory (in this example, 20041).

A pointer indicating the end of the array (e.g., 20048), or a variable that holds its length (e.g., 8).

The step size of the memory addressing must be specified.

These three values define the size and configuration of the circular buffer, and will not change during the program operation.

the pointer to the most recent sample, must be modified as each new sample is acquired.

There must be program logic that controls how this fourth value is updated based on the value of the first three values.

Page 12: DSP Processors Introduction………

Hardware Units in DSP Processors

Modified Bus Structures

and

Memory Access Schemes:

Page 13: DSP Processors Introduction………

Hardware Units in DSP Processors

Multiple Access Memory:

The number of memory accesses per clock cycle can also

be increased

using a high speed memory that permits more than one

access per clock period. (Eg. DARAM)

Multiple access RAM can be connected to the processing

unit of the Harvard Architecture

Multiported Memory:

They dispense with the need for storing the program and

data in two different memory chips.

They are more expensive.

Page 14: DSP Processors Introduction………

Processor Trends - Architectures

Processor Architecture Trends are

VLIW

Advanced Super Harvard

SIMD

Simplified instruction sets – Architectures to increaseclock speeds, compatibility. - (RISC).

More complex instruction sets for higher performance.- (CISC).

Mixed width instruction sets to reduce memory usage.

Deeper pipelines to enable higher clock speeds..

DSP Enhanced GPP.

Page 15: DSP Processors Introduction………

Architecture Evolution

In the traditional Von-Neumann architecture there is

only a single memory and a single bus for

transferring data into and out of CPU.

Page 16: DSP Processors Introduction………

Architecture Evolution

In Harvard Architecture, there are memories for data

and program with separate buses for each.

Since the buses operate independently, program

instructions and data can be fetched at the same

time, improving the speed.

Page 17: DSP Processors Introduction………

Architecture Evolution

Another improvement is the Super HarvardArchitecture.

A handicap of the basic Harvard design is that thedata memory bus is busier than the programmemory bus.

To improve upon this situation, we start byrelocating part of the "data" to program memory.

For instance, we might place the filter coefficientsin program memory, while keeping the input signalin data memory.

Page 18: DSP Processors Introduction………

Architecture Evolution

However, DSP algorithms generally spend most of

their execution time in loops.

This means that the same set of program

instructions will continually pass from program

memory to the CPU.

The Super Harvard architecture takes advantage of

this situation by including an instruction cache in

the CPU.

This is a small memory that contains about 32 of the

most recent program instructions.

Page 19: DSP Processors Introduction………

Architecture Evolution

Page 20: DSP Processors Introduction………

Architecture Evolution

I/O controller is connected to data memory, through whichthe signals enter and exit the system.

Most of the processors contain both serial and parallelcommunications ports.

Dedicated hardware allows these data streams to betransferred directly into memory (Direct Memory Access, orDMA), without having to pass through the CPU's registers.

This type of high speed I/O is a key characteristic of DSPs.

Some DSPs have onboard analog-to-digital and digital-to-analog converters, a feature called mixed signal.

Page 21: DSP Processors Introduction………

Exploiting ILP - VLIW

ILP - Instruction Level Parallelism

Ability to perform multiple operations

(or instructions), from a single

instruction stream, in parallel

Page 22: DSP Processors Introduction………

Exploiting ILP

It is a set of design techniques that speed up

programs by executing in parallel several RISC

style operations,

such as memory loads and stores, integer additions,

floating point multiplications.

These operations are taken from a single stream of

execution rather than from parallel tasks.

Available ILP: Inherent in a region of the code

Achievable ILP: provided by the hardware.

Page 23: DSP Processors Introduction………

Exploiting ILP

ILP Hardware: Hardware can offer ILP in

several ways.

Several of the functional units found in a

processor can execute at the same time.

Here we allow operations to execute

simultaneously on each of the functional units.

Having separate register banks for the integer

and floating point data can help us to do this by

reducing potential hardware resource conflicts.

Page 24: DSP Processors Introduction………

Exploiting ILP

Multiple copies of the functional units, possibly

accessing different register files to add register

bandwidth, can be added for the purpose of

executing in parallel.

Functional units with latency longer than one

cycle can be pipelined.

That is pipelining the floating point and cache

operations so that one can be initiated each cycle,

even though each might take several cycles to

finish.

Page 25: DSP Processors Introduction………

General ILP OrganizationIn

stru

ctio

n m

emory

Inst

ruct

ion

fetc

h u

nit

Inst

ruct

ion

dec

ode

unit

FU-1

FU-2

FU-3

FU-4

FU-5

Reg

iste

r fi

le

Dat

a m

emory

CPU

By

pas

sing n

etw

ork

Page 26: DSP Processors Introduction………

Exploiting ILP

Example: Consider the code,

Cycle 1: add t3=t1,t2

Cycle 2: store [addr 0] = t3

Cycle 3: fmul f6 = f7,f14

Cycle 4: ....waiting….

Cycle 5: ....waiting….

Cycle 6: fmul f7 = f7, f15

Page 27: DSP Processors Introduction………

Exploiting ILP

Cycle 7: ....waiting….

Cycle 8: ....waiting….

Cycle 9: add t1 = p2, p7

Cycle 10: add t5 = p2, p10

Cycle 11: add t4 = t1,t5

Cycle 12: store [addr 1] = t4

IF we have 3 integer units, one floatingpoint unit and one load/store unit, then thecode can be arranged as,

Page 28: DSP Processors Introduction………

Exploiting ILP

Cycle 1:

add t3=t1,t2

add t1 = p2, p7

add t5 = p2, p10

fmul f6 = f7,f14

Cycle 2:

add t4 = t1,t5

fmul f7 = f7, f15

store [addr 0] = t3

Cycle 3:

store [addr 1] = t4

Page 29: DSP Processors Introduction………

VLIW

VLIW = Very Long Instruction Word

architecture

Instruction format:

operation 1 operation 2 operation 3 operation 4 operation 5

Page 30: DSP Processors Introduction………

VLIW

VLIW Architecture:

Very Long Instruction Word architecture

They have a number of processing units (data paths). i.e., a

number of ALUs, MAC units, shifters etc.

The VLIW is accessed from memory and is used to specify

the operands and operations to be performed by each of the

data paths.

The multiple functional units share a common multiported

register file for fetching the operands and storing the

results.

Page 31: DSP Processors Introduction………

VLIW

The performance gains that can be achieved with

VLIW architecture depends on the degree of

parallelism in the algorithm selected for a DSP

application and the number of functional units.

The throughput will be higher only if the

algorithm involves execution of independent

operations.

It is the compiler that does the job of determining

ILP and scheduling it on the functional units.

Page 32: DSP Processors Introduction………

A VLIW Architecture with 7 FUs

Int Register File

Instruction Memory

Int FU

Data Memory

Int FU Int FU LD/ST LD/ST FP FU

Floating Point

Register File

FP FU

Page 33: DSP Processors Introduction………

SIMD

SIMD (Single Instruction Multiple Data)

A single stream of instructions will bebroadcasted to a number of processors

All processors execute the same program butoperate on different data.

Nodes have Mesh or hypercube connectivity

Each PE can exchange values with theirneighbors, has a few registers, some localmemory and an ALU.

Page 34: DSP Processors Introduction………

An SIMD Organization

SIMD Execution Method

tim

e

Instruction 1

Instruction 2

Instruction 3

Instruction n

node1 node2 node-K

Page 35: DSP Processors Introduction………

Architecture Trends – The Down Side

VLIW, SIMD and deep pipelines can increase

Memory use.

Energy consumption.

Code generation complexity, programming difficulty.

Simple instruction sets often increase memory usage.

More instructions are needed to accomplish a given task.

Complex instruction sets hinder compatibility.

Compatibility can bring messy compromises.

Page 36: DSP Processors Introduction………

Summary

Each processor makes different tradeoffs,

depending on its target application

top speed is often not the goal

Page 37: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

Processor Trends – Architectures.

What are Signal Processing Hardware

Trends – Processor options?

What is available in Market?

How to Choose a DSP?

Conclusions

Page 38: DSP Processors Introduction………

DSP Hardware Trends

Today’s system engineer have a wealth ofoptions for implementing DSP tasks.

GPP

DSPs

Application Specific DSPs

Customizable Cores

ASSPs – Application Specific Standard Products

ASICs - Application Specific Integrated Chips

FPGAs – Field Programmable Gate Arrays

Page 39: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

What are Signal Processing Hardware

Trends – Processor options?

Processor Trends – Architectures.

What is available in Market?

How to Choose a DSP?

Conclusions

Page 40: DSP Processors Introduction………

How to Choose?

Performance Analysis

Comparing benchmarking approaches

Page 41: DSP Processors Introduction………

Benchmarking approaches

How to Benchmark?

Simplified metrics

E.g., MIPS, MOPS,MMACS

Full DSP Applications

E.g., V.90 Modem

DSP Algorithms “kernal” benchmarks

E.g., FIR Filter, FFT etc.

Page 42: DSP Processors Introduction………

Algorithm Kernel Benchmarks

Most of the benchmarks are based on DSP

algorithm kernels

DSP algorithm kernels are the most computationally

intensive portions of DSP applications

Example includes FFTs, IIR & FIR filters and Viterbi

decoders

Benchmark results are used with application

profiling to predict overall performance

Page 43: DSP Processors Introduction………

Algorithm Kernel Benchmarks

OTHER

25%

Denorm

11%

Window

25%

IDCT

39%

Application Profile

Page 44: DSP Processors Introduction………

Algorithm Kernel Benchmarks

Advantages

Relevant, Chosen by analysis of real DSP applications.

Kernels are short, allowing

Functionality to be precisely specified.

Benchmarks to be implemented, optimized in a reasonableamount of time.

Disadvantages

Not practical to implement all important algorithms.

Do not reflect application-level optimizations and trade-offs.

Page 45: DSP Processors Introduction………

Emerging Benchmarking

Challenges

New technologies create performance-

analysis challenges

Multi-core Devices

DSP-enhanced FPGAs

Application-specific processors

Customizable processors

Reconfigurable processors

Page 46: DSP Processors Introduction………

Emerging challenges

Evolving applications and tools also lead to new

challenges

Increasing reliance on C compilers

For technologies not well served by kernel benchmarks,

such as

FPGAs

Application-specific Processors

Practicality concerns can be partly addressed by

Using off-the-shelf applications where ever available,or

Using simplified applications

Page 47: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

What are Signal Processing Hardware

Trends – Processor options?

Processor Trends – Architectures.

What is available in Market?

How to Choose a DSP?

Conclusions

Page 48: DSP Processors Introduction………

What is available in Market?

Latest Processors

High performance processors

Texas Instruments TMS320C64xx

StarCore SC140

Low Power Processors

Texas Instruments TMS320C55xx

Analog Devices Blackfin (ADSP-BF53x)

General-purpose/ DSP Processors

Intel PXA2xx

Texas Instruments OMAP5910

Page 49: DSP Processors Introduction………

DSP Speed

1460

3360

6480

3430

930

0

2000

4000

6000

8000

1 2 3 4 5

SPEED PERFORMANCE 1. TI ‘C5502 (300 MHz)

2. ADI ‘BF53x (600

MHz)

3. TI ‘C6414 (720 MHz)

4. StarCore SC 140 (300

MHz)

5. Intel PXA2xx (400

MHz)

Page 50: DSP Processors Introduction………

DSP Speed

What factors affect DSP Speed?

Parallelism

How many parallel operations can be performed per

cycle

Instruction Set

Suitability for the task at hand

Clock Speed

Data types

Data Bandwidth

Page 51: DSP Processors Introduction………

DSP Speed

Pipeline Depth

Instructional latencies

Support for DSP oriented features

DSP Addressing modes

Zero-overhead looping

Saturation, scaling, rounding

Page 52: DSP Processors Introduction………

Memory Use

146 140

256

144 140

0

100

200

300

By

tes

1 2 3 4 5

Memory Speed Comparison

1. TI ‘C55xx (8/16/32/48)

2. ADI ‘BF53x (16/32/64)

3. TI ‘C64xx (32)

4. StarCore SC 140

(16/32)

5. Intel PXA2xx (16/32

MHz)

Lower is Better

Page 53: DSP Processors Introduction………

Memory Use

What factors affect Memory use?

Processors’ memory usage are affected by

Instruction Set

Wider instructions take more memory

Mixed width instructions becoming popular – Use

short simple instructions for simple tasks and use

longer instructions for more complex tasks

Suitability of instruction set for task at hand

Page 54: DSP Processors Introduction………

Memory Use

Architecture

VLIW, SIMD and deeper pipelines may

encourage optimizations that increase

memory use to obtain speed optimized code

Compiler Quality (for compiled codes)

Page 55: DSP Processors Introduction………

Energy Efficiency

11.8

16.9 16.113.7

2.6

0

5

10

15

20

1 2 3 4 5

ENERGY EFFICIENCY 1. TI ‘C5502 (300 MHz)

1.26V

2. ADI ‘BF53x (600

MHz) 1.2V

3. TI ‘C6414 (300 MHz)

1.0V

4. Motorola MSC8101

(SC 140) (300 MHz)

1.5V

5. Intel PXA2xx (400

MHz) 1.0V

Higher is Better

Page 56: DSP Processors Introduction………

Energy Efficiency

What factors affect Energy efficiency?

Processors’ energy efficiency is affectedby

Speed

Fabrication process, voltage, circuit design, logicdesign

Hardware Implementation

Memory usage

Compiler quality (for compiled code)

Page 57: DSP Processors Introduction………

Cost Performance

146.2

375.9

98.3

29 25.6

0

100

200

300

400

1 2 3 4 5

COST PERFORMANCE 1. TI ‘C5502 (300 MHz)

$10

2. ADI ‘BF53x (600

MHz) $6

3. TI ‘C6414 (300 MHz)

$45

4. Motorola MSC8101

(SC 140) (300 MHz)

$118

5. Intel PXA2xx (400

MHz) $27

Higher is Better

Page 58: DSP Processors Introduction………

Cost Performance

What factors affect Cost Performance?

Speed

Chip Cost, which is affected by

Fabrication process

Size of on-chip memory – influenced by processor’s memory usage

On-chip peripherals

Manufacturing volume

Page 59: DSP Processors Introduction………

Cost Performance

But good cost-performance results do not

necessarily mean chip is suitable for

applications with severe cost constraints.

Users does not want to pay for more

performance than needed.

Page 60: DSP Processors Introduction………

Overview

What is a Digital Signal Processor (DSP)?

What are Signal Processing Hardware

Trends – Processor options?

Processor Trends – Architectures.

What is available in Market?

How to Choose a DSP?

Conclusions

Page 61: DSP Processors Introduction………

Conclusions

DSP Processor architecture innovations hasaccelerated greatly

New processor types are increasingly competitive

DSP enhanced general purpose processors

Multiprocessor chips

Customizable cores

Non-processor approaches are increasinglycompetitive

DSP-enhanced FPGAs

Page 62: DSP Processors Introduction………

Conclusions

Architectural options are

expanding

Page 63: DSP Processors Introduction………

Conclusions

Today’s DSP oriented processors cannot be

meaningfully compared using simplified matrices

Relevant, meaningful benchmark results are essential

for processor evaluation

There is no ideal processor

Fastest does not mean best

The “best” processor depends on the details of the application

Different architectural approaches make different

performance trade-offs Understanding these is key to select a processor