Embedded systems programming using digital signal...

Digital Signal Processing (DSP) is the method of processing signals and data in order to enhance or modify those signals or to analyze those signals to determine specific information content. A typical DSP system (Figure 1) consists of a processor and other hardware used to convert outside analog signals to digital form and possibly back to analog (continuous) form. There is nothing mystical about DSPs. Think of a DSP as an application-specific microprocessor. The applications that these devices are good for are digital signal algorithms, which usually are not on general-purpose processors because of their complexity. The software development issues associated with these devices, however, are similar to other general-purpose processors.

DSPs are becoming common in all prod

uct areas, including military and aero

space systems, embedded applications,

and PC-based systems.

Is your application right for a DSP? Engineers developed digital signal proces

sors to solve a particular class of problem.

DSP is a way to represent signals as

ordered sequences of numbers and tech

niques to process those sequences. Some

of the important reasons to do signal pro

cessing include elimination or reduction

of unwanted interference, estimation of

signal characteristics, and transformation

of signals to produce more important

information. Some of the common appli

cations of DSP include:

■ Radar and sonar

■ Communications systems

■ Process control

■ Image processing

■ Audio applications

Each DSP application is different. One

of the first tasks of the system designer

is to determine how much processor is

required to perform the job. Figure 2

shows typical performance ranges for

some DSP applications. Simple control

based applications do not need high-per

formance DSPs, whereas higher perfor

mance ranges are required for applications

such as radar and sonar.

Choosing a processor Digital signal processors are in many

ways similar to general purpose proces

sors. One of the main differences between

the two is that the DSP is typically opti

mized to process certain signal processing

functions very fast, like filtering and Fast

Fourier Transforms (FFT). Many steps in

these functions are executed with single

cycle instructions in a DSP. Engineers

designed DSPs with scalability in mind.

Many real-time signal processing func

tions decompose their processing over

many processors in order to gain a signif

icant improvement in processing time.

This feature can operate either temporally

or spatially (Figure 3). Processors that

house these functions need to have fast

inter-processor communication. In the

DSP processor, this fast communication is

usually accomplished using high-speed

I/O ports and shared memory.

There are an increasingly larger number

of options available to system designers

for a general-purpose DSP. These include:

■ Application Specific Integrated Circuits

(ASICs): These devices can function as

DSP co-processors but are not very flex

ible for many general-purpose signal

processing needs.

■ RISC processors: The extremely fast

clock speed for these devices allows

them to perform well in certain DSP

applications. Scalability and other real

time (predictability) issues still remain

for these devices. DSPs are specifically

made to handle real-time deterministic

applications.

■ Field Programmable Gate Arrays

(FPGA): These devices are very fast

and can do certain DSP functions very

quickly. However, they are also more

difficult to develop in comparison to a

DSP where a simple program can do

the same function.

■ Host signal processing: This area is

becoming a more popular area in the

DSP arena. Host signal processing

generally refers to executing DSP

algorithms on a PC (referred to as the

host). Many of the lower-end multime

dia applications function in this way,

Figure 1. A DSP system

KHz KHz KHz MHz MHz MHz

Figure 2. Performance ranges for DSP applications

Reprinted from DSP Engineering / Summer 2001 Copyright 2001 DSP Engineering

Single Print Only

AD

Ixthos

4/C

PIC VOL 3, ISSUE 1, PG 24

RSC #25 Reprinted from DSP Engineering / Summer 2001 Copyright 2001 DSP Engineering

Single Print Only

Figure 3. Spatial and temporal decomposition

but PC-based DSP still lags behind a

true DSP solution for high-perfor

mance DSP applications.

Because of the complexity of today’s sig

nal processing applications and the need

to upgrade often, a programmable device

such as a DSP has become an attractive

alternative rather than a customized hard

ware solution.

There are several factors that should be

considered when making a processor

selection. Some of them are:

■ Cost

■ Scalability

■ Programming requirements

■ Algorithm complexity and type

■ Tools support

■ Time to market

■ Performance

■ Power consumption

■ Memory usage

■ I/O requirements

Until recently, the application dictated

which processor to choose. Complex sig

nal processing algorithms required DSPs

because of the built-in signal processing

architecture, which made performance for

these classes of algorithms much better.

Control-based or finite state machine soft

ware applications chose a general-purpose

processor. However, now the speed of

general-purpose processors has increased

to the point where many signal-processing

applications previously unable to run on a

general-purpose processor can now exe

cute with excellent performance. In addi

tion, general-purpose processor manufac

turers have been adding more RISC-like

instructions and capabilities to their

processors to capture the increasing mar

ket for signal processing applications.

DSP manufacturers, likewise, are adding

more CISC-like instructions to their

processors to provide full system solu

tions (control software as well as signal

processing software) on a single chip.

pose processors performing better than

their DSP counterparts in some algorithm

benchmarks. However, in general, it takes

many more instruction cycles for a gen

eral-purpose processor to implement a

signal processing algorithm than it does

for a DSP.

One thing to keep in mind is the program

ming complexity of the processor you

choose. A superscalar processor is much

harder to program at the assembly level

than a single pipelined processor. Chip

designers and vendors are providing more

sophisticated development tools to allevi

ate some of these problems. However, if

performance and throughput are impor

tant factors in the application and assem

bly language is the programming choice,

development time could go up signifi

cantly depending on the processor archi

tecture (Table 1).

Another factor in processor choice is the

required memory for an application.

Typically, in RISC processors, the

required memory to run an application

goes up. This is because it takes more

RISC instructions to execute a particular

algorithm than a conventional CISC

processor. In some cases, the increase

can be dramatic. All of the research,

benchmarking, and prototyping should be

used to determine memory requirements

for a particular application. There is

always the possibility of trading perfor

mance for memory, and memory can be

optimized at the cost of application per

formance. Since DSP algorithms involve

performing tight loops of operations over

many data points, if just these small ker

nels are optimized, the performance

improvement is substantial.

How much is enough? Many manufacturers of processors adver

tise the speed of their processor in terms

of how many operations or instructions it

can perform per unit of time (usually a

second). Although these quotes might be

true for an ideal case, many times the

actual performance is much lower. What

is more important is how fast will the

application and algorithms run on the

device within the rest of the system. If a

manufacturer decides to use advertised

benchmarks, it should attempt to choose

benchmarks that are similar to algorithms

that it will use in the design. It makes a big

difference when trying to determine how

much processor the manufacturer needs.

This is especially true when using some of

the higher performance DSPs with opti

mizing compilers. Subtle differences in

algorithm structure can be the difference

in triggering the compiler to optimize a

particular piece of code. The resultant per

formance measurements can be off by

orders of magnitude.

DSPs come in many varieties, some that

are very good at computing FFTs, and oth

ers that are very good at I/O, etc. Determine

the most critical aspects of the design and

attempt to match that to a processor, if pos

sible. Attempt to match the following DSP

features to the application:

General-purpose processor feature

Advantages for DSP

Disadvantages Possible Solutions SIMD instruction

Increased function execution rate

Higher execution rate

Poor instruction execution efficiency

Set extensions (requires restructuring algorithms)

Addition of specialized DSP instructions

Faster DSP execution

More complex architecture

Higher clock speeds


Higher power consumption

Instead use H/W enhancements such as MAC units

Advanced architectures


Prediction time of program becomes harder to estimate

Use a simulator to estimate execution time (if one exists)

Some recent studies show general-pur- Table 1. General-purpose processor features

Reprinted from DSP Engineering / Summer

Single Print Only

■ CPU: CPUs can be fixed point, float

ing point for more scientific applica

tions, and optimized for FFT computa

tion as well as other features.

■ Direct Memory Access: DMA is used

in applications demanding high data

rate and I/O. A DSP designed for high

rate data transfer will have one or more

DMA controllers that may be used to

transfer data without the intervention

of the CPU.

■ Memory access: The basic types of

processor architecture are von Neu

mann and Harvard. Von Neumann

architectures are the traditional design,

using one interface to data and pro

gram space. The Harvard architecture

uses two buses to allow simultaneous

access to both data and program space

in one cycle. This setup results in,

effectively, an instruction executed in a

single cycle.

■ On-chip memory: Internal memory is a

valuable resource for DSPs. This mem

ory is used to store intermediate vari

ables and is much faster to access than

external memory. Effective manage

ment and use of on-chip memory may

result in significant performance

improvements.

■ I/O port: DSPs designed to support

high data throughput have one or more

communication ports to allow fast

transfer of data in and out of the

processor. I/O ports are generally con

trolled by an associated DMA con

troller, allowing data to be streamed in

and out of the processor while the CPU

is busy crunching data. In real-time

applications for the military, support

of multiprocessing configurations and

the need for high bandwidth I/O is

important.

Real-Time Operating Systems (RTOS) One of the key elements driving DSP

solutions to higher and higher levels of

performance has been the evolution of

RTOSs. In fact, some would argue that

operating systems have evolved to the

point that developing code for multi

processor DSP applications is a trivial

extension to just programming a single

processor. It is now becoming advan

tageous to purchase a commercial-off

the-shelf (COTS) RTOS instead of

developing an operating sytem in house.

Real-time operating systems are now

being built specifically for DSPs. The

main features of these operating systems

include:

■ Preemptive priority-based real-time

multitasking

■ Deterministic critical times

■ Time-out parameters on blocking

primitives

■ Memory management

■ Synchronization mechanisms

■ Inter-process communication

mechanisms

■ Special memory allocation for DSPs

(on-chip)

■ Low interrupt latency

■ Asynchronous, device independent,

low overhead I/O

Tools DSP processors, like many of the general

purpose processors, come with a standard

set of tools provided by the chip manu

facturer. Third-party vendors supply

enhanced tool suites that generally are the

standard tool suite with an interactive GUI

wrapped around them. There are other

tools that can be useful for developing

DSP-based systems, including simulators

and emulators. These two tools will be the

topic of this section. Simulators Software simulators are available for

many common DSPs. These tools let the

engineer begin development and integra

tion of software without the DSP and

associated hardware. Simulators are more

common in DSP applications because the

algorithms that typically run on a DSP are

complex and mathematically oriented.

This setup leaves many areas open to

make errors in design and implementa

tion. Simulators also allow the engineer to

examine the device operation easily, with

out having to buy the device in advance.

Software simulators for DSPs will gener

ally consist of a high-level language

debugger and the actual DSP simulation

engine. The simulation engine is a soft

ware model of the DSP device. Simulators

are very useful in the early development

phases of software development. These

tools are relatively slow due to the all-soft

ware implementation of the DSP device.

Therefore, one would not want to com

pletely simulate very large applications.

However, for prototyping and proof of

concept, simulators are very helpful.

Engineers use the typical instruction-level

simulator for high-level functional verifi

cation. These simulators, although rela

tively fast as far as execution rate, should

not be used for performance analysis.

These tools provide the following capabil

ities to the software designer:

■ Analysis of software functionality

■ Code tracing capability

■ Analysis and porting of operating

systems

Another tool very useful to the DSP devel

oper is the cycle accura simulator or

VHDL simulator. These tools are not

always available to the designer. Whereas

most simulators are an instruction-accurate

implementation of the device, a VHDL

level simulation is usually a cycle-accurate

implementation of the device and possibly

of some of its peripherals. These tools

model delays in memory accesses, pipeline

stalls, and all other hardware-related func

tions that simulators ignore. Therefore, to

obtain accurate execution estimates, a

cycle accurate simulator is the preferred

tool to use. It allows precise timing mea

surements and system behavior. Although

slower because of the processing required

to simulate every cycle of the processor,

there are a couple of big advantages:

■ Modeling of all aspects of the target

processor (pipeline, cache, memory

access, etc.)

■ Capability to attach external peripher

als to the simulator

In the past, in-circuit emulators were the

only tool available to assess system per

formance. These emulators required the

design to be committed to silicon. With

the available power of today’s PC and

low-end workstations, simulation is now

possible to do at a relatively cheap price.

Now it is possible to simulate and verify

much of the functionality before ever

committing to the design. Because simu

lators are becoming more accurate, many

programmers today are using simulation

to verify much of their designs.

Most of today’s simulators allow simula

tion of entire systems. This includes but is

not limited to:

■ The processor

■ On-chip peripherals

■ System-level peripherals

■ Other peripheral hardware devices

■ The operating system

■ Application software

Simulation can be done at various levels

of abstraction, depending on the phase of

a program. There is an accuracy versus

performance trade-off when modeling at

different levels of abstraction. In addition,

regardless of which level of abstraction,

keep in mind that simulators cannot model

everything and should not be a replace

ment for running on the real hardware.

Emulators Another very useful tool for DSP develop

ers is the emulator. The purpose of an

emulator is to provide the engineer access

to the DSP(s) and its peripherals in a non

intrusive way to aid in debugging opera

tions and hardware/software integration.

Emulators allow engineers easy access to

hardware registers and memory, allowing

reading and writing to these locations.

Emulators also support other common


Single Print Only

functions, such as breakpoints, single

stepping, and benchmarking. Most emula

tors are non-intrusive both spatially and

temporally. Spatially non-intrusive means

the emulator does not require any addi

tional hardware or software in the target

environment. Temporally non-intrusive

means the emulator does not prevent the

processor or system from executing at its

full speed. These two requirements are

very important when performing hard

ware/software integration.

Because of the shrinking die size in DSP

processors (as well as other chips), manu

facturers are not starting to put emulation

logic in the chip itself. A common

chip/emulator interconnect standard being

used today is the Joint Test Action Group

(JTAG) interface. This interface provides

the ability to perform board-level testing

and requires some on-chip logic to imple

ment (Figure 4).

Emulation tools can also support parallel

processing applications. This is one area

where DSP emulation tools provide a big

advantage over general-purpose proces

sors. In parallel processing systems, the

scan interconnection is daisy chained

between the various processors. An emu

lator controls each of the DSPs. A multi

tasking operating system controls each of

the separate DSPs in a separate window

(the number of windows can become a lit

tle cumbersome as the number of devices

being emulated grows) (Figure 5).

There are many documented cases of soft

ware developers screaming that the hard

ware is broken because their software,

which ran fine on the simulator, is not work

ing on the emulator. In many cases, it was

the software that was broken. Emulators

catch many timing related problems with

software that is not (cannot) be found on a

simulator. This is because simulators are

only instruction-level accurate. Running on

the real hardware is a completely different

story. The development environment for

DSP applications can either be a PC or a

workstation. Tools exist for either of these

platforms.

Programming Issues There are many reasons why program

ming in assembly language should not be

done:

■ It is not very portable.

■ It increases time to market.

■ It is harder to maintain.

■ It is harder to write.

In some industries, there are even

requirements limiting the use of assem

bly language to a certain total percentage

of the code. This requirement is mainly

for maintainability and life-cycle cost

issues. DSP manufacturers are beginning

to recognize the limitations in this area

and are designing and bringing to market

more sophisticated tools for their DSP, as

well as more portable languages. It is

now more common to see C/C++ com

pilers for DSP as well as Java environ

ments. Parallel devices and VLIW archi

tectures make the job of efficient

assembly programming an order of mag

nitude harder. It is almost impossible to

manually pipeline and optimize an algo

rithm in a VLIW device without a cycle

accurate simulator and a lot of trial and

error.

In order to alleviate these problems, tools

are being developed to shield the devel

oper from having to worry about the

explicit parallelism in the chip architec

ture. For example, an assembly language

optimizer developed for the TMS320C6

xx VLIW device allows a programmer to

write a serial assembly language imple

mentation and then make it parallel to run

efficiently on the device. The programmer

does not need to allocate specific registers

(a virtual register set is used). Although

this tool eliminates much of the complex

ity of assembly language programming of these devices, it still cannot totally replace

manual programming.

Even when using a high-order language

such as C, efficient implementations of

algorithms can sometimes make a solu

tion extremely difficult to get working

correctly. Even more so in DSP applica

tions, because of the algorithmic nature of

many of the applications, the phrase,

“make it run correctly, then make it run

fast,” should be the approach when devel

oping algorithmically intensive applica

tions, if at all possible. The template for

developing efficient real-time code is

shown in Figure 6 and consists of itera

tions of C-level optimization, followed by

assembly language as a last resort.

In summary Developing DSP applications typically

involves a real-time application, for which

DSPs are well suited. Using simulation,

emulation, and other modeling tech

niques, such as rate monotonic analysis,

will allow early analysis of the system

before committing to hardware and soft

ware. Throughput estimates are a main

driver for determining how much hard

ware will be required to implement the

solution. Even with careful use of these

Figure 4. JTAG boundary scan emulation

Figure 5. Emulation for a multiuser, multiprocessor system


Single Print Only

Figure 6. Code development model for efficient DSP programming

Figure 7. Life cycle of a throughput estimate

Figure 8. Components of a DSP application

algorithm design

code develop

block diagram code gen

compile link

debug optimize

Figure 9. Development steps for DSP applications

Reprinted from DSP Engineering / Summer 2001

tools, there will likely be surprises along

the way that will cause throughput to go

up. Early analysis will, hopefully, provide

the necessary time to recover from these

surprises (Figure 7).

A DSP application has several compo

nents (Figure 8):

■ The application

■ An operating system (COTS or

in-house)

■ A set of libraries and Application

Programming Interfaces (APIs)

(COTS or in-house)

■ The DSP processor and hardware

■ The development host, operating

system, and debugger

■ Simulation, emulation, and modeling

tools

The steps involved in the development of

DSP systems are shown in Figure 9. Tools

are available to help the software designer

through these steps and to integrate them

into an effective development environment.

Robert Oshana is a project manager at Texas Instruments and has been developing embedded systems for more than 18 years. Oshana teaches a variety of graduate

level software engineering courses at Southern Methodist University. He has contributed frequently to Embedded Systems Programming and is a regular speaker at the Embedded Systems Conference. Oshana has MSEE, MBA, and MSCS degrees.

For further information about the company and its products, visit the Web site at www.ti.com.

References Blalock, Garrick, “General Purpose uPs for DSP applications: consider the tradeoffs,” EDN, October 23, 1997.

Hakkarainen, Harri, “Evaluating the TMS320C62xx for Comm Applications,” Communication Systems Design, October 1997.

Levy, Markus, “Virtual Processors and the Reality of Software Simulation,” EDN, January 15, 1998.

Mayer, John H., “Sophisticated tools bring real-time DSP applications to market,” Military and Aerospace Electronics, January 1998.

Stearns, Samuel D. and Ruth David, Signal Processing Algorithms in MAT-LAB, Prentice Hall 1996.

Copyright 2001 DSP Engineering

Single Print Only

© 2008 OpenSystems Publishing. Not Licensed for distribution. Visit opensystems-publishing.com/reprints for copyright permissions.

Embedded systems programming using digital signal...

Documents

Transcript of Embedded systems programming using digital signal...