Embedded systems programming using digital signal...
Transcript of Embedded systems programming using digital signal...
Digital Signal Processing (DSP) is the method of processing signals and data in order to enhance or modify those signals or to analyze those signals to determine specific information content. A typical DSP system (Figure 1) consists of a processor and other hardware used to convert outside analog signals to digital form and possibly back to analog (continuous) form. There is nothing mystical about DSPs. Think of a DSP as an application-specific microprocessor. The applications that these devices are good for are digital signal algorithms, which usually are not on general-purpose processors because of their complexity. The software development issues associated with these devices, however, are similar to other general-purpose processors.
DSPs are becoming common in all prod
uct areas, including military and aero
space systems, embedded applications,
and PC-based systems.
Is your application right for a DSP? Engineers developed digital signal proces
sors to solve a particular class of problem.
DSP is a way to represent signals as
ordered sequences of numbers and tech
niques to process those sequences. Some
of the important reasons to do signal pro
cessing include elimination or reduction
of unwanted interference, estimation of
signal characteristics, and transformation
of signals to produce more important
information. Some of the common appli
cations of DSP include:
■ Radar and sonar
■ Communications systems
■ Process control
■ Image processing
■ Audio applications
Each DSP application is different. One
of the first tasks of the system designer
is to determine how much processor is
required to perform the job. Figure 2
shows typical performance ranges for
some DSP applications. Simple control
based applications do not need high-per
formance DSPs, whereas higher perfor
mance ranges are required for applications
such as radar and sonar.
Choosing a processor Digital signal processors are in many
ways similar to general purpose proces
sors. One of the main differences between
the two is that the DSP is typically opti
mized to process certain signal processing
functions very fast, like filtering and Fast
Fourier Transforms (FFT). Many steps in
these functions are executed with single
cycle instructions in a DSP. Engineers
designed DSPs with scalability in mind.
Many real-time signal processing func
tions decompose their processing over
many processors in order to gain a signif
icant improvement in processing time.
This feature can operate either temporally
or spatially (Figure 3). Processors that
house these functions need to have fast
inter-processor communication. In the
DSP processor, this fast communication is
usually accomplished using high-speed
I/O ports and shared memory.
There are an increasingly larger number
of options available to system designers
for a general-purpose DSP. These include:
■ Application Specific Integrated Circuits
(ASICs): These devices can function as
DSP co-processors but are not very flex
ible for many general-purpose signal
processing needs.
■ RISC processors: The extremely fast
clock speed for these devices allows
them to perform well in certain DSP
applications. Scalability and other real
time (predictability) issues still remain
for these devices. DSPs are specifically
made to handle real-time deterministic
applications.
■ Field Programmable Gate Arrays
(FPGA): These devices are very fast
and can do certain DSP functions very
quickly. However, they are also more
difficult to develop in comparison to a
DSP where a simple program can do
the same function.
■ Host signal processing: This area is
becoming a more popular area in the
DSP arena. Host signal processing
generally refers to executing DSP
algorithms on a PC (referred to as the
host). Many of the lower-end multime
dia applications function in this way,
Figure 1. A DSP system
KHz KHz KHz MHz MHz MHz
Figure 2. Performance ranges for DSP applications
Reprinted from DSP Engineering / Summer 2001 Copyright 2001 DSP Engineering
Single Print Only
AD
Ixthos
4/C
PIC VOL 3, ISSUE 1, PG 24
RSC #25 Reprinted from DSP Engineering / Summer 2001 Copyright 2001 DSP Engineering
Single Print Only
Figure 3. Spatial and temporal decomposition
but PC-based DSP still lags behind a
true DSP solution for high-perfor
mance DSP applications.
Because of the complexity of today’s sig
nal processing applications and the need
to upgrade often, a programmable device
such as a DSP has become an attractive
alternative rather than a customized hard
ware solution.
There are several factors that should be
considered when making a processor
selection. Some of them are:
■ Cost
■ Scalability
■ Programming requirements
■ Algorithm complexity and type
■ Tools support
■ Time to market
■ Performance
■ Power consumption
■ Memory usage
■ I/O requirements
Until recently, the application dictated
which processor to choose. Complex sig
nal processing algorithms required DSPs
because of the built-in signal processing
architecture, which made performance for
these classes of algorithms much better.
Control-based or finite state machine soft
ware applications chose a general-purpose
processor. However, now the speed of
general-purpose processors has increased
to the point where many signal-processing
applications previously unable to run on a
general-purpose processor can now exe
cute with excellent performance. In addi
tion, general-purpose processor manufac
turers have been adding more RISC-like
instructions and capabilities to their
processors to capture the increasing mar
ket for signal processing applications.
DSP manufacturers, likewise, are adding
more CISC-like instructions to their
processors to provide full system solu
tions (control software as well as signal
processing software) on a single chip.
pose processors performing better than
their DSP counterparts in some algorithm
benchmarks. However, in general, it takes
many more instruction cycles for a gen
eral-purpose processor to implement a
signal processing algorithm than it does
for a DSP.
One thing to keep in mind is the program
ming complexity of the processor you
choose. A superscalar processor is much
harder to program at the assembly level
than a single pipelined processor. Chip
designers and vendors are providing more
sophisticated development tools to allevi
ate some of these problems. However, if
performance and throughput are impor
tant factors in the application and assem
bly language is the programming choice,
development time could go up signifi
cantly depending on the processor archi
tecture (Table 1).
Another factor in processor choice is the
required memory for an application.
Typically, in RISC processors, the
required memory to run an application
goes up. This is because it takes more
RISC instructions to execute a particular
algorithm than a conventional CISC
processor. In some cases, the increase
can be dramatic. All of the research,
benchmarking, and prototyping should be
used to determine memory requirements
for a particular application. There is
always the possibility of trading perfor
mance for memory, and memory can be
optimized at the cost of application per
formance. Since DSP algorithms involve
performing tight loops of operations over
many data points, if just these small ker
nels are optimized, the performance
improvement is substantial.
How much is enough? Many manufacturers of processors adver
tise the speed of their processor in terms
of how many operations or instructions it
can perform per unit of time (usually a
second). Although these quotes might be
true for an ideal case, many times the
actual performance is much lower. What
is more important is how fast will the
application and algorithms run on the
device within the rest of the system. If a
manufacturer decides to use advertised
benchmarks, it should attempt to choose
benchmarks that are similar to algorithms
that it will use in the design. It makes a big
difference when trying to determine how
much processor the manufacturer needs.
This is especially true when using some of
the higher performance DSPs with opti
mizing compilers. Subtle differences in
algorithm structure can be the difference
in triggering the compiler to optimize a
particular piece of code. The resultant per
formance measurements can be off by
orders of magnitude.
DSPs come in many varieties, some that
are very good at computing FFTs, and oth
ers that are very good at I/O, etc. Determine
the most critical aspects of the design and
attempt to match that to a processor, if pos
sible. Attempt to match the following DSP
features to the application:
General-purpose processor feature
Advantages for DSP
Disadvantages Possible Solutions SIMD instruction
Increased function execution rate
Higher execution rate
Poor instruction execution efficiency
Set extensions (requires restructuring algorithms)
Addition of specialized DSP instructions
Faster DSP execution
More complex architecture
Higher clock speeds
Higher execution rate
Higher power consumption
Instead use H/W enhancements such as MAC units
Advanced architectures
Higher execution rate
Prediction time of program becomes harder to estimate
Use a simulator to estimate execution time (if one exists)
Some recent studies show general-pur- Table 1. General-purpose processor features
Reprinted from DSP Engineering / Summer
Single Print Only
■ CPU: CPUs can be fixed point, float
ing point for more scientific applica
tions, and optimized for FFT computa
tion as well as other features.
■ Direct Memory Access: DMA is used
in applications demanding high data
rate and I/O. A DSP designed for high
rate data transfer will have one or more
DMA controllers that may be used to
transfer data without the intervention
of the CPU.
■ Memory access: The basic types of
processor architecture are von Neu
mann and Harvard. Von Neumann
architectures are the traditional design,
using one interface to data and pro
gram space. The Harvard architecture
uses two buses to allow simultaneous
access to both data and program space
in one cycle. This setup results in,
effectively, an instruction executed in a
single cycle.
■ On-chip memory: Internal memory is a
valuable resource for DSPs. This mem
ory is used to store intermediate vari
ables and is much faster to access than
external memory. Effective manage
ment and use of on-chip memory may
result in significant performance
improvements.
■ I/O port: DSPs designed to support
high data throughput have one or more
communication ports to allow fast
transfer of data in and out of the
processor. I/O ports are generally con
trolled by an associated DMA con
troller, allowing data to be streamed in
and out of the processor while the CPU
is busy crunching data. In real-time
applications for the military, support
of multiprocessing configurations and
the need for high bandwidth I/O is
important.
Real-Time Operating Systems (RTOS) One of the key elements driving DSP
solutions to higher and higher levels of
performance has been the evolution of
RTOSs. In fact, some would argue that
operating systems have evolved to the
point that developing code for multi
processor DSP applications is a trivial
extension to just programming a single
processor. It is now becoming advan
tageous to purchase a commercial-off
the-shelf (COTS) RTOS instead of
developing an operating sytem in house.
Real-time operating systems are now
being built specifically for DSPs. The
main features of these operating systems
include:
■ Preemptive priority-based real-time
multitasking
■ Deterministic critical times
■ Time-out parameters on blocking
primitives
■ Memory management
■ Synchronization mechanisms
■ Inter-process communication
mechanisms
■ Special memory allocation for DSPs
(on-chip)
■ Low interrupt latency
■ Asynchronous, device independent,
low overhead I/O
Tools DSP processors, like many of the general
purpose processors, come with a standard
set of tools provided by the chip manu
facturer. Third-party vendors supply
enhanced tool suites that generally are the
standard tool suite with an interactive GUI
wrapped around them. There are other
tools that can be useful for developing
DSP-based systems, including simulators
and emulators. These two tools will be the
topic of this section. Simulators Software simulators are available for
many common DSPs. These tools let the
engineer begin development and integra
tion of software without the DSP and
associated hardware. Simulators are more
common in DSP applications because the
algorithms that typically run on a DSP are
complex and mathematically oriented.
This setup leaves many areas open to
make errors in design and implementa
tion. Simulators also allow the engineer to
examine the device operation easily, with
out having to buy the device in advance.
Software simulators for DSPs will gener
ally consist of a high-level language
debugger and the actual DSP simulation
engine. The simulation engine is a soft
ware model of the DSP device. Simulators
are very useful in the early development
phases of software development. These
tools are relatively slow due to the all-soft
ware implementation of the DSP device.
Therefore, one would not want to com
pletely simulate very large applications.
However, for prototyping and proof of
concept, simulators are very helpful.
Engineers use the typical instruction-level
simulator for high-level functional verifi
cation. These simulators, although rela
tively fast as far as execution rate, should
not be used for performance analysis.
These tools provide the following capabil
ities to the software designer:
■ Analysis of software functionality
■ Code tracing capability
■ Analysis and porting of operating
systems
Another tool very useful to the DSP devel
oper is the cycle accura simulator or
VHDL simulator. These tools are not
always available to the designer. Whereas
most simulators are an instruction-accurate
implementation of the device, a VHDL
level simulation is usually a cycle-accurate
implementation of the device and possibly
of some of its peripherals. These tools
model delays in memory accesses, pipeline
stalls, and all other hardware-related func
tions that simulators ignore. Therefore, to
obtain accurate execution estimates, a
cycle accurate simulator is the preferred
tool to use. It allows precise timing mea
surements and system behavior. Although
slower because of the processing required
to simulate every cycle of the processor,
there are a couple of big advantages:
■ Modeling of all aspects of the target
processor (pipeline, cache, memory
access, etc.)
■ Capability to attach external peripher
als to the simulator
In the past, in-circuit emulators were the
only tool available to assess system per
formance. These emulators required the
design to be committed to silicon. With
the available power of today’s PC and
low-end workstations, simulation is now
possible to do at a relatively cheap price.
Now it is possible to simulate and verify
much of the functionality before ever
committing to the design. Because simu
lators are becoming more accurate, many
programmers today are using simulation
to verify much of their designs.
Most of today’s simulators allow simula
tion of entire systems. This includes but is
not limited to:
■ The processor
■ On-chip peripherals
■ System-level peripherals
■ Other peripheral hardware devices
■ The operating system
■ Application software
Simulation can be done at various levels
of abstraction, depending on the phase of
a program. There is an accuracy versus
performance trade-off when modeling at
different levels of abstraction. In addition,
regardless of which level of abstraction,
keep in mind that simulators cannot model
everything and should not be a replace
ment for running on the real hardware.
Emulators Another very useful tool for DSP develop
ers is the emulator. The purpose of an
emulator is to provide the engineer access
to the DSP(s) and its peripherals in a non
intrusive way to aid in debugging opera
tions and hardware/software integration.
Emulators allow engineers easy access to
hardware registers and memory, allowing
reading and writing to these locations.
Emulators also support other common
Reprinted from DSP Engineering / Summer 2001 Copyright 2001 DSP Engineering
Single Print Only
functions, such as breakpoints, single
stepping, and benchmarking. Most emula
tors are non-intrusive both spatially and
temporally. Spatially non-intrusive means
the emulator does not require any addi
tional hardware or software in the target
environment. Temporally non-intrusive
means the emulator does not prevent the
processor or system from executing at its
full speed. These two requirements are
very important when performing hard
ware/software integration.
Because of the shrinking die size in DSP
processors (as well as other chips), manu
facturers are not starting to put emulation
logic in the chip itself. A common
chip/emulator interconnect standard being
used today is the Joint Test Action Group
(JTAG) interface. This interface provides
the ability to perform board-level testing
and requires some on-chip logic to imple
ment (Figure 4).
Emulation tools can also support parallel
processing applications. This is one area
where DSP emulation tools provide a big
advantage over general-purpose proces
sors. In parallel processing systems, the
scan interconnection is daisy chained
between the various processors. An emu
lator controls each of the DSPs. A multi
tasking operating system controls each of
the separate DSPs in a separate window
(the number of windows can become a lit
tle cumbersome as the number of devices
being emulated grows) (Figure 5).
There are many documented cases of soft
ware developers screaming that the hard
ware is broken because their software,
which ran fine on the simulator, is not work
ing on the emulator. In many cases, it was
the software that was broken. Emulators
catch many timing related problems with
software that is not (cannot) be found on a
simulator. This is because simulators are
only instruction-level accurate. Running on
the real hardware is a completely different
story. The development environment for
DSP applications can either be a PC or a
workstation. Tools exist for either of these
platforms.
Programming Issues There are many reasons why program
ming in assembly language should not be
done:
■ It is not very portable.
■ It increases time to market.
■ It is harder to maintain.
■ It is harder to write.
In some industries, there are even
requirements limiting the use of assem
bly language to a certain total percentage
of the code. This requirement is mainly
for maintainability and life-cycle cost
issues. DSP manufacturers are beginning
to recognize the limitations in this area
and are designing and bringing to market
more sophisticated tools for their DSP, as
well as more portable languages. It is
now more common to see C/C++ com
pilers for DSP as well as Java environ
ments. Parallel devices and VLIW archi
tectures make the job of efficient
assembly programming an order of mag
nitude harder. It is almost impossible to
manually pipeline and optimize an algo
rithm in a VLIW device without a cycle
accurate simulator and a lot of trial and
error.
In order to alleviate these problems, tools
are being developed to shield the devel
oper from having to worry about the
explicit parallelism in the chip architec
ture. For example, an assembly language
optimizer developed for the TMS320C6
xx VLIW device allows a programmer to
write a serial assembly language imple
mentation and then make it parallel to run
efficiently on the device. The programmer
does not need to allocate specific registers
(a virtual register set is used). Although
this tool eliminates much of the complex
ity of assembly language programming of these devices, it still cannot totally replace
manual programming.
Even when using a high-order language
such as C, efficient implementations of
algorithms can sometimes make a solu
tion extremely difficult to get working
correctly. Even more so in DSP applica
tions, because of the algorithmic nature of
many of the applications, the phrase,
“make it run correctly, then make it run
fast,” should be the approach when devel
oping algorithmically intensive applica
tions, if at all possible. The template for
developing efficient real-time code is
shown in Figure 6 and consists of itera
tions of C-level optimization, followed by
assembly language as a last resort.
In summary Developing DSP applications typically
involves a real-time application, for which
DSPs are well suited. Using simulation,
emulation, and other modeling tech
niques, such as rate monotonic analysis,
will allow early analysis of the system
before committing to hardware and soft
ware. Throughput estimates are a main
driver for determining how much hard
ware will be required to implement the
solution. Even with careful use of these
Figure 4. JTAG boundary scan emulation
Figure 5. Emulation for a multiuser, multiprocessor system
Reprinted from DSP Engineering / Summer 2001 Copyright 2001 DSP Engineering
Single Print Only
Figure 6. Code development model for efficient DSP programming
Figure 7. Life cycle of a throughput estimate
Figure 8. Components of a DSP application
algorithm design
code develop
block diagram code gen
compile link
debug optimize
Figure 9. Development steps for DSP applications
Reprinted from DSP Engineering / Summer 2001
tools, there will likely be surprises along
the way that will cause throughput to go
up. Early analysis will, hopefully, provide
the necessary time to recover from these
surprises (Figure 7).
A DSP application has several compo
nents (Figure 8):
■ The application
■ An operating system (COTS or
in-house)
■ A set of libraries and Application
Programming Interfaces (APIs)
(COTS or in-house)
■ The DSP processor and hardware
■ The development host, operating
system, and debugger
■ Simulation, emulation, and modeling
tools
The steps involved in the development of
DSP systems are shown in Figure 9. Tools
are available to help the software designer
through these steps and to integrate them
into an effective development environment.
Robert Oshana is a project manager at Texas Instruments and has been developing embedded systems for more than 18 years. Oshana teaches a variety of graduate
level software engineering courses at Southern Methodist University. He has contributed frequently to Embedded Systems Programming and is a regular speaker at the Embedded Systems Conference. Oshana has MSEE, MBA, and MSCS degrees.
For further information about the company and its products, visit the Web site at www.ti.com.
References Blalock, Garrick, “General Purpose uPs for DSP applications: consider the tradeoffs,” EDN, October 23, 1997.
Hakkarainen, Harri, “Evaluating the TMS320C62xx for Comm Applications,” Communication Systems Design, October 1997.
Levy, Markus, “Virtual Processors and the Reality of Software Simulation,” EDN, January 15, 1998.
Mayer, John H., “Sophisticated tools bring real-time DSP applications to market,” Military and Aerospace Electronics, January 1998.
Stearns, Samuel D. and Ruth David, Signal Processing Algorithms in MAT-LAB, Prentice Hall 1996.
Copyright 2001 DSP Engineering
Single Print Only
© 2008 OpenSystems Publishing. Not Licensed for distribution. Visit opensystems-publishing.com/reprints for copyright permissions.