Salvetat Richard · Barrel Shifter Sixteen 16-bit Math registers / Eight 32-bit Math Registers Two...
Transcript of Salvetat Richard · Barrel Shifter Sixteen 16-bit Math registers / Eight 32-bit Math Registers Two...
The World Leader in High-Performance Signal Processing Solutions
Developing with Digital Signal Processor
Salvetat RichardSEMINAIRE Captronic – INSAVALOR
10 Mars 2011
S
Stage DSP Mars 2011, Lyon
Agenda
IntroductionSection 1: DSP “Demystification” Section 2: DSP Core Architecture OverviewSection 3: DSP Core Example: BlackFin BF53x Section 4: Memory Manager / DMASection 5: Program Sequencer / Interrupt ManagerSection 6: I/O EmbeddedSection 7: Software environmentSection 8: Compiler C, Rules for an optimal software ConclusionTRAVAUX PRATIQUES
The World Leader in High-Performance Signal Processing Solutions
Section 1DSP “demystification”
Stage DSP Mars 2011, Lyon
Digital sampling of an analogue signal
So what problem has to be solved?
BPFor
LPF
N-BITADC
N-BITDACDSP
fa
fs fs
BPFor
LPF
A
t
Most DSP algorithms can be expressed with MAC:
i
count
ii xaY *
1∑
=
=
Stage DSP Mars 2011, Lyon
Typical DSP Algorithms
Complex Fourier Transform
Infinite Impulse Response Filters (IIR)
Convolution
∑−
=
−⋅=1
0
/2N
n
Nnmjnm exX π
∑ ∑= =
−− ⋅+⋅=N
k
N
kknkknkn ybxay
0 1
∑−+
=−⋅=
2
0
kn NN
kknkn xhy
Stage DSP Mars 2011, Lyon
Mathematical Primitive
Basic Problem of most DSP Algorithms
Multiply and Accumulate (MAC) Operationy:=0;FOR n:=0 TO N-1 DO
y:= y + a[n]·x[n];
∑−
=
⋅=1
0
N
nnn xay
Stage DSP Mars 2011, Lyon
The Harvard Architecture
Harvard Architecture: Simultaneous Access of Data and Instruction
Modified Harvard Architecture: Single-cycle Access of 2 Pieces of Data and 1 Instruction Three Bus Performance
DSPµProcessor
Data Storage
DM
Program Storage
PM
Data Data
Address Address
Stage DSP Mars 2011, Lyon 8
What are the characteristics of a good DSP?
Fast, flexible arithmetic computation units Unconstrained data flow to and from the computation
units Extended precision and dynamic range in the
computation units Dual address generators Efficient program sequencing Ease of programming Efficient I/O Processing
Stage DSP Mars 2011, Lyon
Floating vs. Fixed point processors Applications which require:
High precision. Wide dynamic range. High signal-to-noise ratio. Ease of use.
Need a floating point processor. Drawback of floating point processors:
Higher power consumption. Can be more expensive. Can be slower than fixed-point counterparts
and larger in size.
Stage DSP Mars 2011, Lyon
General Purpose DSP vs. DSP in ASIC Application Specific Integrated Circuits (ASICs) are
semiconductors designed for dedicated functions. The advantages and disadvantages of using ASICs
are listed below:
AdvantagesAdvantages• High throughputHigh throughput• Lower silicon areaLower silicon area• Lower power consumptionLower power consumption• Improved reliabilityImproved reliability• Reduction in system noiseReduction in system noise• Low overall system costLow overall system cost
DisadvantagesDisadvantages• High investment costHigh investment cost• Less flexibilityLess flexibility• Long time from design to Long time from design to
marketmarket
The World Leader in High-Performance Signal Processing Solutions
Section 2DSP Core Architecture Overview
Stage DSP Mars 2011, Lyon
Different Needs? Multiple Families
TigerSHARCHigh Performance
SHARCLow Cost
Floating Point
Per
form
ance
ADSP-21xxPower efficient
Fixed Point
BlackfinMedia enabled
Fixed Point
Wired Voice Wireless Voice VOIP/VON Industrial Control
Image compression3G TerminalsDigital Still/Video CameraMMOIPBiometrics
2.5G/3G InfrastructureMedical ImagingIndustrial ImagingMultiprocessing
AudioInfotainmentIndustrial
Power
Stage DSP Mars 2011, Lyon
Configurable Memory System
Supports a Cache Memory Model and an SRAM Memory Model Sustained Dual Data Accesses for DSP Applications Supports accesses of 8, 16, 32 bit Data Separate Multi-ported L1 Instruction and Data Memories
Processor Core
L1 InstructionSRAM & Cache
DMA
L2Instruction
& DataSRAM
L1 Data SRAM & Cache
Scratchpad SRAM
Stage DSP Mars 2011, Lyon
The Blackfin Core –Microcontroller and DSP?
Acc1
40BarrelShifter
Acc0
40
16168 8 8 8
Address Arithmetic Unit
DAG0 DAG1
I3 L3 B3 M3I2 L2 B2 M2I1 L1 B1 M1I0 L0 B0 M0
P0P1P2P3P4P5FPSP
R0R1R2R3R4R5R6R7
Data Arithmetic Unit
Sequencer
Two 16-bit MultipliersTwo 32-bit ALUsFour 8-bit Video ALUsBarrel ShifterSixteen 16-bit Math registers /Eight 32-bit Math Registers
Two DAGs, byte addressingEight 32-bit pointer registersFour Sets of 32-bit Index, Modify, Length, Base
16-bit Instructions32-bit InstructionsMulti-Issue, 64-bit Instructions
Interlocked Pipeline
Stage DSP Mars 2011, Lyon
Great Performance ValueHighest Frequency (600MHz) Highest MMAC/$ Lowest mW/MMAC of any conventional DSP (0.15 mW/MMAC)
High System Integration Video I/O connects directly to ITU-R 656 encoders and decodersSPORTs support 8 Channels of I2S AudioCore Voltage RegulatorMicrocontroller features include WDT, RTC, SDRAM controller
Up to 600MHzBlackfin
Processor Core
SDRAM
FLASH/SRAMInterfaces
RTC
Watchdog
JTAG
System Peripherals
Up to80KBytesPM
4KBytes
Enhanced DMA
SPI 1
UART 1
Timers 3
GPIO 16
User Peripherals
PLL
Dynamic Power
Management
EnhancedSPORTs 2
PPIVideo I/O Switching
Regulator
Memory
ADSP-BF531/BF532/BF533Enhanced Blackfin Processors
32KBytesPMROM
Up to64KBytesDM
The World Leader in High-Performance Signal Processing Solutions
Registers and
Data Types
Stage DSP Mars 2011, Lyon 17
32-bit Fixed-Point Formats
Bit 31 30 29 2 1 0Weight -231 230 229 ··· 22 21 20
Sign Signed Integer bit
Bit 31 30 29 2 1 0Weight 231 230 229 ··· 22 21 20
Unsigned Integer
Bit 31 30 29 2 1 0Weight -20 2-1 2-2 ··· 2-29 2-30 2-31
Sign Signed Fractional bit
Bit 31 30 29 2 1 0Weight 2-1 2-2 2-3 ··· 2-30 2-31 2-32
Unsigned Fractional
examples -3 = 0xfffffffd 3 = 0x00000003
2^31 = 0x80000000 3 = 0x00000003
.25 = 0x20000000 .75 = 0x60000000
.25 = 0x40000000 .75 = 0xc0000000
Stage DSP Mars 2011, Lyon 18
Floating-Point Formats
39 832-bit: s e7 e0 1.f22 f0
39 040-bit: s e7 e0 1.f30 f8 f7 f0
hidden bit
type exponent fraction value example representationNormal 1 ≤e≤254 any (-1)s(1.f)2e-127 0x3f800000 1 . 0Zero 0 0 zero 0x00000000 0 . 0Infinity 255 0 infinity 0x7f800000 1.# I N FNAN 255 non-zero undefined 0x7f800001 1.# N A NDenormal 0 any zero 0x00000001 1.# I N D
Stage DSP Mars 2011, Lyon 19
Integer/Fractional Fixed Point Multiplication• Integer Multiplier Fixed-Point Result Placement
0316379
INTEGER RESULTINTEGER RESULTOVERFLOW
MR2 MR1 MR0
ureg ZEROS
8 bits32 bits
Register File Placement
MRF or MRB Placement •
Binary Point
INTEGER RESULTOVERFLOW (is lost) •
ureg ZEROS
8 bits32 bits
0316379
FRACTIONAL RESULTOVERFLOW
MR2 MR1 MR0
• Fractional Multiplier Fixed-Point Result Placement
MRF or MRB Placement
Register File PlacementFRACTIONAL RESULT UNDERFLOW (is lost)
•Binary Point
FRACTIONAL RESULT
MV set
MV set
The World Leader in High-Performance Signal Processing Solutions
Section 3DSP Core Example
Stage DSP Mars 2011, Lyon
Accessing Registers
There are two ways to access registers on the ADSP-2153x
A majority of registers are memory mapped and must be accessed indirectly Core MMRs are used to configure the core registers
They are listed in Appendix A of the HRM System MMRs are used to configure all other peripherals
They are listed in Appendix B of the HRM The addresses of the core and system MMRs are part of the
def21533.h and defblkfin.h header files MMRs can only be accessed in Supervisor mode
The remaining registers are accessed directly, by name
Stage DSP Mars 2011, Lyon
System Registers
LT0LB0
Loop CounterLoop TopLoop Bottom
ASTAT
RETS
RETI
RETX
RETN
RETE
Arithmetic Status
Subroutine Return
Interrupt Return
Exception Return
NMI Return
Emulation Return
LT1LB1
System Config
Sequencer Status
SYSCFG
SEQSTAT
LC0
LC1
SystemRegisters
I0
I1
I2
I3
L0
L1
L2
L3
B0
B1
B2
B3
M0
M1
M2
M3
31 0 31 0 31 0 31 0
P0
P1
P2
P3
P4
P5
31 0
FP
SP
USP
Address Registers
R0
R1
R2
R3
R4
R5
R6
R7
R0.LR0.H
R1.LR1.H
R4.LR4.H
R7.LR7.H
1531
A1.H A1.L
A0.H A0.L
A1X
A0X
Data Registers
1531
Supervisor mode access only
Data Registers: R0-R7
Index Registers: I0-I3
Pointer Registers: P0-P5
Stage DSP Mars 2011, Lyon
Micro Signal Architecture Core
Acc1
40BarrelShifter
Acc0
40
16168 8 8 8
Address Arithmetic Unit
DAG0 DAG1
I3 L3 B3 M3I2 L2 B2 M2I1 L1 B1 M1I0 L0 B0 M0
P0P1P2P3P4P5FPSP
R0R1R2R3R4R5R6R7
Data Arithmetic Unit
Sequencer
Blackfin DSP Core based on the Micro Signal ArchitectureJointly Developed With Intel Corporation
• Two 16-bit Multipliers• Two 32/40-bit ALUs• Four 8-bit Video ALUs• Barrel Shifter• Sixteen 16-bit Math registers / Eight 32-bit Math Registers
• Two DAGs, byte addressing• Eight 32-bit pointer registers• Four Sets of 32-bit Index, Modify, Length, Base
• 16-bit Instructions• 32-bit Instructions• Multi-Issue, 64-bit Instructions
Stage DSP Mars 2011, Lyon
Arithmetic Logic Unit (ALU)
Data Arithmetic Unit
R0
R1
R2
R3
R4
R5
R6
R7
A1
40barrelshifter
A0
40
1616
8 8 8 8
Stage DSP Mars 2011, Lyon
Arithmetic Logic Unit (ALU)
Two ALUs operating on 16-bit, 32-bit, and 40-bit input operands and output 16-bit, 32-bit, and 40-bit results.
Functions Fixed-point addition and subtraction Addition and subtraction of immediate values Accumulator and subtraction of multiplier results Logical AND, OR, NOT, XOR, bitwise XOR, Negate Functions: ABS, MAX, MIN, Round, division primitives
Features Supports conditional instructions 8-bit video ALU operations
Stage DSP Mars 2011, Lyon
ALU Operations Single 16-Bit Operations
Single 16-bit Addition, Subtraction Operations Any two 16-bit register halves may be used as inputs. One 16-bit result is deposited in designated 16-bit register
half.
General Form:Dreg_lo_hi = Dreg_lo_hi + Dreg_lo_hi;
Example:R6.H = R3.H + R2.L; Single
16-bit addition 031 16
031 16
031 16
R2
R3
R6
+
Stage DSP Mars 2011, Lyon
ALU Operations Dual 16-Bit Operations Dual 16-bit Addition, Subtraction Operations
Any two 32-bit registers may be used as inputs. Two 16-bit results are deposited in designated 32-bit register.
General Form:Dreg = Dreg +|+ Dreg; Dreg = Dreg -|- Dreg;Dreg = Dreg +|- Dreg; Dreg = Dreg -|+ Dreg;
Example:R6 = R2 + | - R3;
Dual16-bit addition
31 16 0
R2
R3
R6+ -
Stage DSP Mars 2011, Lyon
ALU Operations Quad 16-Bit Operations Quad 16-bit Addition, Subtraction Operations
Any two 32-bit registers may be used as inputs. Four 16-bit results are deposited in two designated 32-bit
registers. General Form:
Dreg = Dreg +|+ Dreg, Dreg = Dreg -|- Dreg;Dreg = Dreg +|- Dreg, Dreg = Dreg -|+ Dreg;
Example:R3 = R0 + | + R1, R2 = R0 - | - R1;
31 16 0
R0
R1
+R3
+R2
- -
R0R1
31 16 0
Quad16-bit addition
Stage DSP Mars 2011, Lyon
ALU OperationsSingle 32-Bit Operations
Single 32-bit Addition, Subtraction Operations Any two 32-bit registers may be used as inputs. One 32-bit result is deposited in designated 32-bit register.
General Form:Dreg = Dreg + Dreg;Dreg = Dreg - Dreg;
Example:R6 = R2 + R3; 32-bit addition
031
031
031
R3
R6
+
R2
Stage DSP Mars 2011, Lyon
ALU OperationsDual 32-Bit Operations
Dual 32-bit Addition, Subtraction Operations Any two 32-bit registers may be used as inputs. Two 32-bit result is deposited in designated 32-bit register.
General Form:Dreg = Dreg + Dreg, Dreg = Dreg - Dreg;
Example: R3 = R1 + R2, R4 = R1 - R2;
R4
-
R1R2
31 0
R3
+
R1R2
31 0
Dual32-bit operation
Stage DSP Mars 2011, Lyon
Other ALU Operations
Rounding Instructions Dreg_lo_hi = Dreg (RND) eg. R1.L = R5 (RND)
Pointer Register Example Instructions P5 = P3 + P0; // add two 32-bit pointer registers P5 += -4; // add immediate value to P register
Video Alu Instructions 4 Adds or Subtracts with 8bit inputs
Stage DSP Mars 2011, Lyon
32-bit ALU Logical Operations
ANDGeneral Form:Dreg = Dreg & Dreg;Example:R4 = R4 & R3;
NOTGeneral Form:Dreg = ~Dreg;Example:R3 = ~ R4;
ORGeneral Form:Dreg = Dreg | Dreg;Example:R4 = R4 | R3;
XORGeneral Form:Dreg = Dreg ^ Dreg;Example:R4 = R4 ^ R3;
Stage DSP Mars 2011, Lyon
ASTAT - Register
Arithmetic Status will be hold in the ASTAT register
AZ: Zero ResultAN: Negative ResultAC0: Alu0 CarryAC1: Alu1 CarryAV0: A0 OverflowAV1: A1 OverflowCC: Condition CodeAQ: Quotient BitRND_MOD: Rounding Mode
Stage DSP Mars 2011, Lyon
Conditional Code (CC) Bit in ASTAT
CC bit is used in several instructions Action taken in the instruction depends on the value of CC
If CC jump here; // if cc = 1, jump to label "here"If CC R3 = R0; // perform move if cc=1
CC bit value is based on a comparison of two registers, pointers or accumulators
CC = R3 == R2;CC = R3 < R2;
CC bit can be moved to and from a data register or ASTAT bitCC = R1;R1 = CC;CC = AZ;
Stage DSP Mars 2011, Lyon
Multiply-Accumulators (MAC)
Data Arithmetic Unit
R0
R1
R2
R3
R4
R5
R6
R7
A1
40barrelshifter
A0
40
1616
8 8 8 8
Stage DSP Mars 2011, Lyon
Multiply-Accumulators (MAC)
Two identical MACs Each can perform fixed point multiplication and multiply-and-
accumulate operations on 16-bit fixed point input data and outputs 32-bit or 40-bit results depending the destination.
Functions Multiplication Multiply-and-accumulate with addition (optional rounding) Multiply-and-accumulate with subtraction (optional rounding) Dual versions of the above
Features Saturation of accumulator results Optional rounding of multiplier results
Stage DSP Mars 2011, Lyon
Multiplication Modes -- Fractional Mode
Mode 1: fractional mode Multiplier assumes all numbers in a 1.15 format Multiplier automatically shifts product 1-bit left before accumulation (Result forced to 1.31 format) Example: A0 = R0.L * R1.L (fu);
0x4000 0x4000R0.L R1.L
A0.LA0.HA0.X
A0.H
0x00 2000 0000
0x2000underflowoverflow
=0.5=0.5 =0.5=0.5
=0.25=0.25
Stage DSP Mars 2011, Lyon
Multiplication Modes -- Integer ModeMode 2: integer mode Multiplier assumes all numbers in a 16.0 format No automatic left-shift necessary Example: A0 = R0.L * R1.L (IS);
0x4000 0x4000R0.L R1.L
A0.LA0.HA0.X
0x00 1000 0000
0x0000overflow
A0.L
overflow
=2=2 =2=2
=2=2
1414 1414
2828
Stage DSP Mars 2011, Lyon
Multiply Operations Example Instructions
Example input operand combinations Accumulator or data register or half-
register can be the destination
A0 = R2.L * R3.L;
R2
R3
XA1
R0 = R2.L * R3.H;
R2
R3
X
R0.H = R2.H * R3.L;
R2R3
XA0
R2
R3
XA0
A1 = R2.H * R3.H;
R0
R0
Default datatype is signed fractional(FU) fractional unsigned(IS) integer signed(IU) integer unsigned
Stage DSP Mars 2011, Lyon
MAC Operations Example Instructions
Example input operand combinations
A0 += R2.L * R3.L;
R2
R3
X +A1
A0-= R2.L * R3.H;
R2
R3
X+
A1
A1 += R2.H * R3.L;
R2R3
X-
A0A0
R2
R3
X+
A0
A1 += R2.H * R3.H;
Stage DSP Mars 2011, Lyon
Multiply and MAC Operations
R4.L = (A0 += R2.L * R3.L);
R2
R3
X+
A0
R4
R4.H = (A1 += R2.L * R3.L);
R2
R3
X+
A1
R4
When Result is Transferred From the Accumulator to a 16-bit Data Register
Stage DSP Mars 2011, Lyon
Multiply and MAC Operations
R1 = (A1 += R2.L * R3.H);
R2R3
X+
A0A1
R0 = (A0 += R2.L * R3.H);R2
R3
X+
A0A0
R0
When A0 is used, the destinationmust be to an even Data Register, e.g. R0, R2, R4, R6
When A1 is used, the destinationmust be to an odd Data Register, e.g. R1, R3, R5, R7
In both cases, the accumulate can be removed or replaced by a subtraction
When Result is Transferred From the Accumulator to a 32-bit Data Register
R1
Stage DSP Mars 2011, Lyon
Dual Multiply Operations ExampleBoth Multipliers can be used in the same operation to double the throughput. The same two 32-bit input registers must be used.
A1 = R2.H * R3.H, A0 = R2.L * R3.L;
R2R3
XX
A1 A0
R2R3
XX
R0 R1
R0 = R2.H * R3.H, R1 = R2.L * R3.L;
32-bit Data Register Destinations must be used in pairs, e.g. R0:R1 or R2:R3 or R4:R5 or R6:R7
Stage DSP Mars 2011, Lyon
Dual MAC Operations ExampleBoth MACs can be used in the same operation to double the MAC throughput. The same two 32-bit input registers must be used (R2 and R3 in this example).
A1 -= R2.H * R3.H, A0 += R2.L * R3.L;
R2
R3
X+
X-
A1
A0
In both cases, the accumulate and subtraction are interchangeable
Stage DSP Mars 2011, Lyon
32 Bit Multiplication Example
This instruction needs 5 CCLK cykles and works only in integer mode
R2 *= R3;
R2
R3
X
Stage DSP Mars 2011, Lyon
Barrel-Shifter (Shifter)
Data Arithmetic Unit
R0
R1
R2
R3
R4
R5
R6
R7
A1
40barrelshifter
A0
40
1616
8 8 8 8
Stage DSP Mars 2011, Lyon
Barrel-Shifter (Shifter)
The shifter performs bitwise shifting for 16-bit, 32-bit or 40-bit inputs and yields 16-bit, 32-bit, or 40-bit outputs.Functions Arithmetic Shift: The Arithmetic Shift instruction shifts a
registered number a specified distance and direction while preserving the sign of the original number. The sign bit value back-fills the left-most bit positions vacated by the arithmetic right shift.
Logical Shift: The Logical Shift instruction logically shifts a registered number a specified distance and direction. Logical shifts discard any bits shifted out of the register and backfill vacated bits with zeros.
Rotate: The Rotate instruction rotates a registered number through the CC bit a specified distance and direction.
Bit Operations Field Extract and Deposit
Stage DSP Mars 2011, Lyon
Arithmetic Shift Example
Immediate Shift MagnitudeR3.L = R0.H >>> 7; /* arithmetic right shift, half
word */R5 = R2 << 24 (S); /* arithmetic left shift */
Registered Shift MagnitudeR3.L = ashift R0.H by R7.L; /* arithmetic shift, half-word
*/A0 = ashift A0 by R7.L; /* arithmetic shift, accumulator */
Stage DSP Mars 2011, Lyon
Logical Shift Example
Pointer shift, fixed magnitudeP3 = P2 >> 1; /* pointer right shift by 1 */P0 = P1 << 2; /* pointer left shift by 2 */
Data shift, immediate shift magnitudeR3.L = R0.L >> 4; /* data right shift, half word register */R3 = R0 << 12; /* data left shift, 32-bit word */A0 = A0 << 7; /* accumulator left shift */
Data shift, registered shift magnitudeR3.H = lshift R0.L by R2.L; /* logical shift, half word
register */A1 = lshift A1 by R7.L; /* logical shift, accumulator */
Stage DSP Mars 2011, Lyon
Rotate Example Instruction
Immediate Rotate MagnitudeR4 = rot R1 by 8; /* rotate left by 8 */A0 = rot A0 by -5; /* rotate right by 5 */
Registered Rotate MagnitudeR4 = rot R1 by R2.L /* rotate by value in R2.L */A1 = rot A1 by R7.L /* rotate by value in R7.L */
Each bit that rotates out of the register is stored in the CC bit.
Stage DSP Mars 2011, Lyon
Bit Operations Example Instructions
Bit Clear: BITCLR(R2, 3);
Bit Set: BITSET(R2, 7);
Bit Toggle: BITTGL(R2, 24);
Bit Test: CC = BITTST (R7, 15);
Bit Test: CC = !BITTST (R3, 0);
Stage DSP Mars 2011, Lyon
Field Extract and Deposit Example
Bit Field ExtractionR7 = extract (R4, R3.L) (z); //zero-extendedR7 = extract (R4, R3.L) (x); //sign-extended
Bit Field DepositR7 = deposit (R4, R3); //zero-extendR7 = deposit (R4, R3) (x); //sign-extended
The World Leader in High-Performance Signal Processing Solutions
Section 4Memory Management / DMA
Stage DSP Mars 2011, Lyon
Address Registers
I0
I1
I2
I3
L0
L1
L2
L3
B0
B1
B2
B3
M0
M1
M2
M3
31 0 31 0 31 0 31 0
P0
P1
P2
P3
P4
P5
31 0
FP
SP
USP
Address Registers
• One set of general purpose pointer registers • P0-P5, SP and FP
• One set of DSP addressing index registers• I0-I3, B0-B3, L0-L3, M0-M3
• All addresses are byte addresses
SP points to supervisor stack in Supervisor mode and user stack in User modeUSP is accessible in supervisor mode only - Allows access to user stack location while in Supervisor mode
Stage DSP Mars 2011, Lyon
Data Types
8-bit bytessigned or unsigned integers
16-bit half-words (little Endian)signed or unsigned integerssigned fractional (1.15)
32-bit words (little Endian)signed or unsigned integerssigned fractional (1.31)
Stage DSP Mars 2011, Lyon
Addressing Methods
Indirect Addressing Index Registers (32-bit and 16-bit accesses) Pointer Registers P0 - P5 (32-bit, 16-bit, and 8-bit accesses) Stack and Frame Pointer Registers (32-bit accesses)
Types of address modify Post-Modify Pre-Modify using Stack Pointer Pre-Modify without update
Circular buffering/modulo addressing Base registers for circular buffer wrap-arounds
Bit Reversal
Stage DSP Mars 2011, Lyon
Post-Modify and Pre-Modify Operations Post-Modify Instructions
32-bit accessesR0 = [P0++]; /* Increments the value of P0 by 4 after the read */R0 = [P1 ++ P2]; /* Increments P1 by P2 after reading 32-bit word
from P1 */ 16-bit accesses
R0 = W[I0--]; /* Decrements the value of I0 by 2 after the read */R0 = W[I2++M2]; /* Increments the value of I2 by M2 after reading 16-
bit word from I2 only */ 8-bit accesses
R0 = B[P0++]; /* Increments the value of P0 by 1 after the read */R2 = B[P4 ++P5]; /* Increments P4 by P5 after reading 8-bit word from
P4 only */Analogous store instructions exist
The only pre-modify instruction with update supported uses the Stack Pointer[ -- SP ] = R0; /* Decrements current value in SP by 4, and then
writes the value in R0 to the updated value in SP */
Stage DSP Mars 2011, Lyon
Stack Instructions
Push Instruction: [--SP] = src_reg; The push instruction stores the contents of a specified register or
registers in the stack The instruction pre-decrements the stack pointer to the next available
location in the stack first Push multiple instruction allows multiple registers to be placed on the
stack with single instruction
Pop Instruction: dest_reg= [SP++]; The pop instruction loads the contents of the stack indexed by the
current stack pointer into a specified register The instruction post-increments the stack pointer to the next occupied
location in the stack before concluding Pop multiple instruction allows multiple registers to be popped from
the stack with single instruction
Stage DSP Mars 2011, Lyon
1M Byte Asynchronous
Memory & Interface
32K BytesInstruction ROM
32K BytesInstruction SRAM
16K Bytes Instruction SRAM/Cache
32K BytesData SRAM/Cache
4K BytesScratchpad SRAM
Exte
rnal
Mem
ory
Inte
rfac
e16
1M Byte Asynchronous
1M Byte Asynchronous
1M Byte Asynchronous
16M Byte – 128M ByteSynchronous
Stage DSP Mars 2011, Lyon
BF531 Memory Levels Internal L1 memory -
Closest to the ProcessorCan be configured as cache or SRAMSmallest Memory Capacity(32KB Instruction, 20KB Data)Single Cycle Access
External L2 memory -Off ChipLargest Memory Capacity (Synchronous and Asynchronous)Slowest access time
Stage DSP Mars 2011, Lyon
Configurable Memory
As processor speeds increase (300Mhz - 1 GHz), it becomes increasingly difficult to have large memories running at full speed.
Two methods can be used to fill the L1 memory - Caching and Dynamic Downloading - Blackfin Supports Both.
Micro-controllers have typically used the caching method, as they have large programs often residing in external memory and determinism is not as important.
DSPs have typically used Dynamic Downloading as they need direct control over which code runs in the fastest memory.
Blackfin allows the programmer to chose one or both methods to optimize system performance.
Stage DSP Mars 2011, Lyon
BF531 Internal SRAM
Processor
L1 Code SRAM/CACHE
(32KB)
L1 Data Bank ASRAM/CACHE
(8KB)
L1 Data Bank BSRAM/CACHE
(8KB)
L1 Scratch SRAM(4KB)
0xFF80 0000
0xFF90 0000
0xFF90 3FFF
0xFF80 3FFF
0xFFA0 0000
0xFFA0 3FFF
0xFFB0 0000
0xFFB0 0FFF
Stage DSP Mars 2011, Lyon
Configurable Memory System
Supports a Cache Memory Model and an SRAM Memory Model Sustained Dual Data Accesses for DSP Applications Supports accesses of 8, 16, 32 bit Data Separate Multi-ported L1 Instruction and Data Memories
Processor Core
L1 InstructionSRAM & Cache
DMA
L2Instruction
& DataSRAM
L1 Data SRAM & Cache
Scratchpad SRAM
Stage DSP Mars 2011, Lyon
Why Do Blackfin DSPs Have Cache?
To allow users to take advantage of single cycle memory without having to specifically move instructions and or data "manually" On-chip and off-chip L2 memory can be used to hold large programs
and data sets The paths to and from L1 memory are optimized to perform with cache
enabled Automatically optimizes code that reuses recently used or nearby
data
External L2 Memory:Largest capacityHighest latency
Internal L2 Memory:Larger capacityHigher latency
Internal L1 Memory:Smallest capacity
Single cycle access
Stage DSP Mars 2011, Lyon
L1 Instruction Memory
Instruction
DMA
4KBsub-bank
Fill
4KBsub-bank
4KBsub-bank
4KBsub-bank
16 KB cache 4-way set associative with
arbitrary locking of ways LRU replacement No DMA access when
configured as cache
16 KB SRAM Four 4KB single-ported
sub-banks Allows simultaneous
DMA access to different banks
Stage DSP Mars 2011, Lyon
L1 Data Memory
Two 16KB banks
Each bank can be cache or SRAM
4KB scratch SRAM (stack can be located here for fast context switching)
Data 1
Data 0
DMA A
16KBBank B
16KBBank A
4KBSRAM
Fill A
DMA B
Fill B
Stage DSP Mars 2011, Lyon
16KB Super-bank Data Architecture
Four 4KB single-ported sub-banks
Multi-ported data access when using different sub-banks
Data 1
Data 0
DMA
4KBsub-bank
Fill
4KBsub-bank
4KBsub-bank
4KBsub-bank
When Used as Cache Each bank is 2-way
set-associative No DMA access
When Used as SRAM Dual Data Access DMA Access
Stage DSP Mars 2011, Lyon
Cache Hits and Misses
A cache hit occurs when the address for an instruction fetch request from the core matches a valid entry in the cache.
When a cache hit occurs, the target 64-bit instruction word is sent to the instruction alignment unit where it is stored in one of two 64-bit instruction buffers.
When a cache miss occurs, the instruction memory unit generates a cache line-fill access to retrieve the missing cache line from memory internal or external L2 memory to the core.
Stage DSP Mars 2011, Lyon
Cacheability Protection Lookaside Buffers (CPLBs)
Divide the entire Blackfin memory map into regions that have cacheability and protection properties.
16 Pages in Instruction Memory plus 16 Pages in Data memoryPage sizes: 1KB, 4KB, 1MB, 4MB
Each CPLB has 2 associated registers: 32bit Start Address: ICPLB_ADDRx,
DCPLB_ADDRx Cache/Protection Properties:
ICPLB_DATAx, DCPLB_DATAx
Memory
CPLB0
CPLB1
CPLB2
ICPLB_ADDR0
ICPLB_ADDR1
ICPLB_ADDR2
ICPLB_DATA1
ICPLB_DATA0
ICPLB_DATA2
0x0000
0x1000
0x1400
Stage DSP Mars 2011, Lyon
Using CPLBs
Cache enabled: CPLB must be used to
define cacheability properties
Cache disabled: CPLBs can be used to
protect pages of memory
For both cases, a valid CPLB must exist before an access to a specific memory location is attempted. Otherwise, an exception will be generated.
User and Supervisor mode protection is available without using CPLBs.
Stage DSP Mars 2011, Lyon
Example Protection Operation
Set up CPLBs to define regions and properties: Including CPLBs to protect system registers, MMRs and assign
exception handling memory. Disable all memory other than the desired memory space. Execute Code.
If code tries to access memory that has been disabled or protected, then a memory protection violation occurs as an exception.
Stage DSP Mars 2011, Lyon
Example CPLB Setup
Async: Non-cacheableOne 4MB page
SDRAM: CacheableEight 4MB pages
L1 Instruction: Non-cacheable1MB page
L2 Memory: Cacheable1MB page
PCI: Non-cacheableTwo 4MB pages
Async: Non-cacheableOne 4 MB page
SDRAM: CacheableEight 4MB pages
MMR’s: Non-cacheable4MB page
L1 Data: Non-cacheableOne 4MB page
Scratchpad: Non-cacheable4KB page
L2 Memory: Cacheable1MB page
Instruction CPLB setup
Data CPLB setup
Async: CacheableTwo 4MB pages
Async: CacheableOne 4 MB page
Reset Area Ex: 0xEF0000004MB page
Stage DSP Mars 2011, Lyon
Direct Memory Access (DMA)
• Transfers Data within memory or between memory and peripheral• Core Independent 8-bit, 16-bit, or 32-bit transfers• Support for separate Data and Error Interrupts• Support for two-dimensional (2D) DMA
• Interrupt at end of row or buffer• Descriptor Based DMA
• Set of registers stored within memory• Flexible Descriptor Structure• Can chain together multiple DMA sequences
• Register Based DMA• Directly Program DMA control registers• On completion, the original values can automatically be updated for
continuous transfer
Stage DSP Mars 2011, Lyon
12 Prioritized DMA ChannelsDefault DMA channel priorities exist (PPI is highest)Can reprogram these priorities based on application
DMA Channel Default Peripheral Mapping Reprogrammable0 (highest priority) PPI YES1 SPORT0 RX YES
2 SPORT0 TX YES3 SPORT1 RX YES
4 SPORT1 TX YES5 SPI YES
6 UART RX YES
7 UART TX YES8 MemDMA Stream 0 TX NO9 MemDMA Stream 0 RX NO10 MemDMA Stream 1 TX NO
11 (lowest priority) MemDMA Stream 1 RX NO
Stage DSP Mars 2011, Lyon
ADSP-BF531/2/3 DMA Performance Increases
• DMA controller write-back to memory descriptor not required• Reduces core intervention
• Autobuffer available in all channels• Increases performance and eliminates core interaction
• Descriptor load time GREATLY reduced • Memory DMA throughput increased
• Write from source to destination as soon as it is read• Additional memory DMA channels
• Reduces core intervention• Prioritized DMA channels
• Default DMA channel priorities will exist (PPI is highest)• Customer will be able to re-program these priorities based on
application (similar to way interrupts can be re-programmed)
Stage DSP Mars 2011, Lyon
ADSP-BF531/2/3 DMA Engine
• Two descriptor-based methods • “Linked-list” model (similar to ADSP-BF535)• “Array of descriptors” model (Descriptors follow one another
until array ends)• Register-based
• Autobuffer (similar to ADSP-BF535)• Interrupts
• Separate levels for DMA completion and DMA error• Allows separate ISRs for each
Stage DSP Mars 2011, Lyon
ADSP-BF531/2/3 Descriptor Enhancements
• Flexible size allows up to 9 elements within each descriptor
• Customer can use only what is needed• In addition to the ADSP-BF535 source, destination,
and # of transfers model …• The ADSP-BF531/2/3 DMA controller supports:
• One-dimensional strides (e.g. skip every n samples)• Two-dimensional transfers (Xcount, Xmodify, Ycount,
Ymodify)
Stage DSP Mars 2011, Lyon
Scalable Descriptors of the ADSP-BF531/2/3
• DMA descriptors are variable-sized data structures whose contents are loaded into DMA registers.
• The sequence of registers in the descriptor is fixed, while the length of the descriptor is programmable.
• The user may select a descriptor size from one register (Base Address Low) to nine registers (all the DMA parameters.)
• The three variations depend on what kind of Next Descriptor Pointer is included:
• None (called "Descriptor Array mode")• The low 16 bits of the Next Descriptor pointer (Descriptor List,
Small Model), • All 32 bits of the Next Descriptor pointer (Descriptor List, Large
Model).
Stage DSP Mars 2011, Lyon
ADSP-BF531/2/3 Descriptor Definition
• NextDscPtrLow: Link Pointer to next descriptor.• NextDscPtrHigh • BaseAddrLow: Base address of current buffer. • BaseAddrHigh: • DMA Configuration Register • Xcount: Inner loop count. • Xmodify: Inner loop address increment, in bytes. (This is a signed,
2's complement value.)• Ycount: Outer loop count (2D only). • Ymodify: Outer loop address increment, in bytes. (This is a
signed, 2's complement value.)
Stage DSP Mars 2011, Lyon
Examples of ADSP-BF531/2/3 “Flex DMA”
A single, linear buffer that stops upon completion A linear buffer with stride greater than 1 A circular, auto-refreshing buffer that interrupts on each full buffer A similar buffer that interrupts on fractional buffers (e.g. 1/2, 1/4) 1-D DMA using a set of identical ping-pong buffers defined by a linked
ring of 3-word descriptors, each containing { link pointer, 32-bit address }.
1-D DMA using a linked list of 5-word descriptors containing { link pointer, 32-bit address, length, config }. (BF535 style)
2-D DMA using an array of 1-word descriptors specifying only the base DMA address within a common data page.
2-D DMA using a linked list of 9-word descriptors specifying everything.
Stage DSP Mars 2011, Lyon
Operation of 2-D Direct Memory Access
A E F GC DBPONMLKJIH
LKJIH
FG
EDCBA
....
Data Capture & Storage to Linear L2 Memory
2-D DMA to L1 Memory A, B, I, J
ProgrammableX &Y Count & Modify Values
2-D DMA significantly decreases S/W overhead in video applications!
The World Leader in High-Performance Signal Processing Solutions
Section 5Program Sequencer / Interrupt Manager
Stage DSP Mars 2011, Lyon
Program Sequencer Features
The Program Sequencer controls all program flow:
Maintains Loops, Subroutines, Jumps, Idle, Interrupts and Exceptions
Contains an 8-stage instruction pipeline
Includes Zero-Overhead Loop Registers
Stage DSP Mars 2011, Lyon
Sequencer-Related RegistersRegister Name Description
RETXRETNRETIRETERETS
Retur Address Register:Exception ReturnNMI ReturnInterrupt ReturnEmulation ReturnSubroutine Return
LC0, LC1LT0, LT1LB0, LB1
Zero-Overhead Loop Registers:Loop CounterLoop TopsLoop Bottoms
FP, SP Frame Pointer Stack PointerSYSCFG System Configuration RegisterCYCLES, CYCLES2 Cycle Counters
Stage DSP Mars 2011, Lyon
Program Flow InstructionsProgram Flow Instruction Instruction Function
JUMP Unconditional BranchIF CC JUMPIF !CC JUMP
Conditional Branch
CALL Subroutine callRTS,RTI,RTX,RTN,RTE Return from Flow interrupterLSETUP Set up Hardware Loop Jump (P5); /* indirect jump instruction */ Jump (PC + P3); /* indirect jump with offset (PC-relative) */ Call (P5); /* RETS register is loaded with address
of instruction after call */ Call (PC + P3); /* RETS register is loaded with address
of instruction after call */ IF CC Jump <label>; /* jump on condition cc=1 */ Call <label>; /* OK within 24-bit offset from PC */
Stage DSP Mars 2011, Lyon
Conditional Execution – CC Bit
Condition Code Flag (CC bit) resolves branch direction
5 ways to access CC to control program flow Conditional branch is resolved by value of CC Dreg value can be copied to CC, and vice-versa Status flag can be copied into CC, and vice-versa
e.g., CC = AV1; CC can be set to result of a Preg comparison CC can be set to result of a Dreg comparison
e.g., CC = R3==R2;
Stage DSP Mars 2011, Lyon
SSYNC instruction synchronizes " the System", executing everything in the processor pipeline, and completing all pending reads and writes from peripherals. Until SSYNC completes, no further instructions can
enter the pipeline.
CSYNC instruction synchronizes "the Core", executing everything in the processor pipeline CSYNC is typically used after Core MMR writes to
prevent imprecise behavior.
SSYNC and CSYNC instructions
Stage DSP Mars 2011, Lyon
Multi-Cycle Instructions Multi-cycle instructions will not execute faster
through rescheduling. ! See EE-171 Appnote for a complete list of stalls and
multicycle instructions
A 32-bit multiply operation is availabler0 *= r1; // 5 cycles to execute
The Push Multiple and Pop Multiple instructions take n cycles to complete, where n is the number of registers pushed or popped, assuming L1 memory.
[--SP] = (R7:0, P5:0); // 14 cycles to execute
Stage DSP Mars 2011, Lyon
10-stage super-pipeline
The sequencer ensures that the pipeline is fully interlocked and that all the data hazards are hidden from the programmer
If executing an instruction that requires data to be fetched, the pipeline will stall until that data is available
BF533 Execution Pipeline
Stage DSP Mars 2011, Lyon
Instruction PipelinePipeline Stage DescriptionInstruction Fetch 1 (IF1 Start instruction memory accessInstruction Fetch 2 (IF2) Finish L1 instruction memory access
and align instructionInstruction decode (DEC)
Frame Pointer Stack Pointer
Address Calculation (AC)
Calculate data addresses and branch target address
Execute 1 (EX1) Read data and start access of data memory
Execute 2 (EX2) Finish access of data memory and start execution of dual cycle instructions
Execute 3 (EX3) Execute single cycle instructionWrite Back (WB)/Commit
Write states to Data and Pointer register files and process events
Stage DSP Mars 2011, Lyon
Pipeline Events• Stall
− A latency stall condition can occur when two instructions requireextra cycles to complete, because they are close to each other in the assembly program. Other stalls can be memory- or loop-related. Stalls can be diagnosed with the Pipeline Viewer, and can be remedied with some rescheduling.
• Kill− Instructions after a branch are invalidated in the pipeline,
because they will have entered the pipeline before the actual branch instruction gets serviced
• Multicycle instruction− Some instructions take more than one cycle to complete. These
extra cycles cannot be avoided without removing the instruction that caused them.
! See EE-171 Appnote for a complete list of stalls and multicycle instructions.
Stage DSP Mars 2011, Lyon
Some Examples of Stall Conditions
Use of a Preg loaded in the previous instruction causes a 3-cycle stallP0=[P1++];R0=[P0];
Use of a Preg which was transferred from Dreg in the previous instruction causes a 3-cycle stall.P0=R0;P1=P0+P2;
Use of a DAG-reg which was transferred from a Dreg in the previous instruction causes 3-cycle stallI0=R0;R1=[I0++];
Back-to-back multiplication where the result of first multiplication is used as an operand of the second multiplication causes 1-cycle stallR0 = A1+=R1.L*R2.L;R1 = A1+=R0.L*R2.L;
Stage DSP Mars 2011, Lyon
Avoiding Pipeline Stalls
Most common numeric operations have no instruction latency
Application note EE-171 available on avoiding stalls
Gives instruction combinations with associated stall info
VDSP++ 3.0 Pipeline Viewer highlights Stall, Kill conditions
Stage DSP Mars 2011, Lyon
Change of Instruction Flow
When a change of flow happens, a new address is presented to the Instruction Memory Unit
There will be a minimum of three cycles before the new instructions appear in the decoder (except when utilizing the hardware loop buffers)
When an instruction in a given pipeline stage is killed, all the instructions in stages above it will also be killed
Stage DSP Mars 2011, Lyon
Hardware Loop Buffers
• The BF53x DSP provides two sets of dedicated registers tosupport two zero-overhead nested loops • One way to load these registers is by using the Loop Setup (LSETUP) instruction;
• If the desired loop size exceeds the largest LSETUP size in the table above, LT[1:0], LB[1:0], LC[1:0] can be set manually
• If more than 2 nested loops are required, the stack must be used
Stage DSP Mars 2011, Lyon
Operating Modes
User mode Causes exceptions when protected resources are accessed. May be used for algorithm/application code
Supervisor mode has unprotected access to all resources. May be used for O/S kernel, device drivers, debuggers, ISRs
Emulator (or Debug) mode has supervisor abilities and is accessible via JTAG
Operating Modes provide a feature to implement RTOS architectures and Multitasking schemes.Smaller applications may simply run in Supervisor mode all the time.
Stage DSP Mars 2011, Lyon
Operating Modes
Supervisor mode
Emulator/Debug mode
User mode
System, Code and Event Handlers
Application Code
Sleep
RESET
Supervisor
User
Emulation
RTI, RTX
Interrupt or E
xception
RTE
Emulation Event
Emulation Event
RTE
Emulation Event
Reset inactive
Interrupt
Stage DSP Mars 2011, Lyon
Events (Interrupts / Exceptions)
Emulation (via SW or external pin) Reset (via SW or external pin) Non-Maskable Interrupt (NMI) - for events that require
immediate processor attention (via SW or external pin) Exception Interrupts
Global Interrupt Enable Hardware Error Core Timer 9 General-Purpose Interrupts for servicing peripherals
The Event Controller manages 5 types of Events:
Stage DSP Mars 2011, Lyon
Event Processing Flow
EM
U
NM
IR
ST
IMASK
EVS
W
Awake from IDLE!
RTCUSBPCISPORT0 RXSPORT0 TXSPORT1 RXSPORT1 TXSPI0SPI1UART0 RXUART0 TXUART1 RXUART1 TXTIMER0TIMER1TIMER2PF APF BMEMDMAWATCHDOG
IVHW
IVTMR
IVG7
IVG8
IVG9
IVG10
IVG11
IVG12
IVG13
IVG14
IVG15
SIC_IARx
SIC
_IMA
SK
SIC_IW
R
SIC
_ISR
IVG6
IVG5
IVG3
IVG2
IVG1
IVG0
"Event A"
0xFF80 0000
FF80 0000
FF80 0002
FF80 0004
FF80 0008
FF80 000C
FF80 0010
FF80 0012
FF80 0014
FF80 0016
[- -sp] = P4;
[- -sp] = R4;
R4.l = 0x1010;
P4.H = hi(FLAG_C);
P4.L = lo(FLAG_C);
W[P4] = R4;
R4 = [sp++];
P4 = [sp++];
RTI;
IPEND
CO
RE PIPE LIN
E
11
EVT
Keep "ILAT" Cleared"0"
Clear "IPEND"
ILAT
1
"Clear SIC_ISR"
R0
CLI R0; // "0000"
STI R0; // IMASK = R0;
Stage DSP Mars 2011, Lyon
Event Processing Flow1. SIC_ISR logs Interrupt A’s request and tracks system interrupts that
are asserted but not yet serviced2. SIC_IWR checks if it should wake the core from an idled state based
on this interrupt request3. SIC_IMASK masks off or enables interrupts from peripherals at the
system level. If Interrupt A isn’t masked, request proceeds to Step 44. SIC_IARx MMRs determine core priority of Interrupt A5. ILAT adds Interrupt A to its log of interrupts latched by the core but not
yet being serviced6. IMASK masks off or enables events of different core priorities. If IVGx
event corresponding to Interrupt A isn’t masked, request proceeds to Step 7
7. Event Vector Table (EVT) accessed to look up appropriate vector for Interrupt A’s interrupt service routine (ISR)
8. When event vector for Interrupt A has entered core pipeline, the appropriate IPEND bit is set, which clears the respective ILAT bit.
9. When Interrupt A’s ISR is finished, RTI instruction clears the appropriate IPEND bit. But the relevant SIC_ISR bit isn’t cleared unless the ISR clears the mechanism that generated Interrupt A.
Stage DSP Mars 2011, Lyon
Nested Interrupts The state of the processor needs to be saved onto the
Stack:ISR:[--SP] = RETI; // Interrupts enabled[--SP] = ASTAT;[--SP] = FP;[--SP] = (Rx,Ax,Px,Ix);……(Rx,Ax,Px,Ix) = [SP++];FP = [SP++];ASTAT = [SP++];RETI = [SP++]; // Interrupts disabledSSYNC; // Wait until RETI load takes effectRTI; // Interrupts enabled
The World Leader in High-Performance Signal Processing Solutions
Section 6I/O Embedded
Stage DSP Mars 2011, Lyon
Serial Ports
Primary TXSecondary TX
Tx ClockTx Sync
Primary RxSecondary RX
Rx ClockRx Sync
Two Dual-Channel Synchronous Ports supporting 8 Stereo I2S Channels
Supports 3-32bit data widths
100MHz operation from external clock
SCLK/2 operation from internal clock ( up to 66MHz )
Stage DSP Mars 2011, Lyon
ADSP-21533 SPORTs
Two synchronous serial ports Fully independent receive and transmit - double buffered Internal or externally generated serial clocks and frame
syncs Programmable internal/external frame syncs Built in hardware for u-law & A-law companding Support for multi-channel TDM interfaces Dedicated DMA engine capable of chained transfers Generates optional interrupts Operates up to 1/2 System bus clock rate (SCLK)
Stage DSP Mars 2011, Lyon
ADSP-21533 Serial PORTs Features
Interrupt-driven, single-word transfers to/from on-chip memory controlled by ADSP-21535 core
Block word transfers to/from memory controlled by DMA controller
Several modes of operation Programmable serial word length, 3 to 16-bits Either MSB or LSB first Early Frame Sync Late Frame Sync No Frame Sync 128 time slot multi-channel capability for TDM interfaces (for example, a T1 interface)
Stage DSP Mars 2011, Lyon
Serial Port - Block Diagram
Stage DSP Mars 2011, Lyon 107
Serial Clock Divisor
Used For Internally generated Serial Clock
Frame Sync Divisor
Used for internally generated Frame Sync
Number of Serial CLK cycles between FS assertions (= xFSDIV + 1)
Serial Clock
Stage DSP Mars 2011, Lyon 108
Serial Port Timing CharacteristicsEarly vs. Late Framing
Early framing: frame sync precedes data by one cycle
Late framing: frame sync checked on first bit only
xCLK
Late Frame Sync
Early Frame Sync
DataB3 B2 B1 B0 ...
Stage DSP Mars 2011, Lyon
Multichannel Operation
D3 D0
RCLK
DT=DR D2 D1 D3 D0D2 D1 D3 D0D2 D1
RFS
TFSExample: Receive on Slot 0 and 2, Transmit on Slot 1
Slot 0 Slot 1 Slot 2
TDM method where serial data is sent or received on different channels sharing the same serial bus
Up to 128 channels can be enabled for transmit or receive RFS signals start of frame TFS is used as Transmit Data Valid (TDV) for external logic. Active
only during transmit channels
Stage DSP Mars 2011, Lyon
Example for Multichannel Connection
RCLK TCLK RFS
DT DR
RFS
DT DR
RFS
DT DR
ADSP-21535
MasterADSP-21535
Slave1
ADSP-21535
Slave2TFS TFS TFS
RCLK TCLK RCLK TCLK
TDV TDVTDV
Stage DSP Mars 2011, Lyon
BF53x SPI Features 1 SPI-Compatible Ports 4 Pin Interface (MOSI, MISO, ~SPISS, SCK) Master and Slave Mode Operation
Supports Multimaster Environments Can Use 16 GP Flag Pins As Slave-Select Lines
2 Slave Select Input Pins (One Per SPI Port) 14 Slave Select Output Pins (Seven Per SPI Port)
Gated SPI Clock (Only Active During Transfers) DMA Support
Each Port Has One DMA Channel (Input or Output) Programmable Baud Rate Programmable Clock Polarity and Phase Programmable Serial Word Length (8 or 16 Bits)
Stage DSP Mars 2011, Lyon
A Closer Look At How The Data Is MovedShift Registers Simultaneously Shift Data In And Out
Stage DSP Mars 2011, Lyon
Universal Asynchronous Receiver/Transmitter Interface (UART)
UART options 5-8 data bits 1, 1½ or 2 stop bits None, even or odd parity Baud rate = SCLK/
(16*DIVISOR)
BF 53x
Stage DSP Mars 2011, Lyon
UART Features Full Duplex, Industry Standard 16450 Compatible
Asynchronous Serial Communication (7-12 Bits / Word) 1 Low-Going Start Bit 5 - 8 Data Bits None or 1 Parity Bit (Even or Odd Parity Supported) 1 or 2 Stop Bits
Data Is ALWAYS Transmitted or Received LSB First
DMA Capable Supports Separate TX and RX DMA Master Channels Used In Either Programmed I/O Mode or In DMA Mode
Data Is Double Buffered On Both Ends Transmit Holding (THR) & Receive Buffer (RBR) Registers Transmit Shift (TSR) & Receive Shift (RSR) Registers These Pairs Allow For Synchronization Delay
Stage DSP Mars 2011, Lyon
BF-533 Core Timer
TSCALE8 bit
TCOUNT32 bit
TPERIOD32 bit
CCLK IRQ 6
Used to generate interrupts at multiples of CCLK rate
32-bit tick timer Dedicated Interrupt Priority 6 (fixed)
Autoreload is optional
Interrupt rate = CCLK x (TSCALE + 1) x TPERIOD
Stage DSP Mars 2011, Lyon
BF533: Watchdog Timer
The watchdog timer can be used to improve system reliability by generating an event to the Blackfin core if the timer expires before being updated by software.
The event generated can be programmed to be: a reset (software reset takes place) a nonmaskable interrupt a general purpose interrupt
The Watchdog timer is clocked by the system clock (SCLK).
Stage DSP Mars 2011, Lyon
BF533 Watchdog Timer
WDOG_STAT32 bit (readable)SCLK
IRQ
Needs to be serviced by software periodically
Unique pattern of 4 bits required to disable watchdog timer
No auto-reload. Sticky W1C bit indicates expiration
Halted in emulator mode
Watchdog Interval = WDOG_CNT / SCLK
WDOG_CNT32 bit
Any write to WDOG_STATtriggers reload from WDOG_CNT
WDOG_CTL16 bit
NMIReset
Stage DSP Mars 2011, Lyon
BF533 RTC
RTC Interrupt may be issued by 6 different events
Dedicated RTC interrupt registers RTC_ITCL and RTC_ISTAT
Interval Interrupts (once per second, minute or 24 hours)
Alarm function supported by RTC_ALARM register
Alarm IRQ issued when hours, minutes and seconds match
Day IRQ issued when also 8-bit day matches
Additional Stopwatch functionality
Additional 8-bit counter decrements every minute
Interrupt when reaches zero, no autoreload
Stage DSP Mars 2011, Lyon
Three Peripheral Timers of the BF53X• Three identical timers• Timer Modes
− Pulse Width Modulation (PWM_OUT)− Width and Period Capture (WDTH_CAP)− External Event Counter (EXT_CLK)
• Dedicated Pins TMR2, TMR1, TMR0• One programmable interrupt each • Three 32-bit registers each (2 x 16 Bit)
− Width − Period − Counter (read-only)
• One 16-bit Configuration Register each• One 16-bit Status Register each (sticky)
− IRQ, Timer Enable, Overflow
Stage DSP Mars 2011, Lyon
Programmable Flags
• Features:− 16 bi-directional programmable flags− Write-1-to-set Flag Set register− Write-1-to-clear Flag Clear register− Two independent interrupt channels− Level or edge sensitive trigger of input source− Rising or falling edge trigger of input source− Single edge or both edges trigger of input source
Stage DSP Mars 2011, Lyon
Flag Pin Functionality16 bi-directional general-purpose I/O pins availableEach can be configured as an output, input, or an interrupt pin
ADSP-21535
PF0
PF15
PF7
PF8
SPISS0 /SPISS1 /
SPI0SEL1 /
SPI1SEL3 /
MSEL0 /
MSEL6 /DF /
SPI0SEL4 /
SPI1SEL7 /
PF7:0 operate as MSEL6:0 and DF for PLL at reset
PF15:0 also function as: 2 Slave Select input pins SPISSx (Slave Mode) and 14 Slave
Select Output Pins SPIxSELy (Master Mode)
Two InterruptRequests (FLAGA/FLAGB)
Stage DSP Mars 2011, Lyon
Parallel Peripheral Interface
VideoPort (PPI)with 2-D
DMA
CLOCKSYNCS
Appliances
Supports glueless ITU-R 656 Video Converter Interface directly at 27MHz
Provides general purpose parallel ADC / DAC interface at up to 65MSPS
Can be alternately configured as 16 general purpose Flag In/Out pins
Integral 2-D DMA Capability Significant Reduction in Software
Overhead for Video Applications.
External Clockup to 65MHz
Stage DSP Mars 2011, Lyon
Bi-directional interface 4 dedicated pins (PPI3 – PPI0) 12 multiplexed PF pins (PF4 – PF15) PPI_CLK (always sourced externally) operating up to SCLK/2
Supports bit-parallel ITU-R 656 recommendation Supports a flexible General Purpose mode
Up to 16 data lines, 1 clock, 3 Frame Syncs Frame Syncs 1&2 muxed with Timer 1&2 pins Frame Sync 3 muxed with PF3
Parallel Peripheral Interface (PPI)
Stage DSP Mars 2011, Lyon
Examples of General-Purpose I/O Modes
Output, 1 Sync
Input, 3 Sync
Stage DSP Mars 2011, Lyon
Frame Capture General-Purpose Mode
Stage DSP Mars 2011, Lyon
General-Purpose PPI Modes
GP MODE PPI_FS1 PPI_FS2 PPI_FS3 DATA
Input, 1 Sync Input --- --- Input
Infinite Capture Input --- --- Input
Input, 3 Syncs Input Input Input Input
Frame Capture Output Output --- Input
Output, 1 Sync Output --- --- Output
Output, 3 Syncs Output Output Output Output
Stage DSP Mars 2011, Lyon
• Can optionally ignore Field 2 (in Active Field Mode) • 4 control signal polarity choices (H,V,CLK)• Can skip even or odd data elements• Supports 16-bit data packing mode• Supports 32-bit DMA mode (2 bursts of 16-bit DMA)• Works hand-in-hand with 2D DMA Engine
• Can interrupt at end of row, frame, or partial-frame transfer
Additional PPI Features
Stage DSP Mars 2011, Lyon
8-16 bits data
CLK
HSYNCVSYNC
PPI_FS3
Video Source
PPI
PPI_FS1PPI_FS2FIELD
PPIx
PPI_CLK
PPI GP Input Modes
8-16 bits data
CLK
HSYNC
A/D Converter
PPI_FS1
PPIx
PPI_CLK
Stage DSP Mars 2011, Lyon
PPI GP Output Modes
D/A Converter
8-16 bits dataCLK
HSYNCVSYNC
Video Display
FIELDPPI_FS3
PPIPPI_FS1PPI_FS2
PPIx
PPI_CLK
8-16 bits data
CLK
HSYNCPPI_FS1
PPIx
PPI_CLK
Stage DSP Mars 2011, Lyon
Data and Control
PPIDMA L1
MemorySPORT
Filtering
DMA
PPIDMA
SDRAML1
Memory
DMA
Possible Data Transfer Scenarios
SPORT
DMA
External Processor
ADC
ADC
Stage DSP Mars 2011, Lyon
PPI
PPI_FS1PPI_FS2
PPIx
PPI_CLK
Connection to ADC/DAC (Bidirectional interface)
SPIMISOMOSI
SCLK
ADSP-BF533
RX/TX PATH
TX_ENRX_EN
ADIOx
RX/TX CLK
SPI
SDATASDO
SCK
AD9975
External or AD9975 Internal Clock
Serial Link for Configuration (Modi, Powerdown …)
Frames controlled by Timer interrupts
Need Software Switching between Frame Capture and Output Modi
Stage DSP Mars 2011, Lyon
ADSP-21533 Booting
Stage DSP Mars 2011, Lyon
Booting MethodsBMODE Pins [2:0] Description
000 Execute from 16-bit External ASYNC Bank0 memory (Bypass Boot ROM)
001 Boot from 8/16-bit Prom/Flash
010 Boot from an 8-bit Addressable SPI0 Device
011 Boot from a 16-bit Addressable SPI0 Device
100-111 Reserved
Stage DSP Mars 2011, Lyon
ADSP-21533RESET
Jump To 16-Bit ExternalMemory For Execution
Jump To On-Chip Boot ROM
BMODE = 000 BMODE = 001, 010, 011
Behavior Upon RESET
Stage DSP Mars 2011, Lyon
On-chip boot ROM loads N bytes into the start of L2 memory and, once completed, jumps to the start of L2 to begin execution
Boot Sequence
On-Chip Boot ROM
0xEF00 0000
L2 Memory(0xF000 0000)
4 Byte Header (N)
BlackFin
Application CodeOr
2nd Stage LoaderApplication Code
Or2nd Stage Loader
Prom/Flash Or SPI Device
0x0
NBytes
Stage DSP Mars 2011, Lyon
Creating a Loader File
Using the Loader Property Page under Project Options
Stage DSP Mars 2011, Lyon
Dynamic Power Management
Stage DSP Mars 2011, Lyon
Dynamic Power Management - Variable Frequency
PLL1x - 64x
÷ 1, 2, 4, 8
÷ 1 : 15
CLKIN
CCLK
SCLK
Dynamically ModificationOn the fly
Dynamic Modification Requires PLL Sequencing
CCLK
SCLK
SCLK =< CCLK
SCLK =< 133MHzPLL
Stage DSP Mars 2011, Lyon
Power Management--Variable Voltage
+-
VREF
VDDINT
VDDCTRL
VDDEXT
DSPINTERNALCIRCUIT
EXTERNALCOMPONENTS
2.25V -> 3.6V
TANTALUMOR
ELECTROLYTIC
CERAM IC
10 µF .1µF
On-chip Voltage Regulation
Generates core voltage from external 2.25V to 3.6V input
Core voltage programmable in 50mV increments
Optional bypass
Minimal external components required
Ind10µH
Uz=4V
Stage DSP Mars 2011, Lyon
Power Management Options
Low Active Power Flexible power management with selectable
power-down for each peripheral sectionDynamic Power Management allows dynamic
modification of both frequency and voltage(Power management companion chip: ADP3053).
Low Standby Power4 Power modesReal Time Clock with alarm and wakeup features
Stage DSP Mars 2011, Lyon
Blackfin DSPs
Optimize Power Consumption
DSP Operation
PLL Settling
RegulatorTransition
1.5V, 300MHz
1.0V, 100MHz
1.3V, 225MHz
RegulatorTransition
PLL SettlingDSP Operation
DSP Operation
PowerConsumption
Vdd
t
Just vary the frequency
Vary the voltage and frequency
Dynamic Power Management
Stage DSP Mars 2011, Lyon
Power Management States
Mode Relative Power Savings
Notes
Full On Min Max performance
Active Low Full core operation at CLKIN/2.System DMA to L1 supported.PLL is bypassed.
Sleep High Core idle. All clocks to core disabled. SCLK enabled.
Deep Sleep
Max Core idle. Async. peripherals enabled for limited operation. Exit only via HW reset or RTC interrupt.
Stage DSP Mars 2011, Lyon
Clock States in Different Power Modes
Mode PLL PLL Bypassed?
Core Clock (CCLK)
System Clock (SCLK)
Full On Enabled No Enabled Enabled
Active Enabled or Disabled
Yes Enabled Enabled
Sleep Enabled Yes or No(depending on previous mode)
Disabled Enabled
Deep Sleep Disabled Yes Disabled Disabled
Stage DSP Mars 2011, Lyon
System Clocking
2 internal clock domains: CCLK and SCLK
CCLK is derived from PLL or directly from CLKIN pin
SCLK is divided down from CCLK & must not exceed 133 MHz
CLKIN can be driven from external oscillator (not a crystal)
Programmable PLL supports 1x to 31x frequency multiplication, enabling high-speed operation with low-frequency clock inputs
Stage DSP Mars 2011, Lyon
Programming PLL Transitions1) Program the wakeup event to bring the core out of
idle state
1) Program the new PLL parameters
1) Disable interrupts
1) Flush the pipeline, and place the core into an idle state
1) Reinstate interrupts
1) Have now transitioned into the new PLL mode
Stage DSP Mars 2011, Lyon
IDLE state After executing IDLE and SSYNC instructions, the DSP core
stops executing instructions, retains the contents of pipeline and waits for an interrupt or wakeup. PLL, CCLK and SCLK continue running
2 ways to leave an IDLE state DSP services an interrupt. DSP will return to the instruction
after the IDLE;SSYNC; after executing the RTI instruction. A peripheral wakes DSP up (based on SIC_IWR settings), but no
interrupt occurs. DSP returns to instruction that follows IDLE;SSYNC;
Note that the IDLE instruction relates to Power-down modes too
The World Leader in High-Performance Signal Processing Solutions
Section 7Software Environment: VDSP++ 5v0
Stage DSP Mars 2011, Lyon
Assembler Parses Input Source Code And Creates Object Files
Raw Assembly Input (.ASM File) Compiler-Generated Assembly Input (.S File) Object File Output (.DOJ File)
Preprocessor Substitutes Code/Data For Macros/Constants Preprocessor Directives
#define - define a macro or constant #undef - undo macro definition #if, #endif - conditional assembly #else,#elif - multiple conditional blocks #ifdef, #if defined - condition based on macro definition #ifndef - condition based on macro not defined #include - include source code from another file
Stage DSP Mars 2011, Lyon
Assembler
Preprocessor
Object File .doj (binary)
Listing File.lst (ASCII)
Assembler
Assembler file.asm
Data File.dat
Header file.h
Intermediate.is
Stage DSP Mars 2011, Lyon
#include <def21535.h>
#include “myheader.h”
#ifdef mydef
R0 += 1;
#else
R0 += -1;
#endif
Assembler Property PageIf you want to
get the intermediate .is file, select
here
Depending on definitions, you
can select different codes
If chosen, a listing file
will be created
If chosen, you are able to
debug in the source code mydef
Stage DSP Mars 2011, Lyon
Directives for Assembler
Directive Function
.ALIGN Specifies a byte alignment requirement
.SECTION Mark start and end of section
.VAR Declare variable or buffer (32 bit)
.BYTE Declare 8 bit data
.BYTE2 Declare 16 bit data
.EXTERN Reference to a symbol declared in another file
.GLOBAL Make a symbol visible to other source files
Stage DSP Mars 2011, Lyon
Sections in Assembler Files
.SECTION data_a;.BYTE data_array[N];
.SECTION data_b;.VAR coeff_array[N];.VAR x = 0x12345689;
.SECTION program;_main: B0 = data_array;
L0 = length(data_array); . . .
The .SECTION directive assigns data and code- Multiple sections may be used within a single source file- Any section name may be chosen
Stage DSP Mars 2011, Lyon
The defbf532.h Header File• Allows Programmer to Use Symbols for Memory Mapped Registers• Located in: \\VisualDSP\Blackfin\include\
To include it use:#include <defbf532.h> or#include <Cdefbf532.h>
• Example: P0.H = HI(EBIU_SDRRC);P0.L = LO(EBIU_SDRRC);R0 = 0x2345;[P0] = R0; // Write 0x2345 to EBIU_SDRRC
Stage DSP Mars 2011, Lyon
Assembler Source File Example #include <defbf532.h>
#define N 20 // replace N by 20
.GLOBAL start;
.SECTION data1; // data in L1 A memory
.byte2 buffer[N]="fill.dat"; // initialize data from file
.SECTION data2; // data in L1 B memory
.VAR xy = 0x12345678; // initialize var with 32bit value
.SECTION/PM program; // instructions in L1 Instructionmemory
start: i0.l =LO(buffer) ; // get low address word of array and load index registeri0.h=HI(buffer);
B0=I0; // load base register with address
L0=N*4; // size of array (circular buffer!) in bytes
R0=0 (z);
P2=N;
LSETUP (loopstart,loopend) LC0=P2; // setup looploopstart: R0 =R0 + 1; // 1st instruction in looploopend: W[I0++]=R0; // last instruction in loop
Stage DSP Mars 2011, Lyon
Listing file (.lst)Page 1 .\test.asmADI easmblkfn (2.1.5.0) 02 Apr 2002 15:32:00
offset opcode line====== ====== ==== 1 #include <def21535.h>; 2 #define N 20 //replace N by 20 3 .GLOBAL start; 4 .SECTION data_a; //data in L1 memory bank A 5 .VAR buffer[N]="fill.dat"; //initialise data from file 5 6 .SECTION data_b; //data in L1 memory bank B 7 .VAR x = 0x12345678; //initialise variable 8 .SECTION L2_program; //instructions in L2 memory 0 9 start: I0 = buffer (z); //get low address word of array 0 90e1 9 2 0000 9 4 50e1 10 I0.H = buffer; //get high address word of array 6 0000 10 8 8036 11 B0=I0; //load base register a 3ce1 12 L0=N*4; // size of array (circular buffer!) in bytes c 5000 12 e 0060 13 R0=0; 10 a068 14 P0=N; 12 b0e0 15 lsetup(loopstart,loopend) LC1 = P0; // setup loop 14 0000 15 16 16 loopstart: R0 += 1; // 1st instruction in loop 16 0864 16 18 17 loopend: [I0++]=R0; // last instruction in loop 18 009e 17
Line Nr. in the source code
Offset within the specified section
Generated opcode
Source code
Stage DSP Mars 2011, Lyon
Macros#define mymacro(x,y) R0 = x; R1 = y; R2 = R0 + R1
.SECTION program;
start: mymacro(0x4,P0);
[I0++] = R2;
The Preprocessor will create the following:start: R0 = 0x4 (Z);
R1 = P0; R2 = R0 + R1;[I0++] = R2;
Semicolon either here or here
Stage DSP Mars 2011, Lyon
Linker
• Generates a Complete Executable DSP Program (.dxe)• Resolves All External References• Assigns Addresses to re-locatable Code and Data
Spaces• Generates Optional Memory Map• Output Can Be Read by Loader, Simulator, and
Debugger• Controlled by linker commands contained in a linker
description file (LDF)
Stage DSP Mars 2011, Lyon
Linker
Linker
Memory Image File.DXE (binary)
Library Files.DLB
Object File.DOJ
Linker Description Files .LDF
Memory Map File.MAP (ASCII)
Stage DSP Mars 2011, Lyon
The Linker Description File (LDF)
The link process is controlled by a linker command language
Provides complete specification of mapping between the linker's input files and its output.
It controls input files output file target memory configuration
Preprocessor Support
Stage DSP Mars 2011, Lyon
LDF consist of three primary parts Global Commands
Defines architecture or processor Directory search paths Libraries and object files to include
Memory Description Defines memory sections
Link Project Commands Link against object file list Output file name Mapping of input sections to memory segments
Stage DSP Mars 2011, Lyon
Example Global Commands
ARCHITECTURE (ADSP-21535) // Processor Used
SEARCH_DIR( $ADI_DSP\Blackfin\lib ) // Directories to search for files
$OBJECTS = bootup.doj, $COMMAND_LINE_OBJECTS; // Macro listing all command line objects and bootup
Stage DSP Mars 2011, Lyon
Linker Description File Macros $COMMAND_LINE_OBJECTS:
List of objects (.DOJ) and libraries (.DLB) passed on command line.
$COMMAND_LINE_OUTPUT_FILE:Output executable file name specified on the command line with the -o switch.
$ADI_DSP: Path to VisualDSP installation directory.
$macro: User defined macro for a list of files.e.g.: $OBJECTS
Stage DSP Mars 2011, Lyon
ARCHITECTURE (BF-533)SEARCH_DIR ($ADI_DSP\Blackfin\lib)$OBJECTS = $COMMAND_LINE_OBJECTS; MEMORY{
seg_data_a { TYPE(RAM) START(0xFF800000) END(0xFF803FFF) WIDTH(8) }seg_data_b { TYPE(RAM) START(0xFF900000) END(0xFF903FFF) WIDTH(8) }seg_data_scr { TYPE(RAM) START(0xFFB00000) END(0xFFB00FFF) WIDTH(8) }seg_prog_L1 { TYPE(RAM) START(0xFFA00000) END(0xFFA03FFF) WIDTH(8) }seg_prog_L2 { TYPE(RAM) START(0xF0000000) END(0xF003FFFF) WIDTH(8) }
}
Example LDFGlobal Commands & Memory Description
Segment name
Start address
End address
Memory width
Global Commands
Stage DSP Mars 2011, Lyon
PROCESSOR p0{
OUTPUT( $COMMAND_LINE_OUTPUT_FILE )SECTIONS
{sec_data_a{ INPUT_SECTIONS( $OBJECTS(data_a) ) } > seg_data_asec_data_b SHT_NOBITS { INPUT_SECTIONS( $OBJECTS(data_b) ) } > seg_data_bsec_data_scr{ INPUT_SECTIONS( $OBJECTS(data_scr) ) } > seg_data_scrsec_prog_L1 { INPUT_SECTIONS( $OBJECTS(prog_L1) ) } >seg_prog_L1sec_prog_L2 { INPUT_SECTIONS( $OBJECTS(prog_L2) ) } >seg_prog_L2
}
}
Example LDF (con‘t)Link Commands
MEMORY SEGMENTSDeclared in the LDF
DXE SECTION NAMESUsed in .map file
OBJECT SECTIONSfrom assembly files
Keyword:Data in that SECTION will not be initialized
Stage DSP Mars 2011, Lyon
Expert Linker Features
Expert Linker is a Graphical tool that can: Use wizards to create LDF files
Define a DSP’s target memory map
Drag and Drop object sections into the memory map
Present watermarks for max Heap and Stack usage
Graphically Manage Overlay support
Import Legacy LDF files
Graphically highlights code elimination of unused objects
Stage DSP Mars 2011, Lyon
Launch the LDF Wizard
Using Expert Linker to create an LDF Launch the wizard by selecting:
Tools -> Expert Linker -> Create LDF The wizard guides through the 3 step process to
generating an LDF: Step 1: Name the LDF filename and choose project
programming language type (i.e. C, C++, Assembly) Step 2: Select processor type and other properties Step 3: A confirmation summary of the choices above is
shown. Click on the finish button to generate LDF
Stage DSP Mars 2011, Lyon
Expert Linker
Stage DSP Mars 2011, Lyon
Project Development
All development in VisualDSP++ occurs within a project.
The project file (.DPJ) stores your program’s build information: source files list and development tools option settings
Stage DSP Mars 2011, Lyon
How to start a new project Create a new Folder
Copy standard .ldf from \VisualDSP\Blackfin\ldf to new Folder
Open VisualDSP++
Choose Project -> new
Switch to your new created Folder
Enter your project name at: File Name
Projects Options window will appear
Stage DSP Mars 2011, Lyon
Project Options
Choose your DSP
Set Type to executable file,
which is expected by the debugger
Then press OK
A new window appears. If you don‘t want to use the operating system
VDK press NO.
Stage DSP Mars 2011, Lyon
VisualDSP++ Menu
Add source, header and .ldf
files to your project.
Build the project
File specific options:Select file, press right mouse button, choose:
File Options
Project Options
Stage DSP Mars 2011, Lyon
Debug Features
Single Step Run Halt Set Breakpoints Profiling Plotting Pipeline Viewer Cache Viewer Memory Viewing Register Viewing
Stage DSP Mars 2011, Lyon
Using Statistical Profiling Statistical Profiling accessed through Tools->Statistical
Profiling->New Profile in an emulator session Enable the Statistical Profiler through the context menu Run and watch as the results are updated in real-time; Halting
keeps the last snapshot on the screen
Stage DSP Mars 2011, Lyon
C/C++ Profiler The profiler is very useful in C/C++ mode because it makes it easy
to benchmark a system from a module-by-module (I.e. C/C++ function) standpoint
Assembly modules can be wrapped in C/C++ functions to take advantage of this
The World Leader in High-Performance Signal Processing Solutions
Section 8C compiler / Rules for optimal software
Stage DSP Mars 2011, Lyon
C/C++ Programming for DSPs
Why Program In C/C++? Maintainability Portability Learning Curve
What Are the Tradeoffs in C/C++ Programming for DSPs? Code Efficiency
Size and Cycle Count Slower Interrupt Handling
Develop Programs Using a Mix of C/C++ and Assembly Languages
Stage DSP Mars 2011, Lyon
blackfin C/C++ Compiler Compiler
Invoked Via IDDE Using Settings from Compiler Property Page Invoked from a DOS Command Line (ccblkfn.exe)
Linker Description File (LDF) Defines Segments in Memory for Code and Data Defines Segment in Memory for the Run-Time Stack Defines Segment in Memory for Heap
Run Time Header Several Separate Run-Time Headers To Choose From Linker Options Determine Which C Run-Time Libraries To Use
Size, File I/O, C++ Are All Selectable Provides Interrupt Handling Initializes C/C++ Run-Time Environment Must Be Linked With C/C++ Code
C/C++/Assembly Interface Prologue/Epilogue
Example
Stage DSP Mars 2011, Lyon
Build ProcessText Editor
C Preprocessor
Compiler
Assembly Preprocessor
Assembler
Linker
File.asm
File.dojFile.ldflib.dlbcrtx.doj
File.c
File.dxe
File.s
crtx.doj is short for the various user-selectable run-time headers supported by Blackfin
Stage DSP Mars 2011, Lyon
Compiler Property Pages
Compile/General Property Page Enable Debug and Optimization Stop After: Preprocessing, Compiling
Compile/Preprocessor Property Page Provide Processor Definitions Specifies Directory Paths for Include Files
Compile/Warning Property Page Turn On/Off Levels of Warning Messages
Stage DSP Mars 2011, Lyon
Compile / General Property Page
Stage DSP Mars 2011, Lyon
Supported Data Formats
Stage DSP Mars 2011, Lyon
Compiler-Generated Memory Section Names
Compiler generates default section names to be used by the linker
program - contains all program instructionsdata1 - contains all global and "static" dataconstdata - contains all data declared as "const"stack - storage for local data and return addresses (user mode)sysstack - storage for local data and return addresses (supervisor mode)heap - memory space to allocate from during run-time
Stage DSP Mars 2011, Lyon
C Source with Alternate Sections
section (“extern”) int array[256];
section (“foo”) void bar(void){ int foovar; foovar = 1; foovar++;}
foo.C foo.DOJ
Object Section = fooType = RAMWidth = 8_bar : p0=_foovar;r0=w[p0]; r0=r0+1;w[p0] = r0;C-Compiler
Object Section = externType = RAMWidth = 8_array [0]_array [1] …_array [255]
Assembler
Object Section = mem_stackType = RAMWidth = 8_foovar: 1Note: The section( ) directive is used to
place data or code into a section other thanthe default section used by the compiler.
foo.S
Stage DSP Mars 2011, Lyon
C Run Time Headers basiccrt.s Sets Up the C Runtime Environment
Installs Default Event Handlers Initializes Both the User and Supervisor Stacks Allows User to Supervisor or User Mode Configures Cache, If Necessary Initializes Profiling Support, If Necessary Initializes argc/argv Support, If Necessary Initializes Global C++ Objects, Sets Up Destructor Call For Clean-Up Calls _main To Start The Actual program Calls _exit When Program Terminates
basiccrt.s Combined With C Library Function EX_INTERRUPT_HANDLER Provides Interrupt Support Enables Global Interrupts Saves / Restores Context Services Interrupts
Stage DSP Mars 2011, Lyon
Interrupt Enabling Interrupts Enabled at Runtime Using:
EX_INTERRUPT_HANDLER(sig, isr_name);Example:
#include<signal.h> EX_INTERRUPT_HANDLER(ik_timer, MyTimerISR);
EX_INTERRUPT_HANDLER Does the Following Enables Global Interrupts Sets Appropriate Bit in IMASK Register Maps Specified Function (isr_name) as the Interrupt Service Routine (ISR) Prior to ISR Call, Interrupt Dispatcher SAVES Current Processor State When ISR Completes, Interrupt Dispatcher RESTORES Processor State
Context Switching Requires 32 Total Cycles To Save/Restore All Data Registers (R0, R1, R2, R3, R4, R5, R6, R7) All Pointer Registers (P0, P1, P2, P3, P4, P5) Frame Pointer (FP) Arithmetic Status Register (ASTAT)
Stage DSP Mars 2011, Lyon
Assembly Language Interface
C-Callable Assembly Language Functions
Assembly Language Statements Within a C Function (In-Line Assembly)
Associate C Variables with Assembly Language Symbols
Stage DSP Mars 2011, Lyon
C-Callable Assembly Language Functions
Several Issues Involved When Writing C-Callable Assembly Language Functions Register Usage
“Dedicated” Registers “Call Preserved” Registers “Scratch” Registers
Argument Passing First Three Arguments Passed in R0, R1 and R2, respectively Arguments Four and Beyond Passed on Stack
4th Parameter Is Closest to SP at [FP+20], 5th at [FP+24], etc. Return Values of 32 Bits or Less Stored in R0
Overflows To R1 for Return Values of 33 to 64 Bits Anything Over 64 Bits Is Allocated on Stack but Passed as Pointer in a Hidden
Argument in P0
Stage DSP Mars 2011, Lyon
C/C++ Compiler Register UsesDedicated Registers
Registers that C/C++ Compiler Reserves for its Own Use
REGISTER VALUE MODIFICATION RULESL0 – L3 0 Do Not Modify
SP Stack Pointer Stack Management Only, Restore
FP Frame Pointer Stack Management Only, Restore
Stage DSP Mars 2011, Lyon
C/C++ Compiler Register UsesCall Preserved Registers
May be Used in an Assembly FunctionContents Should Be Saved and Restored
Values Assumed to be Preserved Across Function Calls
P3, P4, P5
R4, R5, R6, R7
Stage DSP Mars 2011, Lyon
C/C++ Compiler Register UsesScratch Registers
Contents DO NOT Need to Be Saved/RestoredUse Freely in Assembly Sub-Routines
P0 – P3R0 – R3
LC0 – LC1CC
A0 – A1I0 – I3
B0 – B3M0 – M3
Stage DSP Mars 2011, Lyon
C-Callable Assembly Language Functions
Macros in asm_sprt.h Provided to Make Function Calling Easier Save/Restore Preserved Registers (puts, pops) Restore Frame and Stack Pointers (exit)
PUTS = x; push a value in register x onto the stack
PUTS = R5; -> [- -SP] = R5; POPS(n); Read value off top of stack
POPS (R5); -> R5 = [SP++];
EXIT; Restores stack and frame pointers and jumps to the return address .Exit; -> P0 = [FP + 0x4];
JUMP (P0);
Stage DSP Mars 2011, Lyon
In-Line Assembly Language
In-Line Assembly Is Accomplished Using the asm( ) Construct
Example:asm(“RO = w[p0];”);asm(“BITSET(R0,7);”);asm(“ssync;”);
Note: Can Produce Less Efficient Compiled Code – Optimizer Might Re-Sequence Instructions for Optimal Performance
Stage DSP Mars 2011, Lyon
Mixed C/Assembly Naming Conventions
To name an assembly symbol
that corresponds toa C symbol, add anunderscore prefixto the C symbol.
Declare as aglobal variable in
C program and as EXTERN in
assembly routine
To use an assemblyfunction or variable in your
C program, declare the symbol with .GLOBAL
directive in assembly routineand as EXTERN in the C
program
Stage DSP Mars 2011, Lyon
VOLATILE Extension to ANSI CVOLATILE Extension to ANSI C
• VOLATILE Keyword Prevents Compiler from Relocating, Deleting, or Combining Assembly Instructions or Variable Declarations
volatile int x;asm volatile (“ssync;”);
Stage DSP Mars 2011, Lyon
Optimizing C Code Optimization Can Decrease Code Size or Lead to Faster Execution
- Can Be Controlled by Optimization Switchno switch optimization disabled-O optimization enabled-ipa inter-procedural optimization enabled
- Can Be Further Controlled In C Source Code Using Pragmas#pragma optimize_off - Disables Optimizer#pragma optimize_for_space - Decreases Code Size#pragma optimize_for_speed - Increases Performance
Other Optimization Ideas Take Advantage of Existing Assembly Library Functions Write Time-Critical Routines in Assembly as a C-Callable Subroutine App Note, “Tuning C Source Code For The Blackfin DSP Compiler”http://www.analog.com/library/applicationNotes/dsp/blackfin/EE-149.pdf
Stage DSP Mars 2011, Lyon
Example -- Add 5 Numbers in an Assembly Function Example C Program That Calls an Assembly Function (add5)
Adds 5 Integers Passed From C Calling Routine As Arguments C code:extern int add5(int,int,int,int,int); /* Function is located in assembly module */
volatile int sum; /* Variable only used in assembly sub-routine*//* volatile keeps sum from being optimized out */
main() {
int a=1; int b=2; int c=3; int d=4; int e=5; /* Initialize parameters */
int result=0; /* result and sum will have the same value */
result = add5(a,b,c,d,e); /* Call to the ADD5 function */
exit(0);}
Stage DSP Mars 2011, Lyon
Assembly Routine/* Assembly Routines with Parameters Example - _add5 *//* int add5 (int a, int b, int c, int d, int e); *//* This is an assembly language routine that will add 5 numbers */#include <asm_sprt.h> /* Header file that defines the stack manipulation macros */.section program;.global _add5;.extern _sum;
_add5:r0=r0+r1; /* Add the first and second parameter */r0=r0+r2; /* Add the third parameter */r1=[FP+20]; /* Put the fourth parameter in R1 */r0=r0+r1; /* Add the fourth parameter */r1=[FP+24]; /* Put the fifth parameter in R1 */r0=r0+r1; /* R0 is always the return value, variable “result” from C will get r0 value */p0.h = _sum; /* we can also write directly to a globally defined variable as well */p0.l =_sum; /* could be used if this function was implemented with no return type */w[p0] = r0; /* Place the sum in the global variable (C is unaware of this
assignment)*/exit; /* Restores frame and stack pointers */
Stage DSP Mars 2011, Lyon
C Header Files (MMR Access) In previous compilers for other architectures, there was no
straightforward way of accessing and changing MMR values
#include <Cdef21161.h>#define pIM0A ((volatile unsigned int *) 0x61 /* Internal DMA0 memory access modifier */
main ( ){
*pIM0A = 0x010; // set up IM DMA #0 }
This method, though functional, consumes memory for each MMR accessed this way, which is a waste of space
Stage DSP Mars 2011, Lyon
C Header Files (cdefbf53x.h)// include all Core registers and bit definitions #include <defbf532.h>
// include core specific register pointer definitions #include <cdef_LPBlackfin.h>
// Clock and System Control (0xFFC0 0400-0xFFC0 07FF) #define pPLL_CTL ((volatile unsigned short *)PLL_CTL) #define pPLL_STAT ((volatile unsigned short *)PLL_STAT) #define pPLL_LOCKCNT ((volatile unsigned short *)PLL_LOCKCNT)
#include <cdefBF533.h>main ( ){
*pPLL_CTL |= 0x0200; // Set BYPASS Bit}
The cdef header files use the compiler’s pre-processor to set up MMR accesses without using memory for the pointers
If users follow the p-prefix notation, every MMR is set-up for access based on its size (char = 8bit, short = 16bit, int = 32bit)