Software and Hardware Circular Buffer...
Transcript of Software and Hardware Circular Buffer...
Software and Hardware Circular Buffer Operations
First presented in ENCM515 2005. There are 3 earlier lectures that are useful for midterm review.
M. R. Smith, ECEUniversity of Calgary
Canada03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 2
Tackled today
Circular Buffer IssuesDCRemoval( )FIR( )
Coding a software circular buffer in C++ and TigerSHARC assembly codeCoding a hardware circular bufferWhere to next?
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 3
DCRemoval( )
Not as complex as FIR, but many of the same requirementsDoes an “implied” multiplication by a FIR coefficient of 1 and then does the sum.
Easier to handleYou use same ideas in optimizing FIR over Labs 2 and 3Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests
Memory Intensive
Addition intensive
Loops formain code
FIFO implementedas circularbuffer“Memory Shuffle approach”
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 4
Set up timeIn principle 1 cycle / instruction
2 + 4 instructions
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 5
First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log2N)
4 instructions
N * 5 instructions
1 + 2 * log2NNo J parallel shifterDo it M68000 way
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 6
Third key element – FIFO circular buffer-- Order (N)
6
3
6 * N
2
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 7
Next stage in improving code speedSoftware circular buffers
Set up pointers to buffersInsert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return
244 + N * 51 Was 1 + 2 * log2N63 + 6 * N2---------------------------23 + 11 N Was 22 + 11 N + 2
log2N
N = 128 – instructions = 1430
1430 + 300 delay cycles = 1730 cycles
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 8
DCRemoval( )
If there are N points in the circular buffer, then this approach of moving the data from memory to memory location requires
N Memory read / N Memory write (possible data bus conflicts)2N memory address calculations
FIFO implementedas circularbuffer
Uses memory shuffle approach from Lab. 1
NOTE: This approach can sometimes be the “fastest”(see later Labs.)
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 9
Alternative approachMove pointers rather than memory valuesIn principle – 1 memory read, 1 memory write, pointer addition, conditional equate
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 10
Note: Software circular buffer is NOT necessarily more efficient than data moves
Watch outCircular buffers can be implemented with the newest element placed “last” in the FIFO buffer, or with newest element placed “first” in the FIFO bufferSHARC (2002, 2003, 2004) – used “first approach”TigerSHARC – used “first approach” and failed max. optimization
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 11
Note: Software circular buffer is NOT necessarily more efficient than data moves
Now spending more time on moving / checking the software circular buffer pointers than moving the data?
SLOWER
FASTER
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 12
On TigerSHARC
Since we can have multiply instructions on one line, then “perhaps” if we can avoid pipeline delays then software circular buffer is faster than memory moves
No Pipeline delay
XR4 = R4 + R5;;XR3 = R4 + R6;;
Second instruction DOES NOT need result of first
Pipeline delay
XR4 = R4 + R5;;XR4 = R4 + R6;;
Second instruction needs result of first
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 13
Generate the tests for the software circular buffer routine
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 14
New static pointers needed in Software circular buffer code
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 15
New sets of register definesNow using many of TigerSHARC registers
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 16
Code for storing new value into FIFO requires knowledge of “next-empty” location
First you must get the address of where the static variable –saved_next_pointerSecond you must access that address to get the actual pointerThird you must use the pointer valueWill be problem in labs and exams with static variables stored in memory
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 17
Adjustment of software circular buffer pointer must be done carefully
Get and update pointer
Check the pointer
Save corrected pointer
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 18
Next stage in improving code speedSoftware circular buffers
Set up pointers to buffersInsert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return
28 Was 44 + N * 51 Was 1 + 2 * log2N614 Was 3 + 6 * N2---------------------------37 + 5 N Was 23 + 11 N
N = 128 – instructions = 677 cycles677 + 360 delay cycles = 1011 cycles
Was1430 + 300 delay cycles = 1730 cycles
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 19
Next step – Hardware circular buffer
Do exactly the same pointer calculations as with software circular buffers, but now the calculations are done behind the scenes – high speed – using specialized pointer featuresOnly available with J0, J1, J2 and J3 registers (On older ADSP-21061 – all pointer registers)Jx -- The pointer registerJBx – The BASE register – set to start of the FIFO arrayJLx – The length register – set to length of the FIFO array
VERY BIG WARNING? – Reset to zero. On older ADSP-21061 it was very important that the length register be reset to zero, otherwise all the other functions using this register would suddenly start using circular buffer by mistake.
Still advisable – but need special syntax for causing circular buffer operations to occur
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 20
Setting up the circular buffer functionsRemember all the tests to start with
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 21
Store values into hardware FIFO
CB instruction ONLY works on POST-MODIFY operations CB [J1 += J2] not CB [J1 + J2]
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 22
Now perform Math operation using circular buffer operation
MUST NOT DO XR2 = CB [J0 + i_J8];Save N cycles as no longer need to increment index
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 23
Update the static variablesFurther special CB instructions
A few cycles saved here
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 24
Next stage in improving code speedHardware circular buffers
Set up pointers to buffersInsert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return
28 Was 43 + N * 4 Was 4 + N * 51 Was 1 + 2 * log2N614 Was 3 + 6 * N2---------------------------37 + 4 N Was 23 + 5 N
N = 128 – instructions = 549 cycles
549 + 300 delay cycle = 879 cyclesDelays are now >50% of useful time
Was 677 + 360 delay cycles = 1011 cycle
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 25
Tackle the summation part of FIR Exercise in using CB (Lab 2)
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 26
Place assembly code here
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 27
The code is too slow because we are not taking advantage of the available resources
Bring in up to 128 bits (4 instructions) per cycleAbility to bring in 4 32-bit values along J data bus (data1) and 4 along K bus (data2)Perform address calculations in J and K ALU – single cycle hardware circular buffersPerform math operations on both X and Y compute blocksBackground DMA activityOff-load some of the processing to the second processor
03-Feb-07
Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,
Canada 28
Tackled today
Have moved the DCremoval( ) over to the X Compute blockCircular Buffer Issues
DCRemoval( )FIR( )
Coding a software circular buffer in C++ and TigerSHARC assembly codeCoding a hardware circular bufferWhere to next?