Software and Hardware Circular Buffer...

7
Software and Hardware Circular Buffer Operations First presented in ENCM515 2005. There are 3 earlier lectures that are useful for midterm review. M. R. Smith, ECE University of Calgary Canada 03-Feb-07 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada 2 Tackled today Circular Buffer Issues DCRemoval( ) FIR( ) Coding a software circular buffer in C++ and TigerSHARC assembly code Coding a hardware circular buffer Where to next? 03-Feb-07 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada 3 DCRemoval( ) Not as complex as FIR, but many of the same requirements Does an “implied” multiplication by a FIR coefficient of 1 and then does the sum. Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests Memory Intensive Addition intensive Loops for main code FIFO implemented as circular buffer “Memory Shuffle approach” 03-Feb-07 Software Circular Buffer Issues, M. Smith, ECE, University of Calgary, Canada 4 Set up time In principle 1 cycle / instruction 2 + 4 instructions

Transcript of Software and Hardware Circular Buffer...

Page 1: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

Software and Hardware Circular Buffer Operations

First presented in ENCM515 2005. There are 3 earlier lectures that are useful for midterm review.

M. R. Smith, ECEUniversity of Calgary

Canada03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 2

Tackled today

Circular Buffer IssuesDCRemoval( )FIR( )

Coding a software circular buffer in C++ and TigerSHARC assembly codeCoding a hardware circular bufferWhere to next?

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 3

DCRemoval( )

Not as complex as FIR, but many of the same requirementsDoes an “implied” multiplication by a FIR coefficient of 1 and then does the sum.

Easier to handleYou use same ideas in optimizing FIR over Labs 2 and 3Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests

Memory Intensive

Addition intensive

Loops formain code

FIFO implementedas circularbuffer“Memory Shuffle approach”

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 4

Set up timeIn principle 1 cycle / instruction

2 + 4 instructions

Page 2: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 5

First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log2N)

4 instructions

N * 5 instructions

1 + 2 * log2NNo J parallel shifterDo it M68000 way

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 6

Third key element – FIFO circular buffer-- Order (N)

6

3

6 * N

2

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 7

Next stage in improving code speedSoftware circular buffers

Set up pointers to buffersInsert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return

244 + N * 51 Was 1 + 2 * log2N63 + 6 * N2---------------------------23 + 11 N Was 22 + 11 N + 2

log2N

N = 128 – instructions = 1430

1430 + 300 delay cycles = 1730 cycles

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 8

DCRemoval( )

If there are N points in the circular buffer, then this approach of moving the data from memory to memory location requires

N Memory read / N Memory write (possible data bus conflicts)2N memory address calculations

FIFO implementedas circularbuffer

Uses memory shuffle approach from Lab. 1

NOTE: This approach can sometimes be the “fastest”(see later Labs.)

Page 3: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 9

Alternative approachMove pointers rather than memory valuesIn principle – 1 memory read, 1 memory write, pointer addition, conditional equate

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 10

Note: Software circular buffer is NOT necessarily more efficient than data moves

Watch outCircular buffers can be implemented with the newest element placed “last” in the FIFO buffer, or with newest element placed “first” in the FIFO bufferSHARC (2002, 2003, 2004) – used “first approach”TigerSHARC – used “first approach” and failed max. optimization

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 11

Note: Software circular buffer is NOT necessarily more efficient than data moves

Now spending more time on moving / checking the software circular buffer pointers than moving the data?

SLOWER

FASTER

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 12

On TigerSHARC

Since we can have multiply instructions on one line, then “perhaps” if we can avoid pipeline delays then software circular buffer is faster than memory moves

No Pipeline delay

XR4 = R4 + R5;;XR3 = R4 + R6;;

Second instruction DOES NOT need result of first

Pipeline delay

XR4 = R4 + R5;;XR4 = R4 + R6;;

Second instruction needs result of first

Page 4: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 13

Generate the tests for the software circular buffer routine

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 14

New static pointers needed in Software circular buffer code

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 15

New sets of register definesNow using many of TigerSHARC registers

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 16

Code for storing new value into FIFO requires knowledge of “next-empty” location

First you must get the address of where the static variable –saved_next_pointerSecond you must access that address to get the actual pointerThird you must use the pointer valueWill be problem in labs and exams with static variables stored in memory

Page 5: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 17

Adjustment of software circular buffer pointer must be done carefully

Get and update pointer

Check the pointer

Save corrected pointer

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 18

Next stage in improving code speedSoftware circular buffers

Set up pointers to buffersInsert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return

28 Was 44 + N * 51 Was 1 + 2 * log2N614 Was 3 + 6 * N2---------------------------37 + 5 N Was 23 + 11 N

N = 128 – instructions = 677 cycles677 + 360 delay cycles = 1011 cycles

Was1430 + 300 delay cycles = 1730 cycles

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 19

Next step – Hardware circular buffer

Do exactly the same pointer calculations as with software circular buffers, but now the calculations are done behind the scenes – high speed – using specialized pointer featuresOnly available with J0, J1, J2 and J3 registers (On older ADSP-21061 – all pointer registers)Jx -- The pointer registerJBx – The BASE register – set to start of the FIFO arrayJLx – The length register – set to length of the FIFO array

VERY BIG WARNING? – Reset to zero. On older ADSP-21061 it was very important that the length register be reset to zero, otherwise all the other functions using this register would suddenly start using circular buffer by mistake.

Still advisable – but need special syntax for causing circular buffer operations to occur

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 20

Setting up the circular buffer functionsRemember all the tests to start with

Page 6: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 21

Store values into hardware FIFO

CB instruction ONLY works on POST-MODIFY operations CB [J1 += J2] not CB [J1 + J2]

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 22

Now perform Math operation using circular buffer operation

MUST NOT DO XR2 = CB [J0 + i_J8];Save N cycles as no longer need to increment index

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 23

Update the static variablesFurther special CB instructions

A few cycles saved here

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 24

Next stage in improving code speedHardware circular buffers

Set up pointers to buffersInsert values into buffersSUM LOOPSHIFT LOOPUpdate outgoing parametersUpdate FIFOFunction return

28 Was 43 + N * 4 Was 4 + N * 51 Was 1 + 2 * log2N614 Was 3 + 6 * N2---------------------------37 + 4 N Was 23 + 5 N

N = 128 – instructions = 549 cycles

549 + 300 delay cycle = 879 cyclesDelays are now >50% of useful time

Was 677 + 360 delay cycles = 1011 cycle

Page 7: Software and Hardware Circular Buffer Operationspeople.ucalgary.ca/~smithmr/2007webs/encm515_07/07... · Software and Hardware Circular Buffer Operations First presented in ENCM515

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 25

Tackle the summation part of FIR Exercise in using CB (Lab 2)

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 26

Place assembly code here

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 27

The code is too slow because we are not taking advantage of the available resources

Bring in up to 128 bits (4 instructions) per cycleAbility to bring in 4 32-bit values along J data bus (data1) and 4 along K bus (data2)Perform address calculations in J and K ALU – single cycle hardware circular buffersPerform math operations on both X and Y compute blocksBackground DMA activityOff-load some of the processing to the second processor

03-Feb-07

Software Circular Buffer Issues, M. Smith, ECE, University of Calgary,

Canada 28

Tackled today

Have moved the DCremoval( ) over to the X Compute blockCircular Buffer Issues

DCRemoval( )FIR( )

Coding a software circular buffer in C++ and TigerSHARC assembly codeCoding a hardware circular bufferWhere to next?