Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources

28
Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources M. Smith, University of Calgary, Canada smithmr @ ucalgary.ca

description

Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources. M. Smith, University of Calgary, Canada smithmr @ ucalgary.ca. Series of Talks and Workshops. CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code. - PowerPoint PPT Presentation

Transcript of Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources

Page 1: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSPApplication of a Project Management

Toolto manage

low-level DSP processor resources

M. Smith, University of Calgary, Canada

smithmr @ ucalgary.ca

Page 2: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

2/28

Series of Talks and Workshops

CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code.SQUISH-DSP – Talk on using a project management tool to automate identification of parallel DSP processor instructions .SHARC Ecology 101 – Workshop showing how to systematically write parallel 2106X code.SHARC Ecology 201 – Workshop on SQUISH-DSP and CACHE-DSP tools.

Page 3: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

3/28

Scope of Talk

Overview of hand optimization of codeParadigm shift in microprocessor resource scheduling

Project Management Tool Application

Translating ‘microprocessor’ language into a ‘business’ formatExamples and limitations

Better optimization from VisualDSP code

Future directions

Page 4: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

4/28

Standard “C” code

void Convert(float *temperature, int N) {int count;

for (count = 0; count < N; count++) {*temperature = (*temperature) * 9 / 5

+ 32;temperature++

}

Page 5: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

5/28

2106X-style load/store “C” code

void Convert(register float *temperature, register int N) {register int count;register float *pt = temperature; // Ireg <- Dregregister float scratch;

for (count = 0; count < N; count++) {scratch = *pt;scratch = scratch * (9 / 5);scratch = scratch + 32; // Order of Ops*pt = scratch;pt++;

}

Page 6: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

6/28

Check on required register use

#define count scratchR1#define pt scratchDMpt#define scratchF2 F2

LCNTR = INPAR2, DO LOOP_END UNTIL LDE:scratchF2 = dm(pt, zeroDM);

Any special requirements here on F2?? // INPAR1 (R4) is dead -- can reuse

#define constantF4 F4 // Must be floatconstantF4 = 1.8;scratchF2 = scratchF2 * constantF4

Fn = F(0,1,2 or 3) * F(4,5,6 or 7),#define F0_32 F0 // Must be float

F0_32 = 32.0;scratchF2 = scratchF2 + F0_32;

Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) LOOP_END: dm(pt, plus1DM) = scratchF2;

Page 7: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

7/28

Resource Chart -- Basic code

ADDER MULTIPLIER DM ACCESS PM ACCESS

_Convert: pt = INPAR1; F12_32 = 32.0 // bring constants outside the loop F4_1_8 = 1.8 LCNTR = INPAR2, DO LOOP_END UNTIL LCE; F2 = dm(pt, ZERODM) F8 = F2 * F4_1_8 F2 = F8 + F12_32 LOOP_END: dm(pt, PLUS1DM) = F2 5 magic lines of “C” Time = 4 + N * 4 + 5 + 5 to do the call

Page 8: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

8/28

Unroll the loop -- 5 times here

ADDER MULTIPLIER DM ACCESSF2 = dm(pt, ZERODM) R1

F8 = F2 * F4_1_8 M1 F2 = F8 + F12_32 A1

dm(pt, PLUS1DM) = F2 W1F2 = dm(pt, ZERODM) R2

F8 = F2 * F4_1_8 M2 F2 = F8 + F12_32 A2

dm(pt, PLUS1DM) = F2 W2F2 = dm(pt, ZERODM) R3

F8 = F2 * F4_1_8 M3 F2 = F8 + F12_32 A3

dm(pt, PLUS1DM) = F2 W3F2 = dm(pt, ZERODM) R4

F8 = F2 * F4_1_8 M4 F2 = F8 + F12_32 A4

dm(pt, PLUS1DM) = F2 W4F2 = dm(pt, ZERODM) R5

F8 = F2 * F4_1_8 M5 F2 = F8 + F12_32 A5

dm(pt, PLUS1DM) = F2 W5

Page 9: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

9/28

Parallelism causes Register/Resource Conflicts

ADDER MULTIPLIER DM ACCESSF2 = dm(pt, ZERODM) Decode(Mem)

Writeback(F2)F8 = F2 * F4_1_8 F2 = Decode(F2,F4)

Writeback(F8) F2 = F8 + F12_32 F8 = F2 = Decode(F8,F4)

Writeback(F2) F2 = F8 = NO dm(pt, PLUS1DM) = F2 Decode(F2)

Writeback(Mem)

NO F2 = dm(pt, ZERODM) Decode(Mem)Writeback(F2)

F8 = F2 * F4_1_8 Decode(F2,F4)Writeback(F8)

F2 = F8 + F12_32 Decode(F8,F4)Writeback(F2)

dm(pt, PLUS1DM) = F2 Decode(F2)Writeback(Mem)

SRC

SRC

SRC

SRC

SRC

SRC

SRC

SRC

DEST

DEST

DEST

DEST

DEST

DEST

DEST

DEST

Page 10: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

10/28

c

Unroll the loop a bit more

ADDER MULTIPLIER DM ACCESSF2 = dm(pt, ZERODM) R1

F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M1, R2 F9 = F8 + F12_32 F8 = F2 * F4_1_8 A1, M2 F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W1, A2

dm(pt, PLUS1DM) = F9 W2F2 = dm(pt, ZERODM) R3

F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M3, R4 F9 = F8 + F12_32 F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) A3, M4, R5 F9 = F8 + F12_32 F8 = F2 * F4_1_8 dm(pt, PLUS1DM) = F9 W3, A4, M5F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W4, A5

dm(pt, PLUS1DM) = F9 W5F2 = dm(pt, ZERODM) R6

F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M6, R7 F9 = F8 + F12_32 F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) A6, M7, R8 F9 = F8 + F12_32 F8 = F2 * F4_1_8 dm(pt, PLUS1DM) = F9 W6 A7, M8F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W7, A8

dm(pt, PLUS1DM) = F9 W9

Page 11: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

11/28

Final code version

ADDER MULTIPLIER DM ACCESS_Convert: Modify(CTOPofSTACK, -1); dm(FP, -2) = R9; pt = INPAR1; F12_32 = 32.0 // bring constants outside the loop F4_1_8 = 1.8

F2 = dm(pt, ZERODM) R1F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M1, R2

F9 = F8 + F12_32 F8 = F2 * F4_1_8 A1, M2 F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W1, A2

dm(pt, PLUS1DM) = F9 W2 LCNTR = (N-2)/3, DO LOOP_END UNTIL LCE;

F2 = dm(pt, ZERODM) R3F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M3, R4

F9 = F8 + F12_32 F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) A3, M4, R5 F9 = F8 + F12_32 F8 = F2 * F4_1_8 dm(pt, PLUS1DM) = F9 W3, A4, M5F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W4, A5

LOOP_END: dm(pt, PLUS1DM) = F9 W5 R9 = dm(FP, -2); 5 magic lines of C

Page 12: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

12/28

Real Life is not made up of ‘short loops’

Probably using DSP-intelligent compiler as a starting pointLonger loops -- more tasks to make parallelMany different opportunities for task orderingComplicated resource management and register dependency issuesNeed a tool to help get the product ‘out the door’

Page 13: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

13/28

Business Management ToolOne evening went looking for a ‘tree’ program to manage the scheduling of microprocessor resources.

In frustration, decided to take the 2106X tasks and put them into Microsoft Project.

By mistake, found that I had developed a very useful microprocessor management tool, especially with the MS Project GUI!Question -- how to get it to function in a systematic manner?

Page 14: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

14/28

MS Project -- 21XXX processor

Requires a paradigm shiftBusiness project concept -- One person can’t be doing two tasks in the same time slot.

Becomes one data bus can’t be transferring two data items at same time

Handled by identifying the ‘processor resources’ needed to complete each ‘basic task’.

Page 15: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

15/28

MS Project -- 21XXX processor

Business project concept.If you delay building a wall (Task A), then you must delay painting it (Task B) HOWEVER

If you build the wall earlier, you could paint it earlier, but you don’t have to.Might make more sense to delay Task B so that Task C can be done earlier

since doing Task C allows Task D to be completed in parallel with Task Bso that the whole project is finished earlier.

Page 16: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

16/28

Simple Example 1) F6 = dm(I4, M4);10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12);16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12,

M12);

Might be able to move Task 1 in parallel with any instruction 2 through 15 BUT not in parallel with 16If Task 10 moves earlier, so can Task 16, BUT not before Task 10In Task 10 ‘F12=….’ can be made parallel with ‘F6=….’, BUT Task 10 ‘F8=….’ can’t!

Page 17: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

17/28

SquishDSP -- parser 1) F6 = dm(I4, M4);10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12);16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12, M12);

Task 16 split into 3 atomic tasksF12 = pm(I12, M12) -- PMBUS resource, must come after ‘F12=…’ from Task 10, and after ‘F8=…’ in current TaskF8 = F8 + F12 -- ALU resource, must come after ‘F8=…’ and ‘F12=…’ from Task 10F5 = F3 * F6 -- MULTIPLIER resource, must come after ‘F6=…’ from Task 1

Page 18: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

18/28

Preparation for Microsoft Project

.asm Code broken up into sub-tasks with intra and inter dependencies recognizedReformatted as Microsoft Project Text fileRescheduled within Microsoft Project, either automatically or using GUI interfaceReformatted as .asm code with increased parallelism

Page 19: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

19/28

Example GUI screen captureINSTR.BrokenintoATOMICTASKS

ATOMIC TASKS showing RESOURCE and DEPENDENCIES

ATOMIC TASKS with RESOURCE CONFLICTS

Page 20: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

20/28

Task scheduling after ‘LEVELING’

Page 21: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

21/28

Initial ‘C’ code

Page 22: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

22/28

Code from ‘Visual-DSP’

VisualDSP unrolled loop 3 times

Page 23: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

23/28

Code from SQUISH-DSP

12 VisualDSP cycles squished to 8

Page 24: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

24/28

Final version of code(loopchange)

Page 25: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

25/28

FinalSQUISH

12 VisualDSP cycles squished to 6

Page 26: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

26/28

Advantages and Limitations

Current version intended to handle the inner critical loop of algorithmNot handling ‘Cache’ conflicts Not optimized for instructions in delay slots in jumps and conditional jumpsNot optimized for multiple DAG delays

e.g. I4 = …. ; DM(I4, M2) = ; I5 =…

Moving to ‘task profile management’ macros with Primavera PV3 Tool

Page 27: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

27/28

Conclusion

SquishDSP is a prototype scheduling tool to identify and reschedule microprocessor resource operations in parallelAlready useful in current form for ‘inner DSP loops’Microsoft Project used for concept work but Primavera PV3 tool offers more long term promise

Page 28: Squish-DSP Application of a Project Management Tool to manage  low-level DSP processor resources

Squish-DSP Tool [email protected]

28/28

Acknowledgements

Financial support of Natural Sciences and Engineering Research Council (NSERC) of Canada and University of CalgaryFinancial support from Analog Devices. Dr. Mike Smith is ADI University Professor 2001/2002Future financial support from Alberta Provincial Government through Alberta Software Engineering Research Consortium (ASERC)