superblock · 2020. 11. 21. · Title: superblock Created Date: 10/14/2020 11:47:41 PM
Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf ·...
Transcript of Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf ·...
![Page 1: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/1.jpg)
Instruction Schedulingon
VLIW Architectures
Spring 2011
4541.775Topics on Compilers
![Page 2: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/2.jpg)
Instruction Scheduling
● Limited ILP
● Trace Scheduling
● Superblock Scheduling
● Hyperblock Scheduling
● Modulo Scheduling
![Page 3: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/3.jpg)
Instruction Scheduling
● Insufficient ILP
● “normal” code does not contain enough ILP
● ILP within basic blocks is limited for controlintensive programs
– the problem accentuates with longer latencies
unsigned int abs_sum = 0;for (int i=0; i<N; i++) { int abs = (A[i] >= 0? A[i] : -A[i]); abs_sum += abs;}
mov r0 ← #0 mov r1 ← #0 mov r2 ← N shl #2 mov r3 ← @A.loop ld r4 ← mem[r3 + r1] bge r4, #0, .skip not r4 ← r4 add r4 ← r4, #1.skip add r0 ← r0, r4 add r1 ← r1, #4 blt r1, r2, .loop
b0
b1
b2
b3
![Page 4: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/4.jpg)
Instruction Scheduling
● Insufficient ILP
● “normal” code does not contain enough ILP
● ILP within basic blocks is limited for controlintensive programs
– the problem accentuates with longer latencies
b0
b2
b3
b1
ld r4 ← … bge r4, …
ld latency: 4 cycles
ld
bge
4
ld
bge
![Page 5: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/5.jpg)
Instruction Scheduling
● ILP within basic blocks is limited for controlintensive programs.
→ optimizations across basic blocks are needed
– trace scheduling (J.Fisher, 1981)
– superblock scheduling (P.Chang, 1991)
– hyperblock scheduling (S.Mahlke, 1992)
![Page 6: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/6.jpg)
Instruction Scheduling
● Trace Scheduling
J.A.Fisher: Trace Scheduling: A Technique for Global Microcode Compaction (IEEE Transactions on Computers, vol.30, no.7, 1981)
● basic idea: schedule the most frequently executed trace of basic blocks as one unit
● requires compensation code if the program takes another route than expected
add r4 ← r0, r1
add r4 ← r0, r1 add r4 ← r0, r1
code motioncompensationcode0.9 0.1
![Page 7: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/7.jpg)
Instruction Scheduling
● Trace Scheduling
● A trace consists of a sequence of instructions
– including branches– but not including loops
● example:● assume B1,B3,B4,B5,B7 is
the most frequently executedpath
B2
B3
B4
B5
B1
B6
B7
![Page 8: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/8.jpg)
Instruction Scheduling
● Trace Scheduling
B2
B3
B4
B5
B1
B6
B7
B2
B3
B4
B1
B6B5
B7
add compensationcode if necessary
![Page 9: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/9.jpg)
Instruction Scheduling
● Trace Scheduling
● Compensation Code
– moving an instruction below a side exit
instr 1instr 2instr 3instr 4instr 5instr 6
instr 2instr 3instr 4instr 5instr 1instr 6
instr 1
![Page 10: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/10.jpg)
Instruction Scheduling
● Trace Scheduling
● Compensation Code
– moving an instruction above a side exit(speculative execution)
instr 1instr 2instr 3instr 4instr 5instr 6
instr 1instr 5instr 2instr 3instr 4instr 6
[undo instr 5]
![Page 11: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/11.jpg)
Instruction Scheduling
● Trace Scheduling
● Compensation Code
– moving an instruction below a side entrance– moving an instruction above a side entrance
instr 1instr 2instr 3instr 4instr 5instr 6
instr 2instr 3instr 4instr 5instr 1instr 6
instr 5instr 4
![Page 12: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/12.jpg)
Instruction Scheduling
● Superblock Scheduling WenMei Hwu et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation (The Journal of Supercomputing, vol. 7, issue 12, 1993)
● tries to overcome some difficulties with trace scheduling
– complicated bookkeeping when moving instructions above/below a side entrance/exit
– some compiler optimizations require additional bookkeeping when side entrances are present
example: copypropagation
![Page 13: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/13.jpg)
Instruction Scheduling
● Superblock Scheduling
● a superblock is a trace with no side entrances control may only enter from the top, but leave at one or more exit →
points
● similar to extended basic blocks (Aho et al, 1986)
● superblock formation:
1. identify trace using profile information
2. apply tailduplication until all side entrances have been eliminated
● tail duplication
1. copy the the tail portion of the trace from the first side entrance to the end
2. move all side entrances to the corresponding duplicated basic blocks
![Page 14: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/14.jpg)
Instruction Scheduling
● Superblock Scheduling
● example: superblock formation
![Page 15: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/15.jpg)
Instruction Scheduling
● Superblock Scheduling
● superblock ILP optimizations
optimizations that are performed before superblock formation with the goal to enlarge the superblock and increase ILP by removing dependences.
● superblock enlarging optimizations
– branch target expansion● expand target of the likely taken control transfer that ends a superblock● not applied to backedges● stops when a predefined superblock size is reached or the branch does not favor
one direction.
![Page 16: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/16.jpg)
Instruction Scheduling
● Superblock Scheduling
● superblock enlarging optimizations (cont’d)
– loop peeling● applied to superblock loops (superblocks which end with a likely taken control
transfer to itself) that only tend to iterate a few (k) times.● peel the first k iterations and insert control flow to branch to the original loop
body if the loop is not executed k times.● after loop peeling, the superblock may be extended both at the head and the tail
of the superblock loop
– loop unrolling● unroll the body of a superblock loop that tends to iterate many times
![Page 17: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/17.jpg)
Instruction Scheduling
● Superblock Scheduling
● superblock dependence removing optimizationsremove data dependences between instructions in a superblock
– register renamingi.e., in unrolled loop bodies
– operation migration● move instructions whose result is not used within a superblock to a less
frequently superblock● decicion based on a cost function
– induction variable expansion● create a separate copy of the loop induction variable for each unrolled loop body● requires additional patch code at the loop preheader and at exits
![Page 18: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/18.jpg)
Instruction Scheduling
● Superblock Scheduling
● superblock dependence removing optimizations (cont’d)
– accumulator variable expansion● use a separate accumulator for each unrolled instance of loops accumulating a
sum or product in every iteration● additional patch code at the loop preheader needed● additional patch code at the loop exits needed (summing up the individual
accumulators)
– operation combining● for certain classes of instructions, true dependencies can be eliminated by pre
computing new immediate values at compile time● example:
add x ← x, #4add x ← x, #4
add x ← x, #4 add x’ ← x, #8……mov x ← x’
![Page 19: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/19.jpg)
Instruction Scheduling
● Superblock Scheduling
● example: superblock dependence removing optimizations
accumulator variableexpansion
induction variableexpansion
![Page 20: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/20.jpg)
Instruction Scheduling
● Superblock Scheduling
● speculative execution
– occurs when moving an instruction up above a control transfer instruction B
– the instruction is executed in any case, even if the control transfer instruction would branch out of the superblock (i.e., speculative instructions)
– restrictions for an instruction I to be executed speculatively
1. the destination of I is not used before it is redefined when B is taken
2. I will never cause an exception that may terminate the program when B is taken
– instructions that may cause exceptions● memory load● memory store● integer divide● floating point operations
![Page 21: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/21.jpg)
Instruction Scheduling
● Superblock Scheduling
● speculative execution (cont’d)
– exception models● restricted percolation model
no support for disregarding exceptions generated by speculatively executed instructions
● limits performance in superblocks that contain many longlatency potentially trapcausing instructions (i.e., memory loads) above branches
● general percolation modelthe architecture provides a nontrapping version instructions that may cause exceptions
● convert speculatively executed and potentially trapping instructins to their nontrapping counterpart
● if detection of the exception is required additional architecture and compiler support is required
![Page 22: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/22.jpg)
Instruction Scheduling
● Superblock Scheduling
● Analysis
– implementation complexity in the IMPACTI C compiler
total size: ~92K lines
![Page 23: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/23.jpg)
Instruction Scheduling
● Superblock Scheduling
● Analysis
– compilation time (IMPACTI)
![Page 24: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/24.jpg)
Instruction Scheduling
● Superblock Scheduling
● Analysis
– performance improvement due to superblock ILP optimization
![Page 25: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/25.jpg)
Instruction Scheduling
● Superblock Scheduling
● Analysis
– effect of speculative execution support
![Page 26: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/26.jpg)
Instruction Scheduling
● Superblock Scheduling
● Analysis
– code size increase
![Page 27: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/27.jpg)
Instruction Scheduling
● Hyperblock Scheduling Scott Mahlke et al. Effective Compiler Support for Predicated Execution Using the Hyperblock (MICRO’25, 1992)
● tries to overcome some difficulties with superblock scheduling
– superblocks end when both targets of a control flow instruction have a similar probability to be taken
● hyperblock scheduling
– combine basic blocks from multiple control paths (using ifconversion)
– for programs without heavily biased branches, hyperblocks provide a more flexible framework
![Page 28: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/28.jpg)
Instruction Scheduling
● Hyperblock Scheduling
● Predicated execution
– When the predicate is TRUE the instruction is executed normally
– When the predicate is FALSE the instruction is treated as a NOP
● Conditional branches can be eliminated with predicated execution (ifconversion)
![Page 29: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/29.jpg)
Instruction Scheduling
● Hyperblock Scheduling
● The Hyperblock
– set of predicated basic blocks in which control may only enter at the top but several exits may exists.
– very similar to superblock formation
● Building Hyperblocks
1. hyperblock block selection● decide which basic blocks in a region should be included in the hyperblock● three features of each block are examined
– execution frequency– block size– instruction characteristics
● use heuristic functions
![Page 30: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/30.jpg)
Instruction Scheduling
● Hyperblock Scheduling
● Building Hyperblocks (cont’d)
2. hyperblock formation● tail duplication● loop peeling● node splitting
– eliminate dependences created by control path merges– duplicate all blocks subsequent to the merge point for each path
● Ifconversion
![Page 31: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/31.jpg)
Instruction Scheduling
● Hyperblock Scheduling
● Building Hyperblocks (cont’d)
![Page 32: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/32.jpg)
Instruction Scheduling
● Hyperblock Scheduling
● Control Flow Information
– instructions within a hyperblock are not sequential. a more complex analysis is required→
● Predicate Hierarchy Graph (PHG)
– determine if two instructions can ever be executed in a single path
– if they can, then there is a control flow path between these two instructions
![Page 33: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/33.jpg)
Instruction Scheduling
● Hyperblock Scheduling
● Predicate Hierarchy Graph (PHG) example
ANDing p4 and p5p4∙p5 = (c1∙c2) ∙(~c1+c1 ∙~c2) = 0
→ there is no viable path between p4, p5
same path: ANDp4 = c1 ∙ c2
multiple paths meet: ORp5 = ~c1 + c1 ∙ ~c2
![Page 34: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/34.jpg)
● Hyperblock Scheduling
● HyperblockSpecific Optimizations
– similar to optimizations for superblocks
– instruction promotion● removes the dependence between the predicated instruction and the instruction
which sets the corresponding predicate value
– instructions merging● combine two instructions in a hyperblock with complementary predicates into a
single instruction
![Page 35: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock](https://reader033.fdocuments.in/reader033/viewer/2022050201/5f54eb662a4fbd1e48440ef1/html5/thumbnails/35.jpg)
● Summary
● Trace Scheduling can increase ILP
– side entrances are too complex to handle
● Superblock Scheduling removes the side entrances from the trace
– weak point: unbiased branches
● Hyperblock Scheduling
– for programs without heavily biased branches, hyperblocks provide a more flexible framework
● Modulo Scheduling next class!→