Post on 22-May-2015
GCC porting
Use instruction pattern describe target ISA
Shiva Chenshiva0217@gmail.com
May 2013
Outline Compiler structure Intermediate languages in GCC Optimization pass in GCC Define instruction pattern Operand constraints Match instruction pattern Strict RTL Target defined constraints Emit assembly code Target information usage Preserve word to describe instruction pattern Example of instruction pattern Split instruction pattern Instruction attribute Peephole pattern Instruction scheduling
Three main intermediate languages format in GCC GENERIC
Language-independent representation generated by each front end
Common representation for all the languages supported by GCC.
GIMPLEPerform language independent and target independent
optimization RTL
Perform the optimization which will notice target feature by porting code
Gimple optimization pass in GCC 4.6.2
004t.gimple006t.vcg009t.omplower010t.lower012t.eh013t.cfg017t.ssa018t.veclower019t.inline_param1020t.einline021t.early_optimizations022t.copyrename1023t.ccp1024t.forwprop1025t.ealias026t.esra
027t.copyprop1028t.mergephi1029t.cddce1030t.eipa_sra031t.tailr1032t.switchconv034t.profile035t.local-pure-const1036t.fnsplit037t.release_ssa038t.inline_param2057t.copyrename2058t.cunrolli059t.ccp2060t.forwprop2062t.alias063t.retslot
064t.phiprop065t.fre066t.copyprop2067t.mergephi2068t.vrp1069t.dce1070t.cselim071t.ifcombine072t.phiopt1073t.tailr2074t.ch076t.cplxlower077t.sra078t.copyrename3079t.dom1080t.phicprop1081t.dse1
082t.reassoc1083t.dce2084t.forwprop3085t.phiopt2086t.objsz087t.ccp3088t.copyprop3090t.bswap091t.crited092t.pre093t.sink094t.loop095t.loopinit096t.lim1097t.copyprop4…143t.optimized
RTL optimization pass in GCC 4.6.2
004t.gimple
144r.expand
Other gimple pass
145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1
156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce
179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3
204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
Why need divide optimization pass to gimple pass and RTL pass? Gimple pass have more high level semantic
Ex: switch, array, structure, variableSome optimization is more easier to designed when
high level semantic still exist However, gimple pass lack of target information
Ex: instruction length(size), supported ISATherefore, we need RTL optimization pass
Define instruction pattern
All the RTL pattern must match target ISA How to tell GCC generate the RTL match ISA ?
Instruction patterns Use define_expand, define_insn to describe the instruction
patterns which target support
(define_insn “addsi3" [
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )
Define instruction pattern
GCC already define several instruction pattern name and the semantic of the pattern addsi3
Add semantic with 3 SI mode operands
GCC don’t know the operand constraint of the target How to tell GCC our target’s operand constraint of each
instruction ?PredicateConstraint
Operand Constraints
Multiple Alternative Constraints(define_insn “addsi3" [
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... ) Predicate: register_operand, nonmemory_operand
Constraint: r, iPredicate should contain each constraints of the operand
For operand 2 with SI mode r(reg) belong to nonmemory_operand i(immediate) belong to nonmemory_operand
Operand Constraints
GCC already have predicate to restrict operand Why need constraint field ?
Give the opportunity to change operand while optimization
Ex:movi $r0, 4; add $r1, $r1, $r0 {addsi3}Constant propagation => addi $r1, $1, 4 {addsi3}
Operand Constraints
GCC use two level operand constraint group same semantic instruction together with
single instruction pattern (addsi3) Lots of ISA designed have several assembly
instructions with same semantic and different operand constraint
Reduce the instruction pattern when porting
Operand Constraints
Use instruction pattern do ISA support checking when GCC generate a new RTL pattern Check does the back end define the pattern by
define_insn Check the operand type support or not by predi
cate Check the operand belong to which alternative
by constraint
Operand Constraints
Multiple Alternative Constraints(define_insn “addsi3" [
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )
First alternative constraints match “add”
Second alternative constraintsmatch “addi”
Match instruction pattern
Multiple Alternative Constraints(define_insn “addsi3" [
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )
Ex: (set (reg/f:SI 88) (plus:SI (reg:SI 87) (reg/v:SI 55))
1. Parsing RTL pattern(set (op0)
(plus:SI (op1) (op2))
Match instruction pattern
When will generate new RTL pattern ? RTL expand phase (GIMPLE to RTL) During optimization
Ex:(set (reg/f:SI 47) (lshiftrt:SI (reg:SI 60) (const_int 2))
(set (reg/f:SI 88) (plus:SI (reg:SI 47) (reg:SI 55))
(set (reg/f:SI 88) (plus:SI (lshiftrt:SI (reg:SI 60) (const_int 2)) (reg/v:SI 55))Combine phase
srli $r47, $r60, 2add $r88, $r47, $r55
add_srli $r88, $r55, $r60, 2
Strict RTL
Does the new generated RTL pattern always satisfy constraint ? GCC allow certain kind un-match constraint
which reload could fix it later Predicate must always satisfy
RTL1Not do optimization1
Do optimization1
RTL1
RTL2
Reload
Reload
RTL3
RTL2 not satisfy constraint
RTL4
1. RTL3 and RTL4Satisfy constraint 2. RTL4 is betterThen RTL3
Strict RTL
Constraint could allow certain un-match before reload, and hope reload to fix it Ex: constraint is m (memory), but current operand is
constant, GCC will allow before reload Reload phase is after register allocation
In fact, during register allocation, GCC will call reload rapidly while the operand not fit the constraint.
After reload, the operand must satisfy one of the operand constraint (strict RTL)
Strict RTL
(define_insn “movsi" [
(set (match_operand:SI 0 “register_operand" "=r,m")
(match_operand:SI 1 “register_operand" “r,r")) ) ] ... )
(set (reg/f:SI 47) (reg:SI 60))
(set (reg/f:SI 47) (reg:SI 3))
AssumeAfter register allocationPseudo register r60 assigned to r3and the hardware register is exhausted
RA (set (mem:SI (plus (sp)(const)))) (reg:SI 3))
Reload
Target defined constraints
Target could define their own predicate and constraint Target defined predicate
(define_predicate "index_operand" (ior (match_operand 0 "register_operand") (and (match_operand 0 “const_int_operand") (match_test "(INTVAL (op) < 4096 && INTVAL (op) > -4096))")))
Target defined constraints
Target defined constraint
(define_register_constraint "l" "LO_REGS" "registers r0->r7.")
(define_memory_constraint "Uv" "@internal In ARM/Thumb-2 state a valid VFP load/store address." (and (match_code "mem") (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
Emit assembly code
Multiple Alternative Constraints(define_insn “addsi3"
[ (set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")))] “” “@ add %0, %1, %2 addi %0, %1, %2” )
Match First alternative constraints match “add”
Output assembly code “add $r3, $r4, $5”
Ex: (set (reg/f:SI 3) (plus:SI (reg:SI 4) (reg:SI 5))
Target information usage
When will GCC use target information get from instruction patterns ? RTL instruction pattern generation
generate insn-emit.c when building GCC by parsing instruction patterns
RTL instruction validation (target supported)generate insn-recog.c when building GCC by parsing instruction
patterns Emit target assembly code
generate insn-output.c when building GCC by parsing instruction patterns
Preserve word to describe instruction pattern
define_insn
“naming pattern”
define_expand “naming pattern”
define_insn
“*..”
RTL generation
RTL validation
Emit assembly
GCC define several “naming patterns” and their semantic use to generate RTL pattern during RTL expand phase ex: addsi3, subsi3, movsi, movhi …
Some target ISA which the semantic not defined in GCC naming pattern but the RTL could generate by some optimization ex: add_slli could generate by combine phase define un-naming pattern make the instruction validate
define_insn “*add_slli” define_insn name with * prefix will identify as un-naming pattern
Example of instruction pattern1600 ;; These control RTL generation for conditional jump insns1601 (define_expand "cbranchsi4"1602 [(set (pc)1603 (if_then_else (match_operator 0 "ordered_comparison_operator"1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "")1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")])1606 (label_ref (match_operand 3 "" ""))1607 (pc)))]1608 ""1609 {1610 sh_expand_cbranchsi4 (operands);1611 DONE;1612 }1613 )
Semantic of “cbranchsi4”compare operand1 and operand 2 by operator 0branch to label 3 if the compare result is true
Predicate "ordered_comparison_operator“ including EQ,NE,LT,LTU,LE,LEU,GT,GTU,GE,GEU.Use porting function sh_expand_cbranchsi4 to generate RTL pattern
Example of instruction pattern1621 (define_insn "*bcondz"1622 [(set (pc)1623 (if_then_else (match_operator 0 "bcondz_operator"1624 [(match_operand:SI 1 "register_operand" "r")1625 (const_int 0)])1626 (label_ref (match_operand 2 "" ""))1627 (pc)))]1628 ""1629 {1630 switch (GET_CODE (operands[0]))1631 {1632 case EQ:1633 return "beqz %1, %2";1634 case NE:1635 return "bnez %1, %2";1636 case LT:1637 return "bltz %1, %2";1638 case LE:1639 return "blez %1, %2";1640 case GT:1641 return "bgtz %1, %2";1642 case GE:1643 return "bgez %1, %2";1644 default:1645 gcc_unreachable ();1646 }1647 }
Un-naming pattern “*bcondz”Use to validate RTL and emit assembly code for the branchcompare with zero
Example of instruction pattern
1388 (define_insn "one_cmplsi2"1389 [(set (match_operand:SI 0 "register_operand" "=r")1390 (not:SI (match_operand:SI 1 "register_operand" "r")))]1391 ""1392 "nor\t%0, %1, %1“)
Semantic of “one_cmplsi2”not operand1 and set to operand 0
Naming pattern “one_cmplsi2” use to generate RTL, validate RTLAnd output assembly code
Output assembly “nor ra, rb, rb” to match the semantic
Split instruction pattern
When will need split instruction pattern ? The const_int value too big that single assembly
instruction can’t encodeSplit the const_int to high part and low partCould split the constant while define_expand
But it’s not good enough, why? Too early split the constant will lost the opportun
ity to optimize the RTL pattern
Split instruction pattern
The optimization phase “move2add”could do the following thing (use assembly code to present RTL semantic for convenient )
move $r0, 123456move $r1, 123457move $r2, 123458
move $r0, 123456addi $r1, $r0, 1addi $r2, $r0, 2
sethi $r0, hi20(123456)ori $r0, lo12(123456) sethi $r1, hi20(123457)ori $r1, lo12(123457)sethi $r2, hi20(123458)ori $r2, lo12(123458)
If split const_int to high/low part tooearlymove2add will fail to transfer move to add
Split instruction pattern
How to split instruction pattern not in RTL expand phase ? Use define_split, define_insn_and_split
Split instruction pattern
004t.gimple
144r.expand
Other gimple pass
145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1
156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce
179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3
204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
Split instruction pattern
486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movi\t%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (ope rands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]
If const_int not fit signed 20 bit return “#” which means the pattern will split in split phase
Split instruction pattern
486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movi\t%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]
Split conditions:Which is reload_completed (after reload) && the const_int not fit signed 20 bit
Split instruction pattern
486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movi\t%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]
Split RTL pattern to set high partAnd add low sum
match_dup 0 means duplicate operands 0 to this field
Split instruction pattern
288 (define_split 289 [(set (match_operand:ANY64 0 "register_operand" "") 290 (match_operand:ANY64 1 "register_operand" ""))] 291 "reload_completed && 292 (! USE_V3_SERISE_ISA)” 295 [(set (match_dup 0) (match_dup 1)) 296 (set (match_dup 2) (match_dup 3))] 297 “…
Split condition would be reload_completed && not V3 ISAV3 have movd44 which could do 64 bit register move
ANY64: DI, DFDI: double intDF:double float
define_split Define_insn_and_split
Split RTL
RTL validation
Emit assembly
Instruction attribute
120 (define_attr "type" 121 "unknown,load,store,bequal, alu, .." 122 (const_string "unknown"))… 614 (define_insn "cmovn" 615 [(set (match_operand:SI 0 "register_operand" "=r") 616 (if_then_else (ne:SI (match_operand:SI 1 "register_operand" "r") 617 (const_int 0)) 618 (match_operand:SI 2 "register_operand" "r") 619 (match_operand:SI 3 "register_operand" "0")))] 620 "" 621 "cmovn\t%0, %2, %1" 622 [(set_attr "type" "alu") 623 (set_attr “length” “4”])
(define_attr “attribute_name” “value domain” (default value))
Instruction attribute
Attribute “type” use to divide instruction to several instruction group Help to write instruction scheduling porting code
Attribute “length” give each instruction ISA length (size) information make the GCC could calculate branch distance correctly.
Peephole pattern2072 ;; Merge move 0 to bcondz2073 (define_peephole22074 [(set (match_operand:SI 0 "register_operand" "") (const_int 0))2075 (set (pc)2076 (if_then_else (match_operator 1 "bcondz_operator"2077 [(match_dup 0)2078 (match_operand:SI 2 "register_operand" "r")])2079 (label_ref (match_operand 3 "" ""))2080 (pc)))]2081 "peep2_reg_dead_p (2, operands[0])"2082 [(set (pc)2083 (if_then_else:SI (match_dup 1)2084 (label_ref (match_dup 3)) (pc)))]2085 "2086 {2087 operands[1] = gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])) ,2088 SImode, operands[2], GEN_INT(0));2089 }")
Old RTL
New RTL
movi $r0, 0bne $r0, $r1, L3
bnez $r1, L3
Instruction scheduling
Instruction scheduling is the optimization pass in GCC change instruction without changing the
semantic of the code To reduce the pipeline stall to improve
performance Instruction scheduling is belong to RTL phase
RTL optimization pass in GCC 4.6.2
004t.gimple
144r.expand
Other gimple pass
145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1
156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce
179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3
204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
Instruction scheduling
GCC have two scheduling pass Sched1
Do the interblock scheduling before Register allocation Try to find the innermost loop as region
Schedule the instructions in the region Improve the performance of hot spot (innermost loop) Extend the scope to region to find more scheduling opportunit
y
Sched2Do the single basic block scheduling after Register allocationRegister allocation may produce spill code (load/store)
Need re-schedule again
Instruction scheduling
Instruction scheduling resolve the following hazard to prevent pipeline stall Structure hazard
Structure hazard occur when two or more instruction need the same function unit at the same time
Data hazardRAW (read after write): a true dependencyWAR (write after read): a anti-dependencyWAW(write after write): an output dependency
Instruction scheduling
GCC provide several interface to describe pipeline model After parsing the pipeline description porting co
deGcc will generate a automata as a pipeline hazard re
cognizer To figure out the possibility of the instruction issue by
the processor on a given simulated cycle
(define_automaton “name")
Instruction scheduling
(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit "div" "a1")
(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")
a1: automata namedecode1, decode2, div: the cpu unit(function unit) in the processor
define_insn_reservation: describepipeline rule for each instruction class
alu_class,mult_class: insn-name (insn class)
(eq_attr “type" “alu"): match the rulewhile the type attribute of the Instruction pattern is alu
"decode + alu": regular expressionto describe the function unit usage
1 is the default cycle when the datadependency occur
Instruction scheduling
Multiple Alternative Constraints(define_insn “addsi3"
[ (set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")))] “” “@ add %0, %1, %2 addi %0, %1, %2” [(set_attr “type" “alu") )
(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")
Instruction scheduling
(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit “alu" "a1")(define_cpu_unit “mult" "a1")
(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")
nothing
decode+ alu
decode+ mult
alu_classnext_cycle
next_cycle
mult_class
next_cycle
Current CPUFunction unit usage
Next cycle CPUFunction unit usage
State transition:1. Occupy some function unit2. release function some unit
Instruction scheduling
(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")
(define_bypass 2 “alu_class" “alu_class“)
(define_bypass 3 “mult_class" “mult_class“)
producer consumer t 1 2 3 4 5
alu_class alu_class
mult_class
1 0 0 0 0
0 0 0 0 0
mult_class alu_class
mult_class
0 0 0 0 0
1 1 0 0 0
1 means will stall at t cycle
t cycle is the cycle timeAfter producer
Instruction schedulingproducer consumer t 1 2 3 4 5
alu_class alu_class
mult_class
1 0 0 0 0
0 0 0 0 0
mult_class alu_class
mult_class
0 0 0 0 0
1 1 0 0 0
0 0 0 0 00 0 0 0 0
1 0 0 0 00 0 0 0 0
1 0 0 0 01 0 0 0 0
0 0 0 0 01 1 0 0 0
1 0 0 0 01 0 0 0 0
Current state
consumer
alu_class
mult_class
t 1 2 3 4 5
Instruction scheduling
1. movi $r0, 0 {alu}2. movi $r1, 1 {alu}3. add $r0, $r0, $r1 {alu}4. lwi $r4, [$sp + 4] {load}5. mul $r5, $r0, $r4 {mul}
1
4
2
5
3
(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")(define_insn_reservation “load_class" 1 (eq_attr “type" “load") "decode + mem")
Bottom up calcuate priority ofEach instruction
By P = max {latency + one successor latency}
1
22
33
Dataflow graph
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 1 2 4Pending list: 3 5Queued list: Scheduled list:
Ready Pending Queued Scheduled
Scheduled
Dependency
resolved
Data hazard
Pick the max priority insn from Ready list
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 4Pending list: 3 5Queued list: 2Scheduled list:1
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}cycle 1
(define_bypass 2 “alu_class" “alu_class“)
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 2Pending list: 3 5Queued list:Scheduled list:1 4
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list:Pending list: 5Queued list: 3Scheduled list:1 4 2
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 3Pending list: 5Queued list:Scheduled list:1 4 2
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
cycle 4
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 5Pending list:Queued list:Scheduled list:1 4 2 3
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
cycle 4
3. add $r0, $r0, $r1 {alu}cycle 5
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list:Pending list:Queued list:Scheduled list:1 4 2 3 5
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
cycle 4
3. add $r0, $r0, $r1 {alu}cycle 5
5. mul $r5, $r0, $r4 {mul}cycle 6
Thank you
Switch initialization conversion in gimple optimization pass
31 int a,b; 32 33 switch (argc) 34 { 35 case 1: 36 case 2: 37 a = 8; 38 b = 6; 39 break; 40 case 3: 41 a = 9; 42 b = 5; 43 break; 44 case 12: 45 a = 10; 46 b = 4; 47 break; 48 default: 49 a = 16; 50 b = 1; 51 }
58 static const int = CSWTCH01[] = {6, 6, 5, 1, 1, 1, 1, 1, 1, 1, 1, 4}; 59 static const int = CSWTCH02[] = {8, 8, 9, 16, 16, 16, 16, 16, 16, 16, 60 16, 16, 10}; 61 62 if (((unsigned) argc) - 1 < 11) 63 { 64 a = CSWTCH02[argc - 1]; 65 b = CSWTCH01[argc - 1]; 66 } 67 else 68 { 69 a = 16; 70 b = 1; 71 }
Try to transfer switch statement to static array access