Code Generation 1

download Code Generation 1

of 45

Transcript of Code Generation 1

  • 7/31/2019 Code Generation 1

    1/45

    Code Generation

    Steve Johnson

  • 7/31/2019 Code Generation 1

    2/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    2

    The Problem

    Given an expression tree and a machine

    architecture, generate a set of instructions

    that evaluate the tree

    Initially, consider only trees (no common

    subexpressions)

    Interested in the quality of the program

    Interested in the running time of the algorithm

  • 7/31/2019 Code Generation 1

    3/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    3

    The Solution

    Over a large class of machine architectures, we

    can generate optimal programs in linear time

    A very practical algorithm

    But different from the way most compilers work today And the technique, dynamic programming, is powerful

    and interesting

    Work done with Al Aho, published in JACM

  • 7/31/2019 Code Generation 1

    4/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    4

    What is an Expression Tree?

    Nodes represent

    Operators (including assignment)

    Operands (memory, registers, constants)

    No flow of control operations

    A

    =

    +

    B C

  • 7/31/2019 Code Generation 1

    5/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    5

    Representing Operands

    In fact, we want the tree to represent where

    the operands are found

    MEM

    (A)

    =

    +

    MEM

    (B)

    MEM

    (C)

  • 7/31/2019 Code Generation 1

    6/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    6

    Possible Programs

    load B,r1;

    load C,r2;

    add r1,r2,r1

    store r1,Aorload B,r1

    add C,r1

    store r1,Aor

    add B,C,A

  • 7/31/2019 Code Generation 1

    7/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    7

    (Assembler Notation)

    Data always moves left to right

    load B,r1 r1 = MEM(B)

    add r1,r2,r3 r3 = r1 + r2store r1,A MEM(A) = r1

  • 7/31/2019 Code Generation 1

    8/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    8

    Which is Better?

    Not all sequences legal on all machines

    Longer sequences may be faster

    Situation gets more complex when Complicated expressions run out of registers

    Some operations (e.g., call) take a lot ofregisters

    Instructions have complicated addressing

    modes

  • 7/31/2019 Code Generation 1

    9/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    9

    Example Code

    A = 5*B + asin(C/2 + sin(D))

    might generate (machine with 2 registers)

    load B,r1 OR load D,r1

    mul r1,#5,r1 call sin

    store r1,T1 load C,r2

    load C,r1 div r2,#2,r2

    div r1,#2,r1 add r2,r1,r1

    store r1,T2 call asin

    load D,r1 load B,r2

    call sin mul r2,#5,r2

    load T2,r2 add r1,r2,r1

    add r2,r1 store r1,Acall asin

    load T1,r2

    add r2,r1,r1

    store r1,A

  • 7/31/2019 Code Generation 1

    10/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    10

    What is an Instruction

    An instruction is a tree transformation

    MEM

    (A)

    REG

    (r1)load A,r1

    REG

    (r1)

    MEM

    (A)store r1,A

    *

    REG

    (r2)load (r1),r2

    REG

    (r1)

  • 7/31/2019 Code Generation 1

    11/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    11

    These can be Quite Complicated

    *

    +

    REG

    (r1)

    REG

    (r2)

    INT

    2

  • 7/31/2019 Code Generation 1

    12/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    12

    Types and Resources

    Expression Trees (and instructions)

    typically have types associated with them

    Well ignore this

    Doesnt introduce any real problems

    Instructions often need resources to work

    For example, a temporary register or a

    temporary storage location

    Will be discussed later

  • 7/31/2019 Code Generation 1

    13/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    13

    Programs

    A program is a sequence of instructions

    A program computesan expression tree ifit transforms the tree according to the

    desired goal Compute the tree into a register

    Compute the tree into memory

    Compute the tree for its side-effects Condition codes

    Assignments

  • 7/31/2019 Code Generation 1

    14/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    14

    Example

    Goal: compute for side effects

    MEM

    (A)

    =

    +

    MEM

    (B)

    MEM

    (C)

    load B,r1

    load C,r2

    add r1,r2,r1

    store A,r1

  • 7/31/2019 Code Generation 1

    15/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    15

    Example (cont.)

    MEM

    (A)

    =

    +

    REG

    (r1)

    MEM

    (C)

    load C,r2

    MEM

    (A)

    =

    +

    REG

    (r1)

    REG

    (r2)

  • 7/31/2019 Code Generation 1

    16/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    16

    Example (cont.)

    MEM

    (A)

    =

    +

    REG

    (r1)

    REG

    (r2)

    add r1,r2,r1

    MEM

    (A)

    =

    REG

    (r1)

  • 7/31/2019 Code Generation 1

    17/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    17

    Example (concl.)

    store r1,A

    MEM

    (A)

    =

    REG

    (r1)

    REG(r1) (Side effect done)

  • 7/31/2019 Code Generation 1

    18/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    18

    Typical Code Generation

    Some variables are assigned to registers,

    leaving a certain number ofscratchregisters

    An expression tree is walked, producing

    instructions (greedy algorithm...). An

    infinite number of temporary registers is

    assumed

  • 7/31/2019 Code Generation 1

    19/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    19

    Typical Code Generation (cont.)

    A register allocationphase is run

    Assign temporary registers to scratch register

    Often by doing graph coloring...

    If you run out of scratch registers, spill

    Select a register

    Store it into a temporary

    When it is needed again, reload it

  • 7/31/2019 Code Generation 1

    20/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    20

    Practical Observation

    Many (most?) code generation bugshappen in this spill code

    Choose a register that is really needed

    Very hard to test... Create test cases that just barely fit or just barely

    dont fit to test edge cases...

    Can be quite inefficient

    thrashing of scratch registers

    Code may not be optimal

  • 7/31/2019 Code Generation 1

    21/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    21

    Complexity Results

    Simple machine with 2-address instructions:

    r1 op r2 => r1

    Cost = number of instructions

    Allow common subexpressions only of the formA op B, where A and B are leaf nodes

    Generating optimal code is N-P complete

    Even if there are an infinite number of registers! Implies exponential time for a tree with nnodes

  • 7/31/2019 Code Generation 1

    22/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005

    22

    Complexity Results (cont.)

    Simple 3-address machine

    r1 op r2 => r3

    Cost = number of instructions

    Allow arbitrary common subexpressions

    Infinite number of registers

    Can get optimal code in linear time Topological sort

    Each node in a different register

  • 7/31/2019 Code Generation 1

    23/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 23

    Complexity Results (cont.)

    In the 3-address model, finding optimal

    code that uses the minimal number of

    registers is N-P complete

    But thats not what we are faced with in

    practice

    We have a certain number of registers

    We need to use them intelligently

  • 7/31/2019 Code Generation 1

    24/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 24

    Complexity Results (concl.)

    For many practical machine architectures(including 2-address machines), we cangenerate optimal code in linear time when

    there are no common subexpressions(tree)

    Can be extended to an algorithmexponential in the amount of sharing

    The optimal instruction sequence is notgenerated by a simple tree walk

  • 7/31/2019 Code Generation 1

    25/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 25

    Machine Model Restrictions

    Resources (temporary registers) must be

    interchangeable. We will assume that we

    have Nof them

    Every instruction has a (positive) cost

    The cost of a program is the sum of the

    costs of the instructions

    No other constraints on the instruction

    shape or format (!)

  • 7/31/2019 Code Generation 1

    26/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 26

    Study Optimal Programs

    Suppose we have an expression tree T that wewish to compute into a register For the moment, we assume Tcan be computed with

    no stores

    We assume we have Nscratchregisters

    Suppose the root node ofTis +

    Then, in an optimal program, the last instruction

    must have a + at the root of the tree that ittransforms We make a list of these instructions

    Each has some preconditionsfor it to be legal

  • 7/31/2019 Code Generation 1

    27/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 27

    Preconditions: Example

    Suppose the last instruction wasadd r1,r2,r1

    Suppose the tree T looks like

    Then our optimal program must compute T1 intor1 and T2into r2

    +

    T1 T2

  • 7/31/2019 Code Generation 1

    28/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 28

    Precondition Resources

    If our optimal program ends in this add

    instruction, then we can assume that it

    contains two subprograms that compute

    T1 and T2into r1 and r2, respectively

  • 7/31/2019 Code Generation 1

    29/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 29

    Precondition Resources (cont.)

    Look at the first instruction

    If it computes part ofT1, then (since nostores) at least one register is always in use

    computing T1. So T2must be computed using at most N-1

    registers

    Alternatively, if the first instruction computespart ofT2, T1 must be computed using atmost N-1 registers.

  • 7/31/2019 Code Generation 1

    30/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 30

    Reordering Lemma

    Let Pbe an optimal program without stores thatcomputes T. Suppose it ends in an instruction Xthat has kpreconditions. Then we can reorderthe instructions in Pso it looks like

    P1 P2 P3 ... Pk X

    where the Pi compute the preconditions ofXinsome order. Moreover, P2uses at most N-1

    registers, P3uses at most N-2registers, etc.,and each Picomputes its precondition optimallyusing that number of registers

  • 7/31/2019 Code Generation 1

    31/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 31

    Cost Computation

    Define C(T,n) to be the cost of the optimalprogram computing Tusing at most nregisters.Suppose Xis an instruction matching the root ofTwith kpreconditions, corresponding to subtrees

    T1 through Tk. ThenC(T,n)

  • 7/31/2019 Code Generation 1

    32/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 32

    Sketch of Proof

    By the reordering lemma, we can write any

    optimal program as a sequence of subprograms

    computing the preconditions in order, with

    decreasing numbers of scratch registers,followed by some instruction X. If anysubprograms is not optimal, we can make the

    program shorter, contradicting optimality of the

    original program. Thus the optimal cost equalsone of the sums (for some Xand permutation)

  • 7/31/2019 Code Generation 1

    33/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 33

    How About Stores (spills)?

    We will now let C(T,n) represent the cost

    of computing Twith nregisters if stores(spills) are allowed.

    More notation: ifTis a tree and Sasubtree, T/Swill represent Twith Sremoved and replaced by a MEM node.

  • 7/31/2019 Code Generation 1

    34/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 34

    Another Rearrangement Lemma

    Suppose Pis an optimal programcomputing a tree T, and suppose asubtree Sis stored into a temporary

    location in this optimal program. Then Pcan be rewritten in the form

    P1 P2

    where P1 computes Sinto memory andP2computes T/S.

  • 7/31/2019 Code Generation 1

    35/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 35

    Consequences

    P1 can use all Nregisters. AfterP1 runs, allregisters are free again.

    Let C(S,0) be the cost of computing Sinto a

    temporary (MEM) location. ThenC(T,n)

  • 7/31/2019 Code Generation 1

    36/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 36

    Optimal Algorithm

    1. Recursively compute C(S,n) and C(S,0) for allsubtrees ofT, starting bottom up, and all n

  • 7/31/2019 Code Generation 1

    37/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 37

    Dynamic Programming

    This bottom-up technique is called dynamicprogramming

    It has a fixed cost per tree node because:

    There are a finite (usually small) number ofinstructions that match the root of each tree

    The number of permutations for each instruction is

    fixed (and typically small)

    The number of scratch registers Nis fixed

    So the optimal cost can be determined in time

    linear in the size of the tree

  • 7/31/2019 Code Generation 1

    38/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 38

    Unravelling

    Going from the minimal cost back to the

    instructions can be done several ways:

    Can remember the instruction and

    permutation that gives the minimal value foreach node

    At each node, recompute the desired minimal

    value until you find an instruction andpermutation that attain it

  • 7/31/2019 Code Generation 1

    39/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 39

    Top-Down Memo Algorithm

    Instead of computing bottom up, you can

    compute top down (in a lazy manner) and

    remember the results. This might be

    considerably faster for some architectures

  • 7/31/2019 Code Generation 1

    40/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 40

    No Spills!

    Note that we do not have to have spillcode in this algorithm. The subtrees thatare computed and stored fall out of the

    algorithm. They are computed ahead of the main

    computation, when all registers areavailable.

    The resulting instruction stream is nottypically a tree walk of the input.

  • 7/31/2019 Code Generation 1

    41/45

    May 23, 2005 Copyright (c) Stephen C. Johnson2005 41

    Reality Check

    Major assumptions

    Cost is the sum of costs of instructions

    Assumes single ALU, no overlapping

    Many machines now have multiple ALUs, overlapping

    operations

    All registers identical

    True of most RISC machines

    Not true of X86 architectures

    But memory operations getting more expensive

    Optimality for spills is important

  • 7/31/2019 Code Generation 1

    42/45

    May 23, 2005 Copyright (c) Stephen C. Johnson

    2005

    42

    Other Issues

    Register allocation across multiple

    statements, flow control, etc.

    Can make a big difference in performance

    Can use this algorithms to evaluate possible

    allocations

    Cost of losing a scratch register to hold a variable

  • 7/31/2019 Code Generation 1

    43/45

    May 23, 2005 Copyright (c) Stephen C. Johnson

    2005

    43

    Common Subexpressions

    A subtree SofTis used more than once(Tis now not a tree, but a DAG)

    Say there are 2 uses ofS. Then there are

    4 strategies Compute S and store it

    Compute one use and save the result until thesecond use (2 ways, depending on which useis first

    Ignore the sharing, and recompute S

  • 7/31/2019 Code Generation 1

    44/45

    May 23, 2005 Copyright (c) Stephen C. Johnson

    2005

    44

    Cost Computations

    Ignoring the sharing is easy

    Computing and storing is easy

    Ordering the two uses implies an orderingof preconditions in some higher-level

    instruction selection

    And the number of free registers is affected,

    too

    Do the problem twice, once for each order

  • 7/31/2019 Code Generation 1

    45/45

    May 23, 2005 Copyright (c) Stephen C. Johnson 45

    Summary

    Register spills are evil

    Complicated, error-prone, hard to test

    If something is to be spilled, compute it

    ahead of time with all registers free

    The optimal spill points fall out of the

    dynamic programming algorithm