Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could...

28
Code Generation CS 480
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could...

Code Generation

CS 480

Can be complex

• To do a good job of teaching about code generation I could easily spend ten weeks

• But, don’t have ten weeks, so I’ll do it in one lecture

• (Obviously, I’ll omit a lot of material)

Remember Reverse Polish Notation

• Remember RPN?• To evaluate y * (x + 4) Push Y on stackPush x on stackPush 4 on stackDo additionDo multiplication

Code for (x + 12) * 7

Push fpPush -8AddGet value (i.e., address is on stack, get value)Push 12AddPush 7multiply

Addressing modes

• Early discovery, some patterns, such as adding fp + constant, occur really often. Why not make then part of instruction

• Common address modesAbsolute address (i.e., global)#Constant (small integer values)Register – contents of register(reg) – value at register as an addressCon(reg) – value at c plus register (e.g., 8(fp) )

More unusual addressing modes

• PDP had a number of unusual addressing modes

-(reg) use register, then decrement(reg)+ use register, then increment

Look familiar? Could be used to build a great stack system

Code for (x + 12) * 7, again

Move -8(fp), -(sp)Move #12, -(sp)Add (sp)+, (sp)Move #7, -(sp)Mult (sp)+, (sp)

Still easy to do, but still lots of memory access

Generating stack-style code is easy

• We can use the AR stack just like the stack in a RPN calculator

• Simply do a post-order traversal of the AST• Operands (variables, constants) push on stack• Operators generate code to take arguments

from stack, compute the result, and push back on stack

• Simple. This is what you are doing in prog 6

Then what’s the problem?

• Remember the von Neumann machine design?

• CPU is separated from memory by a thin slow wire, which is costly to traverse

• Every time you do a memory access things slow down

Memory access in stack code generation

• One memory access to get instruction• One or two memory access to get arguments• Then you can do the operation• One more memory access to write result back

to memory

• That’s a lot of memory access

A few things you can do to speed up

• There are a few things that you can do with hardware to speed this up just a bit

• Caching – (speeds instruction fetch, operand fetch not so much)

• Pipelines – (but results coming from memory cause frequent stalls)

Can be done much faster

Move -8(fp), r1Add #12, r1Mult #7, r1

Registers live on-chip, allow much faster access, code will run much faster.

Bottom line for todays lecturePro Con

Stack style Easy to do Slow execution

Register style Fast execution Hard to do

Registers can be used to hold intermediate results

• Example (a + b) * (c + d)Move a, r1Add b, r1Move c, r2Add d, r2Mult r2, r1

But notice we need to use two registers

No matter now many you have

• Can always come up with an expression that will be more complicated than the number of registers you have

(a + b) * (c + d) * (e + f) * ….

But are such things common?Still. Need to handle it, called a register spill.Solution: Save temp to memory, load when you

need it (basically going back to stack style).

Making effective use of addressing modes

• Addressing modes give rise to two classes of arguments

• Those that can be represented as addressing modes (local variables, constants),

• And those that can not (expressions)• Need to make effective use of those

Four cases for additionX + y (that is, two addressing mode expressions) move x, r1 add y, r1X + (…) (that is, one addressing mode expression) compute (…) and place into r1 add x, r1(…) + x compute (…) and place into r1 add x, r1(…) + (….) (that is, neither side addressing mode) compute left (…) and place into r1 compute right (…) and place into r2 add r2, r1 (and now r2 is free)

What about subtraction? Non commutative

X - y (that is, two addressing mode expressions) move x, r1 sub y, r1X - (…) (that is, one addressing mode expression) compute (…) and place into r1 sub x, r1 (NO! this is the inverse of the result we want)(…) - x compute (…) and place into r1 add x, r1(…) - (….) (that is, neither side addressing mode) compute left (…) and place into r1 compute right (…) and place into r2 sub r2, r1 (and now r2 is free)

Choices on subtraction

• Use two registers, or• Use one register and create an invert

instruction• Depends upon if you have enough registers

• Lots of special cases (division, multiplication, shifts).

But registers have many other uses

• Can use registers to hold variables that are being accessed a lot (such as loop index values)

• Can use registers for passing parameters (save memory access)

• So a good machine design would have lots of registers, right? Not so fast

Using registers for variables

• Can make your code dramatically faster• But need to discover WHICH variables are

commonly used – complicated data flow analysis

• Or (as C does), allow the user to give you hints (which are frequently wrong)

Using registers for parameters

• If the callee can use the values directly in registers, then the code can be much faster

• If not, when they they were going to save register values anyway, nothing is lost

• But callee might not know if they want parameter is memory or in register

Cost to save an restore

• Remember the parameter calling sequence?• The more registers you have, the more you

need to save and restore on procedure entry and exit

• So there is a trade-off – faster function invocation, slower execution of body of function, or faster execution, slower invocation

Worse yet – don’t know ahead of time

• don’t know until you have seen entire function body how many registers you will need

• Save them all – too much execution time• Don’t save enough – run out of registers• (old C trick, branch to end of function, and

generate saves after you have seen everything else).

Interesting idea Register Window

• Interesting idea. Put a huge amount of registers (say, 256) on-chip, but only allow process to see a few (say, 16) at time

• When you execute a function, simply move up window on registers

• Saves and restores are automatic, make function invocation faster

Even better, overlapping windows

• What if six of those registers are shared with caller, and ten are new?

• What might you use those six for?

• Can make for really function invocation

Downside of register windows

• Eventually you run out, and they must be spilled as well (and restored)

• Context switch is now really slow, as you need to save an restore ALL 256 registers

• Still an interesting idea

How is it really done?

• You can spend a lot of time worrying about things that hardly ever happen (expressions with 32 temporary values in the middle, functions with 17 arguments)

• And programming styles change (OOP has smaller functions, but fewer arguments)

• So there is never a clear answer, and it keeps changing