Register Allocation: Introduction · Register allocation 2. Pattern Matching ML functions can be...

Course Review

CS 502Lecture 1612/09/08

Main areas

Pattern-matching compilation Type inference and static analysis CPS (control-flow transformations) Dataflow analysis and optimizations Points-to and shape analysis Register allocation

2

Pattern Matching ML functions can be defined by a

sequence of pattern expression pairs called rules that are matched to a subject pattern.

The first rule that matches the subject has its expression evaluated and returned as the result of the pattern match.

Pattern Matching Naïve approach:

Match each argument with each pattern in turn (top-down), starting over after every failure.

Inefficient because information gleaned about the argument in each partially successful match is ignored on subsequent matches.

Better strategy: analyze pattern sequence at compile-time, and try to minimize number of case discrimination tests.

Type Inference

If we omit type parameters, we must discover whether the intended use of an expression matches its actual use.

Implications for compilation: How do we generate code for a polymorphic

procedure that may be applied to objects with very different representations?

CPS Take a mini-ML program and convert it to CPS form. Issues: Where do we insert continuations? How do we record the “rest of the computation” that a

continuation is to represent? How do we distinguish between continuations that

Represent the return point of an arbitrary procedure call (e.g., the outer call to fact-cps).

Represent iterative computation (e.g., the inner recursive calls in fact-cps)

CPS Specifying a naive translation is pretty simple,

but the resulting code is very complex: Lots of functions and function calls that need to be

eliminated. Haven’t distinguished between different uses of

continuations: Loops and known jump points Unknown return points

Refined CPS Translation (Core-ML) Tail calls: application in a “return” position. Non-tail calls: pushes a new frame. Return: pops the current frame. Goto: local transfer of control. Continuation variables serve to hold return

variables and temporaries.

8

Defunctionalization

9

First-class functions represented using first-order datatypes: a function is introduced with a constructor

that holds the value of its free variables. Accessed via a case dispatch over the

appropriate constructor.

Closure conversion = defunctionalization + inlining

Example

10

fun aux f = f 1 + f 10

fun main (x,y,b) = aux (fn z => x + z) * aux (fn z => if b then y + z else y - z)

Procedures aux and main have no free variables:could be represented by a function pointer.

The two abstractions are associated with a datatype that has twoconstructors, one for each function abstraction.

Each constructor is closed over the environment of the associatedprocedure: fn z => x + z : Environment holds x fn z => if b then y else y - z : Environment holds b and y

Example

11

After defunctionalization:

datatype lam = Lam1 of int | Lam2 of int * bool

fun apply (Lam1 x, z) = x + z | apply (Lam2 (y,b), z) = if b then y + z else y - z

(* aux: lam -> int *)fun aux f = apply(f,1) * apply(f,10)

fun main(x,y,b) = aux(Lam1 x) * aux(Lam2(y,b)

Closure Representation A closure is a pair consisting of a pointer to the

code representing the function and a record containing the values of free variables

used in the function First-order representation of a higher-order

procedure How should closures be represented? sharing of closure records “safe-for-space”

Closure Conversion

13

Flat closures are “safe-for-space”: Any local variable binding must be unreachable after its last use

within its scope. However, bindings may be copied many times from one closure to

another. Linked closures are not safe-for-space: local variable bindings will stay on the stack until they exit their

scope, so they may remain live even after their last use. However, the same variable is not copied, and recorded only once.

Dataflow Analysis

14

Approach Define a set of dataflow equations Solve the equations using an iterative fixpoint

algorithm Solution is guaranteed to be the “smallest”

solution that is safe: No unnecessary overapproximation

Example -- Available Expressions For each program point, which expressions have

already been computed, and not later modified on all paths to this point.

15

Available Expressions Analysis

The aim of the Available Expressions Analysis is to determine

For each program point, which expressions must have alreadybeen computed, and not later modified, on all paths to the pro-gram point.

Example: point of interest!

[x:= a+b ]1; [y:=a*b]2; while [y> a+b ]3 do ([a:=a+1]4; [x:= a+b ]5)

The analysis enables a transformation into

[x:= a+b]1; [y:=a*b]2; while [y> x ]3 do ([a:=a+1]4; [x:= a+b]5)

PPA Section 2.1 c! F.Nielson & H.Riis Nielson & C.Hankin (May 2005) 10

Available Expressions Analysis

The aim of the Available Expressions Analysis is to determine

For each program point, which expressions must have alreadybeen computed, and not later modified, on all paths to the pro-gram point.

Example: point of interest!

[x:= a+b ]1; [y:=a*b]2; while [y> a+b ]3 do ([a:=a+1]4; [x:= a+b ]5)

The analysis enables a transformation into

[x:= a+b]1; [y:=a*b]2; while [y> x ]3 do ([a:=a+1]4; [x:= a+b]5)


Available Expressions

16

Available Expressions Analysis – the basic idea

X1 X2!!!!!!!!!!!!!!!!"

################$

N = X1 !X2

x := a

X = (N\kill! "# $

{expressions with an x} )

" {subexpressions of a without an x}# $! "gen%


Specification

17

A v a il a b l e E x pr essi o ns A n a l ysis

kill a n d gen f u n c t i o nskillAE ( [x : = a]! ) = {a! " AExp" | x " FV(a! )}

killAE ( [skip]! ) = #killAE ( [b]! ) = #

genAE ( [x : = a]! ) = {a! " AExp(a) | x $" FV(a! )}genAE ( [skip]! ) = #

genAE ( [b]! ) = AExp(b)

d a t a fl o w e q u a t i o ns: AE =

AEentry (!) =!# i f ! = init(S" )"{AEexit (!! ) | (!!, !) " flow(S" )} o t h e r w ise

AEexit (!) = (AEentry (!)\killAE (B! ) ) % genAE (B! )w h e r e B! " blocks(S" )

P P A S e c t i o n 2 . 1 c! F . N i elso n & H . R iis N i e lso n & C . H a n k i n ( M a y 2 0 0 5 ) 1 2

Transfer Functions

Solutions Available expressions is an example of a forward

analysis: We are interested in the largest solution that

satisfies the equations.

18

Example (cont.):

[x:=a+b]1; [y:=a*b]2; while [y> a+b ]3 do ([a:=a+1]4; [x:=a+b]5)

Largest solution:

! AEentry(!) AEexit(!)1 ! {a+b}2 {a+b} {a+b, a*b}3 {a+b} {a+b}4 {a+b} !5 ! {a+b}


Example (cont.):

[x:=a+b]1; [y:=a*b]2; while [y> a+b ]3 do ([a:=a+1]4; [x:=a+b]5)

Largest solution:

! AEentry(!) AEexit(!)1 ! {a+b}2 {a+b} {a+b, a*b}3 {a+b} {a+b}4 {a+b} !5 ! {a+b}


19

Components

Dataflow analysis via abstract interpretation has three main components:

A transfer function (f(n)) that approximates the execution of instruction n based on the (approximate) inputs given.

A join operation that abstracts statically uncomputable operations (e.g., conditionals)

A direction (forward or reverse) describing the order in which instructions are interpreted.

Approach After deciding the structure of the transfer

function, join operation, and analysis direction, we run the analysis.

We continue to iterate until no new information is generated.

Formally:

In the backward direction, we:

– Need get the outputs from the successor instructions.

– Use the join since there are many successors.

– Use the transfer function to get the inputs.

– Iterate the process.

– For reverse analyses:

Computer Science 320

Prof. David Walker- 4 -

Iterative Dataflow Analysis

To code up a particular analysis we need to take the following steps.

First, we decide what sort of information we are interested in processing. This is

going to determine the transfer function and the joining operator, as well as any

initial conditions that need to be set up.

Second, we decide on the appropriate direction for the analysis.

In the forward direction, we:

– Need to get the inputs from the previous instructions

– Since we don’t know exactly which instruction preceeded the current one, we use

the join over all possible predecessors.

– Once we have the input, we apply the transfer function, which generates an out-

put.

– Iterate the process.

– Mathematically:

Computer Science 320

Prof. David Walker- 3 -

Specifications

21

Equations of the Instance:

Analysis!(!) =!

{Analysis•(!") | (!", !) # F} $ "!E

where "!E =

"" if ! # E% if ! /# E

Analysis•(!) = f!(Analysis!(!))

Constraints of the Instance:

Analysis!(!) &!

{Analysis•(!") | (!", !) # F} $ "!E

where "!E =

"" if ! # E% if ! /# E

Analysis•(!) & f!(Analysis!(!))


Basic Structure

22

Each of the analyses we’ve seen take the form:The Overall Pattern

Each of the four classical analyses take the form

Analysis!(!) =

!" if ! " E"{Analysis•(!#) | (!#, !) " F} otherwise

Analysis•(!) = f!(Analysis!(!))

where

–"

is#

or$

(and $ is % or &),

– F is either flow(S#) or flowR(S#),

– E is {init(S#)} or final(S#),

– " specifies the initial or final analysis information, and

– f! is the transfer function associated with B! " blocks(S#).


Combining Dataflow Facts

23

The Principle: union versus intersecton

• When!

is"

we require the greatest sets that solve the equationsand we are able to detect properties satisfied by all execution pathsreaching (or leaving) the entry (or exit) of a label; the analysis iscalled a must-analysis.

• When!

is#

we require the smallest sets that solve the equations andwe are able to detect properties satisfied by at least one executionpath to (or from) the entry (or exit) of a label; the analysis is calleda may-analysis.


Least Fixed-Points

24

Why does it work? (3)

Let f : P(S)! P(S) be a monotone function. Then

" # f(") # f2(") # f3(") # · · ·

Assume that S is a finite set; then the Ascending Chain Condition issatisfied. This means that the chain cannot be growing infinitely sothere exists n such that fn(") = fn+1(") = · · ·

fn(") is the least fixed point of f

................................................................

........................................

.......................

..........................

............................

..............................

.................................

.

.................................

..............................

............................

..........................

.......................

............................................................

..................... ......................

. ...................... .................................................................................

.......................

..........................

............................

..............................

.................................

.

.................................

..............................

............................

..........................

.......................

........................................

...............................................................

•

• ! lfp(f) = fn(") = fn+1(") for some n

"""#•$$$%•&&

&'•...

! "! f1(")

! f2(")! f3(")


Frameworks

25

Frameworks

A Monotone Framework consists of:

• a complete lattice, L, that satisfies the Ascending Chain Condition;we write

!for the least upper bound operator

• a set F of monotone functions from L to L that contains the identityfunction and that is closed under function composition

A Distributive Framework is a Monotone Framework where additionallyall functions f in F are required to be distributive:

f(l1 ! l2) = f(l1) ! f(l2)


Pointer Analysis

26

Goal: what objects can a pointer point-to?

Statically undecidable in general. What are good approximations?

Can be used to infer aliases: if a points to b, and b points to c, then:

{<*a,b>, <*b,c>} ==> {<**a,c>} and thus **a and *b are aliases

Fundamentally an interprocedural analysis: A pointer variable can be supplied as arguments or returned as a result from

a procedure.

Flow-insensitive vs. flow-sensitive context-sensitivity? shape analysis? field sensitivity?

Subset-based (inclusion) Example:

27

q = &x;q = &y;p = q;q = &z;

3 Andersen [1] Example [29]

Consider the following program:

1. q = &x;2. q = &y;3. p = q;3. q = &z;

First two statements are easy:

q = &x;

q x1

q = &y;

q

x1

y

2

Third statement. See all the things q points to, and makep point to them as well. Add in dotted line, to remind uspts(q) ! pts(p).

p = q;

q

x

1

y2

p

3

3

Fourth statement. Add in q " z edge.

q = &z;

q

x

1

y2

z

4

p

3

3

But dotted line reminds us that pts(q) ! pts(p). So we needto add p" z edge as well. This is the extra work that makesAndersen’s analysis more expensive. In a Steensgaard styleanalysis we would have collapsed x and y at the secondstatement, and then we wouldn’t have to worry about thisextra work (although we would lose precision).

q = &z;

q

x

1

y2

z

4 p

3

3

4

Andersen is O(n3).

Steensgaard is said to be equality-based, eg: pts(q) = pts(p).

Acknowledgements

Thanks to Greg Dennis and Rob Seater for discussions.Thanks to John Whaley for sending me his slides [33].Thanks to Michael Ernst for sending me to Dagstuhl whereI saw Barbara Ryder’s talk [29].

References

[1] Lars O. Andersen. Program Analysis and Special-ization of the C Programming Language. PhD thesis,DIKU, University of Copenhagen, 1994.

[2] Marc Berndl, Ondrej Lhotak, Feng Qian, Lau-rie Hendren, and Navindra Umanee. Points-to analysis using BDDs. In Rajiv Gupta, editor,Proc.PLDI, pages 103–114, June 2003.

[3] Venkatesan T. Chakaravarthy. New results on thecomputability and complexity of points-to analysis. InGreg Morrisett, editor, 30thPOPL, pages 115–125,New Orleans, Louisiana, January 2003.

[4] Craig Chambers, editor. Proc.PLDI, June 2004.ISBN 1-58113-807-5.

[5] Jong-Deok Choi, Michael G. Burke, and Paul R.Carini. E!cient flow-sensitive interprocedural compu-tation of pointer-induced aliases and side e"ects. In20thPOPL, pages 232–245, Charleston, SC, January1993.

[6] Amer Diwan. CSCI 5535: Homework 4, 1999.http://www-plan.cs.colorado.edu/diwan/5535-99/hw4-soln.pdf.

[7] Manuel Fahndrich, Jeffrey S. Foster, Zhen-dong Su, and Alexander Aiken. Partial onlinecycle elimination in inclusion constraint graphs. InProc.PLDI, pages 85–96, Montreal, Canada, May 1998.

[8] Manuel Fahndrich, Jakob Rehof, and ManuvirDas. Scalable context-sensitive flow analysis using in-stantiation constraints. In Proc.PLDI, pages 253–263,Vancouver, British Columbia, Canada, June 2000.

[9] John Field and Gregor Snelting, editors.Proc.ACM SIGPLAN-SIGSOFT Workshop on Pro-gram Analysis for Software Tools and Engineering(PASTE), Snowbird, UT, June 2001.

[10] Axel Gross. Evaluation of dynamic points-to analysis,2004. http://www.complang.tuwien.ac.at/franz/sem-arbeiten/04w/semWS04 gross 0026934.pdf.

[11] Nevin Heintze and Olivier Tardieu. Ultra-fastaliasing analysis using CLA: A million lines of C codein a second. In Mary Lou Soffa, editor, Proc.PLDI,Snowbird, UT, June 2001.

[12] Michael Hind. Pointer analysis: haven’t we solvedthis problem yet? In Field and Snelting [9], pages 54–61.

[13] Michael Hind and Anthony Pioli. Assessing thee"ects of flow-sensitivity on pointer alias analyses. InProc.International Static Analysis Symposium (SAS),pages 57–81, 1998.

4



1. q = &x;2. q = &y;3. p = q;3. q = &z;


q = &x;

q x1

q = &y;

q

x1

y

2


p = q;

q

x

1

y2

p

3

3


q = &z;

q

x

1

y2

z

4

p

3

3


q = &z;

q

x

1

y2

z

4 p

3

3

4

Andersen is O(n3).


Acknowledgements


References














4

Dotted line indicates that the points-to set for q must be a subset of the points-to set for p.

Subset-based (inclusion)

28



1. q = &x;2. q = &y;3. p = q;3. q = &z;


q = &x;

q x1

q = &y;

q

x1

y

2


p = q;

q

x

1

y2

p

3

3


q = &z;

q

x

1

y2

z

4

p

3

3


q = &z;

q

x

1

y2

z

4 p

3

3

4

Andersen is O(n3).


Acknowledgements


References














4



1. q = &x;2. q = &y;3. p = q;3. q = &z;


q = &x;

q x1

q = &y;

q

x1

y

2


p = q;

q

x

1

y2

p

3

3


q = &z;

q

x

1

y2

z

4

p

3

3


q = &z;

q

x

1

y2

z

4 p

3

3

4

Andersen is O(n3).


Acknowledgements


References














4

Subset constraint (dotted line) indicates that an edge must be established between p and z as well.

What is the running-time complexity of these two analyses?

Shape Graphs

29

Shape graphs

The analysis will operate on shape graphs (S,H, is) consisting of

• an abstract state, S,

• an abstract heap, H, and

• sharing information, is, for the abstract locations.

The nodes of the shape graphs are abstract locations:

ALoc = {nX | X ! Var!}

Note: there will only be finitely many abstract locations


Transfer Functions

30

Transfer functions

fSA! : P(SG)! P(SG)

has the form:

fSA! (SG) =

!{"SA

! ((S,H, is)) | (S,H, is) " SG}

where

"SA! : SG ! P(SG)

specifies how a single shape graph (in Shape#(!)) may be transformedinto a set of shape graphs (in Shape•(!)) by the elementary block.


Assignments

31

Transfer function for [x:=a]!— where a is of the form n, a1 opa a2 or nil

"SA! ((S,H, is)) = {killx((S,H, is))}

where killx((S,H, is)) = (S!,H!, is!) is

S! = {(z, kx(nZ)) | (z, nZ) " S # z $= x}H! = {(kx(nV ), sel, kx(nW )) | (nV , sel, nW ) " H}is! = {kx(nX) | nX " is}

and

kx(nZ) = nZ\{x}

Idea: all abstract locations are renamed to not having x in their nameset


Assignment (cont)

32

T r a nsf e r f u n c t i o n f or [x:=y]! w h e n x ! = y

"SA! ( ( S, H , is ) ) = {( S"", H "", is"" )}

w h e r e ( S", H ", is" ) = killx ( ( S, H , is ) ) a n d

S"" = {(z, gyx (nZ ) ) | (z, nZ ) # S"}

$ {(x, gyx (nY ) ) | (y", nY ) # S" % y" = y}

H "" = {(gyx (nV ), sel, gy

x (nW ) ) | (nV , sel, nW ) # H "}

is"" = {gyx (nZ ) | nZ # is"}

a n d

gyx (nZ ) =

!nZ${x} i f y # Z

nZ o t h e r w ise

Id e a : a ll a bs t r a c t l o c a t i o ns ar e r e n a m e d t o a lso h a v e x i n t h e ir n a m e se tif t h e y a lr e a d y h a v e y

P P A S e c t i o n 2 . 6 c! F . N i elso n & H . R iis N i elso n & C . H a n k i n ( M a y 2 0 0 5 ) 1 3 7

Heap selection

33

Transfer function for [x:=y.sel]! when x != y

Remove the old binding for x: strong nullification

(S",H", is") = killx((S,H, is))

Establish the new binding for x:

1. There is no abstract location nY such that (y, nY ) # S" – or there isan abstract location nY such that (y, nY ) # S" but no nZ such that(nY , sel, nZ) # H"

2. There is an abstract location nY such that (y, nY ) # S" and there isan abstract location nU != n$ such that (nY , sel, nU) # H"

3. There is an abstract location nY such that (y, nY ) # S" and (nY , sel, n$)# H"


T r a nsf e r f u n c t i o n f or [x:=y.sel]! w h e n x ! = y

R e m o v e t h e o l d b i n d i n g f or x: s t r o n g n u lli fi c a t i o n

( S", H ", is" ) = killx ( ( S, H , is ) )

E s t a b lish t h e n e w b i n d i n g f or x:

1 . T h e r e is n o a bs t r a c t l o c a t i o n nY su c h t h a t (y, nY ) # S" – or t h e r e isa n a bs t r a c t l o c a t i o n nY su c h t h a t (y, nY ) # S" b u t n o nZ su c h t h a t(nY , sel, nZ ) # H "

2 . T h e r e is a n a bs t r a c t l o c a t i o n nY su c h t h a t (y, nY ) # S" a n d t h e r e isa n a bs t r a c t l o c a t i o n nU ! = n$ su c h t h a t (nY , sel, nU ) # H "

3 . T h e r e is a n a bs t r a c t l o c a t i o n nY su c h t h a t (y, nY ) # S" a n d (nY , sel, n$ )# H "

P P A S e c t i o n 2 . 6 c! F . N i elso n & H . R iis N i elso n & C . H a n k i n ( M a y 2 0 0 5 ) 1 3 9

Interprocedural Analysis:Transfer Function for Procedure Calls

34

Transfer functions for procedure calls

Procedure calls [call p(a, z)]!c!r

have two transfer functions:

For the procedure call

f1!c

: P( ! ! D )" P( ! ! D )

and it is used in the equation:

A•(!c) = f1!c(A#(!c)) for all procedure calls [call p(a, z)]!c

!r

For the procedure return

f2!c,!r

: P( ! ! D ) ! P( ! ! D )" P( ! ! D )

and it is used in the equation:

A•(!r) = f2!c,!r

( A#(!c) , A#(!r)) for all procedure calls [call p(a, z)]!c!r

(Note that A#(!r) will equal A•(!x) for the relevant procedure exit.)


Procedure calls/returns

35

Procedure calls and returns

[call p(a, z)]!c!r

Z

!

!

f2!c,!r

(Z, Z!)

! " ######################$

f1!c(Z)

"

Z!

Z

#

%%%%%%%%%%%%%%%%%%%%%%&$

#

!

proc p(val x; res y)

is!n

end!x

!


Register Allocation: Terminology

36

A value corresponds to a definition (or the result of a definition) A variable is live if it holds a value that may be needed in the future A live range is composed of one or more values, connected by common

uses. All values comprising a live range will be read by the same virtual

register A single virtual register may have several live ranges

A single virtual register may comprise several live ranges Interference graph:

Vertices represent variables (more precisely distinct definitions) Edges represent interference between variables with overlapping live

ranges (i.e., both live ranges are live at some point, and cannot use the same physical register)

A coloring represents a register assignment

Linear Scan

37

Linear scan register allocation is composed of 4 simple steps: Order the instructions in linear fashion

Many have proposed heuristics for finding the best linear order

Calculate the set of live intervals Each temporary is given a live interval

Allocate a register to each interval If a register is available, then allocation is possible Otherwise, an already allocated register is chosen (register

spill occurs) Rewrite the code according to the allocation

Actual registers replace temporary or virtual registers Spill code is generated

Register Allocation: Introduction · Register allocation 2. Pattern Matching ML functions can be...

Documents

Transcript of Register Allocation: Introduction · Register allocation 2. Pattern Matching ML functions can be...