Coloring Heuristics for Register Allocation

12
RETROSPECTIVE: Coloring Heuristics for Register Allocation Preston Briggs Keith D. Cooper Ken Kennedy Linda Torczon Cray Research, Inc. Department of Computer Department of Computer Department of Computer 411 First Avenue South Science Science Science Suite 600 Rice University Rice University Rice University Seattle, WA 98104 Houston, TX 77251-1982 Houston, TX 77251-1982 Houston, TX 77251-1982 [email protected] [email protected] [email protected] [email protected] 1. INTRODUCTION From the earliest compilers, register allocation was recognized as an important optimization. Indeed, the original Fortran compiler spent two of its six passes on the problem [1]. (That compiler used an approach similar to the linear-scan algorithms being pro- posed today for just-in-time compilers.) In the early 1960’s, the Soviet mathematician Lavrov made the intellectual connection be- tween allocation problems and graph coloring [17]. Unfortunately, Lavrov gave no algorithm; instead, he suggested that it was pos- sible to enumerate all the colorings of the graph and take the best one. Chaitin, et al. took Lavrov’s fundamental ideas, developed them, and built the first graph-coloring register allocator [9, 8]. In essence, Chaitin’s allocator builds an interference graph where each node represents a value, computes an ordering on the nodes of that graph, and assigns colors to the nodes in that order. Our paper introduced an improved coloring strategy that pro- duced better allocations for many graphs on which Chaitin’s method fails. The key difference between our algorithm and Chaitin’s al- gorithm lies in the timing of spill decisions. While computing the order for coloring, the allocator can reach a point where it cannot find a node in the interference graph that it can provably color. At this point, Chaitin’s allocator spills the value associated with that node and excludes the node from the coloring order. Our method also chooses a spill candidate at this point. Instead of spilling it, the allocator inserts it into the coloring order. The spill candidate either receives a color or it does not—in which case the allocator spills it. Experimental evidence in the paper confirms the effectiveness of this deferred spilling approach. This technique has been adopted in many commercial and research compilers. 2. BACKGROUND We implemented a graph-coloring register allocator in our com- piler for the I BM RT- PC, an early RISC workstation with 16 gen- eral purpose registers and 8 floating-point registers. We were, in general, pleased with the results. In detailed examination of the code for some inner loops, however, we noticed that the allocator overspilled—leaving some registers unused while spilling critical values. This shortcoming led us to re-examine the register alloca- tor and its algorithms. In particular, we began to investigate live range splitting [11, 16, 13]. During this time, the authors attended a colloquium talk that Jorge Mor´ e gave at Rice. Mor´ e’s talk included Matula’s smallest- last coloring algorithm [18]. Our study of Chaitin’s algorithm had 20 Years of the ACM/SIGPLAN Conference on Programming Language Design and Implementation (1979-1999): A Selection, 2003. Copyright 2003 ACM 1-58113-623-4 ...$5.00. produced a simple four-node counterexample, the diamond graph. d a c b We quickly realized that Matula’s algorithm would two-color this graph. This shifted our attention away from live-range splitting and back onto the fundamental’s of Chaitin’s method. At a high-level, Chaitin’s algorithm builds an interference graph, uses the graph to order the nodes for color assignment, and then assigns colors in the specified order. The critical step, for our pur- poses, is when the algorithm picks the next node to add to the order- ing. The algorithm selects any node n with fewer than k neighbors, where k is the number of registers available to the allocator. If n has fewer than k neighbors, any assignment of colors to n’s neigh- bors leaves at least one color for n. Thus, n must receive a color. If no such node remains in the graph, the algorithm picks a node, using some heuristic, and spills the corresponding value. This approach fails to find a two-coloring for the diamond graph because every node has two neighbors. The first time it tries to pick a node, it must spill one, say a. This lowers the degree of its two neighbors, b and d, to one. It can now construct a coloring order for b, c, and d. (Any order beginning with b or d works.) Clearly, this is pessimistic because a can always use the same color as c. Smallest- last coloring constructs an ordering—it picks any node first since they all have the same degree. Since any order will produce a two coloring, it succeeds where Chaitin’s algorithm did not. Smallest-last coloring is not, in itself, the answer. For example, it provides no help in spilling when the graph cannot, in fact, be k-colored. The paper shows the insights and algorithms that we eventually derived. The resulting algorithm fits a stronger coloring heuristic into the basic structure of the original algorithm. We refer to this heuristic as optimistic coloring. 3. INFLUENCE This paper appeared at PLDI 89, in a session with two other pa- pers on improvements to graph-coloring allocators [3, 15]. The pa- pers were notable because they all improved on Chaitin’s algorithm and each showed apples-to-apples comparisons—that is, the same allocator running in the same compiler with a single algorithmic change. (Most prior work on register allocation came in the form of experience papers, rather than experimental comparisons.) That session marked a resurgence of research interest in Chaitin-style register allocators. Nickerson noted that optimistic coloring improved the behav- ior of the allocator for values that required multiple registers [19]. ACM SIGPLAN 283 Best of PLDI 1979-1999 ACM SIGPLAN 283 Best of PLDI 1979-1999

Transcript of Coloring Heuristics for Register Allocation

Page 1: Coloring Heuristics for Register Allocation

RETROSPECTIVE:

Coloring Heuristics for Register Allocation

Preston Briggs Keith D. Cooper Ken Kennedy Linda TorczonCray Research, Inc. Department of Computer Department of Computer Department of Computer

411 First Avenue South Science Science ScienceSuite 600 Rice University Rice University Rice University

Seattle, WA 98104 Houston, TX 77251-1982 Houston, TX 77251-1982 Houston, TX [email protected] [email protected] [email protected] [email protected]

1. INTRODUCTIONFrom the earliest compilers, register allocation was recognized asan important optimization. Indeed, the original Fortran compilerspent two of its six passes on the problem [1]. (That compilerused an approach similar to the linear-scan algorithms being pro-posed today for just-in-time compilers.) In the early 1960’s, theSoviet mathematician Lavrov made the intellectual connection be-tween allocation problems and graph coloring [17]. Unfortunately,Lavrov gave no algorithm; instead, he suggested that it was pos-sible to enumerate all the colorings of the graph and take the bestone. Chaitin, et al. took Lavrov’s fundamental ideas, developedthem, and built the first graph-coloring register allocator [9, 8]. Inessence, Chaitin’s allocator builds an interference graph where eachnode represents a value, computes an ordering on the nodes of thatgraph, and assigns colors to the nodes in that order.

Our paper introduced an improved coloring strategy that pro-duced better allocations for many graphs on which Chaitin’s methodfails. The key difference between our algorithm and Chaitin’s al-gorithm lies in the timing of spill decisions. While computing theorder for coloring, the allocator can reach a point where it cannotfind a node in the interference graph that it can provably color. Atthis point, Chaitin’s allocator spills the value associated with thatnode and excludes the node from the coloring order. Our methodalso chooses a spill candidate at this point. Instead of spilling it, theallocator inserts it into the coloring order. The spill candidate eitherreceives a color or it does not—in which case the allocator spills it.Experimental evidence in the paper confirms the effectiveness ofthis deferred spilling approach. This technique has been adopted inmany commercial and research compilers.

2. BACKGROUNDWe implemented a graph-coloring register allocator in our com-piler for the IBM RT-PC, an early RISC workstation with 16 gen-eral purpose registers and 8 floating-point registers. We were, ingeneral, pleased with the results. In detailed examination of thecode for some inner loops, however, we noticed that the allocatoroverspilled—leaving some registers unused while spilling criticalvalues. This shortcoming led us to re-examine the register alloca-tor and its algorithms. In particular, we began to investigate liverange splitting [11, 16, 13].

During this time, the authors attended a colloquium talk thatJorge More gave at Rice. More’s talk included Matula’s smallest-last coloring algorithm [18]. Our study of Chaitin’s algorithm had

20 Years of the ACM/SIGPLAN Conference on Programming LanguageDesign and Implementation (1979-1999): A Selection, 2003.Copyright 2003 ACM 1-58113-623-4 ...$5.00.

produced a simple four-node counterexample, the diamond graph.

�d

�a

�c

�b

We quickly realized that Matula’s algorithm would two-color thisgraph. This shifted our attention away from live-range splitting andback onto the fundamental’s of Chaitin’s method.

At a high-level, Chaitin’s algorithm builds an interference graph,uses the graph to order the nodes for color assignment, and thenassigns colors in the specified order. The critical step, for our pur-poses, is when the algorithm picks the next node to add to the order-ing. The algorithm selects any node n with fewer than k neighbors,where k is the number of registers available to the allocator. If nhas fewer than k neighbors, any assignment of colors to n’s neigh-bors leaves at least one color for n. Thus, n must receive a color.If no such node remains in the graph, the algorithm picks a node,using some heuristic, and spills the corresponding value.

This approach fails to find a two-coloring for the diamond graphbecause every node has two neighbors. The first time it tries to picka node, it must spill one, say a. This lowers the degree of its twoneighbors, b and d, to one. It can now construct a coloring order forb, c, and d. (Any order beginning with b or d works.) Clearly, this ispessimistic because a can always use the same color as c. Smallest-last coloring constructs an ordering—it picks any node first sincethey all have the same degree. Since any order will produce a twocoloring, it succeeds where Chaitin’s algorithm did not.

Smallest-last coloring is not, in itself, the answer. For example,it provides no help in spilling when the graph cannot, in fact, bek-colored. The paper shows the insights and algorithms that weeventually derived. The resulting algorithm fits a stronger coloringheuristic into the basic structure of the original algorithm. We referto this heuristic as optimistic coloring.

3. INFLUENCEThis paper appeared at PLDI 89, in a session with two other pa-pers on improvements to graph-coloring allocators [3, 15]. The pa-pers were notable because they all improved on Chaitin’s algorithmand each showed apples-to-apples comparisons—that is, the sameallocator running in the same compiler with a single algorithmicchange. (Most prior work on register allocation came in the formof experience papers, rather than experimental comparisons.) Thatsession marked a resurgence of research interest in Chaitin-styleregister allocators.

Nickerson noted that optimistic coloring improved the behav-ior of the allocator for values that required multiple registers [19].

ACM SIGPLAN 283 Best of PLDI 1979-1999ACM SIGPLAN 283 Best of PLDI 1979-1999

Page 2: Coloring Heuristics for Register Allocation

Koblenz and Callahan built a hierarchical register allocator that re-lied, at its heart, on optimistic coloring [7]. Norris and Pollocktook another approach to hierarchical allocation with an allocatorthat operated on program-dependence graphs [20]. We introduced anumber of improvements, including rematerialization, conservativecoalescing, and biased coloring [5]. Appel and George invented it-erated coalescing [14]. Park and Moon struck a middle groundbetween Chaitin’s “aggressive” coalescing and the less aggressiveconservative and iterated schemes [21]. Bergner, et al. inventedinterference-region spilling [2].

Of equal importance, a second thread in the literature has dealtwith issues that arise in the implementation of graph-coloring reg-ister allocators. Gupta, et al. showed taht the compiler can useclique separators to reduce the memory requirements for alloca-tion [15]. Choi, et al. introduced sparse evaluation graphs, reduc-ing the time required to compute LIVE information [10]. Briggs,et al. explained how to encode multiple-register requirements intothe interference graph [4] and how to implement some of the datastructures [6]. Cooper et al. described fast techniques for construct-ing the interference graph [12]. These papers have made it easierto implement a fast and effective allocator.

The algorithm, along with various improvements, has been im-plemented in many compilers, both research systems and com-mercial products. Indeed, Hopkins reported (in conversation) thatIBM’s own implementation of our algorithm in the Tobey compileryielded almost exactly the improvements reported in our paper.

4. ACKNOWLEDGMENTSIBM Corporation, through Dr. Horace Flatt of the Palo Alto Scien-tific Center, provided the support that let us explore these ideas.

REFERENCES[1] J. W. Backus, R. J. Beeber, S. Best, R. Goldberg, L. M.

Haibt, H. L. Herrick, R. A. Nelson, D. Sayre, P. B. Sheridan,H. Stern, I. Ziller, R. A. Hughes, and R. Nutt. TheFORTRAN automatic coding system. In Proceedings of theWestern Joint Computer Conference, pages 188–198.Institute of Radio Engineers, NY, NY, USA, Feb. 1957.

[2] P. Bergner, P. Dahl, D. Engebretsen, and M. O’Keefe. Spillcode minimization via interference region spilling. SIGPLANNotices, 32(6):287–295, June 1997. Proceedings of the ACMSIGPLAN ’97 Conference on Programming LanguageDesign and Implementation.

[3] D. Bernstein, D. Q. Goldin, M. C. Golumbic, H. Krawczyk,Y. Mansour, I. Nahshon, and R. Y. Pinter. Spill codeminimization techniques for optimizing compilers.SIGPLAN Notices, 24(7):258–263, July 1989. Proceedingsof the ACM SIGPLAN ’89 Conference on ProgrammingLanguage Design and Implementation.

[4] P. Briggs, K. D. Cooper, and L. Torczon. Coloring registerpairs. ACM Letters on Programming Languages andSystems, 1(1):3–13, Mar. 1992.

[5] P. Briggs, K. D. Cooper, and L. Torczon. Rematerialization.SIGPLAN Notices, 27(7):311–321, July 1992. Proceedingsof the ACM SIGPLAN ’92 Conference on ProgrammingLanguage Design and Implementation.

[6] P. Briggs and L. Torczon. An efficient representation forsparse sets. ACM Letters on Programming Languages andSystems, 2(1–4):45–58, March–December 1993.

[7] D. Callahan and B. Koblenz. Register allocation viahierarchical graph coloring. SIGPLAN Notices,26(6):192–203, June 1991. Proceedings of the ACM

SIGPLAN ’91 Conference on Programming LanguageDesign and Implementation.

[8] G. J. Chaitin. Register allocation and spilling via graphcoloring. SIGPLAN Notices, 17(6):98–105, June 1982.Proceedings of the ACM SIGPLAN ’82 Symposium onCompiler Construction.

[9] G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke,M. E. Hopkins, and P. W. Markstein. Register allocation viagraph coloring. Computer Languages, 6(1):47–57, Jan. 1981.

[10] J.-D. Choi, R. Cytron, and J. Ferrante. Automaticconstruction of sparse data flow evaluation graphs. InConference Record of the Eighteenth Annual ACMSymposium on Principles of Programming Languages, pages55–66, Orlando, FL, USA, Jan. 1991.

[11] F. C. Chow and J. L. Hennessy. Register allocation bypriority-based coloring. SIGPLAN Notices, 19(6):222–232,June 1984. Proceedings of the ACM SIGPLAN ’84Symposium on Compiler Construction.

[12] K. D. Cooper, T. J. Harvey, and L. Torczon. How to build aninterference graph. Software—Practice and Experience,28(4):425–444, Apr. 1998.

[13] K. D. Cooper and L. T. Simpson. Live range splitting in agraph coloring register allocator. In Proceedings of theSeventh International Compiler Construction Conference,CC ’98, Lecture Notes in Computer Science 1383, pages174–187, 1998.

[14] L. George and A. W. Appel. Iterated register coalescing.ACM Transactions on Programming Languages and Systems,18(3):300–324, May 1996.

[15] R. Gupta, M. L. Soffa, and T. Steele. Register allocation viaclique separators. SIGPLAN Notices, 24(7):264–274, July1989. Proceedings of the ACM SIGPLAN ’89 Conference onProgramming Language Design and Implementation.

[16] J. R. Larus and P. N. Hilfinger. Register allocation in theSPUR Lisp compiler. SIGPLAN Notices, 21(7):255–263, July1986. Proceedings of the ACM SIGPLAN ’86 Symposium onCompiler Construction.

[17] S. S. Lavrov. Store economy in closed operator schemes.Journal of Computational Mathematics and MathematicalPhysics, 1(4):687–701, 1961. English translation in U.S.S.R.Computational Mathematics and Mathematical Physics3:810-828, 1962.

[18] D. Matula and L. Beck. Smallest-last ordering and clusteringand graph coloring algorithms. Technical Report CSE-8104,Department of Computer Science and Engineering, SouthernMethodist University, July 1981.

[19] B. R. Nickerson. Graph coloring register allocation forprocessors with multi-register operands. SIGPLAN Notices,25(6):40–52, June 1990. Proceedings of the ACM SIGPLAN’90 Conference on Programming Language Design andImplementation.

[20] C. Norris and L. L. Pollock. Register allocation over theprogram dependence graph. SIGPLAN Notices,29(6):266–277, June 1994. Proceedings of the ACMSIGPLAN ’94 Conference on Programming LanguageDesign and Implementation.

[21] J. Park and S.-M. Moon. Optimistic register coalescing. InProceedings of the International Conference on ParallelArchitectures and Compilation Techniques (PACT), pages196–204. IEEE, 1998.

ACM SIGPLAN 284 Best of PLDI 1979-1999

Page 3: Coloring Heuristics for Register Allocation

ACM SIGPLAN 285 Best of PLDI 1979-1999

Page 4: Coloring Heuristics for Register Allocation

ACM SIGPLAN 286 Best of PLDI 1979-1999

Page 5: Coloring Heuristics for Register Allocation

ACM SIGPLAN 287 Best of PLDI 1979-1999

Page 6: Coloring Heuristics for Register Allocation

ACM SIGPLAN 288 Best of PLDI 1979-1999

Page 7: Coloring Heuristics for Register Allocation

ACM SIGPLAN 289 Best of PLDI 1979-1999

Page 8: Coloring Heuristics for Register Allocation

ACM SIGPLAN 290 Best of PLDI 1979-1999

Page 9: Coloring Heuristics for Register Allocation

ACM SIGPLAN 291 Best of PLDI 1979-1999

Page 10: Coloring Heuristics for Register Allocation

ACM SIGPLAN 292 Best of PLDI 1979-1999

Page 11: Coloring Heuristics for Register Allocation

ACM SIGPLAN 293 Best of PLDI 1979-1999

Page 12: Coloring Heuristics for Register Allocation

ACM SIGPLAN 294 Best of PLDI 1979-1999