Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice...

65
Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika Subramanian, Linda Torczon Tim Harvey, Steve Reeves, Alex Grosul, Todd Waterman, L. Alamagor Funding: DARPA (early work), DOE LACSI, NSF ITR

Transcript of Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice...

Page 1: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers

Keith D. CooperDepartment of Computer Science

Rice UniversityHouston, Texas

Collaborators: Phil Schielke, Devika Subramanian, Linda Torczon Tim Harvey, Steve Reeves, Alex Grosul, Todd Waterman, L. Alamagor

Funding: DARPA (early work), DOE LACSI, NSF ITR

Page 2: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 2

Roadmap

• The opportunity for change

• Building slower compilers Randomized iterative repair instruction schedulers A multi-optimizer

• Choosing “good” optimization sequences Understanding the search spaces Design & evaluation of search algorithms Speeding up Evaluations

• Roadmap for Future Work

There are no conclusions; we are far from done

Page 3: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 3

The structure of compilers has not changed much since 1957

• Front End, Middle Section (Optimizer), Back End

• Series of filter-style passes

• Fixed order of passes

The Opportunity for Change

FrontEnd

Front End Middle Section Back End

IndexOptimiz’n

CodeMerge

bookkeeping

FlowAnalysis

RegisterAlloc’n

Final Assembly

Fortran Automatic Coding System, IBM, 1957

Opportunity for Change

Page 4: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 4

2000: The Pro64 Compiler

Open source optimizing compiler for IA 64

• 3 front ends, 1 back end

• Five-level IR

• Gradual lowering of abstraction level

Classic Compilers

Fortran

C & C++

Java

Front End Middle End Back End

Interpr.Anal. &Optim’n

LoopNest

Optim’n

GlobalOptim’n

CodeGen.

Each major section is a series of passes

Opportunity for Change

Page 5: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 5

Conventional Wisdom

Compilers should

• Use algorithms that take linear (or near-linear) time

• Produce outstanding code from arbitrary inputs

• Build object code that can be assembled like a brick wall

These goals limit the designs of our compilers

Opportunity for Change

Page 6: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 6

The Opportunity for Change

Over the years, computers have become markedly faster

Compiler at 20% per year

Compilers have not taken advantage of the quantum increases in compute power provided by Moore’s law

Page 7: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 7Opportunity for Change

The Opportunity for Change

Over the years, computers have become markedly faster

Compilers have not taken advantage of the quantum increases in compute power provided by Moore’s law

We can afford slower compilers if they do something useful with the extra time

Page 8: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 8

The Need for Change

For forty five years, we’ve been doing research on compilation

and we’ve been building compilers …

• Hundreds of papers on transformations

• Hundreds of papers on analysis

• A few useful papers on experience …

Unfortunately, the compilers that we use still don’t deliver the performance that we were promised

Research has focused on transformations & analysis

Maybe we need to look at other aspects of compiler construction such as the structure of our compilers

Opportunity for Change

Page 9: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 9

Building Slower Compilers

In 1996, we began to look at what a compiler might do with 10x or 100x more compilation time

• Most compilers would finish the job early & declare victory

• We began looking at the opportunities More expensive analysis (n6 pointer

analysis?) Many more transformations (what &

when) Compile code 10 ways & keep the best version

This inquiry led to an in-depth study of instruction scheduling

• How good is list scheduling?

• Can we do better? (see Sebastian Winkel’s talk, next)

Randomized Schedulers

Page 10: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 10Randomized Schedulers

Iterative Repair Scheduling

Search technique from AI based on randomization & restart

• Used for (small) scheduling problems in other domains

• Assume a simple invalid schedule & fix it Pick an error at random & reschedule that operation Different runs find different valid schedules

• Finds better schedules for hard problems (at higher cost) Schedules are often better in secondary criteria (register

use)

How often are schedules hard enough to justify IR?

• Examined blocks & extended blocks from benchmarks

• Examined > 85,000 distinct synthetic blocks

3 wall months, 2 SPARCs

Phil Schielke’s thesis

Page 11: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 11

Support

Iterative Repair Scheduling

What did we learn?

• List scheduling does well on codes & models we tested RBF does 5 forward & 5 backward passes, with randomized

tie-breaking RBF found optimal schedules for 92% of blocks & 73% of

EBBs RBF found optimal schedules for 80% of synthetic blocks

• IR Scheduling also finds good schedules Schedules that use fewer resources than RBF’s optimal one Optimal schedules for many where RBF fails

Parameter that predicts when to use IR

• Set of schedules where RBF is likely to find suboptimal answer and IR scheduler is likely to do well

Randomized Schedulers

Page 12: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 12

Lessons from Iterative Repair Work

• Disappointment RBF does very well – conventional wisdom is right Used randomization, restart, & multiple trials (vs. tie

breakers)

• Good understanding of the space of schedules Can find equivalent schedules that use fewer resources Can identify instances where IR is likely to beat RBF

• New model for our work Extensive, expensive exploration to understand problem

space Development of effective & (reasonably) efficient

techniques for the hard problems, using knowledge gained in explorations

Randomized Schedulers

Page 13: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 13

Next Idea: Multiple Optimization Plans

Idea is simple: try several and keep the best result

Optimization Sequences

Front End

Middle End Back End

Cost is roughly 4x the “old” way

Might produce better code

(Bernstein et al.)

Keep bestcode

Page 14: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 14

Next Idea: Multiple Optimization Plans

Idea is simple: try several and keep the best result

Implementation leads immediately to some hard questions

• How do we pick good sequences?

• How do we implement a compiler that can handle multiple sequences?

This investigation produced a system that used a genetic algorithm to derive good program-specific sequences

• Improvements of (roughly) 20% in speed, 12% in space

• Paper in LCTES, summer 1999

Optimization Sequences

Genetic algorithm led to “Evolving” in this talk’s

title

Remainder of this talk

*

Page 15: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 15

Next Idea: Multiple Optimization Plans

This single idea hijacked our research agenda

• Questions inherent in this simple idea are quite difficult

• We saw no easy way to answer them

• Led to Schielke’s work on the impact of optimization order on code size & code speed

• Spawned a project to develop & engineer compilers that adapt their behavior to input program, objective function, and target machine

We did not know that it would become a ten-year odyssey

Page 16: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 16

Tries to minimize an objective function using adaptive search

• Finds the “right” configuration for the compiler & input Set of optimizations & an order to run them

• Uses multiple trials & feedback to explore solution space

Prototype Adaptive Compiler

Vary parameters

Objectivefunction

Executable codeFront end

SteeringAlgorithm

Optimization Sequences

Page 17: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 17

Finding “right configuration” is hard• Multiple algorithms for each effect• Different scopes, cases, domains, strengths, &

weaknesses• Overlap between effects complicates choices

Prototype Adaptive Compiler

Vary parameters

Objectivefunction

Executable codeFront end

SteeringAlgorithm

Optimization Sequences

Support

Page 18: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 18

Choosing Optimization Sequences

The Problem:

find the best sequence of transformations for your program

What’s hard

• 16 optimizations in our compiler (ignoring option flags)

• With 10 passes, that is 1610 possibilities

• Interactions are nonlinear, unpredictable, & overlapping

• Want to pick a minimzer for objective function quickly

Optimization Sequences

Page 19: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 19

Prototype Adaptive Compiler

• Based on MSCP compiler

• 16 transformations Run in almost any order (not

easy) Many options & variants

• Search-based steering algorithms Hill climber (valley descender?) Variations on a genetic

algorithm Exhaustive enumeration

• Objective functions Run-time speed Code space Dynamic bit-transitions

• Experimental tool Exploring applications Learning about search space Designing better searches

Vary parameters

Objectivefunction

Executable codeFront end

SteeringAlgorithm

Prototype adaptive compiler

+

com

bina

tions

of th

em

An effective way to find some subtle optimization

bugs

*Optimization Sequences

Page 20: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 20

Early Experiments

• Genetic Algorithm (12 passes drawn from pool of 10) Evaluate each sequence Replace worst + 3 at each generation Generate new strings with crossover Apply mutation to all, save the best

• Optimizing for space then speed 13% smaller code than fixed sequence (0 to 41%) Code was often faster (26% to -25%; 5 of 14 slower)

• Optimizing for speed then space 20% faster code than fixed sequence (best was 26%) Code was generally smaller (0 to 5%)

• Found “best” sequence in 200 to 300 generations of size 20

Register-relative procedure abstraction gets 5% space, -2% speed

*

This GA turns out to be fairly poor. Even so, it took many fewer probes to find “best” sequence than did random sampling.

N.B.: 6,000 compilations No measure of solution quality

Optimization Sequences

Page 21: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 21

Choosing Optimization Sequences

Classic optimization problem

• Compiler looks for minimizer in some discrete space 1610 points for 10-pass optimization in prototype compiler Can obtain function values for any point, at some cost

• Need to understand the properties of the search space Depends on base optimizations & interactions between

them Depends on program being optimized Depends on properties of the target machine

• Difficult and complex problem …

Optimization Sequences

*

Page 22: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 22

Choosing Optimization Sequences

Classic optimization problem

• Compiler looks for minimizer in some discrete space 1610 points for 10-pass optimization in prototype compiler Can obtain function values for any point, at some cost

• Need to understand the properties of the search space Depends on base optimizations & interactions between

them Depends on program being optimized Depends on properties of the target machine

• The genetic algorithm operates (pretty well) in this space !

Optimization Sequences

Page 23: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 23

Choosing Optimization Sequences

Work has two major thrusts

• Characterizing the search spaces Large-scale enumerations of small spaces to develop

insights Small-scale experiments in large spaces to confirm insights

• Designing effective search algorithms Rapid offline experiments in enumerated spaces Confirming online experiments in large spaces

• Question: can we understand the space analytically? Models of optimizations & their combinations I don’t yet know enough about interactions & effects

Optimization Sequences

Is it convex or differentiable?

Page 24: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 24

Characterizing the Search Space

Enumeration Experiments

• Full search space is huge: 1,099,511,627,776 points in 1610

• Work with tractable subspaces: 510 has 9,765,625 points Work with small programs, of necessity Enumerate full 510 subspaces & analyze data offline First enumeration, FMIN, took 14 CPU-months Today, takes about 2.5 CPU-months We’ve done 6 or 7 full enumerations in 510

• Follow paradigm from iterative repair scheduling work Large-scale studies to gain insight, randomization & restart

Farm of Apple XServesand a couple of Suns≥ 60,000,000 compilations &

evaluations

Characterizing the Spaces

FMIN

Page 25: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 25

What Have We Learned About Search Spaces?

adpcm-coder, 54 space, plosn

p: peelingl: PREo: logical peepholes: reg. coalescingn: useless CF elimination

We confirmed some obvious points

These spaces are:

• not smooth, convex, or differentiable

• littered with local minima at different fitness values

• program dependent

Characterizing the Spaces

Page 26: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 26

What Have We Learned About Search Spaces?

p: peelingl: PREo: logical peepholes: reg. coalescingn: useless CF elimination

We confirmed some obvious points

These spaces are:

• not smooth, convex, or differentiable

• littered with local minima at different fitness values

• program dependent

fmin, 54 space, plosn

Characterizing the Spaces

Page 27: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 27

What About Presentation Order?

Clearly, order might affect the picture …

adpcm-coder, 54 space, plosn

Still, some bad local minima

Reality Fiction

*

Page 28: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 28

What Have We Learned About Search Spaces?

Both programs and optimizations shape the space

p: peelingl: PREo: logical peepholes: reg. coalescingn: useless CF elimination

Range is 0 to 70% Can approximate distribution with 1,000 probes

Characterizing the Spaces

Page 29: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 29

What Have We Learned About Search Spaces?

Both programs and optimizations shape the space

p: peelingd: dead code elimination n: useless CF eliminationx: dominator value num’g t: strength reduction

Range is compressed (0-40%) Best is 20% worse than best in “plosn”

Characterizing the Spaces

Page 30: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 30

What Have We Learned About Search Spaces?

Many local minima are “good”

Characterizing the Spaces

Many local minima

258 strict

27,315 non-strict

Lots of chances for a search to get stuck in a local minima

Page 31: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 31

What Have We Learned About Search Spaces?

Distance to a local minimum is small

Characterizing the Spaces

Downhill walk halts quickly

Best-of-k walks should find a good minimum, for big enough k

Page 32: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 32

Search Algorithms

• Knowledge does not make the code run faster

• Need to use our knowledge to build better search techniques

Search Algorithms

Moving from curiosity-driven research to practical work

Page 33: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 33

Search Algorithms: Genetic Algorithms

• Original work used a genetic algorithm (GA)

• Experimented with many variations on GA

• Current favorite is GA-50 Population of 50 sequences 100 evolutionary steps

• At each step Best 10% survive Rest generated by crossover

• Fitness-weighted reproductive selection• Single-point, random crossover

Mutate until unique

GA-50 finds best sequence within 30 to 50 generations

Difference between GA-50 and GA-100 is typically < 0.1%

This talk shows best sequence after 100 generations …

Search Algorithms

Page 34: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 34

Search Algorithms: Hill climbers

Many nearby local minima suggests descent algorithm

• Neighbor Hamming-1 string (differs in 1 position)

• Evaluate neighbors and move downhill

• Repeat from multiple starting points

• Steepest descent take best neighbor

• Random descent take 1st downhill neighbor (random order)

• Impatient descent random descent, limited local search HC algorithms examine at most 10% of neighbors HC-10 uses 10 random starting points, HC-50 uses 50

Search Algorithms

Page 35: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 35

Search Algorithms: Greedy Constructive

Greedy algorithms work well on many complex problems

How do we do a greedy search?

Algorithm takes k · (2n - 1) evaluations for a string of length k

Takes locally optimal steps

Early exit for strings with no improvement

1. start with empty string

2. pick best optimization as 1st element

3. for i = 2 to ktry each pass as prefix and as suffixkeep the best result

Local minimum under a different notion of neighbor

Search Algorithms

Page 36: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 36

Search Algorithms: Greedy Constructive

One major complication: equal-valued extensions, or ties

• Ties can take GC to wildly different places

• Have experimented with three GC algorithms GC-exh explores pursues all equal-valued options GC-bre does a breadth-first rather than depth-first search GC-10 & GC-50 break ties randomly and use restart to

explore

Experiments use GC-10 & GC-50

adpcm-d GC-exh GC-bre GC-50

Sequences checked

91,633 325 2,200

Code speed — + 0.003%

+ 2%

Search Algorithms

Page 37: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 37

Search Algorithm Results

Variety of codes

5 searches + training/testing

All do pretty well

Operations executed relative to rvzcodtvzcodSimulated RISC machine

1610 space

Search Algorithms *

Page 38: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 38

Search Algorithm Results

Variety of codes

5 searches + training/testing

All do pretty well

Greedy has some problems - fmin, tomcatv - price to pay?

Operations executed relative to rvzcodtvzcodSimulated RISC machine

1610 space

Search Algorithms *

Page 39: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 39

Search Algorithm Results

Variety of codes

5 searches + training/testing

All do pretty well

Training/testing data shows small variation - no systematic bias from training data

Operations executed relative to rvzcodtvzcodSimulated RISC machine

1610 space

Search Algorithms

Page 40: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 40

Search Algorithm Costs

Sequences evaluated by search algorithm

4,550

Surprisingly fast - old GA took 6,000

- several < 1,000

GC can explode - zeroin, nsieve

50 generations of GA-50 does almost as well as 100

1610 space

Search Algorithms

Page 41: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 41

Focusing on the Cheap Techniques

Sequences evaluated by search algorithm

HC-10: - 10x faster than old GA - 7x faster than GA-50

GC-10 - does well, on average - ties cause problems - fmin & tomcatv had slow code; does not show up in costs…

1610 space

Search Algorithms

Page 42: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 42

Search Algorithms

So, where are we?

• Find good sequences in 200 to 600 probes Good means “competitive with GA-50 for 100 generations”

• How close are we to global minimum? Cannot know without enumeration Enumeration is hard to justify in 1610 (& harder to

perform) Current experiment: HC-100000 on several codes

(on a 264-processor IA-64 machine)

• Next major step is bringing code features into the picture …

Search Algorithms

Page 43: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 43

Designing Practical Adaptive Compilers

User may not want to pay 300x for compilation

Moore’s law will help …

Engineering approaches

• Make the search incremental across many compilations

• Develop faster techniques to evaluate sequences on codes

• Use parallelism

• And, of course, make the compiler fast

Page 44: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 44

Speeding up Evaluation

Compile-evaluate cycle takes most of the time

• Faster evaluation methods Low-level performance models (Mellor-Crummey

et al.) Analytical models (Soffa

et al.)

• Machine-learning to predict sequences Probabilistic models may reveal consistent relationships Want to relate sequences to source-code features

Success in any or all of these approaches could reduce the cost of evaluating each sequence

Evaluation Speed

Preliminary conjecture

Page 45: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 45

But, All These Experiments Used Our Compiler

We have tried a number of other compilers, with no success

• Try to reorder a pass in GCC

• Hit problems in GCC, SUIF-1, ORC, …

• Look forward to using LLVM & Phoenix

Our platform is reconfigurable by accident of design

• Have run > 100,000,000 configurations in our system

• One unavoidable phase-order bug

We have used MIPSPro in another series of experiments

AdaptiveControl

SubjectCompiler

code exe

feedback

Page 46: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 46

Road Map for our Project

Our goals• Short term (now)

Characterize the problems, the potential, & the search space Learn to find good sequences quickly (search)

• Medium term (3 to 5 years) Develop proxies and estimators for performance

(speed ) Demonstrate practical applications for adaptive scalar optimization Understanding interface between adaptive controller & compiler

• Long term (5 to 10 years) Apply these techniques to harder problems

• Data distribution, parallelization schemes on real codes• Compiling for complex environments, like the Grid

Develop a set of design & engineering principles for adaptive compilers

Processor

Speed

*

Page 47: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 47

Where Does This Research Lead?

Practical systems within ten years

How will they work? (Frankly, we don’t yet know)

• Efficient searches that capitalize on properties of the space

• Incremental searches distributed over program development

• Predictive techniques that use program properties to choose good starting points

• Compiler structures & parameterizations that fit adaptation

In the meantime, we have a lot of work to do

And machines will keep getting faster …

Page 48: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 48

And that’s the end of my story …

Extra Slides Start Here

Page 49: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 49

Choosing the Right Techniques

How can a compiler writer pick the right techniques?

• Depends on source, target, & code shape

• “Right” probably changes from program to program

• Multiple algorithms for each effect Different scopes, cases, domains, strengths, & weaknesses Overlap between effects complicates choices

• Value numbering and constant propagation• LCM also does LICM

Choices are non-obvious

• Classic compiler is one size fits all May partially explain Evelyn Duesterwald’s phase behavior

provably

Decisions made before it sees your code!

Page 50: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 50

Experience from Early Experiments

Improving the Genetic Algorithm

• Experiments aimed at understanding & improving convergence

*

• Larger population helps Tried pools from 50 to 1000, 100 to 300 is about right

• Use weighted selection in reproductive choice Fitness scaling to exaggerate late, small improvements

• Crossover 2 point, random crossover Apply “mutate until unique” to each new string

• Variable length chromosomes With fixed string, GA discovers NOPs Varying string rarely has useless pass

GA now dominates both the hill climber and random probing of the search space, in results/work.

Page 51: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 51

We’re doing the research & building the compilers, but we are not getting the results we need…

Why Can’t I Find a Good Compiler?

• Hundreds of papers on transformations

• Hundreds of papers on analysis

• Even a few papers on experience …

• Lag from new machine to good compiler is 5 years

• Market lifetime is often less than compiler development time

• Codesign isn’t enough; we need either predesign or reuse

This is an engineering problem, in the best sense of the word

Page 52: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 52

Modular Components

Front End Middle End Back End

Our picture

• Set of self-contained passes with uniform interfaces

• Mix and match, pick-a-pass

• Easy to test, debug, evaluate passes competitively

• Simple interpass structure: knowledge is in IR

Our adaptive work on compilation order needs this kind of code …

Few (if any) compilers work this way!

Page 53: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 53

Modular Components

Code depends heavily on IR, inhibiting reuse in other systems

Most Compilers

• Many idiosyncratic interpass dependences

• Complicated interactions and interfaces

• No real freedom to mix, match, & change

Front End Middle End Back End

3 all

Page 54: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 54

Modular Components

Front End Middle End Back End

We need to build compilers like this

• Don’t want to implement the same algorithm many times• Do want to mix-and-match, reorder, change parameters

Matter of engineering• Follow reasonable design principles • Limit global state• Use algorithm libraries Representation independent

analysis & clean interfaces

Page 55: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 55

Do we have a problem?

• In some applications, performance does matter Oil Industry, Earth Simulator, Wildfire & Tornado prediction

… Large computations with commensurate economic impact

• Quality compilers for new hardware are years late Historical lag time is about five years (IA-

64?)

• Speed is no longer the only measure of quality Code space, power, and other emerging concerns If 40 years hasn’t solved speed, how long for power?

• Delivered performance? LANL sees 4 to 13% of peak on real codes, 70% on LINPACKD

We cannot (in general) deliver good compilers in a timely fashion

Page 56: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 56

The Need for Speed

Characterize actual speed on ASCI Blue Mountain

• Separated uniprocessor & multiprocessor issues (R10000)

• Measured performance PARTISN: 13% of peak SAGE: 4% of peak LINPACKD 70% of peak

• Instruction mix in actual code differs from benchmarks PARTISN: 3 memory ops per flop, 2.4 ops per memory op SAGE: 3 memory ops per flop, 3 ops per memory op Linpack: O(n3) flops for O(n2) memory op SpecFP is 39% memory, 18% floating point, 26% integer

Wasserman et al., LANL CCS-3 Group

Real codes

Page 57: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 57

Is the Situation Hopeless?

Solver in SAGE accounts for 40-60% of runtime

• Original code achieves 5.5% of peak

• After appropriate unroll & jam, it achieves 10% of peak Mellor-Crummey & Garvin Twice as fast, but still a long way to go …

Why didn’t MIPSPro do the unroll & jam? (Carr 93)

• Complex transformation

• Difficult to determine parameters

• … and MIPSPro is a good compiler

Page 58: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 58

Support Slides Begin Here

Page 59: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 59

Characterizing FMIN

• One application, FMIN 150 lines of Fortran, 44 basic blocks Exhibited complex behavior in other experiments

• Ran hill-climber from 100 random starting points Picked the 5 most used passes from winners

• Evaluated all strings of length 10 from those 5 9,765,625 distinct combinations (for speed) 34,000 to 37,000 experiments/machine/day Produces sequence & number of cycles to execute First subspace took 14 CPU-months to complete

• We’re learning from the results Can run experiments against the data offline

FMIN

Have now done several subspaces, at 250,000 experiments/machine/day

Back

Page 60: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 60

Results from FMIN

In the initial subspace - plosn (9,765,625 sequences)

• Best sequence is 986 cycles

• Seven at 987

• Worst sequence is 1716

• Unoptimized code takes 1765

• Range is 3% to 43% improvement

*

With full set of transformations:

GA: 822 in 184 generations of 100 825 in 152 generations of 100

Hill: 833 at round 2232 830 at round 750

Local minima

• 258 strict local minima (x < all Hamming-1 points)

• 27,315 non-strict local minima (x ≤ all Hamming-1 points)

• All local minima are improvements

• They come in many shapes Back

Page 61: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 61

Results from FMIN

132 of these spikes, from loop peeling

Strict local minima have many shapes

Minimum

Hamming-1 points

Magnitude is wall height (≥ 1)

Landscape looks like a pock-marked asteroid, in 10 dimensions

Back

Page 62: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 62

Applying IR Scheduling

1. Measure average ready queue length in list scheduler

2. Check resulting code No holes in schedule take the existing schedule Holes in the schedule + schedule longer than critical path invoke IR scheduler to look for better schedule

This regime invokes IR when it might pay off

• Infrequent use

• Reasonable likelihood of profit

support

Randomized Schedulers

Page 63: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 63

Iterative Repair Scheduling (& Int. Lin. Prog.)

• We also tried integer linear programming

• Produced improvements that were similar to IR scheduling

• CPLEX ran for a long time on simple cases Too many choices that lead to good schedules May be an artifact of our formulation of the problem From this perspective, Sebastian Winkel appears to have a

better formulation than ours

support

Randomized Schedulers

Page 64: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 64

Support: Choosing Optimization Sequences

Consider redundancy elimination

• Dominator Value Numbering (DVNT) Extends local value numbering to dominator regions

• Alpern-Wegman-Zadeck’s partitioning method (AWZ) Uses Hopcroft’s DFA minimizer to prove equivalence Second pass performs replacement

• GCSE using Available Expressions Global data-flow analysis to find redundant expressions Second pass performs replacement

• Lazy Code Motion (LCM) Extends GCSE to handle partial redundancy Computes careful placement for inserted expressions

Optimization Sequences

support

Page 65: Evolving the Next Generation of Compilers Keith D. Cooper Department of Computer Science Rice University Houston, Texas Collaborators: Phil Schielke, Devika.

Evolving the Next Generation of Compilers 65

Support: Choosing Optimization Sequences

How do we choose among DVNT, AWZ, GCSE, & LCM ?

• Only DVNT finds x + x = 2 * x

• Only DVNT & AWZ can prove x * y = x * z when y = z

• DVNT cannot find redundancy across a loop-closing branch

• Only AWZ is optimistic; it finds loop-based redundancies that the others miss

• Only LCM finds partial redundancies

• Only DVNT folds constants

• Only LCM moves loop-invariant code

The BEST choice depends on the input code, the opportunities that it presents, and the tradeoffs that it embodies.

Optimization Sequences

support