Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations...
Compiler Optimization Overview
1. Computer Hardware Architecture Review2. Analysis
3. Optimizations4 . Continuing Development
Review: Phases of a Compiler
Intermediate code optimizations are not machine specific
Low level optimizations can be machine specific
Review: CISC vs RISK
CISC x86 Intel Multi-clock complex
instructions Memory access
incorporated in instruction
Complex instruction set
RISC Mac Powerbook Single clock
instructions Memory accesses
are separate instructions
Simple instruction set
Review: Memory Hierarchy
Memory access becomes exponentially slower at higher levels
Memory access intensive programs require special optimizations
Review: Multiple Cores
Need to create and use ILP
Multiple cores on the same die can share cache working together faster
Can only execute trivial parallelism (Dr. Doughty)
Must eliminate hazards
Compiler Optimization Overview
1. Computer Hardware Architecture Review2. Analysis
3. Optimizations4 . Continuing Development
Optimizing for Speed*
Useful for CPU intensive applications (graphics, video editing, sorting)
Scheduling – out of order execution Removal of dependencies increase ILP Instruction latency Multiple ALUs, Cores, etc Mix instruction types (int, float, mult, read,
write) Eliminate jumps Buffer writes (cannot write out of order)
Optimizing for Size
More common for embedded applications Competing with power/speed optimizations
Limiting code size to keep critical loops in memory
Choose form of instruction that is smaller (CISC)
Use short constants for jumps (simpler form of addressing)
Increase instruction length for loop alignment
Optimizing for Memory
Useful for memory I/O intensive applications Consideration of proper alignment of data
and instructions to reduce cache misses and improve results of paging
Use instructions for controlling cache Partially addresses Von Neumann bottleneck Reading lowest level cache in P4 is 3 clocks
Each higher level is an order of magnitude larger (10, 100)
Alias Analysis
Determines if there are multiple ways to access a single data point
Knowing aliases helps identify optimizations by recognizing data dependencies and locating redundant code/data updates
Alias analysis is critical for global optimizations (reference parameters, globally defined data, pointers)
Control Flow Analysis
Precursor to critical loop reductions Replacement of inefficient code
Gathers information concerning hierarchical flow of control
Identifies potential branches in program execution useful for mitigating pipeline hazards
Data Flow Analysis
Procure information about how a procedure uses data
Builds on structures from control flow analysis
There are many ways to achieve goal: Reaching definitions
Calculate potential definitions at a give point in the code
Iterative Analysis Use control graph
Structural Analysis etc
Dependence Analysis*
Recognizes relationships using a DAG True/Flow dependence Anitdependence Output dependence Input dependence (does not affect execution order)
Instruction scheduling Data caching
Interprocedural Analysis
Incorporates analysis methods discussed earlier, but on a broader level
OOD and high level coding methodologies are optimal for human understanding, not computer processing
Includes analysis of relationships between function calls to mitigate overhead of OOD oriented code
Compiler Optimization Overview
1. Computer Hardware Architecture Review2. Analysis
3. Optimizations4 . Continuing Development
Loop Optimizations*
Loop optimizations have the greatest impact on overall code performance
Desire to reduce dependencies to allow ILP Desire to reduce overhead of jumping and
branching in loop Predictability – predicting loop behavior to
mitigate pipeline hazards Loops must be well behaved
Single return No breaks, branches, etc
Procedure Optimizations
Based on control flow Desire to eliminate overhead of context
switches Possibly turn function calls into branches Optimizations occur at high and low level
High level – Procedure integration Low level – In line expansion
Conventions Leaf routines (call no others) have reduced
overhead Shrink wrapping creates pseudo leaves by
adding data flow analysis
Code Scheduling*
Block Scheduling Blocks optimized as independent pieces of code Cross block scheduling applied to optimized
blocks Branch Scheduling
Fill stall cycles after branch with independent code
Reduces effect of bad branch predictions in HW pipeline
Software Pipelining Executes multiple iterations of loops
synchronously
Register Allocation
Applies to low level assembly Loops and nesting are used to weigh which
values should be maintained in registers Nested loops weigh more heavily Considers variable activity before and after block
of code is accessed Use of operation costs and number of times they
are performed
Register Allocation: Graph Coloring
Use subset of objects that should be allocated to registers
Arcs represent points where two objects exist at the same time
Arcs represent conflicts where the object cannot be assigned a register (int, float)
Color graph with number of colors equal to number of registers
Assign registers based on color
Redundancy Elimination
Based on data flow analysis Intermediate level optimization Includes:
Common subexpression elimination Loop invariant code motion Partial redundancy elimination Code hoisting
Peephole Optimizations
Focused on very small subsets of code Generally performed late in the code process Arguably covers up bad and incomplete
optimizations from earlier processes Some examples include:
Dead code elimination (created from earlier optimizations)
Strength reductions Constant folding Instruction combining Copy propagation Algebraic simplifications
Compiler Optimization Overview
1. Computer Hardware Architecture Review2. Analysis
3. Optimizations4 . Continuing Development
Continuous Relevance of Compiler Development
Back end of compilers for older languages are reworked to take advantage of advances in hardware
Pipelines are becoming longer Multiple cores are now common allowing
more use of parallel instructions
Research Areas
Domain specific subjects: security, reliability, parallel, distributed, embedded, mobile
Analysis, prediction, and debugging tools Embedded JIT compilation Development of a research compiler (GCC) Enhancing compiler optimization times,
specifically iterative and whole program optimizations
MS F# - functional language for .NET like ML
Compiler Job Options
Additional exploitation of parallel computing environments for desktop platforms
Multiple OS/Environment support Integration of AI techniques, machine
learning, to know when, how, where to apply optimizations (GCC)
Special purpose languages for video, graphics, and audio processing (nVidea)
Special purpose vendors for embedded products (Wind River, VxWorks)
Compiler Job Options
Library adaptation for reconfigurable processors (GCC)
Fault tolerance and exception handling for security