Phoenix: a framework for Code Generation, Optimization and Program Analysis

35
Phoenix: a framework for Code Generation, Optimization and Program Analysis Andrew Pardoe Phoenix team [email protected]

description

Phoenix: a framework for Code Generation, Optimization and Program Analysis. Andrew Pardoe Phoenix team [email protected]. What is Phoenix?. Phoenix is Microsoft’s next-generation, state of the art infrastructure for program analysis and transformation We wanted to… - PowerPoint PPT Presentation

Transcript of Phoenix: a framework for Code Generation, Optimization and Program Analysis

Phoenix: a framework for Code Generation, Optimization

and Program AnalysisAndrew Pardoe Phoenix team [email protected]

What is Phoenix? Phoenix is Microsoft’s next-generation, state of the

art infrastructure for program analysis and transformation

We wanted to… Develop an industry-leading compilation and tools

framework Foster a rich ecosystem for

Academic Research Industry

With an infrastructure that is robust, retargetable, extensible, configurable and scalable

Phoenix is built on C++/CLI and compiles either as managed or native code

Building a program with C++/CLI Microsoft C++ compiler

Input: program source code Ouput: COFF object file COFF files are linked with system libraries into PEs

Driver (CL)

C++Source

Frontend(C1)

Backend(C2)

ObjFile

Roles of C1 (C1xx) and C2

C1 or C1xx C2

Preprocessing Tokenization Parsing Semantic processing CIL emission * Types and symbol

debug info Metadata for

managed code

* CIL reading Program analysis Optimization Lowering to target COFF emission Source level debug

info

Why we built Phoenix Code generation technology now appears in

many different forms Large-scale optimizers (PreJIT or C++’s LTCG) Fast code generation (.NET’s JIT, C++ debug

mode, C#) Custom code generators (fast conditional

breakpoints, SQL expression optimizers) Code generators in Microsoft target many

different computer architectures PC platforms (x86, x64, IA64) Game consoles (x86, PPC) Handheld devices (ARM)

And another set of reasons… Microsoft builds sophisticated analysis tools

VS 2005’s C++ compiler contains an /analyze switch to perform static analysis for code defects

The .NET coding guidelines are enforced by FxCop We have tools for defect, security and race

detection These tools are often developed in a manner

that work for one specific product. This limits… Retargeting the tool for other applications Ability to adopt the best-of-breed technology Ability to move forward as technology changes

Why the rest of the world needs Phoenix 一

Research Research often spends too much time handling routine

work instead of exploring the novel ideas that inspired the research

If research doesn’t build on a world-class framework it often cannot handle real-world problems

Industry Much effort is spent on deciphering poorly documented

formats and interfaces (Microsoft’s CIL or PE file formats) There is an inherent fragility in working without

specifications or promises of future compatibility Industry “mistakes” end up costing Microsoft as well

Academic Attempts to provide common infrastructures have had

limited success in the past By using Phoenix, educators can start with big problems

and leave the routine work to us

PhoenixInfrastructure

.Net CodeGenRuntime JITsPre-JITOO and .Net optimizations

Native CodeGenAdvanced C++/OO OptimizationsFP optimizationsOpenMP

Retargetable“Machine Models”~3 months: -Od~3 months: -O2

Chip Vendor CDK~6 month portsSample port + docsKey ports (Xscale) done at msft

Academic RDKFull sources (future)Managed API’sIP as DLLsDocs

MSR & Partner ToolsBuilt on Phoenix API’sBoth HL and LL API’sManaged API’sProgram AnalysisProgram Rewrite

MSR Adv LangLanguage ResearchDirect xfer to PhoenixResearch Insulated from code generation

AST ToolsStatic Analysis ToolsNext Gen Front-EndsR/W Global Program Views

Key features of Phoenix Written in C++ but usable by any .NET language

Samples provided in C# and C++/CLI Phase and Plug-In model for third-party extensions

to: C++ compiler backend, JIT/PreJIT Static analysis tools, binary analysis and manipulation Plug-Ins and extensions to the Phoenix architecture

Single, strongly-typed, explicit dataflow/control flow IR used throughout all phases of the framework

IR and Type system are capable of processing native and managed code

Strong inter-phase consistency checking

Delphi Cobol

HL

Opt

s

LL O

pts

Cod

e G

en

HL

Opt

s

LL O

pts

LL O

pts

HL

Opt

s

NativeImage

C#

Phoenix Core

AST IR Syms Types CFGraph SSADataflow Alias EH Readers Writers

Xlator

Formatter

Browser

Phx APIs

Profiler

Obfuscator

Visualizer

SecurityChecker

Refactor

Lint

VB

C++ IL.NETassembly

C++

C++AST

PreFast

Profile

Eiffel

C++

Phx AST

Lex/Yacc

Tiger

Cod

e G

en

Compilers Tools

CLR

JIT

CLR

Pre

JIT

er

VC++V

C+

+ B

E

The Phoenix Building Blocks

Core StructuresAnd Utilities

High Level Optimizations

Low LevelOptimizations

MachineAbstractions

Dynamic Tools

Loca

ityop

ts

Static Tools

Ana

lysi

s

Phoenix Architecture Core set of extensible classes to represent

IR (intermediate representation of code stream) Symbols, Types, Function units, Basic blocks,

Graphs, Trees, Aliasing information Layered set of analysis and transformation

components Data flow analysis, Loop analysis, Alias analysis,

Dead code removal, Redundant code detection Global optimizations built on reusable analysis

lattices Common input/output library for binary formats

PE, LIB, OBJ, CIL, MSIL, PDB Phoenix both reads and writes binary formats

Simple example

void main (int argc, char** argv){

char * message;

if (argc > 1) message = “Hello, world!\n”;

elsemessage = “Goodbye, world!\n”;

printf (message);}

Resulting Phoenix IR 二

View inside a Phoenix-based C2

AST HIR MIR LIR EIR

CIL ReaderType Checker

MIR LowerSSA ConstSSA DestCanonAddr Modes

LowerReg AllocEH LowerStack AllocFrame GenSwitch LowerBlock LayoutFlow Opts

EncodeLister

C2C1

CIL

SOURCE

OBJECT

Types of IR High-level IR: Architecture and runtime

independent. Object model instructions, array indices, full aliasing

Mid-level IR: Architecture independent, runtime dependent. Lowered to calls and address arithmetic

Low-level IR: Architecture and runtime dependent. Lowered to machine instructions

Encoded IR: Binary format. Lowered to encoded data instructions

IRs contain Instructions and Operands of various types at each IR level

IR states during compilation

Phases transform IR either within a state or from one state to a contiguous state

For example, lower phase transforms MIR into LIR. Optimizations usually work within a single phase.

Abstract Concrete

Lowering

Raising

AST HIR MIR LIR EIR

Extending a Phoenix-based compiler The VC++ optimizer is just a Phoenix client All Phoenix clients can host Plug-Ins Plug-Ins can

Add new components Extend existing components Reconfigure clients

Extensibility relies upon Reflection Events and delegates

Component extensibility Most objects in the system support observers by

deriving from the Phoenix class Extensible Object Observer classes can register delegates so that

they are notified when the host object undergoes certain events. For example, if the host object is copied it will notify registered delegates

Phoenix provides a standard plug-in discovery and registration mechanism

Plug-ins can reconfigure the client, such as replacing the register allocator

Plug-ins can also use Phoenix’s analyses to do their own analyses and transformations

Extensibility example – birth tracking

// Called from Instruction ctorPlugIn::NewInstructionEventHandler( Phx::IR::Instruction ^ instruction){ InstructionBirthExtensionObject ^ extensionObject

= gcnew InstructionBirthExtensionObject();

extensionObject->BirthPhase = instruction->FunctionUnit->Phase;

instruction->AddExtensionObject(extensionObject);}

// Called from Instruction dtorvoidPlugIn::DeleteInstructionEventHandler( Phx::IR::Instruction ^ instruction){ InstructionBirthExtensionObject ^ extensionObject

= InstructionBirthExtensionObject::Get(instruction);

instruction->RemoveExtensionObject(extensionObject);

}

// Attach a note to each instruction with the birth

// phase for reference later

public ref class InstructionBirthExtensionObject : public Phx::IR::InstructionExtensionObject

{public:

property Phx::Phases::Phase ^ BirthPhase;

property System::String ^ BirthPhaseText { System::String ^ get () { if (BirthPhase != nullptr) { return BirthPhase->NameString; } return ""; } }};

Plug-In VS Integration 三

Plug-Ins can be created via Visual Studio Wizards

RDK is downloadable and works with free VS Express Editions (though you probably want the VS Team System Edition for your work : )

Example: Unitialized local detection

We would like to warn the user that ‘x’ is not initialized before use

To do this we need to perform dataflow analysis

We’ll use a plug-in to add this phase to the existing Phoenix-based C2

int foo(){

int x;return x;

}

May and Must examples

message may be usedbefore it is defined

message must be used before it is defined

void main(…){

char * message;if (…)

message = “Hello”;printf(message);

}

void main(…){

char * message;char * other;if (…)

other = “Hello”;printf(message);

}

IR for detecting uninitialized locals

Detecting an uninitialized use For each local variable v

Examine all paths from the entry of the method to each use of v

If on every path v is not initialized before the use v must be used before it is defined

If there is some path where v is not initialized before the use v may be used before it is defined

Classic solution is to build a control flow graph and solve the data flow problem.

State is “unknown” at the start of each block. Transfer states between blocks and combine them as you traverse the control flow graph

Code sketch using classic dataflowbool changed = true;

while (changed)

{

for each (Phx::Graphs::BasicBlock block in function)

{

STATE ^ inState = inStates[block];

bool firstPred = true;

for each(Phx::Graphs::BasicBlock predecessorBlock in block->Predecessors)

{

STATE ^ predecessorState = outStates[predecessorBlock];

inState = meet(inState, predecessorState);

}

inStates[id] = inState;

STATE ^ newOutState = gcnew STATE(inState);

for each(Phx::IR::Instruction ^ instruction in block->Instructions)

{

for each (Phx::IR::Operand ^ operand in instruction->DestinationOperands)

{

Phx::Symbols::LocalVariableSymbol^ localSymbol =

operand->Symbol->AsLocalVariableSymbol;

newOutState[localSymbol] = destination(newOutState[localSymbol]);

}

}

STATE ^ outState = outStates[id];

bool blockChanged = ! equals(newOutState, outState);

if (blockChanged)

{

changed = true;

outStates[id] = newOutState;

}

}

}

Update input state

Compute output state

Check for convergence

Can we make this easier? Dataflow solution computes the state for the

entire graph, even at places where v is never referenced

An alternate model is known as “Static Single Assignment” form, or SSA. It directly connects definitions and uses.

Phoenix uses SSA and builds flow graphs when necessary

We can rewrite this code letting Phoenix do most of the routine work

Code sketch using Phoenix 四

for each (Phx::IR::Operand ^ destinationOperand in Phx::IR::Operand::IteratorDestinations(firstInstruction))

{ if (destinationOperand->IsMemoryModificationReference) { for each (Phx::IR::Operand ^ useOperand in

Phx::IR::Operand::IteratorUse(destinationOperand)) { if (useOperand->Instruction->Opcode !=

Phx::Common::Opcode::Phi && useOpnd->IsVariableOpnd)

{ Phx::Symbols::Symbol ^ symbolUse = useOperand->

AsVariableOpnd->Symbol; if (symbolUse != nullptr && !mustList.Contains(symbolUse)) { mustList.Add(symbolUse); } } } }}

Uninitialized local plug-in

Plug-in is loaded at runtime by Phoenix-based C2

UninitializedLocal.cpp

C++/CLI

UninitialzedLocal.dll

Test.cpp

C1

Test.obj

Phx-C2

Phoenix C2 with our plug-in added This complete plug-in is provided as a sample

in the Research Development Kit It is only ~400 lines of code to add a key

warning to the C2 compiler Other types of checking can be added just as

easily A demonstration of the warnings being

emitted:

Phoenix PE Reading Phoenix can read and write PE files directly

You can implement your own compiler or linker You can create post-link tools for analysis,

instrumentation or optimization Binaries can be read in, raised into IR, changed

and rewritten as new, working binaries Phoenix Explorer is only ~800 lines of code on

top of the Phoenix binary reading-writing library

Phoenix explorer is like ILDasm to IR

Binary rewriting with Phoenix mtrace utility injects tracing code into

managed applications You don’t need the source code to do this

(you do need the PDB) mtrace shows functions being entered and

exited

How do I get Phoenix? Early access RDKs are available to selected

universities Sample projects include aspect oriented

programming, code obfuscation, profiling Contact [email protected] for Academic early

access requests Early access CDK is available to selected

industry partners Contact [email protected] for commercial early

access requests Phoenix RDK/CDKs release about every 6

months Phoenix will be the next MS compiler backend

We build the next-generation Windows every night

More information 五

http://research.microsoft.com/phoenix