Phoenix: a framework for Code Generation, Optimization and Program Analysis
description
Transcript of Phoenix: a framework for Code Generation, Optimization and Program Analysis
Phoenix: a framework for Code Generation, Optimization
and Program AnalysisAndrew Pardoe Phoenix team [email protected]
What is Phoenix? Phoenix is Microsoft’s next-generation, state of the
art infrastructure for program analysis and transformation
We wanted to… Develop an industry-leading compilation and tools
framework Foster a rich ecosystem for
Academic Research Industry
With an infrastructure that is robust, retargetable, extensible, configurable and scalable
Phoenix is built on C++/CLI and compiles either as managed or native code
Building a program with C++/CLI Microsoft C++ compiler
Input: program source code Ouput: COFF object file COFF files are linked with system libraries into PEs
Driver (CL)
C++Source
Frontend(C1)
Backend(C2)
ObjFile
Roles of C1 (C1xx) and C2
C1 or C1xx C2
Preprocessing Tokenization Parsing Semantic processing CIL emission * Types and symbol
debug info Metadata for
managed code
* CIL reading Program analysis Optimization Lowering to target COFF emission Source level debug
info
Why we built Phoenix Code generation technology now appears in
many different forms Large-scale optimizers (PreJIT or C++’s LTCG) Fast code generation (.NET’s JIT, C++ debug
mode, C#) Custom code generators (fast conditional
breakpoints, SQL expression optimizers) Code generators in Microsoft target many
different computer architectures PC platforms (x86, x64, IA64) Game consoles (x86, PPC) Handheld devices (ARM)
And another set of reasons… Microsoft builds sophisticated analysis tools
VS 2005’s C++ compiler contains an /analyze switch to perform static analysis for code defects
The .NET coding guidelines are enforced by FxCop We have tools for defect, security and race
detection These tools are often developed in a manner
that work for one specific product. This limits… Retargeting the tool for other applications Ability to adopt the best-of-breed technology Ability to move forward as technology changes
Why the rest of the world needs Phoenix 一
Research Research often spends too much time handling routine
work instead of exploring the novel ideas that inspired the research
If research doesn’t build on a world-class framework it often cannot handle real-world problems
Industry Much effort is spent on deciphering poorly documented
formats and interfaces (Microsoft’s CIL or PE file formats) There is an inherent fragility in working without
specifications or promises of future compatibility Industry “mistakes” end up costing Microsoft as well
Academic Attempts to provide common infrastructures have had
limited success in the past By using Phoenix, educators can start with big problems
and leave the routine work to us
PhoenixInfrastructure
.Net CodeGenRuntime JITsPre-JITOO and .Net optimizations
Native CodeGenAdvanced C++/OO OptimizationsFP optimizationsOpenMP
Retargetable“Machine Models”~3 months: -Od~3 months: -O2
Chip Vendor CDK~6 month portsSample port + docsKey ports (Xscale) done at msft
Academic RDKFull sources (future)Managed API’sIP as DLLsDocs
MSR & Partner ToolsBuilt on Phoenix API’sBoth HL and LL API’sManaged API’sProgram AnalysisProgram Rewrite
MSR Adv LangLanguage ResearchDirect xfer to PhoenixResearch Insulated from code generation
AST ToolsStatic Analysis ToolsNext Gen Front-EndsR/W Global Program Views
Key features of Phoenix Written in C++ but usable by any .NET language
Samples provided in C# and C++/CLI Phase and Plug-In model for third-party extensions
to: C++ compiler backend, JIT/PreJIT Static analysis tools, binary analysis and manipulation Plug-Ins and extensions to the Phoenix architecture
Single, strongly-typed, explicit dataflow/control flow IR used throughout all phases of the framework
IR and Type system are capable of processing native and managed code
Strong inter-phase consistency checking
Delphi Cobol
HL
Opt
s
LL O
pts
Cod
e G
en
HL
Opt
s
LL O
pts
LL O
pts
HL
Opt
s
NativeImage
C#
Phoenix Core
AST IR Syms Types CFGraph SSADataflow Alias EH Readers Writers
Xlator
Formatter
Browser
Phx APIs
Profiler
Obfuscator
Visualizer
SecurityChecker
Refactor
Lint
VB
C++ IL.NETassembly
C++
C++AST
PreFast
Profile
Eiffel
C++
Phx AST
Lex/Yacc
Tiger
Cod
e G
en
Compilers Tools
CLR
JIT
CLR
Pre
JIT
er
VC++V
C+
+ B
E
The Phoenix Building Blocks
Core StructuresAnd Utilities
High Level Optimizations
Low LevelOptimizations
MachineAbstractions
Dynamic Tools
Loca
ityop
ts
Static Tools
Ana
lysi
s
Phoenix Architecture Core set of extensible classes to represent
IR (intermediate representation of code stream) Symbols, Types, Function units, Basic blocks,
Graphs, Trees, Aliasing information Layered set of analysis and transformation
components Data flow analysis, Loop analysis, Alias analysis,
Dead code removal, Redundant code detection Global optimizations built on reusable analysis
lattices Common input/output library for binary formats
PE, LIB, OBJ, CIL, MSIL, PDB Phoenix both reads and writes binary formats
Simple example
void main (int argc, char** argv){
char * message;
if (argc > 1) message = “Hello, world!\n”;
elsemessage = “Goodbye, world!\n”;
printf (message);}
View inside a Phoenix-based C2
AST HIR MIR LIR EIR
CIL ReaderType Checker
MIR LowerSSA ConstSSA DestCanonAddr Modes
LowerReg AllocEH LowerStack AllocFrame GenSwitch LowerBlock LayoutFlow Opts
EncodeLister
C2C1
CIL
SOURCE
OBJECT
Types of IR High-level IR: Architecture and runtime
independent. Object model instructions, array indices, full aliasing
Mid-level IR: Architecture independent, runtime dependent. Lowered to calls and address arithmetic
Low-level IR: Architecture and runtime dependent. Lowered to machine instructions
Encoded IR: Binary format. Lowered to encoded data instructions
IRs contain Instructions and Operands of various types at each IR level
IR states during compilation
Phases transform IR either within a state or from one state to a contiguous state
For example, lower phase transforms MIR into LIR. Optimizations usually work within a single phase.
Abstract Concrete
Lowering
Raising
AST HIR MIR LIR EIR
Extending a Phoenix-based compiler The VC++ optimizer is just a Phoenix client All Phoenix clients can host Plug-Ins Plug-Ins can
Add new components Extend existing components Reconfigure clients
Extensibility relies upon Reflection Events and delegates
Component extensibility Most objects in the system support observers by
deriving from the Phoenix class Extensible Object Observer classes can register delegates so that
they are notified when the host object undergoes certain events. For example, if the host object is copied it will notify registered delegates
Phoenix provides a standard plug-in discovery and registration mechanism
Plug-ins can reconfigure the client, such as replacing the register allocator
Plug-ins can also use Phoenix’s analyses to do their own analyses and transformations
Extensibility example – birth tracking
// Called from Instruction ctorPlugIn::NewInstructionEventHandler( Phx::IR::Instruction ^ instruction){ InstructionBirthExtensionObject ^ extensionObject
= gcnew InstructionBirthExtensionObject();
extensionObject->BirthPhase = instruction->FunctionUnit->Phase;
instruction->AddExtensionObject(extensionObject);}
// Called from Instruction dtorvoidPlugIn::DeleteInstructionEventHandler( Phx::IR::Instruction ^ instruction){ InstructionBirthExtensionObject ^ extensionObject
= InstructionBirthExtensionObject::Get(instruction);
instruction->RemoveExtensionObject(extensionObject);
}
// Attach a note to each instruction with the birth
// phase for reference later
public ref class InstructionBirthExtensionObject : public Phx::IR::InstructionExtensionObject
{public:
property Phx::Phases::Phase ^ BirthPhase;
property System::String ^ BirthPhaseText { System::String ^ get () { if (BirthPhase != nullptr) { return BirthPhase->NameString; } return ""; } }};
Plug-In VS Integration 三
Plug-Ins can be created via Visual Studio Wizards
RDK is downloadable and works with free VS Express Editions (though you probably want the VS Team System Edition for your work : )
Example: Unitialized local detection
We would like to warn the user that ‘x’ is not initialized before use
To do this we need to perform dataflow analysis
We’ll use a plug-in to add this phase to the existing Phoenix-based C2
int foo(){
int x;return x;
}
May and Must examples
message may be usedbefore it is defined
message must be used before it is defined
void main(…){
char * message;if (…)
message = “Hello”;printf(message);
}
void main(…){
char * message;char * other;if (…)
other = “Hello”;printf(message);
}
Detecting an uninitialized use For each local variable v
Examine all paths from the entry of the method to each use of v
If on every path v is not initialized before the use v must be used before it is defined
If there is some path where v is not initialized before the use v may be used before it is defined
Classic solution is to build a control flow graph and solve the data flow problem.
State is “unknown” at the start of each block. Transfer states between blocks and combine them as you traverse the control flow graph
Code sketch using classic dataflowbool changed = true;
while (changed)
{
for each (Phx::Graphs::BasicBlock block in function)
{
STATE ^ inState = inStates[block];
bool firstPred = true;
for each(Phx::Graphs::BasicBlock predecessorBlock in block->Predecessors)
{
STATE ^ predecessorState = outStates[predecessorBlock];
inState = meet(inState, predecessorState);
}
inStates[id] = inState;
STATE ^ newOutState = gcnew STATE(inState);
for each(Phx::IR::Instruction ^ instruction in block->Instructions)
{
for each (Phx::IR::Operand ^ operand in instruction->DestinationOperands)
{
Phx::Symbols::LocalVariableSymbol^ localSymbol =
operand->Symbol->AsLocalVariableSymbol;
newOutState[localSymbol] = destination(newOutState[localSymbol]);
}
}
STATE ^ outState = outStates[id];
bool blockChanged = ! equals(newOutState, outState);
if (blockChanged)
{
changed = true;
outStates[id] = newOutState;
}
}
}
Update input state
Compute output state
Check for convergence
Can we make this easier? Dataflow solution computes the state for the
entire graph, even at places where v is never referenced
An alternate model is known as “Static Single Assignment” form, or SSA. It directly connects definitions and uses.
Phoenix uses SSA and builds flow graphs when necessary
We can rewrite this code letting Phoenix do most of the routine work
Code sketch using Phoenix 四
for each (Phx::IR::Operand ^ destinationOperand in Phx::IR::Operand::IteratorDestinations(firstInstruction))
{ if (destinationOperand->IsMemoryModificationReference) { for each (Phx::IR::Operand ^ useOperand in
Phx::IR::Operand::IteratorUse(destinationOperand)) { if (useOperand->Instruction->Opcode !=
Phx::Common::Opcode::Phi && useOpnd->IsVariableOpnd)
{ Phx::Symbols::Symbol ^ symbolUse = useOperand->
AsVariableOpnd->Symbol; if (symbolUse != nullptr && !mustList.Contains(symbolUse)) { mustList.Add(symbolUse); } } } }}
Uninitialized local plug-in
Plug-in is loaded at runtime by Phoenix-based C2
UninitializedLocal.cpp
C++/CLI
UninitialzedLocal.dll
Test.cpp
C1
Test.obj
Phx-C2
Phoenix C2 with our plug-in added This complete plug-in is provided as a sample
in the Research Development Kit It is only ~400 lines of code to add a key
warning to the C2 compiler Other types of checking can be added just as
easily A demonstration of the warnings being
emitted:
Phoenix PE Reading Phoenix can read and write PE files directly
You can implement your own compiler or linker You can create post-link tools for analysis,
instrumentation or optimization Binaries can be read in, raised into IR, changed
and rewritten as new, working binaries Phoenix Explorer is only ~800 lines of code on
top of the Phoenix binary reading-writing library
Binary rewriting with Phoenix mtrace utility injects tracing code into
managed applications You don’t need the source code to do this
(you do need the PDB) mtrace shows functions being entered and
exited
How do I get Phoenix? Early access RDKs are available to selected
universities Sample projects include aspect oriented
programming, code obfuscation, profiling Contact [email protected] for Academic early
access requests Early access CDK is available to selected
industry partners Contact [email protected] for commercial early
access requests Phoenix RDK/CDKs release about every 6
months Phoenix will be the next MS compiler backend
We build the next-generation Windows every night