Discussion Demographics of the developer community What made “the best C++ compiler” in 1980?...
-
Upload
grace-orman -
Category
Documents
-
view
218 -
download
2
Transcript of Discussion Demographics of the developer community What made “the best C++ compiler” in 1980?...
Discussion
• Demographics of the developer community
• What made “the best C++ compiler” in 1980?
• What’s changed since 1980?
Common Language RuntimeC
om
mo
n L
ang
uag
e R
un
tim
eC
om
mo
n L
ang
uag
e R
un
tim
eFrameworksFrameworks
Class loader and layoutClass loader and layout
IL t
o
IL t
o
nat
ive
cod
e n
ativ
e co
de
com
pile
rsco
mp
ilers
GC, stack walk, code managerGC, stack walk, code manager
Sec
uri
tyS
ecu
rity
Exe
cuti
on
Exe
cuti
on
Su
pp
ort
Su
pp
ort
Base ClassesBase Classes
Multiple Languages• Common type system
– Object-oriented in flavor– Procedural languages well-supported– Functional languages possible
• CLS guides frameworks design– Rules for wide reach– All .NET Framework functionality available
• Over 15 languages investigated– Most are CLS consumers– Many are CLS extenders
Metadata: Creation And Use
MetadataMetadata(and code)(and code)
DebuggerDebugger
Schema Schema GeneratorGenerator
ProfilerProfiler
OtherOtherCompilerCompiler
Proxy GeneratorProxy Generator
Type BrowserType Browser
CompilerCompiler
SourceSourceCodeCode
XML encodingXML encoding(SDL or SUDS)(SDL or SUDS)
SerializationSerialization(e.g. SOAP)(e.g. SOAP)
DesignersDesigners
ReflectionReflection
Execution ModelVBVB VCVC ...... ScriptScript
ILILNativeNativeCodeCode
““Econo”-JITEcono”-JITCompilerCompiler
Standard JITStandard JITCompilerCompiler
NativeNativeCodeCode
Install timeInstall timeCode GenCode Gen
Common Language RuntimeCommon Language Runtime
Managed Code
• Managed code provides...– Metadata describing data– Location of references to objects – Exception handling tables
• So runtime can provide…– Exception handling– Security– Automatic lifetime management– Debugging and profiling
Runtime Control Flow
ClassClassLoaderLoader
IL to nativeIL to nativecode compilercode compiler
CPUCPUSecuritySecuritySystemSystem
CodeCodeManagersManagers
ManagedManagedNativeNativeCodeCode
AssemblyAssembly
First call to First call to methodmethod
First First reference to reference to typetype
ExecutionExecutionSupportSupport
Compiling IL To Native• “Econo” JIT
– Generates unoptimized native code– Code can be discarded and regenerated
• “Standard” JIT– Generates optimized native code– Includes verification of IL code
• Install time code generation– Done at install time– Reduces start-up time– Native code has version checks and reverts
to runtime JIT if they fail
Managed Data• Layout Provided by Runtime
– Usually automatic– Metadata can specify
• Order• Packing• Explicit layout
• Lifetime Managed by Runtime (GC)– Working set is compacted – Data is moved – Object references are updated– No more intrusive than a page fault
Calling Unmanaged Code
NativeNativeCodeCode
““Econo”-JITEcono”-JITCompilerCompiler
Standard JITStandard JITCompilerCompiler
NativeNativeCodeCode
Common Language RuntimeCommon Language Runtime
UnmanagedUnmanaged
ManagedManaged
Crossing The Boundary• Mode transition for code manager
– Calling conventions differ on x86– Fast, rarely more than register shuffle
• Data type marshalling– Representations may not be the same– Pinning, copying, and/or reformatting needed– Custom marshalling supported
• The IL to native compilers help– In-line code transition and simple marshalling– Per call site cost is very low
• Plus a small cost on entry to a procedure that can make calls across boundary
Some Statistics
In the 3 years since we launched:• 50% of professional developers use .NET• 120M copies have been downloaded• 85% of consumer PCs sold in 2004
have .NET pre-installed• 58% of business PCs sold in 2004
have .NET pre-installed• HP printers/scanners/cameras install .NET
(3M copies of .NET / year)
Discussion
• How does the move to managed code affect the compiler?
• How does the move to Web servers and Web services affect the compiler?
• What makes “the best Java compiler”?
• What makes “the best C++ compiler” in 2005?
Generics
One of the major features added in Version 2.0 of the CLR, to be release later this year (2005)
In English . . .Instead of defining StackOfInt, StackOfString, etc., use
class Stack<T> { void Push(T item) { … } T Pop() { … } T TopOfStack() { … }}
static Stack<int> IntStack;static Stack<string> StringStack;
• Type safe (compile and design time support)• Shared code (better perf, easier maintenance)
Polymorphic Programming Languages
Standard ML
O’Caml
Eiffel
Ada
GJ
C++
Mercury
Miranda Pizza
Haskell
Clu
Design for multiple languages
MLFunctors are cool!
Visual BasicDon’t confuse
me!
C++ Give me template
specializationC++
And template meta-
programmingJava
Run-time types please
SchemeWhy should I care?
C#Just give me decent collection classes
HaskellRank-n types? Existentials?
Kinds? Type classes?
EiffelAll generic types covariant please
COBOLChange my call
syntax!?!?
C++ Can I write class C<T> : T
Simplicity => no odd restrictions
interface IComparable<T> { int CompareTo(T other); }
class Set<T> : IEnumerable<T> where T : IComparable<T>{ private TreeNode<T> root; public static Set<T> empty = new Set<T>(); public void Add(T x) { … } public bool HasMember(T x) { … }}
Set<Set<int>> s = new Set<Set<int>>();
Type arguments can be value or reference types
Even statics can use type parameter
Constraints can reference type parameter (“F-bounded
polymorphism”)
Interfaces and superclass can be
instantiated
Non-goals
• C++ style template meta-programmingLeave this to source-language compilers
• Higher-order polymorphism, existentialsLet’s get the basics right first!
Compiling polymorphism, as was
Two main techniques:• Specialize code for each instantiation
– C++ templates, MLton & SML.NET monomorphization
– good performance – code bloat (though not a problem with modern C++ impls)
• Share code for all instantiations– Either use a single representation for all types (ML, Haskell)– Or restrict instantiations to “pointer” types (Java)
– no code bloat – poor performance (extra boxing operations required on
primitive values)
Compiling polymorphism in the Common Language Runtime
• Polymorphism is built-in to the intermediate language (IL) and the execution engine
• CLR performs “just-in-time” type specialization• Code sharing avoids bloat• Performance is (almost) as good as hand-
specialized code
Code sharing
• Rule: – share field layout and code if type arguments have
same representation
• Examples:– Representation and code for methods in Set<string>
can be also be used for Set<object> (string and object are both 32-bit GC-traced pointers)
– Representation and code for Set<long> is different from Set<int> (int uses 32 bits, long uses 64 bits)
Exact run-time types
• We want to supportif (x is Set<string>) { ... }else if (x is Set<Component>) { ... }
• But representation and code is shared between compatible instantiations e.g. Set<string> and Set<Component>
• So there’s a conflict to resolve…• …and we don’t want to add lots of overhead to
languages that don’t use run-time types (ML, Haskell)
Object representation in the CLR
vtable ptr
fields
normal object representation:type = vtable pointer
vtable ptr
elements
array representation:type is inside object
element typeno. of
elements
Object representation for generics
Array-style: store the instantiation directly in the object? extra word (possibly more for multi-parameter types)
per object instance e.g. every list cell in ML or Haskell would use an extra
word
Alternative: make vtable copies, store instantiation info in the vtable extra space (vtable size) per type instantiation expect no. of instantiations << no. of objects so we chose this option
Object representation for generics
vtable ptr
fields
x : Set<string>
vtable ptr
fields
y : Set<object>
Add
HasMemberToArray
Add
HasMemberToArray
code for HasMember
code for ToArray
code for Add
string object
… …
What’s in the design?
• Type parameterization for all declarations– classes
e.g. class Set<T>
– interfaces e.g. interface IComparable<T>
– structse.g. struct HashBucket<K,D>
– methods e.g. static void Reverse<T>(T[] arr)
– delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)
Precompilation (ngen)
• JIT compilation is flexible, but– can lead to slow startup times– increases working set (must load JIT compiler, code pages can’t
be shared between processes)
• Instead, we can pre-compile– .NET CLR has “ngen” tool for native generation– IL is compiled to x86 up-front– runtime data structures (vtables etc) are persisted in native
image– read-only pages (e.g. code) can be shared between processes– loader now responsible only for “link” step (cross-module fix-ups)
Ngen for generics
• For non-generic code, to ngen an assembly:– just compile every class and method in the assembly– perhaps inline a little across assemblies
• For generic code:– compile every generic class and method, but at what
instantiations?• just reference types? (code is shared)• or some “commonly-used” types? (e.g. int)
– we don’t know statically what instantiations will be used
• it’s a “separate compilation” problem
Ngen all instantiations
• Our approach:– always compile generic code for reference-type
instantiations– for value type instantiations, compute the transitive
closure of instantiations used by the assembly– compile code for those instantiations not already
present in other linked ngen images• leads to code duplication• at load-time, just pick one• has some interesting interactions with app-domain code-
sharing policy (see SPACE’04 paper on Don Syme’s home page)
NGen: example
class List<T>class Set<T>…Set<int>…
x86 for List<object> x86 for Set<object>
x86 for Set<int>
struct Point…List<Point>…Set<int>…List<int>…
class Window…List<Window>……List<int>…
MyCollections Client1 Client2
x86 for List<int>x86 for List<Point>x86 for List<int>
ngen
NGen: when we can’t
• JIT is still required for– instantiations requested through reflection
(“late-bound”)e.g. typeof(List<>).BindGenericParameters(typeof(int))
– generic virtual methods• double dispatch, on instantiation and class of
object
– polymorphic recursion (unbounded number of instantiations)
What’s in the design (2)?
Constraints on type parameters– class constraint (“must extend”)
e.g. class Grid<T> where T : Control
– interface constraints (“must implement”)e.g. class Set<T> where T : IComparable<T>
– type parameter constraints (“must subtype”)e.g. class List<T> { void AddList<U>(List<U> items) where U : T }
– 3 special cases• Can be instantiated (“new”)• Can be null (“nullable”)• Must be a value type (“struct”)
And What About Perf?
• Do generics really provide performance?
• It depends on how you ask the question…– And who is asking the question
• Or at least why they are really asking the question
MSR Perf Measurements
0
0.5
1
1.5
2
2.5
3
3.5
4
int double string (length)
element type
Quicksort on 1,000,000 elementsTimes in seconds
Generic
Non-generic (object)
My Perf Measurements
QuickSort, 500 items(Shorter is better)
0
10
20
30
40
50
60
70
Integer String Double Integer String Double
Data Type
Sec
on
ds
/ 50
,000
cal
ls
Generic
Non-Generic
Specific
Note:
• First three columns are based on my “natural” implementation of QuickSort(Array).
• Second three are based on Andrew Kennedy’s QuickSort(Array, ComparisonOperation)
What’s in the design (3)?
• Variance annotations on type parameters – covariant subtyping
interface IEnumerator<+T> { T get_Current(); bool MoveNext(); }
so IEnumerator<string> assignable to IEnumerator<object>
– contravariant subtypinginterface IComparer<-T> { int Compare(T x, int y); }
so IComparer<object> assignable to IComparer<string>