Intel JIT Talk
-
Upload
iamdvander -
Category
Technology
-
view
1.991 -
download
0
description
Transcript of Intel JIT Talk
Friday, July 1, 2011 – Intel, Santa Clara
High Performance JavaScript: JITs in Firefox
Friday, July 1, 2011 – Intel, Santa Clara
Power of the Web
Friday, July 1, 2011 – Intel, Santa Clara
How did we get here?
•Making existing web tech faster and faster
•Adding new technologies, like video, audio, integration and 2D+3D drawing
Friday, July 1, 2011 – Intel, Santa Clara
The Browser
Friday, July 1, 2011 – Intel, Santa Clara
Getting a Webpage
www.intel.com192.198.164.158
Friday, July 1, 2011 – Intel, Santa Clara
Getting a Webpage
www.intel.com192.198.164.158
GET / HTTP/1.0(HTTP Request)
Friday, July 1, 2011 – Intel, Santa Clara
Getting a Webpage
<html><head><title>Laptop, Notebooks, ...</title></head><script language="javascript"> function doStuff() {...}</script><body>...
(HTML, CSS, JavaScript)
Friday, July 1, 2011 – Intel, Santa Clara
Core Web Components
•HTML provides content
•CSS provides styling
•JavaScript provides a programming interface
Friday, July 1, 2011 – Intel, Santa Clara
New Technologies
•2D Canvas
•Video, Audio tags
•WebGL (OpenGL + JavaScript)
Friday, July 1, 2011 – Intel, Santa Clara
DOM
(Source: http://www.w3schools.com/htmldom/default.asp)
Friday, July 1, 2011 – Intel, Santa Clara
CSS•Cascading Style Sheets
•Separates presentation from content
•<b class=”warning”>This is red</b>
.warning { color: red; text-decoration: underline;}
Friday, July 1, 2011 – Intel, Santa Clara
What is JavaScript?
•C-like syntax
•De facto language of the web
•Interacts with the DOM and the browser
Friday, July 1, 2011 – Intel, Santa Clara
C-Like Syntax
•Braces, semicolons, familiar keywords
if (x) { for (i = 0; i < 100; i++) print("Hello!");}
Friday, July 1, 2011 – Intel, Santa Clara
Displaying a Webpage
•Layout engine applies CSS to DOM
•Computes geometry of elements
•JavaScript engine executes code
•This can change DOM, triggers “reflow”
•Finished layout is sent to graphics layer
Friday, July 1, 2011 – Intel, Santa Clara
The Result
Friday, July 1, 2011 – Intel, Santa Clara
JavaScript
•Invented by Brendan Eich in 1995 for Netscape 2
•Initial version written in 10 days
•Anything from small browser interactions, complex apps (Gmail) to intense graphics (WebGL)
Friday, July 1, 2011 – Intel, Santa Clara
The Result
Friday, July 1, 2011 – Intel, Santa Clara
Think LISP or Self, not Java
•Untyped - no type declarations
•Multi-Paradigm – objects, closures, first-class functions
•Highly dynamic - objects are dictionaries
Friday, July 1, 2011 – Intel, Santa Clara
No Type Declarations
•Properties, variables, return values can be anything:
function f(a) { var x = 72.3; if (a) x = a + "string"; return x;}
Friday, July 1, 2011 – Intel, Santa Clara
No Type Declarations
•Properties, variables, return values can be anything:
function f(a) { var x = “hi”; if (a) x = a + 33.2; return x;}
if a is: number: add; string: concat;
Friday, July 1, 2011 – Intel, Santa Clara
Functional
•Functions may be returned, passed as arguments:
function f(a) { return function () { return a; }}var m = f(5);print(m());
Friday, July 1, 2011 – Intel, Santa Clara
Objects
•Objects are dictionaries mapping strings to values
•Properties may be deleted or added at any time!
var point = { x : 5, y : 10 };delete point.x;point.z = 12;
Friday, July 1, 2011 – Intel, Santa Clara
Prototypes
•Every object can have a prototype object
•If a property is not found on an object, its prototype is searched instead
•… And the prototype’s prototype, etc..
Friday, July 1, 2011 – Intel, Santa Clara
Numbers
•JavaScript specifies that numbers are IEEE-754 64-bit floating point
•Engines use 32-bit integers to optimize
•Must preserve semantics: integers overflow to doubles
Friday, July 1, 2011 – Intel, Santa Clara
Numbersvar x = 0x7FFFFFFF;x++;
int x = 0x7FFFFFFF;x++;
JavaScript C++
x 2147483648 -2147483648
Friday, July 1, 2011 – Intel, Santa Clara
JavaScript Implementations
•Values (Objects, strings, numbers)
•Garbage collector
•Runtime library
•Execution engine (VM)
Friday, July 1, 2011 – Intel, Santa Clara
Values
•The runtime must be able to query a value’s type to perform the right computation.
•When storing values to variables or object fields, the type must be stored as well.
Friday, July 1, 2011 – Intel, Santa Clara
ValuesBoxed Unboxed
Purpose Storage ComputationExamples (INT32, 9000) (int)9000
(STRING, “hi”) (String *) “hi”(DOUBLE, 3.14) (double)3.14
Definition (Type tag, C++ value) C++ value
• Boxed values required for variables, object fields• Unboxed values required for computation
Friday, July 1, 2011 – Intel, Santa Clara
Boxed Values
•Need a representation in C++
•Easy idea: 96-bit struct
•32-bits for type
•64-bits for double, pointer, or integer
•Too big! We have to pack better.
Friday, July 1, 2011 – Intel, Santa Clara
Boxed Values
•We could use pointers, and tag the low bits
•Doubles would have to be allocated on the heap
•Indirection, GC pressure is bad
•There is a middle ground…
Friday, July 1, 2011 – Intel, Santa Clara
•Values are 64-bit
•Doubles are normal IEEE-754
•How to pack non-doubles?
•51 bits of NaN space!
IA32, ARMNunboxing
Friday, July 1, 2011 – Intel, Santa Clara
NunboxingIA32, ARM
63 0
Type Payload
•Full value: 0x400c000000000000•Type is double because it’s not a NaN•Encodes: (Double, 3.5)
0x400c0000 0x00000000
Friday, July 1, 2011 – Intel, Santa Clara
NunboxingIA32, ARM
63 0
Type Payload
•Full value: 0xFFFF000100000040•Value is in NaN space•Type is 0xFFFF0001 (Int32)•Value is (Int32, 0x00000040)
0xFFFF0001 0x00000040
Friday, July 1, 2011 – Intel, Santa Clara
Punboxingx86-64
63 0Type Payload
•Full value: 0xFFFF800000000040•Value is in NaN space•Bits >> 47 == 0x1FFFF == INT32•Bits 47-63 masked off to retrieve payload•Value(Int32, 0x00000040)
1111 1111 1111 1000 0
47
000 .. 0000 0000 0000 0000 0100 0000
Friday, July 1, 2011 – Intel, Santa Clara
Nunboxing Punboxing
Fits in register NO YES
Trivial to decode YES NO
Portability 32-bit only* x64 only
* Some 64-bit OSes can restrict mmap() to 32-bits
Friday, July 1, 2011 – Intel, Santa Clara
Garbage Collection
•Need to reclaim memory without pausing user workflow, animations, etc
•Need very fast object allocation
•Consider lots of small objects like points, vectors
•Reading and writing to the heap must be fast
Friday, July 1, 2011 – Intel, Santa Clara
Mark and Sweep
•All live objects are found via a recursive traversal of the root value set
•All dead objects are added to a free list
•Very slow to traverse entire heap
•Building free lists can bring “dead” memory into cache
Friday, July 1, 2011 – Intel, Santa Clara
Improvements
•Sweeping, freelist building have been moved off-thread
•Marking can be incremental, interleaved with program execution
Friday, July 1, 2011 – Intel, Santa Clara
Generational GC•Observation: most objects are short-lived
•All new objects are bump-allocated into a newborn space
•Once newborn space is full, live objects are moved to the tenured space
•Newborn space is then reset to empty
Friday, July 1, 2011 – Intel, Santa Clara
Generational GC
8MB 8MBInactive Active
Newborn Space
Tenured Space
Friday, July 1, 2011 – Intel, Santa Clara
After Minor GC
8MB8MBActive Inactive
Newborn Space
Tenured Space
Friday, July 1, 2011 – Intel, Santa Clara
Barriers
•Incremental marking, generational GC need read or write barriers
•Read barriers are much slower
•Azul Systems has hardware read barrier
•Write barrier seems to be well-predicted?
Friday, July 1, 2011 – Intel, Santa Clara
Unknowns
•How much do cache effects matter?
•Memory GC has to touch
•Locality of objects
•What do we need to consider to make GC fast?
Friday, July 1, 2011 – Intel, Santa Clara
Running JavaScript
•Interpreter – Runs code, not fast
•Basic JIT – Simple, untyped compiler
•Advanced JIT – Typed compiler, lightspeed
Friday, July 1, 2011 – Intel, Santa Clara
Interpreter
•Good for code that runs once
•Giant switch loop
•Handles all edge cases of JS semantics
Friday, July 1, 2011 – Intel, Santa Clara
Interpreter
while (true) { switch (*pc) { case OP_ADD: ... case OP_SUB: ... case OP_RETURN: ... } pc++;}
Friday, July 1, 2011 – Intel, Santa Clara
Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result; if (lhs.isInt32() && rhs.isInt32()) { int left = rhs.toInt32(); int right = rhs.toInt32(); if (AddOverflows(left, right, left + right)) result.setInt32(left + right); else result.setNumber(double(left) + double(right)); } else if (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right); result.setString(r); } else { double left = ValueToNumber(lhs); double right = ValueToNumber(rhs); result.setDouble(left + right); } PUSH(result); break;
Friday, July 1, 2011 – Intel, Santa Clara
Interpreter
•Very slow!
•Lots of opcode and type dispatch
•Lots of interpreter stack traffic
•Just-in-time compilation solves both
Friday, July 1, 2011 – Intel, Santa Clara
Just In Time
•Compilation must be very fast
•Can’t introduce noticeable pauses
•People care about memory use
•Can’t JIT everything
•Can’t create bloated code
•May have to discard code at any time
Friday, July 1, 2011 – Intel, Santa Clara
Basic JIT•JägerMonkey in Firefox 4
•Every opcode has a hand-coded template of assembly (registers left blank)
•Method at a time:
•Single pass through bytecode stream!
•Compiler uses assembly templates corresponding to each opcode
Friday, July 1, 2011 – Intel, Santa Clara
Bytecode
function Add(x, y) { return x + y;}
GETARG 0 ; fetch xGETARG 1 ; fetch yADDRETURN
Friday, July 1, 2011 – Intel, Santa Clara
Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result; if (lhs.isInt32() && rhs.isInt32()) { int left = rhs.toInt32(); int right = rhs.toInt32(); if (AddOverflows(left, right, left + right)) result.setInt32(left + right); else result.setNumber(double(left) + double(right)); } else if (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right); result.setString(r); } else { double left = ValueToNumber(lhs); double right = ValueToNumber(rhs); result.setDouble(left + right); } PUSH(result); break;
Friday, July 1, 2011 – Intel, Santa Clara
Assembling ADD
•Inlining that huge chunk for every ADD would be very slow
•Observation:
•Some input types much more common than others
Friday, July 1, 2011 – Intel, Santa Clara
var j = 0;for (i = 0; i < 10000; i++) { j += i;}
var j = 12.3;for (i = 0; i < 10000; i++) { j += new Object() + i.toString();}
Integer Math – Common
Weird Stuff – Rare!
Friday, July 1, 2011 – Intel, Santa Clara
Assembling ADD
•Only generate code for the easiest and most common cases
•Large design space
•Can consider integers common, or
•Integers and doubles, or
•Anything!
Friday, July 1, 2011 – Intel, Santa Clara
Basic JIT - ADDif (arg0.type != INT32) goto slow_add;if (arg1.type != INT32) goto slow_add;
Friday, July 1, 2011 – Intel, Santa Clara
Basic JIT - ADDif (arg0.type != INT32) goto slow_add;if (arg1.type != INT32) goto slow_add;R0 = arg0.dataR1 = arg1.data
Greedy Register Allocator
Friday, July 1, 2011 – Intel, Santa Clara
Basic JIT - ADDif (arg0.type != INT32) goto slow_add;if (arg1.type != INT32) goto slow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1;if (OVERFLOWED) goto slow_add;
Friday, July 1, 2011 – Intel, Santa Clara
Slow Paths
slow_add: Value result = runtime::Add(arg0, arg1);
Friday, July 1, 2011 – Intel, Santa Clara
slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; goto rejoin;
Inline Out-of-line
if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32) goto slow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED) goto slow_add;rejoin:
Friday, July 1, 2011 – Intel, Santa Clara
Rejoining
slow_add: Value result = runtime::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
Inline Out-of-line if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32) goto slow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED) goto slow_add; R3 = TYPE_INT32;rejoin:
Friday, July 1, 2011 – Intel, Santa Clara
Final Code
if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32) goto slow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED) goto slow_add; R3 = TYPE_INT32;rejoin: return Value(R3, R2);
Inline
slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
Out-of-line
Friday, July 1, 2011 – Intel, Santa Clara
Observations
•Even though we have fast paths, no type information can flow in between opcodes
•Cannot assume result of ADD is integer
•Means more branches
•Types increase register pressure
Friday, July 1, 2011 – Intel, Santa Clara
Basic JIT – Object Access
•x.y is multiple dictionary searches!
•How can we create a fast-path?
•We could try to cache the lookup, but
•We want it to work for >1 objects
function f(x) { return x.y;}
Friday, July 1, 2011 – Intel, Santa Clara
Object Access
•Observation
•Usually, objects flowing through an access site look the same
•If we knew an object’s layout, we could bypass the dictionary lookup
Friday, July 1, 2011 – Intel, Santa Clara
Object Layout
var obj = { x: 50, y: 100,
z: 20 };
Friday, July 1, 2011 – Intel, Santa Clara
Object LayoutJSObject
Shape *shape;
Value *slots;
ShapeProperty Name Slot Number
x 0
y 1
z 2
0 (Int32, 50)
1 (Int32, 100)
2 (Int32, 20)
Friday, July 1, 2011 – Intel, Santa Clara
Object Layoutvar obj = { x: 50,
y: 100, z: 20
};…
var obj2 = { x: 78, y: 93, z: 600
};
Friday, July 1, 2011 – Intel, Santa Clara
Object LayoutJSObject
Shape *shape;
Value *slots;
ShapeProperty Name Slot Number
x 0
y 1
z 2 JSObject
Shape *shape;
Value *slots;
0 (Int32, 50)
1 (Int32, 100)
2 (Int32, 20)
0 (Int32, 78)
1 (Int32, 93)
2 (Int32, 600)
Friday, July 1, 2011 – Intel, Santa Clara
Familiar Problems
•We don’t know an object’s shape during JIT compilation
•... Or even that a property access receives an object!
•Solution: leave code “blank,” lazily generate it later
Friday, July 1, 2011 – Intel, Santa Clara
Inline Cachesfunction f(x) { return x.y;}
Friday, July 1, 2011 – Intel, Santa Clara
Inline Caches
•Start as normal, guarding on type and loading data
if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;
Friday, July 1, 2011 – Intel, Santa Clara
if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;
goto property_ic;
Inline Caches
•No information yet: leave blank (nops).
Friday, July 1, 2011 – Intel, Santa Clara
Property ICsfunction f(x) { return x.y;}
f({x: 5, y: 10, z: 30});
Friday, July 1, 2011 – Intel, Santa Clara
Property ICsJSObject
Shape *shape;
Value *slots;
Shape S1
Property Name Slot Number
x 0
y 1
z 2
0 (Int32, 5)
1 (Int32, 10)
2 (Int32, 30)
Friday, July 1, 2011 – Intel, Santa Clara
Property ICsShape S1
Property Name Slot Number
x 0
y 1
z 2
Friday, July 1, 2011 – Intel, Santa Clara
Property ICsShape S1
Property Name Slot Number
x 0
y 1
z 2
Friday, July 1, 2011 – Intel, Santa Clara
Inline Caches
•Shape = S1, slot is 1
if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;
goto property_ic;
Friday, July 1, 2011 – Intel, Santa Clara
Inline Caches
•Shape = S1, slot is 1
if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;
goto property_ic;
Friday, July 1, 2011 – Intel, Santa Clara
Inline Caches
•Shape = SHAPE1, slot is 1
if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data; if (obj->shape != SHAPE1) goto property_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;
Friday, July 1, 2011 – Intel, Santa Clara
Polymorphism•What happens if two different shapes pass
through a property access?
•Add another fast-path...
function f(x) { return x.y;}f({ x: 5, y: 10, z: 30});f({ y: 40});
Friday, July 1, 2011 – Intel, Santa Clara
Polymorphism
if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data; if (obj->shape != SHAPE1) goto property_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:
Main Method
Friday, July 1, 2011 – Intel, Santa Clara
Polymorphism
if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data; if (obj->shape != SHAPE0) goto property_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:
Main Method Generated Stub
stub: if (obj->shape != SHAPE2) goto property_ic; R0 = obj->slots[0].type; R1 = obj->slots[0].data; goto rejoin;
Friday, July 1, 2011 – Intel, Santa Clara
Polymorphism
if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data; if (obj->shape != SHAPE1) goto stub; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:
Main Method Generated Stub
stub: if (obj->shape != SHAPE2) goto property_ic; R0 = obj->slots[0].type; R1 = obj->slots[0].data; goto rejoin;
Friday, July 1, 2011 – Intel, Santa Clara
Chain of Stubsif (obj->shape == S1) { result = obj->slots[0];} else { if (obj->shape == S2) { result = obj->slots[1]; } else { if (obj->shape == S3) { result = obj->slots[2]; } else { if (obj->shape == S4) { result = obj->slots[3]; } else { goto property_ic; } } }}
Friday, July 1, 2011 – Intel, Santa Clara
Code Memory•Generated code is patched from inside the
method
•Self-modifying, but single threaded
•Code memory is always rwx
•Concerned about protection-flipping expense
Friday, July 1, 2011 – Intel, Santa Clara
Basic JIT Summary
•Generates simple code to handle common cases
•Inline caches adapt code based on runtime observations
•Lots of guards, poor type information
Friday, July 1, 2011 – Intel, Santa Clara
Optimizing Harder
•Single pass too limited: we want to perform whole-method optimizations
•We could generate an IR, but without type information...
•Slow paths prevent most optimizations.
Friday, July 1, 2011 – Intel, Santa Clara
Code Motion?
function Add(x, n) { var sum = 0; for (var i = 0; i < n; i++) sum += x + n; return sum;}
The addition looks loop invariant.Can we hoist it?
Friday, July 1, 2011 – Intel, Santa Clara
Code Motion?
function Add(x, n) { var sum = 0; var temp0 = x + n; for (var i = 0; i < n; i++) sum += temp0; return sum;}
Let’s try it.
Friday, July 1, 2011 – Intel, Santa Clara
Wrong Resultvar global = 0;var obj = { valueOf: function () { return ++global; }}Add(obj, 10);
• Original code returns 155• Hoisted version returns 110!
Friday, July 1, 2011 – Intel, Santa Clara
Idempotency
•Untyped operations are usually not provably idempotent
•Slow paths are re-entrant, can have observable side effects
•This prevents code motion, redundancy elimination
Friday, July 1, 2011 – Intel, Santa Clara
Ideal Scenario
•Actual knowledge about types!
•Remove slow paths
•Perform whole-method optimizations
Friday, July 1, 2011 – Intel, Santa Clara
Advanced JITs•Make optimistic guesses about types
•Guesses must be informed
•Don’t want to waste time compiling code that can’t or won’t run
•Using type inference or runtime profiling
•Generate an IR, perform textbook compiler optimizations
Friday, July 1, 2011 – Intel, Santa Clara
Optimism Pays Off
•People naturally write code as if it were typed
•Variables and object fields that change types are rare
Friday, July 1, 2011 – Intel, Santa Clara
IonMonkey
•Work in progress
•Constructs high and low-level IRs
•Applies type information using runtime feedback
•Inlining, GVN, LICM, LSRA
Friday, July 1, 2011 – Intel, Santa Clara
function Add(x, y) { return x + y + x;}
GETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN
Friday, July 1, 2011 – Intel, Santa Clara
Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN
v0 = arg0v1 = arg1
Friday, July 1, 2011 – Intel, Santa Clara
Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN
v0 = arg0v1 = arg1
Friday, July 1, 2011 – Intel, Santa Clara
Type Oracle
•Need a mechanism to inform compiler about likely types
•Type Oracle: given program counter, returns types for inputs and outputs
•May use any data source – runtime profiling, type inference, etc
Friday, July 1, 2011 – Intel, Santa Clara
Type Oracle
•For this example, assume the type oracle returns “integer” for the output and both inputs to all ADD operations
Friday, July 1, 2011 – Intel, Santa Clara
Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN
v0 = arg0v1 = arg1v2 = iadd(v0, v1)
Friday, July 1, 2011 – Intel, Santa Clara
Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN
v0 = arg0v1 = arg1v2 = iadd(v0, v1)v3 = iadd(v2, v0)
Friday, July 1, 2011 – Intel, Santa Clara
Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN
v0 = arg0v1 = arg1v2 = iadd(v0, v1)v3 = iadd(v2, v0)v4 = return(v3)
Friday, July 1, 2011 – Intel, Santa Clara
SSA (Intermediate)
v0 = arg0v1 = arg1v2 = iadd(v0, v1)v3 = iadd(v2, v0)v4 = return(v3)
Untyped
Untyped
TypedTyped
Friday, July 1, 2011 – Intel, Santa Clara
Intermediate SSA
•SSA does not type check
•Type analysis:
•Makes sure SSA inputs have correct type
•Inserts conversions, value decoding
Friday, July 1, 2011 – Intel, Santa Clara
Unoptimized SSA
v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = unbox(v0, INT32)v6 = iadd(v4, v5)v7 = return(v6)
Friday, July 1, 2011 – Intel, Santa Clara
Optimization
v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = unbox(v0, INT32)v6 = iadd(v4, v5)v7 = return(v6)
Friday, July 1, 2011 – Intel, Santa Clara
Optimization
v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = unbox(v0, INT32)v6 = iadd(v4, v5)v7 = return(v6)
Friday, July 1, 2011 – Intel, Santa Clara
Optimized SSA
v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = iadd(v4, v2)v6 = return(v5)
Friday, July 1, 2011 – Intel, Santa Clara
Register Allocation
•Linear Scan Register Allocation
•Based on Christian Wimmer’s work• Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. Master's thesis,
Institute for System Software, Johannes Kepler University Linz, 2004
• Linear Scan Register Allocation on SSA Form. In Proceedings of the International Symposium on Code Generation and Optimization, pages 170–179. ACM Press, 2010.
•Same algorithm used in Hotspot JVM
Friday, July 1, 2011 – Intel, Santa Clara
Allocate Registers
0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)
Friday, July 1, 2011 – Intel, Santa Clara
Code Generation
0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)
SSA
Friday, July 1, 2011 – Intel, Santa Clara
Code Generation
0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)
SSA if (arg0.type != INT32) goto BAILOUT; r1 = arg0.data;
Native Code
Friday, July 1, 2011 – Intel, Santa Clara
Code Generation
0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)
SSA if (arg0.type != INT32) goto BAILOUT; r1 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r0 = arg1.data;
Native Code
Friday, July 1, 2011 – Intel, Santa Clara
Code Generation
0: stack arg01: stack arg12: r0 unbox(0, INT32)3: r1 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)
SSA if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT;
Native Code
Friday, July 1, 2011 – Intel, Santa Clara
Code Generation
0: stack arg01: stack arg12: r0 unbox(0, INT32)3: r1 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)
SSA if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT;
Native Code
Friday, July 1, 2011 – Intel, Santa Clara
Code Generation
0: stack arg01: stack arg12: r0 unbox(0, INT32)3: r1 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(4)
SSA if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;
Native Code
Friday, July 1, 2011 – Intel, Santa Clara
Generated Code if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;
Friday, July 1, 2011 – Intel, Santa Clara
Generated Code•Bailout exits JIT•May recompile function
if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;
Friday, July 1, 2011 – Intel, Santa Clara
Guards Still Needed
•Object shapes for reading properties
•Types of…
•Arguments
•Values read from the heap
•Values returned from C++
Friday, July 1, 2011 – Intel, Santa Clara
Guards Still Needed
•Math
•Integer Overflow
•Divide by Zero
•Multiply by -0
•NaN is falsey in JS, truthy in ISAs
Friday, July 1, 2011 – Intel, Santa Clara
Advanced JITs
•Full type speculation - no slow paths
•Can move and eliminate code
•If speculation fails, method is recompiled or deoptimized
•Can still use techniques like inline caching!
Friday, July 1, 2011 – Intel, Santa Clara
To Infinity, and Beyond•Advanced JIT example has four guards:
•Two type checks
•Two overflow checks
•Better than Basic JIT, but we can do better
• Interval analysis can remove overflow checks
•Type Inference!
Friday, July 1, 2011 – Intel, Santa Clara
Type Inference
•Whole program analysis to determine types of variables
•Hybrid: both static and dynamic
•Replaces checks in JIT with checks in virtual machine
•If VM breaks an assumption held in any JIT code, that JIT code is discarded
Friday, July 1, 2011 – Intel, Santa Clara
Conclusions•The web needs JavaScript to be fast
•Untyped, highly dynamic nature makes this challenging
•Value boxing has different tradeoffs on different ISAs
•GC needs to be fast – what should we focus on?
Friday, July 1, 2011 – Intel, Santa Clara
Conclusions
•JITs getting more powerful
•Type specialization is key
•Still need checks and special cases on many operations
Friday, July 1, 2011 – Intel, Santa Clara
Questions?