Post on 04-Jan-2016
1
Theory of Compilation 236360
Erez Petrank
Lecture 9: Runtime Part II.
2
Runtime Environment
• Mediates between the OS and the programming language
• Hides details of the machine from the programmer– Ranges from simple support functions all the way to a
full-fledged virtual machine
• Handles common tasks – Runtime stack (activation records)– Memory management– JIT (Dynamic optimization)– Debugging– …
3
Dynamic Memory Management: Introduction
There is a course about this topic: 236780 “Algorithms for dynamic memory management”
4
Static and Dynamic Variables
• Static variables are defined in a method and allocated on the runtime stack.
• Dynamic allocation: • In C, “malloc” allocates a space and “free” says that the
program will not use this space anymore.
Ptr = malloc (256 bytes);/* Use ptr */free (Ptr);
5
Dynamic Memory Allocation
• In Java, “new” allocates an object for a given class. – President obama = new President
• But there is no instruction for manually deleting the object.
• Objects reclaimed by a garbage collector when the program “does not need them” anymore.
course c = new course(236360);c.class = “TAUB 2”;Faculty.add(c);
6
Manual Vs. Automatic Memory Management
• A manual memory management lets the programmer decide when objects are deleted.
• A memory manager that lets a garbage collector delete objects is called automatic.
• Manual memory management creates severe debugging problems– Memory leaks,– Dangling pointers.
• Considered the BIG debugging problem of the 80’s
7
Automatic Memory Reclamantion
• An object is reclaimed when the program has “no way of accessing it”.
• Formally, when it is unreachable by a path of pointers from the “root” pointers, to which the program has direct access. – Local variables, pointers on stack, global (class) pointers, JNI
pointers, etc.
What’s good about automatic “garbage collection”?
© Erez Petrank 8
• Software engineering: – Relieves users of the book-keeping burden. – Stronger reliability, fewer bugs, faster debugging. – Code understandable and reliable. (Less interaction
between modules.)
• Security (Java):– Program never gets a pointer to “play with”.
9
Importance
• Memory is the bottleneck in modern computation. – Time & energy (and space).
• Optimal allocation (even if all accesses are known in advance to the allocator) is NP-Complete (to even approximate).
• Must be done right for a program to run efficiently.
• Must be done right to ensure reliability.
GC and languages
© Erez Petrank 10
• Sometimes it’s built in:– LISP, Java, C#.– The user cannot free an object.
• Sometimes it’s an added feature:– C, C++.– User can choose to free objects or not. The collector
frees all objects not freed by the user.
• Most modern languages are supported by garbage collection.
© Erez Petrank 11
Most modern languages rely on GC
Source: “The Garbage Collection Handbook” by Richard Jones, Anthony Hosking, and Eliot Moss.
61
7
What’s bad about automatic “garbage collection”?
© Erez Petrank 12
• It has a cost: – Old Lisp systems 40%.– Today’s Java program (if the collection is done “right”)
5-15%.
• Considered a major factor determining program efficiency.
• Techniques have evolved since the 60’s. We will only go over basic techniques.
13
Garbage Collection Efficiency
• Overall collection overheads (program throughput).
• Pauses in program run. • Space overhead. • Cache Locality (efficiency and energy).
14
Three classical algorithms
• Reference counting• Mark and sweep (and mark-compact)• Copying.
• The last two are also called tracing algorithms because they go over (trace) all reachable objects.
Reference counting
15
• Recall that we would like to know if an object is reachable from the roots.
• Associate a reference count field with each object: how many pointers reference this object.
• When nothing points to an object, it can be deleted.
• Very simple, used in many systems.
Basic Reference Counting
16
• Each object has an RC field, new objects get o.RC:=1.
• When p that points to o1 is modified to point to o2 we execute: o1.RC--, o2.RC++.
• if then o1.RC==0:– Delete o1.
– Decrement o.RC for all “children” of o1.
– Recursively delete objects whose RC is decremented to 0.
o1 o2
p
A Problem: Cycles
17
• The Reference counting algorithm does not reclaim cycles!
• Solution 1: ignore cycles, they do not appear frequently in modern programs.
• Solution 2: run tracing algorithms (that can reclaim cycles) infrequently.
• Solution 3: designated algorithms for cycle collection.
• Another problem for the naïve algorithm: requires a lot of synchronization in parallel programs.
• Advanced versions solve that.
The Mark-and-Sweep Algorithm
18
• Mark phase:– Start from roots and traverse all objects reachable by
a path of pointers. – Mark all traversed objects.
• Sweep phase:– Go over all objects in the heap. – Reclaim objects that are not marked.
The Mark-Sweep algorithm
19
• Traverse live objects & mark black. • White objects can be reclaimed.
sta
ckHeap
registers
Roots
Note! This is not the heap data structure!
Triggering
20
New(A)=if free_list is empty
mark_sweep() if free_list is empty
return (“out-of-memory”)pointer = allocate(A)return (pointer)
Garbage collection is triggered by allocation.
Basic Algorithm
21
mark_sweep()= for Ptr in Roots
mark(Ptr)sweep()
mark(Obj)=if mark_bit(Obj) == unmarked
mark_bit(Obj)=markedfor C in Children(Obj)
mark(C)
Sweep()=p = Heap_bottomwhile (p < Heap_top)
if (mark_bit(p) == unmarked) then free(p)else mark_bit(p) = unmarked; p=p+size(p)
Mark&Sweep Example
r1
r2
Properties of Mark & Sweep
23
• Most popular method today (at a more advanced form). • Simple.• Does not move objects, and so heap may fragment. • Complexity:
Mark phase: live objects (dominant phase) Sweep phase: heap size.
• Termination: each pointer traversed once. • Various engineering tricks are used to improve
performance.
• During the run objects are allocated and reclaimed. • Gradually, the heap gets fragmented. • When space is too fragmented to allocate, a compaction
algorithm is used. • Move all live objects to the beginning of the heap and
update all pointers to reference the new locations. • Compaction is considered very costly and we usually
attempt to run it infrequently, or only partially.
Mark-Compact
24
The Heap
25
An Example: The Compressor
• A simplistic presentation of the Compressor: • Go over the heap and compute for each live object where it
moves to – To the address that is the sum of live space before it in the
heap. – Save the new locations in a separate table.
• Go over the heap and for each object: – Move it to its new location– Update all its pointers.
• Why can’t we do it all in a single heap pass? • (In the full algorithm: succinct table, execute the first pass
very quickly, and parallelization.)
26
Mark Compact
• Important parameters of a compaction algorithm:– Keep order of objects?– Use extra space for compactor data structures?– How many heap passes? – Can it run in parallel on a multi-processor?
• We do not elaborate in this intro.
Copying garbage collection
27
• Heap partitioned into two.• Part 1 takes all allocations.• Part 2 is reserved. • During GC, the collector traces all reachable
objects and copies them to the reserved part. • After copying the parts roles are reversed: • Allocation activity goes to part 2, which was
previously reserved. • Part 1, which was active, is reserved till next
collection.
1 2
Copying garbage collection
28
Part I Part II
Roots
A
D
C
B
E
The collection copies…
29
Part I Part II
Roots
A
D
C
B
E
A C
Roots are updated; Part I reclaimed.
30
Part I Part II
Roots
A C
Properties of Copying Collection
31
• Compaction for free• Major disadvantage: half of the heap is not
used. • “Touch” only the live objects
– Good when most objects are dead. – Usually most new objects are dead, and so there are
methods that use a small space for young objects and collect this space using copying garbage collection.
A very simplistic comparison
Copying
Mark & sweep
Reference Counting
Live objects
Size of heap (live objects)
Pointer updates + dead objects
Complexity
Half heap wasted
Bit/object + stack for DFS
Count/object + stack for DFS
Space overhead
For free Additional work Additional work
Compaction
long long Mostly short Pause time
Cycle collection
More issues
33
Memory Management with Parallel Processors
• Stop the world• Parallel (stop-the-world) GC• Concurrent GC
• Trade-offs in pauses and throughput• Difference in complexity• Choose between parallel (longer pauses) and
concurrent (lower throughput)
© Erez Petrank 34
Terminology
Stop-the-World
Parallel
Concurrent
On-the-Fly
programGC
© Erez Petrank 35
Concurrent GC• Problem: Long pauses disturb the user.An important measure for the collection: length
of pauses.
• Can we just run the program while the collector runs on a different thread?
• Not so simple!• The heap changes while we collect. • For example,
– we look at an object B, but before we have a chance to mark its children, the program changes them.
Problem: Interference
AC
B
1. GC traced B
SYSTEM = program|| GC
Problem: Interference
AC
B
AC
B
1. GC traced B 2. programlinks C to B
SYSTEM = program|| GC
Problem: Interference
AC
B
AC
B
AC
B
X
1. GC traced B 2. programlinks C to B
3. programunlinks C from A
SYSTEM = program|| GC
Problem: Interference
AC
B
AC
B
AC
B
AC
B
C LOST
1. GC traced B 2. programlinks C to B
3. programunlinks C from A
4. GC traced A
SYSTEM = program|| GC
40
Coordination with a write-barrier
• The program notifies the collector that changes happen so that it can mark objects conservatively.
• E.g.: the program registers objects that gain pointers.
• update(object y, field f, object x) {notifyCollector(x); // record new referent y.f := x;
}
• Collector can then assume all recordedobjects are alive (and trace their descendants as live).
Y
XZ
Three Typical Write Barriers
AC
B
1. Record C when C is linked to B
AC
B
2. Record C when link to C is removed
X AC
B
3. Record B when C is linked to B
C CB
42
Dijkstra on Concurrent GC
It has been surprisingly hard to find the published solution and justification.
It was only too easy to design what looked--sometimes even for weeks and to many people--like a perfectly valid solution, until the effort to prove it to be correct revealed a (sometimes deep) bug.
© Erez Petrank 43
Generational Garbage Collection
“The weak generational hypothesis”: most objects die young.
Using the hypothesis: separate objects according to their ages and collect
the area of the young objects more frequently.
© Erez Petrank 44
More Specifically,
• The heap is divided into two or more areas (generations).
• Objects allocated in 1st (youngest) generation. • The youngest generation is collected frequently. • Objects that survive in the young generation
“long enough” are promoted to the old generation.
Old GenerationYoung Generation
© Erez Petrank 45
• Short pauses: the young generation is kept small and so most pauses are short.
• Efficiency: collection efforts are concentrated where many dead objects exists.
• Locality: • Collector: mostly concentrated on a small part of
the heap• Program: allocates (and mostly uses) young
objects in a small part of the memory.
Advantages
© Erez Petrank 46
Mark-Sweep or Copying ?
• Copying is good when live space is small (time) and heap is small (space).
• A popular choice: – Copying for the (small) young generation.– Mark-and-sweep for the full collection.
• A small waste in space, high efficiency.
© Erez Petrank 47
Inter-Generational Pointers
• When tracing the young generation, is it enough to trace from the roots through the young generation only?
Old GenerationYoung Generation
Roots
© Erez Petrank 48
Inter-Generational Pointers
• When tracing the young generation, is it enough to trace from roots through the young generation?
• No! Pointers from old to young generation may witness the liveness of an object.
Old GenerationYoung Generation
Roots
© Erez Petrank 49
Inter-Generational Pointers
• We don’t trace the old generation. – Why?
• The solution: – “Maintain a list” of all inter-generational pointers.– Assume (conservatively) the parent (old) object is
alive. – Treat these pointers as additional roots.
• “Typically”: most pointers are from young to old (few from old to young).
• When collecting the old generation, collect the entire heap.
© Erez Petrank 50
Inter-Generational Pointers
• Inter-generational pointers are created: – When objects are promoted to old generation– When pointers are modified in the old generation.
• The first can be monitored by the collector during promotion.
• The second requires a write barrier.
© Erez Petrank 51
Write Barrier
Update(object y, field f, object x) { if y is in old space and x is in young
remember y.f->x ; y.f := x;}
x f
young old
Reference counting also had a write-barrier:
update(object y, field f, object x) {x.RC++; // increment new referent
y.f^.RC--; // decrement old referentif (y.f^.RC == 0) collect(y.f); y.f := x;
}
Y
52
Modern Memory Management
• Handle parallel platforms.• Real-time. • Cache consciousness. • Handle new platforms (GPU, PCM, …) • Hardware assistance.
53
Summary: Dynamic Memory Management
• Compiler generates code for allocation of objects and garbage collection when necessary.
• Reference-Counting, Mark-Sweep, Mark-Compact, Copying. • Concurrent garbage collection• Generations
– inter-generational pointers, write barrier.
Terminology
54
• Heap, objects• Allocate, free (deallocate, delete,
reclaim)• Reachable, live, dead, unreachable• Roots• Reference counting, mark and sweep,
copying, compaction, tracing algorithms• Fragmentation• Concurrent GC • Generational GC
55
Runtime Summary• Runtime:
– services that are always there: function calls, memory management, threads, etc.
– We discussed function calls• scoping rules• activation records• caller/callee conventions• Nested procedures (and the display array)
– Memory Management• mark-sweep, copying, reference counting, compaction• Generations• Concurrent garbage collection
56
OO Issues
57
Representing Data at Runtime
• Source language types– int, boolean, string, object types
• Target language types– Single bytes, integers, address representation
• Compiler should map source types to some combination of target types– Implement source types using target types
58
Basic Types
• int, boolean, string, void• Arithmetic operations
– Addition, subtraction, multiplication, division, remainder
• Can be mapped directly to target language types and operations
59
Pointer Types
• Represent addresses of source language data structures
• Usually implemented as an unsigned integer• Pointer dereferencing – retrieves pointed value
• May produce an error– Null pointer dereference – When is this error triggered?
60
Object Types
• Basic operations– Field selection + read/write
• computing address of field, dereferencing address
– Copying• copy block or field-by-field copying
– Method invocation• Identifying method to be called, calling it
• How does it look at runtime?
61
Object Types
class Foo { int x; int y;
void rise() {…} void shine() {…}}
x
y
rise
shine
Compile time information
Runtime memory layout for object of class Foo
DispacthVectorPtr
62
Field Selection
x
y
rise
shine
Compile time information
Runtime memory layout for object of class Foo
DispacthVectorPtrFoo f;int q;
q = f.x;
MOV f, %EBXMOV 4(%EBX), %EAXMOV %EAX, q
base pointer
field offset
from base pointer
63
Object Types - Inheritance
x
y
rise
shine
Compile time information
Runtime memory layout for object of class Bar
twinkle
z
DispacthVectorPtr
class Foo { int x; int y;
void rise() {…} void shine() {…}}
class Bar extends Foo{ int z; void twinkle() {…}}
64
Object Types - Polymorphism
class Foo { … void rise() {…} void shine() {…}}
x
y
Runtime memory layout for object of class Bar
class Bar extends Foo{ …}
z
class Main { void main() { Foo f = new Bar(); f.rise();}
f
Pointer to Bar
Pointer to Foo inside Bar
DVPtr
65
Static & Dynamic Binding
• Which “rise” should main() use? • Static binding: f is of type Foo and therefore it always refers
to Foo’s rise. • Dynamic binding: f points to a Bar object now, so it refers
to Bar’s rise.
class Foo { … void rise() {…} void shine() {…}}
class Bar extends Foo{ void rise() {…}}
class Main { void main() { Foo f = new Bar(); f.rise();}
66
Typically, Dynamic Binding is used
• Finding the right method implementation at runtime according to object type
• Using the Dispatch Vector (a.k.a. Dispatch Table)
class Foo { … void rise() {…} void shine() {…}}
class Bar extends Foo{ void rise() {…}}
class Main { void main() { Foo f = new Bar(); f.rise();}
67
Dispatch Vectors in Depth
• Vector contains addresses of methods• Indexed by method-id number• A method signature has the same id number for all subclasses
class Main { void main() { Foo f = new Bar(); f.rise();}
class Foo { … void rise() {…} void shine() {…}}
01
class Bar extends Foo{ void rise() {…}}
0
xyz
fPointer to Bar
Pointer to Foo inside BarDVPtr
shine
rise
shinerise
Dispatchvector for Bar
Methodcode
using Bar’s
dispatch table
68
Dispatch Vectors in Depthclass Main { void main() { Foo f = new Foo(); f.rise();}
class Foo { … void rise() {…} void shine() {…}}
01
class Bar extends Foo{ void rise() {…}}
0
xy
f
Pointer to Foo
DVPtrshine
rise
shinerise
using Foo’s
dispatch table
Dispatchvector for Foo
Methodcode
Multiple Inheritance
69
supertyping convert_ptr_to_E_to_ptr_to_C(e) = e convert_ptr_to_E_to_ptr_to_D(e) = e + sizeof (class C)
subtyping convert_ptr_to_C_to_ptr_to_E(e) = e convert_ptr_to_D_to_ptr_to_E(e) = e - sizeof (class C)
class C { field c1; field c2; void m1() {…} void m2() {…}}class D { field d1; void m3() {…} void m4() {…}}class E extends C,D{ field e1; void m2() {…} void m4() {…} void m5() {…}}
c1c2
DVPtr
Pointer to E
Pointer to C inside EDVPtr
m2_C_E
m1_C_C
E-Object layout
Dispatchvector
d1e1
m4_D_E
m3_D_D
m5_E_EPointer to D inside E
70
Runtime checks
• generate code for checking attempted illegal operations– Null pointer check– Array bounds check– Array allocation size check– Division by zero– …
• If check fails jump to error handler code that prints a message and gracefully exists program
71
Null pointer check
# null pointer check cmp $0,%eax je labelNPE
labelNPE: push $strNPE # error message call __println push $1 # error code call __exit
Single generated handler for entire program
72
Array bounds check
# array bounds check mov -4(%eax),%ebx # ebx = length# ecx holds index cmp %ecx,%ebx jle labelABE # ebx <= ecx ? cmp $0,%ecx jl labelABE # ecx < 0 ?
labelABE: push $strABE # error message call __println push $1 # error code call __exit
Single generated handler for entire program
73
Array allocation size check
# array size check cmp $0,%eax # eax == array size jle labelASE # eax <= 0 ?
labelASE: push $strASE # error message call __println push $1 # error code call __exit
Single generated handler for entire program
74
Exceptions
bar
foo
main
Exception thrown
env
Exception exampleorg.eclipse.swt.SWTException: Graphic is disposed
at org.eclipse.swt.SWT.error(SWT.java:3744) at org.eclipse.swt.SWT.error(SWT.java:3662) at org.eclipse.swt.SWT.error(SWT.java:3633) at org.eclipse.swt.graphics.GC.getClipping(GC.java:2266) at com.aelitis.azureus.ui.swt.views.list.ListRow.doPaint(ListRow.java:260) at com.aelitis.azureus.ui.swt.views.list.ListRow.doPaint(ListRow.java:237) at com.aelitis.azureus.ui.swt.views.list.ListView.handleResize(ListView.java:867) at com.aelitis.azureus.ui.swt.views.list.ListView$5$2.runSupport(ListView.java:406) at org.gudy.azureus2.core3.util.AERunnable.run(AERunnable.java:38) at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:130) at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3323) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2985) at org.gudy.azureus2.ui.swt.mainwindow.SWTThread.<init>(SWTThread.java:183) at org.gudy.azureus2.ui.swt.mainwindow.SWTThread.createInstance(SWTThread.java:67)….
76
Recap• Lexical analysis
– regular expressions identify tokens (“words”)
• Syntax analysis– context-free grammars identify the structure of the program
(“sentences”)
• Contextual (semantic) analysis– type checking defined via typing judgements– can be encoded via attribute grammars– Syntax directed translation
• Intermediate representation – many possible IRs; generation of intermediate representation;
3AC; backpatching
• Runtime: – services that are always there: function calls, memory
management, threads, etc.
77
Journey inside a compiler
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
Inter.Rep.
Code Gen.
float position;
float initial;
float rate;
position = initial + rate * 60
<float> <ID,position> <;> <float> <ID,initial> <;> <float> <ID,rate> <;> <ID,1> <=> <ID,2> <+> <ID,3> <*> <60>
TokenStream
78
Journey inside a compiler
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
Inter.Rep.
Code Gen.
<ID,1> <=> <ID,2> <+> <ID,3> <*> <60>
60
<id,1>
=
<id,3>
<id,2>
+
*
AST
id symbol type data
1 position float …
2 initial float …
3 rate float …
symbol table
S ID = EE ID | E + E | E * E | NUM
79
Journey inside a compiler
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
Inter.Rep.
Code Gen.
60
=
<id,3>
<id,2>
+
*
<id,1>
inttofloat
60
<id,1>
=
<id,3>
<id,2>
+
*
AST AST
coercion: automatic conversion from int to floatinserted by the compiler
id symbol type
1 position float
2 initial float
3 rate float
symbol table
80
Journey inside a compiler
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
Inter.Rep.
Code Gen.
t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3
3AC
60
=
<id,3>
<id,2>
+
*
<id,1>
inttofloat
production semantic rule
S id = E S.code := E. code || gen(id.var ‘:=‘ E.var)
E E1 op E2 E.var := freshVar(); E.code = E1.code || E2.code || gen(E.var ‘:=‘ E1.var ‘op’ E2.var)
E inttofloat(num) E.var := freshVar(); E.code = gen(E.var ‘:=‘ inttofloat(num))
E id E.var := id.var; E.code = ‘’
t1 = inttofloat(60)t2 = id3 * t1
t3 = id2 * t2id1 = t3
(for brevity, bubbles show only code generated by the node and not all accumulated “code” attribute)
note the structure:translate E1translate E2
handle operator
81
Journey inside a compiler
Inter.Rep.
Code Gen.
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
3AC Optimized
t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3
t1 = id3 * 60.0id1 = id2 + t1
value known at compile timecan generate code with converted value
eliminated temporary t3
82
Journey inside a compiler
Inter.Rep.
Code Gen.
LexicalAnalysi
s
Syntax Analysi
s
Sem.Analysi
s
Optimized
t1 = id3 * 60.0id1 = id2 + t1
Code Gen
LDF R2, id3MULF R2, R2, #60.0LDF R1, id2ADDF R1,R1,R2STF id1,R1
83
Problem 3.8 from [Appel]
A simple left-recursive grammar: E E + id E id
A simple right-recursive grammar accepting the same language:
E id + E E id
Which has better behavior for shift-reduce parsing?
84
Answer
The stack never has more than three items on it. In general, withLR-parsing of left-recursive grammars, an input string of length O(n)requires only O(1) space on the stack.
E E + idE id
Input
id+id+id+id+id
id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E
stack
left recursive
85
Answer
The stack grows as large as the input string. In general, with LR-parsingof right-recursive grammars, an input string of length O(n) requires O(n) space on the stack.
E id + EE id
Input
id+id+id+id+id
id id + id + id id + id + id + id + id id + id + id id + id + id + id id + id + id + id + id + id + id + id + id (reduce) id + id + id + id + E (reduce) id + id + id + E (reduce) id + id + E (reduce) id + E (reduce) E
stack
right recursive