Theory of Compilation 236360 Erez Petrank Lecture 9: Runtime Part II. 1.

Theory of Compilation 236360

Erez Petrank

Lecture 9: Runtime Part II.

Runtime Environment

• Mediates between the OS and the programming language

• Hides details of the machine from the programmer– Ranges from simple support functions all the way to a

full-fledged virtual machine

• Handles common tasks – Runtime stack (activation records)– Memory management– JIT (Dynamic optimization)– Debugging– …

Dynamic Memory Management: Introduction

There is a course about this topic: 236780 “Algorithms for dynamic memory management”

Static and Dynamic Variables

• Static variables are defined in a method and allocated on the runtime stack.

• Dynamic allocation: • In C, “malloc” allocates a space and “free” says that the

program will not use this space anymore.

Ptr = malloc (256 bytes);/* Use ptr */free (Ptr);

Dynamic Memory Allocation

• In Java, “new” allocates an object for a given class. – President obama = new President

• But there is no instruction for manually deleting the object.

• Objects reclaimed by a garbage collector when the program “does not need them” anymore.

course c = new course(236360);c.class = “TAUB 2”;Faculty.add(c);

Manual Vs. Automatic Memory Management

• A manual memory management lets the programmer decide when objects are deleted.

• A memory manager that lets a garbage collector delete objects is called automatic.

• Manual memory management creates severe debugging problems– Memory leaks,– Dangling pointers.

• Considered the BIG debugging problem of the 80’s

Automatic Memory Reclamantion

• An object is reclaimed when the program has “no way of accessing it”.

• Formally, when it is unreachable by a path of pointers from the “root” pointers, to which the program has direct access. – Local variables, pointers on stack, global (class) pointers, JNI

pointers, etc.

What’s good about automatic “garbage collection”?

• Software engineering: – Relieves users of the book-keeping burden. – Stronger reliability, fewer bugs, faster debugging. – Code understandable and reliable. (Less interaction

between modules.)

• Security (Java):– Program never gets a pointer to “play with”.

Importance

• Memory is the bottleneck in modern computation. – Time & energy (and space).

• Optimal allocation (even if all accesses are known in advance to the allocator) is NP-Complete (to even approximate).

• Must be done right for a program to run efficiently.

• Must be done right to ensure reliability.

GC and languages

• Sometimes it’s built in:– LISP, Java, C#.– The user cannot free an object.

• Sometimes it’s an added feature:– C, C++.– User can choose to free objects or not. The collector

frees all objects not freed by the user.

• Most modern languages are supported by garbage collection.

Most modern languages rely on GC

Source: “The Garbage Collection Handbook” by Richard Jones, Anthony Hosking, and Eliot Moss.

What’s bad about automatic “garbage collection”?

• It has a cost: – Old Lisp systems 40%.– Today’s Java program (if the collection is done “right”)

5-15%.

• Considered a major factor determining program efficiency.

• Techniques have evolved since the 60’s. We will only go over basic techniques.

Garbage Collection Efficiency

• Overall collection overheads (program throughput).

• Pauses in program run. • Space overhead. • Cache Locality (efficiency and energy).

Three classical algorithms

• Reference counting• Mark and sweep (and mark-compact)• Copying.

• The last two are also called tracing algorithms because they go over (trace) all reachable objects.

Reference counting

• Recall that we would like to know if an object is reachable from the roots.

• Associate a reference count field with each object: how many pointers reference this object.

• When nothing points to an object, it can be deleted.

• Very simple, used in many systems.

Basic Reference Counting

• Each object has an RC field, new objects get o.RC:=1.

• When p that points to o1 is modified to point to o2 we execute: o1.RC--, o2.RC++.

• if then o1.RC==0:– Delete o1.

– Decrement o.RC for all “children” of o1.

– Recursively delete objects whose RC is decremented to 0.

A Problem: Cycles

• The Reference counting algorithm does not reclaim cycles!

• Solution 1: ignore cycles, they do not appear frequently in modern programs.

• Solution 2: run tracing algorithms (that can reclaim cycles) infrequently.

• Solution 3: designated algorithms for cycle collection.

• Another problem for the naïve algorithm: requires a lot of synchronization in parallel programs.

• Advanced versions solve that.

The Mark-and-Sweep Algorithm

• Mark phase:– Start from roots and traverse all objects reachable by

a path of pointers. – Mark all traversed objects.

• Sweep phase:– Go over all objects in the heap. – Reclaim objects that are not marked.

The Mark-Sweep algorithm

• Traverse live objects & mark black. • White objects can be reclaimed.

ckHeap

registers

Note! This is not the heap data structure!

Triggering

New(A)=if free_list is empty

mark_sweep() if free_list is empty

return (“out-of-memory”)pointer = allocate(A)return (pointer)

Garbage collection is triggered by allocation.

Basic Algorithm

mark_sweep()= for Ptr in Roots

mark(Ptr)sweep()

mark(Obj)=if mark_bit(Obj) == unmarked

mark_bit(Obj)=markedfor C in Children(Obj)

mark(C)

Sweep()=p = Heap_bottomwhile (p < Heap_top)

if (mark_bit(p) == unmarked) then free(p)else mark_bit(p) = unmarked; p=p+size(p)

Mark&Sweep Example

Properties of Mark & Sweep

• Most popular method today (at a more advanced form). • Simple.• Does not move objects, and so heap may fragment. • Complexity:

Mark phase: live objects (dominant phase) Sweep phase: heap size.

• Termination: each pointer traversed once. • Various engineering tricks are used to improve

performance.

• During the run objects are allocated and reclaimed. • Gradually, the heap gets fragmented. • When space is too fragmented to allocate, a compaction

algorithm is used. • Move all live objects to the beginning of the heap and

update all pointers to reference the new locations. • Compaction is considered very costly and we usually

attempt to run it infrequently, or only partially.

Mark-Compact

The Heap

An Example: The Compressor

• A simplistic presentation of the Compressor: • Go over the heap and compute for each live object where it

moves to – To the address that is the sum of live space before it in the

heap. – Save the new locations in a separate table.

• Go over the heap and for each object: – Move it to its new location– Update all its pointers.

• Why can’t we do it all in a single heap pass? • (In the full algorithm: succinct table, execute the first pass

very quickly, and parallelization.)

Mark Compact

• Important parameters of a compaction algorithm:– Keep order of objects?– Use extra space for compactor data structures?– How many heap passes? – Can it run in parallel on a multi-processor?

• We do not elaborate in this intro.

Copying garbage collection

• Heap partitioned into two.• Part 1 takes all allocations.• Part 2 is reserved. • During GC, the collector traces all reachable

objects and copies them to the reserved part. • After copying the parts roles are reversed: • Allocation activity goes to part 2, which was

previously reserved. • Part 1, which was active, is reserved till next

collection.

Copying garbage collection

Part I Part II

The collection copies…

Part I Part II

Roots are updated; Part I reclaimed.

Part I Part II

Properties of Copying Collection

• Compaction for free• Major disadvantage: half of the heap is not

used. • “Touch” only the live objects

– Good when most objects are dead. – Usually most new objects are dead, and so there are

methods that use a small space for young objects and collect this space using copying garbage collection.

A very simplistic comparison

Copying

Mark & sweep

Reference Counting

Live objects

Size of heap (live objects)

Pointer updates + dead objects

Complexity

Half heap wasted

Bit/object + stack for DFS

Count/object + stack for DFS

Space overhead

For free Additional work Additional work

Compaction

long long Mostly short Pause time

Cycle collection

More issues

Memory Management with Parallel Processors

• Stop the world• Parallel (stop-the-world) GC• Concurrent GC

• Trade-offs in pauses and throughput• Difference in complexity• Choose between parallel (longer pauses) and

concurrent (lower throughput)

Terminology

Stop-the-World

Parallel

Concurrent

On-the-Fly

programGC

Concurrent GC• Problem: Long pauses disturb the user.An important measure for the collection: length

of pauses.

• Can we just run the program while the collector runs on a different thread?

• Not so simple!• The heap changes while we collect. • For example,

– we look at an object B, but before we have a chance to mark its children, the program changes them.

Problem: Interference

1. GC traced B

SYSTEM = program|| GC

1. GC traced B 2. programlinks C to B

3. programunlinks C from A

C LOST

3. programunlinks C from A

4. GC traced A

Coordination with a write-barrier

• The program notifies the collector that changes happen so that it can mark objects conservatively.

• E.g.: the program registers objects that gain pointers.

• update(object y, field f, object x) {notifyCollector(x); // record new referent y.f := x;

• Collector can then assume all recordedobjects are alive (and trace their descendants as live).

Three Typical Write Barriers

1. Record C when C is linked to B

2. Record C when link to C is removed

3. Record B when C is linked to B

Dijkstra on Concurrent GC

It has been surprisingly hard to find the published solution and justification.

It was only too easy to design what looked--sometimes even for weeks and to many people--like a perfectly valid solution, until the effort to prove it to be correct revealed a (sometimes deep) bug.

Generational Garbage Collection

“The weak generational hypothesis”: most objects die young.

Using the hypothesis: separate objects according to their ages and collect

the area of the young objects more frequently.

More Specifically,

• The heap is divided into two or more areas (generations).

• Objects allocated in 1st (youngest) generation. • The youngest generation is collected frequently. • Objects that survive in the young generation

“long enough” are promoted to the old generation.

Old GenerationYoung Generation

• Short pauses: the young generation is kept small and so most pauses are short.

• Efficiency: collection efforts are concentrated where many dead objects exists.

• Locality: • Collector: mostly concentrated on a small part of

the heap• Program: allocates (and mostly uses) young

objects in a small part of the memory.

Advantages

Mark-Sweep or Copying ?

• Copying is good when live space is small (time) and heap is small (space).

• A popular choice: – Copying for the (small) young generation.– Mark-and-sweep for the full collection.

• A small waste in space, high efficiency.

Inter-Generational Pointers

• When tracing the young generation, is it enough to trace from the roots through the young generation only?

• When tracing the young generation, is it enough to trace from roots through the young generation?

• No! Pointers from old to young generation may witness the liveness of an object.

• We don’t trace the old generation. – Why?

• The solution: – “Maintain a list” of all inter-generational pointers.– Assume (conservatively) the parent (old) object is

alive. – Treat these pointers as additional roots.

• “Typically”: most pointers are from young to old (few from old to young).

• When collecting the old generation, collect the entire heap.

• Inter-generational pointers are created: – When objects are promoted to old generation– When pointers are modified in the old generation.

• The first can be monitored by the collector during promotion.

• The second requires a write barrier.

Write Barrier

Update(object y, field f, object x) { if y is in old space and x is in young

remember y.f->x ; y.f := x;}

young old

Reference counting also had a write-barrier:

update(object y, field f, object x) {x.RC++; // increment new referent

y.f^.RC--; // decrement old referentif (y.f^.RC == 0) collect(y.f); y.f := x;

Modern Memory Management

• Handle parallel platforms.• Real-time. • Cache consciousness. • Handle new platforms (GPU, PCM, …) • Hardware assistance.

Summary: Dynamic Memory Management

• Compiler generates code for allocation of objects and garbage collection when necessary.

• Reference-Counting, Mark-Sweep, Mark-Compact, Copying. • Concurrent garbage collection• Generations

– inter-generational pointers, write barrier.

Terminology

• Heap, objects• Allocate, free (deallocate, delete,

reclaim)• Reachable, live, dead, unreachable• Roots• Reference counting, mark and sweep,

copying, compaction, tracing algorithms• Fragmentation• Concurrent GC • Generational GC

Runtime Summary• Runtime:

– services that are always there: function calls, memory management, threads, etc.

– We discussed function calls• scoping rules• activation records• caller/callee conventions• Nested procedures (and the display array)

– Memory Management• mark-sweep, copying, reference counting, compaction• Generations• Concurrent garbage collection

OO Issues

Representing Data at Runtime

• Source language types– int, boolean, string, object types

• Target language types– Single bytes, integers, address representation

• Compiler should map source types to some combination of target types– Implement source types using target types

Basic Types

• int, boolean, string, void• Arithmetic operations

– Addition, subtraction, multiplication, division, remainder

• Can be mapped directly to target language types and operations

Pointer Types

• Represent addresses of source language data structures

• Usually implemented as an unsigned integer• Pointer dereferencing – retrieves pointed value

• May produce an error– Null pointer dereference – When is this error triggered?

Object Types

• Basic operations– Field selection + read/write

• computing address of field, dereferencing address

– Copying• copy block or field-by-field copying

– Method invocation• Identifying method to be called, calling it

• How does it look at runtime?

Object Types

class Foo { int x; int y;

void rise() {…} void shine() {…}}

Compile time information

Runtime memory layout for object of class Foo

DispacthVectorPtr

Field Selection

Runtime memory layout for object of class Foo

DispacthVectorPtrFoo f;int q;

q = f.x;

MOV f, %EBXMOV 4(%EBX), %EAXMOV %EAX, q

base pointer

field offset

from base pointer

Object Types - Inheritance

Runtime memory layout for object of class Bar

twinkle

DispacthVectorPtr

class Foo { int x; int y;

void rise() {…} void shine() {…}}

class Bar extends Foo{ int z; void twinkle() {…}}

Object Types - Polymorphism

class Foo { … void rise() {…} void shine() {…}}

Runtime memory layout for object of class Bar

class Bar extends Foo{ …}

class Main { void main() { Foo f = new Bar(); f.rise();}

Pointer to Bar

Pointer to Foo inside Bar

Static & Dynamic Binding

• Which “rise” should main() use? • Static binding: f is of type Foo and therefore it always refers

to Foo’s rise. • Dynamic binding: f points to a Bar object now, so it refers

to Bar’s rise.

class Bar extends Foo{ void rise() {…}}

Typically, Dynamic Binding is used

• Finding the right method implementation at runtime according to object type

• Using the Dispatch Vector (a.k.a. Dispatch Table)

Dispatch Vectors in Depth

• Vector contains addresses of methods• Indexed by method-id number• A method signature has the same id number for all subclasses

fPointer to Bar

Pointer to Foo inside BarDVPtr

shinerise

Dispatchvector for Bar

Methodcode

using Bar’s

dispatch table

Dispatch Vectors in Depthclass Main { void main() { Foo f = new Foo(); f.rise();}

Pointer to Foo

DVPtrshine

shinerise

using Foo’s

dispatch table

Dispatchvector for Foo

Methodcode

Multiple Inheritance

supertyping convert_ptr_to_E_to_ptr_to_C(e) = e convert_ptr_to_E_to_ptr_to_D(e) = e + sizeof (class C)

subtyping convert_ptr_to_C_to_ptr_to_E(e) = e convert_ptr_to_D_to_ptr_to_E(e) = e - sizeof (class C)

class C { field c1; field c2; void m1() {…} void m2() {…}}class D { field d1; void m3() {…} void m4() {…}}class E extends C,D{ field e1; void m2() {…} void m4() {…} void m5() {…}}

Pointer to E

Pointer to C inside EDVPtr

m2_C_E

m1_C_C

E-Object layout

Dispatchvector

m4_D_E

m3_D_D

m5_E_EPointer to D inside E

Runtime checks

• generate code for checking attempted illegal operations– Null pointer check– Array bounds check– Array allocation size check– Division by zero– …

• If check fails jump to error handler code that prints a message and gracefully exists program

Null pointer check

# null pointer check cmp $0,%eax je labelNPE

labelNPE: push $strNPE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

Array bounds check

# array bounds check mov -4(%eax),%ebx # ebx = length# ecx holds index cmp %ecx,%ebx jle labelABE # ebx <= ecx ? cmp $0,%ecx jl labelABE # ecx < 0 ?

labelABE: push $strABE # error message call __println push $1 # error code call __exit

Array allocation size check

# array size check cmp $0,%eax # eax == array size jle labelASE # eax <= 0 ?

labelASE: push $strASE # error message call __println push $1 # error code call __exit

Exceptions

Exception thrown

Exception exampleorg.eclipse.swt.SWTException: Graphic is disposed

at org.eclipse.swt.SWT.error(SWT.java:3744) at org.eclipse.swt.SWT.error(SWT.java:3662) at org.eclipse.swt.SWT.error(SWT.java:3633) at org.eclipse.swt.graphics.GC.getClipping(GC.java:2266) at com.aelitis.azureus.ui.swt.views.list.ListRow.doPaint(ListRow.java:260) at com.aelitis.azureus.ui.swt.views.list.ListRow.doPaint(ListRow.java:237) at com.aelitis.azureus.ui.swt.views.list.ListView.handleResize(ListView.java:867) at com.aelitis.azureus.ui.swt.views.list.ListView$5$2.runSupport(ListView.java:406) at org.gudy.azureus2.core3.util.AERunnable.run(AERunnable.java:38) at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:130) at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3323) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2985) at org.gudy.azureus2.ui.swt.mainwindow.SWTThread.<init>(SWTThread.java:183) at org.gudy.azureus2.ui.swt.mainwindow.SWTThread.createInstance(SWTThread.java:67)….

Recap• Lexical analysis

– regular expressions identify tokens (“words”)

• Syntax analysis– context-free grammars identify the structure of the program

(“sentences”)

• Contextual (semantic) analysis– type checking defined via typing judgements– can be encoded via attribute grammars– Syntax directed translation

• Intermediate representation – many possible IRs; generation of intermediate representation;

3AC; backpatching

• Runtime: – services that are always there: function calls, memory

management, threads, etc.

Journey inside a compiler

LexicalAnalysi

Syntax Analysi

Sem.Analysi

Inter.Rep.

Code Gen.

float position;

float initial;

float rate;

position = initial + rate * 60

TokenStream

LexicalAnalysi

Syntax Analysi

Sem.Analysi

Inter.Rep.

Code Gen.

<ID,1> <=> <ID,2> <+> <ID,3> <*> <60>

<id,1>

<id,3>

<id,2>

id symbol type data

1 position float …

2 initial float …

3 rate float …

symbol table

S ID = EE ID | E + E | E * E | NUM

LexicalAnalysi

Syntax Analysi

Sem.Analysi

Inter.Rep.

Code Gen.

<id,3>

<id,2>

<id,1>

inttofloat

<id,1>

<id,3>

<id,2>

AST AST

coercion: automatic conversion from int to floatinserted by the compiler

id symbol type

1 position float

2 initial float

3 rate float

symbol table

LexicalAnalysi

Syntax Analysi

Sem.Analysi

Inter.Rep.

Code Gen.

t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3

<id,3>

<id,2>

<id,1>

inttofloat

production semantic rule

S id = E S.code := E. code || gen(id.var ‘:=‘ E.var)

E E1 op E2 E.var := freshVar(); E.code = E1.code || E2.code || gen(E.var ‘:=‘ E1.var ‘op’ E2.var)

E inttofloat(num) E.var := freshVar(); E.code = gen(E.var ‘:=‘ inttofloat(num))

E id E.var := id.var; E.code = ‘’

t1 = inttofloat(60)t2 = id3 * t1

t3 = id2 * t2id1 = t3

(for brevity, bubbles show only code generated by the node and not all accumulated “code” attribute)

note the structure:translate E1translate E2

handle operator

Inter.Rep.

Code Gen.

LexicalAnalysi

Syntax Analysi

Sem.Analysi

3AC Optimized

t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3

t1 = id3 * 60.0id1 = id2 + t1

value known at compile timecan generate code with converted value

eliminated temporary t3

Inter.Rep.

Code Gen.

LexicalAnalysi

Syntax Analysi

Sem.Analysi

Optimized

t1 = id3 * 60.0id1 = id2 + t1

Code Gen

LDF R2, id3MULF R2, R2, #60.0LDF R1, id2ADDF R1,R1,R2STF id1,R1

Problem 3.8 from [Appel]

A simple left-recursive grammar: E E + id E id

A simple right-recursive grammar accepting the same language:

E id + E E id

Which has better behavior for shift-reduce parsing?

Answer

The stack never has more than three items on it. In general, withLR-parsing of left-recursive grammars, an input string of length O(n)requires only O(1) space on the stack.

E E + idE id

id+id+id+id+id

id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E

left recursive

Answer

The stack grows as large as the input string. In general, with LR-parsingof right-recursive grammars, an input string of length O(n) requires O(n) space on the stack.

E id + EE id

id+id+id+id+id

id id + id + id id + id + id + id + id id + id + id id + id + id + id id + id + id + id + id + id + id + id + id (reduce) id + id + id + id + E (reduce) id + id + id + E (reduce) id + id + E (reduce) id + E (reduce) E

right recursive

Theory of Compilation 236360 Erez Petrank Lecture 9: Runtime Part II. 1.

Documents

Transcript of Theory of Compilation 236360 Erez Petrank Lecture 9: Runtime Part II. 1.

Erez Alsheich - GridControl

A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.

Submitters: Erez Rokah Erez Goldshide Supervisor: Yossi Kanizo.

Theory of Compilation 236360 Erez Petrank Lecture 11: Optimizations 1.

An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.

Theory of Compilation 236360 Erez Petrank Lecture 4: Semantic Analysis: 1.

Theory of Compilation 236360 Erez Petrank Lecture 1: Introduction, Lexical Analysis 1.

Concurrent Hashing and Natural Parallelism Chapter 13 in The Art of Multiprocessor Programming Instructor: Erez Petrank Presented by Tomer Hermelin.

YOAV ZURIEL, MICHAL FRIEDMAN, GALI SHEFFI, NACHSHON …erez/Papers/durable-sets-oopsla19.pdf · 128:2 Yoav Zuriel, Michal Friedman, Gali Shefi, Nachshon Cohen, and Erez Petrank may

Theory of Compilation 236360 Erez Petrank

1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.

Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.

Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+

Theory of Compilation 236360 Erez Petrank Lecture 2: Syntax Analysis, Top-Down Parsing 1.

An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –

תורת הקומפילציה 236360 הרצאה 9 שפות ביניים Intermediate Languages/Representations

Theory of Compilation 236360 Erez Petrank Lecture 5: Semantic Analysis: 1.

Data Structure Aware Garbage Collector - Technionerez/Papers/dsa-ismm-15.pdf · Data Structure Aware Garbage Collector Nachshon Cohen Technion nachshonc@gmail.com Erez Petrank Technion

YOAV ZURIEL, GALI SHEFFI, arXiv:1909.02852v2 [cs.DC] 9 Sep 2019 · 128:2 Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank may change the order in which

On the limits of partial compaction Anna Bendersky & Erez Petrank Technion.