CSE P501 – Compiler Construction
description
Transcript of CSE P501 – Compiler Construction
Spring 2014 Jim Hogg - UW - CSE P501 W-1
CSE P501 – Compiler Construction
Conventional Heap Storage
Garbage Collection
Spring 2014 Jim Hogg - UW - CSE P501 W-2
...char* s = (char*) malloc(50);...free(s);
C
C Runtime Heap Memory
• Developer must remember to free memory when no longer required
• Eventual fragmentation => slow to malloc, slow to free
In Use
Conventional Heap Storage
Spring 2014 Jim Hogg - UW - CSE P501 W-3
Heap Storage Fragmentation
C Runtime Heap Memory
In Use
• malloc: walk the freelist to find a slot big enough for current request
• free: adjust freelist; collapse contiguous freespace• fragmentation: plenty free chunks but none big enough for request• cannot compact the used space - may contain pointers; may be
pointed-at
Spring 2014 Jim Hogg - UW - CSE P501 W-4
Bugs• Forget to free => eventually run out of memory
• called a "memory leak"
• Call free, but continue to use!• called "use-after-free", or "dangling pointer"• memory corruption - wrong answers; crash if lucky!• major source of security issues• detect via "pool poisoning"
2 pointers
free via 1
free malloc; corruption!
Spring 2014 Jim Hogg - UW - CSE P501 W-5
Solving Memory Leaks
int f() { ... Car c = Car("Chevy"); ...}
C++
• object c is destructed [calls ~Car() ]automatically when c it falls out of scope
• called RAII ("Resource Acquisition Is Initialization")
• more general case still requires Developer to delete c when no longer needed
• "smart pointers" can help
W-6
Solving Leaks and Use-After-Freepublic static int Main(String[] args) { ... if (...) { ArrayList cars = new ArrayList(); for (int i = 0; i < 1000; i++) { cars.Add(new Car()); } ... } ... Boat b = new Boat(); ...}
A
• At point A I have 1000 Car objects allocated on the heap• At some later time I ask for more heap memory. May
trigger a "garbage collection" - automatic, and silent• Compiler realizes that cars is not referenced later by the
program; so it's "garbage"; GC recycles it for use by other objects
• Magically solves leaks and use-after-free!
B
Spring 2014 Jim Hogg - UW - CSE P501 W-7
next
next
Allocate an object; fast!
next
Allocate more objects; and one more, please?
Garbage Collection
Spring 2014 Jim Hogg - UW - CSE P501 W-8
Garbage Collection, 1
Allocate another object
next
next"roots"
Trace reachable objects
next
"roots"
Compact unreachables;update all pointers
GC does not find garbage: it finds live objects and ignores all other memory
Spring 2014 Jim Hogg - UW - CSE P501 W-9
Roots?
"Roots" are locations that hold a pointer to any object in the heap:
• Method-local variables• Method Arguments• Global variables• Class-static fields• Registers
Spring 2014 Jim Hogg - UW - CSE P501 W-10
GC Start
root
root
Spring 2014 Jim Hogg - UW - CSE P501 W-11
GC Mark Phase
root
root
UnreachableReachable
Spring 2014 Jim Hogg - UW - CSE P501 W-12
GC Sweep Phase
root
root
Reachable
With memory free, now allocate space for object that provoked the GC
Spring 2014 Jim Hogg - UW - CSE P501 W-13
No Bugs
• Forget to free => eventually run out of memory• called a "memory leak"
• Call free, but continue to use!• called "use-after-free"
• GC removes control over free'ing memory from the hands of the Developer
• GC will find all live objects and reclaim all other memory each time it is invoked. So no "memory leak"s
• An object is garbage-collected only if it was unreachable by the program; being unreachable, GC removes "use-after-free" bugs
• So what's a "memory leak" in C# or Java? - holding on to objects that really could be freed - eg, by setting the object-ref to null. Particularly troublesome in a Generational Garbage Collector
Spring 2014 Jim Hogg - UW - CSE P501 W-14
C# Finalizers
• Class C above defines a "Finalize" method. Syntax is ~C()• Syntactic sugar for: protected override void Finalize()
• As each object of class C is created, the CLR links it onto a Finalization Queue
• When the object is about to be reclaimed, it is moved onto the Freachable Queue; a Finalizer background thread will run that Finalize method later
class C { private StreamWriter sw; ... ~C() { sw.Close(); } // flush output buffer ...}
Spring 2014 Jim Hogg - UW - CSE P501 W-15
Finalization
root
Finalization Queue
root
Finalization Queue
FreachableQueue
Not live; not dead; zombies
Spring 2014 Jim Hogg - UW - CSE P501 W-16
Threads• Compiler needs to create GC info - at each point in program,
where are current roots? - variables, registers
• GC info is bulky - cannot store for every instruction!
• So store GC info for selected points in the program - "GC-safepoints"
• Typical GC-safepoint is a function return
• Need to stop all threads to carry out a GC• For each thread stack-frame, stomp its return address; when it
reaches there, the thread will return, not to its caller, but into GC code
Spring 2014 Jim Hogg - UW - CSE P501 W-17
Hijacking a Thread
retaddr
gc
caller
GC
gc
GC
gc
GC
Top of Stack
Save & Stomp retaddr
Thread continues to
runEventually
returns and hits the hijack
Spring 2014 Jim Hogg - UW - CSE P501 W-18
Threads Run to Their Safepoints
Start GC
After each thread has hit its safepoint, GC can start its sweep phase
Spring 2014 Jim Hogg - UW - CSE P501 W-19
Object Age Distribution• Performing GC over large heap consumes lots of cpu time (and
trashes the caches!)
• Experiment reveals:• Many objects live a very short time (high "infant mortality")• Some object live a very long time• Bi-modal
number
age
Object Ages
Spring 2014 Jim Hogg - UW - CSE P501 W-20
Generational Garbage Collection
Gen0 Gen2Gen1
'Nursery'
• Divide heap into 3 areas• Allocate new objects from Gen0• At next GC
• move all Gen0 survivors into Gen1• move all Gen1 survivors into Gen2
Spring 2014 Jim Hogg - UW - CSE P501 W-21
Migration thru Generations
Gen0 Gen2Gen1
Gen0 Gen2Gen1
Gen0 0 Gen2Gen1
Survivors
Gen0 Gen2Gen1
Spring 2014 Jim Hogg - UW - CSE P501 W-22
Generational GC• Garbage-collecting entire heap is expensive
• A process of diminishing returns
• Instead, Garbage-Collect only the nursery (Gen0)• Smaller, so faster• Yields lots of free space (objects die young)• Maximum 'bang for the buck'
• Sometime collect Gen0+Gen1
• When the going gets tough, collect Gen0+Gen1+Gen2 ("full" collection)
• Some old objects may become unreachable - just ignore them in a Gen0 collect
• Above picture assumes no writes to old objects kept Gen0 objects reachable
Spring 2014 Jim Hogg - UW - CSE P501 W-23
Card Table• In practice, older objects are written-to between GCs• Might update a pointer that results in keeping young object alive;
missing this is a disaster
Gen0 Gen2Gen1
• Card Tables: bitmap, with 1 bit per 128 Bytes of Gen2 heap• Create a "write barrier" - any write into Gen1 or Gen2 heap sets
corresponding bit in Card Table• During GC sweep, check Card Tables for any "keep alive" pointers• In practice, Card Tables contain few 1 bits
Gen2 Card Table
Gen1 Card Table
Spring 2014 Jim Hogg - UW - CSE P501 W-24
Debugging the GC• What if a GC goes wrong?
• Fails to free some memory that is unreachable => memory leak• Frees some memory that is still reachable - impending disaster• Frees some non-objects - disaster• Colloquially called a GC 'hole'
• Causes• Bug in GC - mark, sweep, compaction• Bug in GC info
• Symptoms• Wrong answers• Process Crash• If we're 'lucky' failure comes soon after error• Otherwise, failure may occur seconds, minutes or hours later
• Testing• GC-Stress• eg: maximally randomize the heap; force frequent GCs; check
results
Spring 2014 Jim Hogg - UW - CSE P501 W-25
Further Design Complexities• Objects with Finalizers:
• take 2 GCs to die• extend the lifetime of other connected objects• run at some later time => "non-deterministic" finalization• no guarantee on order that finalizers runs• uncaught exceptions escape and disappear!
• Strong refs• Weak refs - "short" and "long"• "Dispose" pattern• Resurrection• Large Object Heap (objects > 85 kB in size)• Concurrent Garbage Collection• Per-processor heaps; processor affinity• Fully-interruptible GC• Safe-point on back-edge of loop
Spring 2014 Jim Hogg - UW - CSE P501 W-26
References for CLR GC
Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework – by Jeffrey Richter
Garbage Collection—Part 2: Automatic Memory Management in the Microsoft .NET Framework - by Jeffrey Richter
Garbage Collector Basics and Performance Hints – by Rico Mariani
Using GC Efficiently – Part 1 by MaoniUsing GC Efficiently – Part 2 by MaoniUsing GC Efficiently – Part 3 by MaoniUsing GC Efficiently – Part 4 by Maoni
GC Performance Counters - by Maoni
Tools that help diagnose managed memory related issues – by Maoni
Clearing up some confusion over finalization and other areas in GC – by Maoni
And a bit of perspective…
Automatic GC has been around since LISP I in 1958
Ubiquitous in functional and object-oriented programming communities for decades
Mainstream since Java(?) (mid-90s)
Now conventional? - nope!
Specialized patterns of allocate/free are still better coded by-hand - eg "Arena" storage inside compiler optimization phases
Spring 2014 Jim Hogg - UW - CSE P501 W-27