Conservative Garbage Collection Stephan Lesch January 9, 2002 [email protected].
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Conservative Garbage Collection Stephan Lesch January 9, 2002 [email protected].
So Far
Type-accurate GC:– locations of pointers are known– no pointer arithmetic– often tailored to one software product– usually supported by compiler/runtime system
Ambiguous Roots Collection• every register/word potiential pointer• non-supportive environment• little/no knowledge about
– register usage– object/stack layout
• should work with any C/C++ programs• programmers don‘t want to pay for GC unless needed• must coexist with explicit memory management
The middle way:
• programmer/compiler provide information to recognize pointers
Boehm/Demers/Weiser (Xerox PARC) [1988]• non-moving mark-and-deferred-sweep collector• fully conservative, no reliance on compiler
no extra bits to distinguish pointer/non-pointer
no additional object headers
• for C and C++• for Unix, OS/2, Mac, Win95/NT• supports incremental/generational collection• can function as space leak detector
Conservative GC
Heap Layout
Two logically distinct heaps:
Standard heap• malloc / free• compatible with
existing code• no pointers to
collected heap!
Collected heap• GC_malloc• GC_free to free known
garbage• pointers to standard
heap ignored
Layout of Collected Heap
• made up of blocks (e.g. 4 K, aligned to 4 K boundaries)
• one object size per block
• for each object size:– bitmap to mark allocated objects
– freelist (linked list of heap block slots)
– reclaimable blocks queue (deferred sweep)
• heap-block free-list
for small objects:pop free-list for this size
free-list is empty
resume sweep phasestill empty
GC not enough spacereclaimed
expand heap
Allocation
for objects > 1/2 block:allocate chunk of blocks(heap-block free list)
none available
GC not enough spacereclaimed
expand heap
Clear object after allocation!
Finding Roots & Pointers• possible roots: registers, stack, static areas• no cooperation from compiler
– treat every word as potential pointer– ignore interior pointers (standard)– prefer marking from false pointers over ignoring valid pointers
Conservative Pointer Identification: given word p;– does p refer to the collected heap?– does it point into heap block allocated by collector?– does it point to the beginning of an object in that block?
if yes,– mark object in block header– push object onto mark stack
finally: reset mark bits of objects on free-lists
Misidentification• integers accidentally fulfilling validity tests• avoid need to trace from interior pointers...• ... or unaligned pointers:
000000090000000A– avoid addresses with lots of trailing 0’s
• try to avoid generating false references:– collector clears non-atomic objects after alloc
– GC_malloc_atomic for objects without pointers
– programmer initialize structures
– programmer destroy obsolete pointers (“dead pointers on stack are often the most significant source of leaks”)
Black Listing
Idea: don’t allocate in heap blocks at addresses likely to collide with invalid pointers:– black list references to vincinity of heap which fail
validity tests
– extra run before first allocation finds false references in static data
• additional space overhead < 10%• but: difficult to allocate >100K without spanning
black-listed blocks
Influence of Data StructuresProblems with:
large structures + interior pointersstrongly connected structures
Lisp:– small disjoint garbage structures– lists constructed of cons-cells=> Conservative GC worked well, memory leaks remain bounded
(<8% leakage, constant amount)
KRC: – large, strongly connected structures – next pointers in objects=> collector thrashed
[Wentworth, 1990]
Efficiency (1)
Comparative studies by Zorn, 1992; Detlefs et al. 1994
• „real-world“ C programs: (perl, xfig, GhostScript)
• comparing BDW w. explicit managers
• replace malloc() w. GC_malloc(), remove free()
• no further adaption
• used outdated versions (4.3 vs. 1.6/2.6)
Efficiency (2)
• realistic alternative to explicit mem management(20% avg execution time overhead over best managers, up to 57% in worst case)
• marks 3 MB/s on SparcStation II
• up to 3 times heap usage for small heaps (fixed cost for collector’s internal structs)
• needs substantially more space to avoid over-frequent GC
• works best w. programs using very small objects
• might co-exist poorly with cache management(heap blocks aligned on 4K boundaries)
Incremental/Generational Mode
• marking in small steps interleaved with mutator• need to detect later changes to connectivity in
traced parts of graph:– read dirty bits for pages
– write-protect memory and catch faults
• when mark stack is empty:trace from all marked objects on dirty heap blocks
• reduces avg. pause times, increases total exec time• generational: GC uses knowledge which pages
were recently modified
Mostly Copying Collection
• Joel Bartlett, 1988 (Digital)
• hybrid conservative / copying collector:– roots are treated conservative (don’t move referenced objects)
– objects only accessible from heap-allocated objects are copied(assumes pointers in heap-allocated data can be found accurately)
faster allocationless problems with pointer identification
more accurate GC
Object layout
size #pointers
pointers
non-pointers
user data
header
– programmer has no control over object layout
– what if object layout should match hardware registers or file structures?
Heap layout
current_space = 1next_space = 1
1
root
01
42
blocks with space identifiers
currently unused
currently unused
Allocation
• within a block:– inc free-pointer
– dec free-slots-count
• if necessary: search for free block(space_id current_space/next_space)
set its space_id to next_space
• current_space = next_space during allocation
Collection
• GC when heap is half full (half of heap blocks have space_id=current_space)
• next_space = current_space +1 mod n• Fromspace = current_space blocks• Tospace = next_space blocks• scan roots conservatively for pointers into heap• move potentially referred objects to Tospace:
– changing space_id of their blocks to next_space
– add block to Tospace scan list
• copy graphs accessible from blocks on scan list
Heap after Collection
current_space = 2next_space = 2
1
root
22
42
currently unused currently unused
Bartlett‘s GC algorithm (1)gc() =
next_space = (current_space + 1) mod 077777
Tospace_queue = empty
for R in Roots
promote(block(R))
while Tospace_queue != empty
blk = pop(Tospace_queue)
for obj in blk
for S in Children(obj)
S = copy(S)
current_space = next_space
Bartlett‘s GC algorithm (2)promote (block) =
if Heap_bottom block Heap_topand space(block) == current_space
space(block) = next_spaceallocatedBlocks = allocatedBlocks + 1push(block, Tospace_queue)
copy (p) = if space(p) == next_space or p == nil
return pif forwarded(p)
return forwarding_address(p)np = move(p, free)free = free + size(p)forwarding_address(p) = npreturn np
Generational Mode (1)
• One bit in space_id indicates young/old generation• Other bits approximate age of objects/blocks• Minor collection:
– when 50% of free space after last GC is full
– young objects reachable from roots/remembered set are promoted en masse (change space_id/copy)
– remembered set: maintained via memory protection
Generational Mode (2)
• Major collection (mark-compact):– when old generation occupies >85% of heap– mark accessible objects in old generation– pass 1: find old generation blocks <1/3 filled
copy objects to free space leaving forwarding addresses– pass 2: rescan old generation, correct pointers using
forwarding addresses– expand heap if >75% full
• maintaining remembered set costs time, but often saves more time during GC(20% time improvement on Scheme compiler)also reduces pause times in interactive programs
Efficiency (1)
• no thorough studies• space overhead:
space_ids, type info, block links, promotion bits 2% for 512 byte blocks; tagging data increases overhead
• Mostly Copying vs. BDW:Mostly Copying probably better with many shortlived objects, benefit from faster allocation
Experiences
• generational version: 20% runtime improvement for Scheme-to-C compiler
• significant performance increase in CAD program (reduced paging)
• bad results for non-generational collector for Modula-2 w. very large heaps (10s of Megabytes)
• choose GC strategy that fits behaviour of mutator
The optimising Compiler/User Devil• conservative GC defeated by temporarily hidden pointers - parts of graph may be
unreachable during a GC:– pointer arithmetic– adding tag bits
• e.g. optimized array traversal:
for (i=0; i<SIZE; i++)...x[i]...;
...x...;
xend = x+SIZE;for(; x<xend; x++)
...*x...;x -= SIZE;...x...;
inside loop x is interior pointer,
afterwards x points one past the end
Machine-specific Optimizationsstruct l_thing {
char thing[35000];
struct l_thing *next;
}
struct l_thing *;
tail(struct l_thing *x) {
return (x->next);
}
on IBM RISC System/6000, tail() translates toAIU r3=r3,1 ; r3+=65536
L r3=SHADOW(r3, -30536) ;= r3+35000
BA lr
Boehm and Chase’s Solution (1)
• local root set of function f at any point in execution:– register/auto variables
– previously computed values of direct sub-expressions of incompletely evaluated expressions:malloc‘s return value in malloc(size) + 4
• global root set:– declared static and extern variables
– local root sets of all call sites in call chain
– any values stored in other areas scanned by collector
• valid base pointer:– pointer to anywhere inside an object or one past its end
– BDW can handle such pointers
Boehm and Chase’s Solution (2)• every object on garbage collected heap must be accessible
from global root set through chain of base pointers
conservative collection safe with strictly ANSI-compatible programs
• suggested implementation:– preprocess source using macros that prevent code generator from
discarding live base pointers prematurely– compile normally– post-process assembly code, removing macro artifacts
• transparent to programmer & compiler• may interfere with instruction scheduling• may increase register pressure
Ellis and Detlef’s solution
• annotate operations on pointers with names of base pointers from which they’re derived
• compiler treats these operations as uses of the original base pointers, extending their live ranges
• code generation must respect live ranges• requires changes to compiler• does not alter sources• does not rely on behaviour of volatile declarations
GC for C++
• object-oriented languages often use more heap-allocated data
• generate more complex data structures• GC uncouples memory management from class
interfaces instead of dispersing it through code
Conservative GC for C++
• requires no changes to language• restriction on coding style holds:
no hidden pointers (converted to int)– existing code may violate the restriction
– aggressive optimisers may as well
– safety must be enforced in code-generator
• some support for finalization (GC_register_finalizer) - assuming few objects need finalization
Mostly Copying for C++• storing all pointers at beginning of objects interferes with
inheritance (fast field lookup)• here: user supplies callback methods to identify pointers
class Tree {public:
Tree* left;Tree* right;int data;Tree (int x);
GCCLASS(Tree);...
};
GCPOINTERS(Tree) {gcpointer(left);gcpointer(right);
}GCPOINTERS macro generates callback method Tree::GCPointers
• currently no support for finalisation
Benefits of pointer locating methods
• programmer may solve unsure reference problem:
union {int n;thing *ptr;
} x;
• enables semantically accurate marking:e.g. stacks, queues– automatic GC retains uncleared references to removed elements
– programmer can omit them
even better than type-accurate GC
Using Object Descriptors• Detlefs, 1991: extension to Mostly Copying • insert descriptor into object headers• Bitmap format:
– 1 word with 32 bits indicating pointer/non-pointer words – use if only first 32 words of user data contain pointers,
can’t handle unsure references
• Indirect format:– pointer to byte array encoding sure/unsure references and non-
pointer values– array can be compressed using repeat counts
• Fast indirect format:– array of ints; 1st number indicates repetitions of rest– subsequent numbers = number of words to skip to reach next
pointer, negative number indicates unsure reference
Conclusion• GC effective for traditional imperative languages• realistic alternative to explicit mem management for most
applications• not yet suitable for real-time / safety-critical applications• no big onstraints to coding style, except hidden pointer
problem• gc’ing allocators competitive even with code not written
for GC• GC should have hooks for client/programmer to
communicate their knowledge:– explicit deallocation calls– atomic objects– hints of appropriate times to collect