Basic Garbage Collection Techniques
-
Upload
an-khuong -
Category
Technology
-
view
9.260 -
download
2
description
Transcript of Basic Garbage Collection Techniques
Contents1. What is Garbage Collection2. Reference Counting3. Mark-Sweep Collection4. Mark-Compact Collection5. Copying Garbage Collection6. Non-Copy Implicit Collection7. Choosing Among Basic Tracing Techniques8. Problems with Simple Tracing Collectors9. Conservatism in GC
2
Tracing
1. Garbage Collection
3
Garbage Collection (GC) Is the automatic storage reclamation of computer
storageThe GC function is to find data objects that are no
longer in use and make their space available by the running program
So Why Garbage Collection?A software routine operating on a data structure
should not have to depend what other routines may be operating on the same structure
If the process does not free used memory, the unused space is accumulated until the process terminates or swap space is exhausted
Explicit Storage Management Hazards
4
Programming errors may lead to errors in the storage management:May reclaim space earlier than necessaryMay not reclaim space at all, causing memory
leaksThese errors are particularly dangerous since
they tend to show up after deliveryMany programmers allocate several objects
statically, to avoid allocation on the heap and reclaiming them at a later stage
Explicit Storage Management Hazards
5
In many large systems, garbage collection is partially implemented in the system’s objects
Garbage Collection is not supported by the programming language
Leads to buggy, partial Garbage Collectors which are not useable by other applications
The purpose of GC is to address these issues
GC Complexity
6
Garbage Collection is sometimes considered cheaper than explicit deallocation
A good Garbage Collector slows a program down by a factor of 10 percent
Although it seems a lot, it is only a small price to pay for:ConvenienceDevelopment timeReliability
Garbage Collection – The Two-Phase Abstraction
7
The basic functioning of a garbage collector consists, abstractly speaking, of two parts:Distinguishing the live objects from the garbage
in some way (garbage detection)Reclaiming the garbage objects’ storage, so that
the running program can use it (garbage reclamation)
In practice these two phases may be interleaved
2. Reference Counting
8
Each object has an associated count of the references (pointers) to it
Each time a reference to the object is created, its reference count is increased by one and vice-versa
When the reference count reaches 0, the object’s space may be reclaimed
Reference Counting
9
When an object is reclaimed, its pointer fields are examined and every object it points to has its reference count decremented
Reclaiming one object may therefore lead to a series of object reclamations
There are two major problems with reference counting
The Cycles Problem
10
Reference Counting fails to reclaim circular structures
Originates from the definition of garbage
Circular structures are not rare in modern programs: Trees Cyclic data structures
The solution is up to the programmer
The Efficiency Problem
11
When a pointer is created or destroyed, its reference count must be adjusted
Short-lived stack variables can incur a great deal of overhead in a simple reference counting scheme
In these cases, reference counts are incremented and then decremented back very soon
Deferred Reference Counting
12
Much of this cost can be optimized away by special treatment of local variables
Reference from local variables are not included in this bookkeeping
However, we cannot ignore pointers from the stack completely
Therefore the stack is scanned before object reclamation and only if a pointer’s reference count is still 0, it is reclaimed
Reference Counting - Recap
13
While reference counting is out of vogue for high-performance applications
It is quite common in applications where acyclic data structures are used
Most file systems use reference counting to manage files and/or disk blocks
Very simple scheme
3. Mark-Sweep Collection - [McCarthy 1960]
Distinguishing live object from garbage(Mark phase)Done by tracing – starting at the root set and
usually traversing the graph of pointers relationships
The reached objects are markedReclaiming the garbage(Sweep phase)
Once all live objects are marked, memory is exhaustively examined to find all of the unmarked (garbage) objects and reclaim their space
14
Example
rootroot root
Free_list
Mark phase Sweep phaseBefore
Heap
15
Example
Free list
r1
Free list r1
p
p
Free list r1
16 Live object Marked objectGarbage
Basic Algorithm
New(A)= if free_list is empty mark_sweep() if free_list is empty return (“out-of-memory”) pointer = allocate(A) return (pointer)
mark_sweep()= for Ptr in Roots mark(Ptr) sweep()
mark(Obj)= if mark_bit(Obj) == unmarked mark_bit(Obj)=marked for C in Children(Obj) mark(C)
Sweep()= p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)
17
Properties of Mark & SweepFragmentation
Difficult to allocate large objectsSeveral small objects maybe add up to large
contiguous spaceCost
Proportional to the heap size, including both live and garbage objects
Locality of reference: Does not move objectsInterleave of different ages causes many page
swaps. (Because objects often created in clusters that are active at the same time)
18
Standard Engineering TechniquesMark:
Explicit mark-stack (avoid recursion)Pointer reversalUsing bitmaps
Sweep:Using bitmapsLazy sweep
19
4. Mark-Compact CollectionCompact after mark
Mark-Compact collectors remedy the fragmentation and allocation problems of mark-sweep collectors
The collector traverses the pointers graph and copy every live object after the previous one
This results in one big contiguous space which contains live objects and another which is considered free space
20
Mark-Compact Collection
21
Garbage objects are “squeezed” to the end of the memory
The process requires several passes over the memory:One to computes the new location for objectsSubsequent passes update pointers and actually move
the objects The algorithm can be significantly slower than
Mark-Sweep CollectionTwo Algorithms:
Two-finger Alg–for objects of equal size. Lisp 2 Alg
Object Ordering
22
1 2 3 4
Linearizing
4 1 2 3
Arbitrary
Worst
1 3 4 2 Good
1 2 3 4
Sliding
Best
Arbitrary –no guaranteed order. Linearizing–objects pointing to
one another are moved into adjacent positions.
Sliding–objects are slid to one end of the heap, maintaining the original order of allocation.
The Two Finger Algorithm [Edwards 1974]
Simplest algorithm: Designed for objects of equal sizeOrder of objects in output is arbitrary. Two passes: compact & update pointers
23
Pass I: CompactUse two pointers:
free: goes over heap from bottom searching for free/empty objects.
Live:goes over heap from top downwards searching for live objects.
When freefinds a free spot and livefinds an object, the object is moved to the free spot.
When an object is moved, a pointer to its new location is left at its old location
24
Two finger, pass I - Example
25
Pass II: Fix PointersGo over live objects in the heap
Go over their pointersIf pointer points to the free area: fix it according
to the forwarding pointers in target objects.
26
Compacted area free area
Simple!Relatively fast: One pass + pass on live objects
(and their previous location). No extra space required.
Objects restricted to equal size. Order of objects in output is arbitrary. This significantly deteriorates program
efficiency! Thus –not used today.
Two finger –Properties
27
The Lisp2 AlgorithmImprovements: handle variable sized objects, keep
order of objects. Requires one additional pointer field for each object. The picture:
Note: cannot simply keep forwarding pointer in original address. It may be overwritten by a moved object
28
Free
The Lisp2 AlgorithmPass 1: Address computation. Keep new
address in an additional object field. Pass 2: pointer modification. Pass 3: two pointers (free & live) run from the
bottom. Live objects are moved to free space keeping their original order.
29
Free
Lisp 2 –Properties
Simple enough.No constraints on object sizes. Order of objects preserved.
30
Slower: 3 passes. Extra space required –a pointer per object.
5. Copying Garbage Collection
31
Like Mark-Compact, the algorithm moves all of the live objects into one area, and the rest of the heap becomes available
There are several schemes of copying garbage collection, one of which is the “Stop and-Copy” garbage collection
In this scheme the heap is divided into two contiguous semispaces. During normal program execution, only one of them is in use
Stop-and-Copy Collector
32
Memory is allocated linearly upward through the “current” semispace
When the running program demands an allocation that will not fit in the unused area
The program is stopped and the copying garbage collector is called to reclaim space
Cheney breath-first copying
33
The algorithm
34
Init()= Tospace=Heap_bottom space_size=Heap_size/2 top_of_space=Tospace+space_size Fromspace=top_of_space+1 free=Tospace
New(n)= If free+n>top_of_space Collect() if free+n>top_of_space abort“Memoryexhausted” new-object=free free=free+n return(new-object)collect()=
from-space,to-space= to-space,from-space//swap scan=free=Tospace top_of_space=Tospace+space_size forRinRoots R=copy(R) whilescan<free forPinchildren(scan) *P=copy(P) scan=scan+size(scan)
copy(P)=ifforwarded(P)returnforwarding_address(P)elseaddr=freemem-copy(P,free)free=free+size(P)forwarding_address(P)=addrreturn(addr)
1 2
4
3
Efficiency of Copying Collection
35
Can be made arbitrarily efficient if sufficient memory is available
The work done in each collection is proportional to the amount of live data
To decrease the frequency of garbage collection, imply allocate larger semispaces
Impractical if there is not enough RAM and paging occurs
6. Non-Copying Implicit Collection
36
Need more two pointer fields and a color field to each object
These fields serve to link each hunk of storage into a doubly-linked list
The color field indicates which set an object belongs to
Live objects being traced do not require space for both fromspace and tospace versions
In most cases, this appears to make the space cost smaller than that of a copying collector but in some cases fragmentation costs(due to the inability to compact data) may outweigh those savings
7. Choosing Among Basic Tracing Techniques
37
A common criterion for high-performance garbage collection is that the cost of collecting objects be comparable, on average, to the cost of allocating objects
While current copying collectors appear to be more efficient than mark-sweep collectors, the difference is not high for state-of-the art implementations
When the overall memory capacity is small, reference counting collectors are more attractive
Simple Garbage Collection Techniques:Too much space, too much time
8. Problems with Simple Tracing Collectors
38
9. Conservatism in GC
39
The first conservative assumption most collectors make is that any variable in the stack, globals or registers is live. There may be interactions between the compiler's optimizations and the garbage collector's view of the reachability graph.
Tracing collectors introduce a major temporal form of conservatism, simply by allowing garbage to go un collected between collection cycles
Reference counting collectors are conservative topologically failing to distinguish between different paths that share an edge in the graph of pointer relationships