Basic Garbage Collection Techniques

39
[email protected] Basic Garbage Collection Techniques 1

description

 

Transcript of Basic Garbage Collection Techniques

Page 1: Basic  Garbage  Collection  Techniques

[email protected]

Basic Garbage Collection Techniques

1

Page 2: Basic  Garbage  Collection  Techniques

Contents1. What is Garbage Collection2. Reference Counting3. Mark-Sweep Collection4. Mark-Compact Collection5. Copying Garbage Collection6. Non-Copy Implicit Collection7. Choosing Among Basic Tracing Techniques8. Problems with Simple Tracing Collectors9. Conservatism in GC

2

Tracing

Page 3: Basic  Garbage  Collection  Techniques

1. Garbage Collection

3

Garbage Collection (GC) Is the automatic storage reclamation of computer

storageThe GC function is to find data objects that are no

longer in use and make their space available by the running program

So Why Garbage Collection?A software routine operating on a data structure

should not have to depend what other routines may be operating on the same structure

If the process does not free used memory, the unused space is accumulated until the process terminates or swap space is exhausted

Page 4: Basic  Garbage  Collection  Techniques

Explicit Storage Management Hazards

4

Programming errors may lead to errors in the storage management:May reclaim space earlier than necessaryMay not reclaim space at all, causing memory

leaksThese errors are particularly dangerous since

they tend to show up after deliveryMany programmers allocate several objects

statically, to avoid allocation on the heap and reclaiming them at a later stage

Page 5: Basic  Garbage  Collection  Techniques

Explicit Storage Management Hazards

5

In many large systems, garbage collection is partially implemented in the system’s objects

Garbage Collection is not supported by the programming language

Leads to buggy, partial Garbage Collectors which are not useable by other applications

The purpose of GC is to address these issues

Page 6: Basic  Garbage  Collection  Techniques

GC Complexity

6

Garbage Collection is sometimes considered cheaper than explicit deallocation

A good Garbage Collector slows a program down by a factor of 10 percent

Although it seems a lot, it is only a small price to pay for:ConvenienceDevelopment timeReliability

Page 7: Basic  Garbage  Collection  Techniques

Garbage Collection – The Two-Phase Abstraction

7

The basic functioning of a garbage collector consists, abstractly speaking, of two parts:Distinguishing the live objects from the garbage

in some way (garbage detection)Reclaiming the garbage objects’ storage, so that

the running program can use it (garbage reclamation)

In practice these two phases may be interleaved

Page 8: Basic  Garbage  Collection  Techniques

2. Reference Counting

8

Each object has an associated count of the references (pointers) to it

Each time a reference to the object is created, its reference count is increased by one and vice-versa

When the reference count reaches 0, the object’s space may be reclaimed

Page 9: Basic  Garbage  Collection  Techniques

Reference Counting

9

When an object is reclaimed, its pointer fields are examined and every object it points to has its reference count decremented

Reclaiming one object may therefore lead to a series of object reclamations

There are two major problems with reference counting

Page 10: Basic  Garbage  Collection  Techniques

The Cycles Problem

10

Reference Counting fails to reclaim circular structures

Originates from the definition of garbage

Circular structures are not rare in modern programs: Trees Cyclic data structures

The solution is up to the programmer

Page 11: Basic  Garbage  Collection  Techniques

The Efficiency Problem

11

When a pointer is created or destroyed, its reference count must be adjusted

Short-lived stack variables can incur a great deal of overhead in a simple reference counting scheme

In these cases, reference counts are incremented and then decremented back very soon

Page 12: Basic  Garbage  Collection  Techniques

Deferred Reference Counting

12

Much of this cost can be optimized away by special treatment of local variables

Reference from local variables are not included in this bookkeeping

However, we cannot ignore pointers from the stack completely

Therefore the stack is scanned before object reclamation and only if a pointer’s reference count is still 0, it is reclaimed

Page 13: Basic  Garbage  Collection  Techniques

Reference Counting - Recap

13

While reference counting is out of vogue for high-performance applications

It is quite common in applications where acyclic data structures are used

Most file systems use reference counting to manage files and/or disk blocks

Very simple scheme

Page 14: Basic  Garbage  Collection  Techniques

3. Mark-Sweep Collection - [McCarthy 1960]

Distinguishing live object from garbage(Mark phase)Done by tracing – starting at the root set and

usually traversing the graph of pointers relationships

The reached objects are markedReclaiming the garbage(Sweep phase)

Once all live objects are marked, memory is exhaustively examined to find all of the unmarked (garbage) objects and reclaim their space

14

Page 15: Basic  Garbage  Collection  Techniques

Example

rootroot root

Free_list

Mark phase Sweep phaseBefore

Heap

15

Page 16: Basic  Garbage  Collection  Techniques

Example

Free list

r1

Free list r1

p

p

Free list r1

16 Live object Marked objectGarbage

Page 17: Basic  Garbage  Collection  Techniques

Basic Algorithm

New(A)= if free_list is empty mark_sweep() if free_list is empty return (“out-of-memory”) pointer = allocate(A) return (pointer)

mark_sweep()= for Ptr in Roots mark(Ptr) sweep()

mark(Obj)= if mark_bit(Obj) == unmarked mark_bit(Obj)=marked for C in Children(Obj) mark(C)

Sweep()= p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)

17

Page 18: Basic  Garbage  Collection  Techniques

Properties of Mark & SweepFragmentation

Difficult to allocate large objectsSeveral small objects maybe add up to large

contiguous spaceCost

Proportional to the heap size, including both live and garbage objects

Locality of reference: Does not move objectsInterleave of different ages causes many page

swaps. (Because objects often created in clusters that are active at the same time)

18

Page 19: Basic  Garbage  Collection  Techniques

Standard Engineering TechniquesMark:

Explicit mark-stack (avoid recursion)Pointer reversalUsing bitmaps

Sweep:Using bitmapsLazy sweep

19

Page 20: Basic  Garbage  Collection  Techniques

4. Mark-Compact CollectionCompact after mark

Mark-Compact collectors remedy the fragmentation and allocation problems of mark-sweep collectors

The collector traverses the pointers graph and copy every live object after the previous one

This results in one big contiguous space which contains live objects and another which is considered free space

20

Page 21: Basic  Garbage  Collection  Techniques

Mark-Compact Collection

21

Garbage objects are “squeezed” to the end of the memory

The process requires several passes over the memory:One to computes the new location for objectsSubsequent passes update pointers and actually move

the objects The algorithm can be significantly slower than

Mark-Sweep CollectionTwo Algorithms:

Two-finger Alg–for objects of equal size. Lisp 2 Alg

Page 22: Basic  Garbage  Collection  Techniques

Object Ordering

22

1 2 3 4

Linearizing

4 1 2 3

Arbitrary

Worst

1 3 4 2 Good

1 2 3 4

Sliding

Best

Arbitrary –no guaranteed order. Linearizing–objects pointing to

one another are moved into adjacent positions.

Sliding–objects are slid to one end of the heap, maintaining the original order of allocation.

Page 23: Basic  Garbage  Collection  Techniques

The Two Finger Algorithm [Edwards 1974]

Simplest algorithm: Designed for objects of equal sizeOrder of objects in output is arbitrary. Two passes: compact & update pointers

23

Page 24: Basic  Garbage  Collection  Techniques

Pass I: CompactUse two pointers:

free: goes over heap from bottom searching for free/empty objects.

Live:goes over heap from top downwards searching for live objects.

When freefinds a free spot and livefinds an object, the object is moved to the free spot.

When an object is moved, a pointer to its new location is left at its old location

24

Page 25: Basic  Garbage  Collection  Techniques

Two finger, pass I - Example

25

Page 26: Basic  Garbage  Collection  Techniques

Pass II: Fix PointersGo over live objects in the heap

Go over their pointersIf pointer points to the free area: fix it according

to the forwarding pointers in target objects.

26

Compacted area free area

Page 27: Basic  Garbage  Collection  Techniques

Simple!Relatively fast: One pass + pass on live objects

(and their previous location). No extra space required.

Objects restricted to equal size. Order of objects in output is arbitrary. This significantly deteriorates program

efficiency! Thus –not used today.

Two finger –Properties

27

Page 28: Basic  Garbage  Collection  Techniques

The Lisp2 AlgorithmImprovements: handle variable sized objects, keep

order of objects. Requires one additional pointer field for each object. The picture:

Note: cannot simply keep forwarding pointer in original address. It may be overwritten by a moved object

28

Free

Page 29: Basic  Garbage  Collection  Techniques

The Lisp2 AlgorithmPass 1: Address computation. Keep new

address in an additional object field. Pass 2: pointer modification. Pass 3: two pointers (free & live) run from the

bottom. Live objects are moved to free space keeping their original order.

29

Free

Page 30: Basic  Garbage  Collection  Techniques

Lisp 2 –Properties

Simple enough.No constraints on object sizes. Order of objects preserved.

30

Slower: 3 passes. Extra space required –a pointer per object.

Page 31: Basic  Garbage  Collection  Techniques

5. Copying Garbage Collection

31

Like Mark-Compact, the algorithm moves all of the live objects into one area, and the rest of the heap becomes available

There are several schemes of copying garbage collection, one of which is the “Stop and-Copy” garbage collection

In this scheme the heap is divided into two contiguous semispaces. During normal program execution, only one of them is in use

Page 32: Basic  Garbage  Collection  Techniques

Stop-and-Copy Collector

32

Memory is allocated linearly upward through the “current” semispace

When the running program demands an allocation that will not fit in the unused area

The program is stopped and the copying garbage collector is called to reclaim space

Page 33: Basic  Garbage  Collection  Techniques

Cheney breath-first copying

33

Page 34: Basic  Garbage  Collection  Techniques

The algorithm

34

Init()= Tospace=Heap_bottom space_size=Heap_size/2 top_of_space=Tospace+space_size Fromspace=top_of_space+1 free=Tospace

New(n)= If free+n>top_of_space Collect() if free+n>top_of_space abort“Memoryexhausted” new-object=free free=free+n return(new-object)collect()=

from-space,to-space= to-space,from-space//swap scan=free=Tospace top_of_space=Tospace+space_size forRinRoots R=copy(R) whilescan<free forPinchildren(scan) *P=copy(P) scan=scan+size(scan)

copy(P)=ifforwarded(P)returnforwarding_address(P)elseaddr=freemem-copy(P,free)free=free+size(P)forwarding_address(P)=addrreturn(addr)

1 2

4

3

Page 35: Basic  Garbage  Collection  Techniques

Efficiency of Copying Collection

35

Can be made arbitrarily efficient if sufficient memory is available

The work done in each collection is proportional to the amount of live data

To decrease the frequency of garbage collection, imply allocate larger semispaces

Impractical if there is not enough RAM and paging occurs

Page 36: Basic  Garbage  Collection  Techniques

6. Non-Copying Implicit Collection

36

Need more two pointer fields and a color field to each object

These fields serve to link each hunk of storage into a doubly-linked list

The color field indicates which set an object belongs to

Live objects being traced do not require space for both fromspace and tospace versions

In most cases, this appears to make the space cost smaller than that of a copying collector but in some cases fragmentation costs(due to the inability to compact data) may outweigh those savings

Page 37: Basic  Garbage  Collection  Techniques

7. Choosing Among Basic Tracing Techniques

37

A common criterion for high-performance garbage collection is that the cost of collecting objects be comparable, on average, to the cost of allocating objects

While current copying collectors appear to be more efficient than mark-sweep collectors, the difference is not high for state-of-the art implementations

When the overall memory capacity is small, reference counting collectors are more attractive

Simple Garbage Collection Techniques:Too much space, too much time

Page 38: Basic  Garbage  Collection  Techniques

8. Problems with Simple Tracing Collectors

38

Page 39: Basic  Garbage  Collection  Techniques

9. Conservatism in GC

39

The first conservative assumption most collectors make is that any variable in the stack, globals or registers is live. There may be interactions between the compiler's optimizations and the garbage collector's view of the reachability graph.

Tracing collectors introduce a major temporal form of conservatism, simply by allowing garbage to go un collected between collection cycles

Reference counting collectors are conservative topologically failing to distinguish between different paths that share an edge in the graph of pointer relationships