Bounding Space Usage of Conservative Garbage Collectors Ohad Shacham December 2002 Based on work by...

45
Bounding Space Usage of Conservative Garbage Collectors Ohad Shacham December 2002 Based on work by Hans-J. Boehm

Transcript of Bounding Space Usage of Conservative Garbage Collectors Ohad Shacham December 2002 Based on work by...

Bounding Space Usage of Conservative Garbage Collectors

Ohad Shacham December 2002

Based on work by Hans-J. Boehm

Garbage Collector

Mechanism that allows automatic recycling of unreachable memory objects

Convenience – the programmer does not need to deallocate memory.

Safety – don’t reuse reachable memory objects

Conservative GC

Garbage collector that tolerates ambiguous pointers

Ambiguous pointer – location which may or may not be a pointer

CGC treated ambiguous pointer whose value is valid object address as they were pointers

Why do we use CGC?

C/C++ programs Compilers that generate C code Compiler/Collector interface can be much

simpler Facilitate language interoperability Etc…

CGC Problems

Can’t safely update a pointer to a moved object (unless we are sure that the pointer is unambiguous) Bad for programs with a large number of very

short lived objects Retaining unreferenced memory as a result

of misidentified pointers Integer that misidentified as a pointer to a large

Data Structure

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

Embarrassing Failure Scenario

head tail

- - - -

False Reference

Unbounded space retaining

How can it be fixed?

Assigning NULL to the next field in the list element being removed from the queue

Our Purpose

We know that conservatively collected programs can retain large amount of memory due to a particularly misidentified pointer

We want to bound space usage independently of the particular misidentified pointers that arise during a particular execution

Motivation

Provide intuitive explanations why CGC behaves reasonably well in practice

Provide better characteristic of what can provoke failure of CGC

Make CGC safe for environments that required hard space bound

How we’ll satisfy our motivation?

Prove mathematical bound on space consumption

Suggest a testing technique for identifying potential unbounded growth without executing the program and identifying misidentified pointers

Reachability

Reachable Object – An object that can be reached by a real pointer (Y)

Backward Reachable Objects – A set of Objects that are backward reachable from a real pointer {Y,Z,L,K}

X Y- - - -root

Y

L

K

ZX- - - -

rootD

Backward Forward Reachability

Backward Forward Reachable Object – An object that is reachable through an anchor X and X is backward reachable (Y)

X

Y

root

Strongly GC Robust

Data Structure is Strongly GC robust iff all objects that are backward forward reachable from a root through a single anchor were at some point reachable through a root

Who doesn’t satisfy Strongly GC Robust criterion?

Our FIFO queue example

head tailhead tail

But these objects weren’t reachable at the same time

head

Who does satisfy Strongly GC Robust criterion?

Purely functional programs

e.g. Stack

headheadheadheadhead

head head

head

Bounding the Space Consumption

Set of Objects S that is reachable from an unreachable object x, were at some point backward forward reachable through object x

X

root

So what do we know?

Set of Objects S that is reachable from an unreachable object x, was at some point backward forward reachable through object x

Objects that are backward forward reachable in Strongly GC robust Data Structures were at some point reachable

In programs that use only Strongly GC robust DSA Set of objects S that is reachable from an unreachableObject x was at some point reachable

Bounding the Space Consumption

If a program uses only Strongly GC robust DS and the number of misidentified pointers in bounded by N, then the extra space retaining by CGC is bounded by

N * maximal amount of live memory

What did we want to Satisfy?

Prove mathematical bound on space consumption

Suggest a testing technique for identifying potential unbounded growth without executing the program and identifying misidentified pointers

Observation

The number of reachable objects from an unreachable object x that are not reachable from a proper root grows only if some objects became unreachable from a proper roots.

Observation

X

Reachable Objects

Unreachable Objects

ProperRoots

Observation

X

Reachable Objects

Unreachable Objects

ProperRoots

False Reference

Theorem

If the number of objects reachable from an unreachable object x but not reachable from proper roots grows without bound,

then the length of the longest simple path from x through unreachable objects to a reachable object also grow without bounds

Proof

According to our observation a path to a reachable object can grow only if a reachable object became unreachable

We have two cases:1. The newly unreachable object is on a path to a

reachable object

2. The newly unreachable object is not on a path to a reachable object

Proof

1. We have two cases1. The object is on a simple path

2. The object is not on a simple path

False Reference x

yw

So What do we know?

Number of objects reachable from x but unreachable

From proper roots grows without bound

the length of the longest simple pathFrom x through unreachable objects

to a reachable object grows without bound

Therefore

Number of objects reachable from x but unreachable

From proper roots grows with bound

the length of the longest simple pathFrom x through unreachable objects

to a reachable object grows with bound

Weakly GC Robust

A Data Structure is weakly GC robust if the length of simple paths through unreachable objects ending at an object in the Data Structure remains bounded for any execution

Who does satisfy Weakly GC Robust?

Our FIFO queue with restriction that only bounded number of objects are ever removed

head tail

- - - -

Bounded

False Reference

Who doesn’t satisfy Weakly GC Robust ?

Our FIFO queue without the previous restriction

head tail

- - - -

Unbounded

False Reference

Strongly VS Weakly

StronglyPrecise bound

on the cost of misidentification pointer

Max (live memory)

WeaklyReason about

bonded Vs unboundedspace loss

Testing Algorithm

1. Building a backward reachability graph

2. Marking all the reachable objects

3. Performing DFS on the graph from 1 and counting the height of each reachable object. Reporting the maximum height

4. Attach to each reachable object its height

5. Discard unreachable objects and the auxiliary DS from 1

Stage 1

ac

g

i

b

f

e

nk

d

h

l

j

m

a

b

cd

e

f

g

j

h

i

l

n

k

m

Stage 2

ac

g

i

b

f

e

nk

d

h

l

j

m

a

b

cd

e

f

g

j

h

i

l

n

k

m

Stage 3

ac

g

i

b

f

e

nk

d

h

l

j

m

a

b

cd

e

f

g

j

h

i

l

n

k

m0

1

2

3

4

0

1

2

3

Stage 4

ac

g

i

b

f

e

nk

d

h

l

j

m

a

b

cd

e

f

g

j

h

i

l

n

k

m0

1

2

3

4

0

1

2

3

4

3

Stage 5

a

gf

nk

j

m

4

3

Results

In most of the tests the maximum counter stabilized in tens or hundreds

In our FIFO queue the counter passed the million

Conclusions

In Strongly GC robust DS the space retaining is bounded by the maximum of live memory for each misidentified pointer

Conjecture – All the DS, except for the straightforward implementation of a singly linked queue and infinite data structure that relay on lazy evaluation are weakly GC robust