0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and...

45
1 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center QuickTime™ and a decompressor are needed to see this picture.

Transcript of 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and...

Page 1: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

1

Parallel and ConcurrentReal-time Garbage Collection

Part III:Tracing, Snapshot, and Defragmentation

David F. Bacon

T.J. Watson Research Center

QuickTime™ and a decompressor

are needed to see this picture.

Page 2: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

2

Part 2: Trace (aka Mark)• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

Page 3: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

3

Let’s Assume a Stack Snapshot

Stack

rr

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

Page 4: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

4

Yuasa Algorithm Review:2(a): Copy Over-written Pointers

Stack

pp

rr

TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

Page 5: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

5

Yuasa Algorithm Review: 2(b): Trace

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

WWa

b

ZZa

b

YYa

b

XXa

b

* Color is per-object mark bit

Page 6: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

6

Yuasa Algorithm Review: 2(c): Allocate “Black”

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

ss VVa

b

WWa

b

ZZa

b

YYa

b

XXa

b

Page 7: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

7

Non-monotonicity in Tracing

1 GB

Page 8: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

8

Which Design Pattern is This?

pool of work

Shared Monotonic Work Pool

Per-Thread State Update

Page 9: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

9

Trace is Non-Monotonic…and requires thread-local data

GCWorkerThreads

GCMasterThread

ApplicationThreads

pool ofnon-monotonic

work

Page 10: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

10

Basic Solution

• Check if there are more work packets

– If some found, trace is not done yet

– If none found, “probably done”• Pause all threads• Re-scan for non-empty buffers• Resume all threads• If none, done• Otherwise, try again later

Page 11: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

11

Ragged Barriers:How to Stop without Stopping

Epoch 1

A B C- Local Epoch- Global Min- Global Max

Epoch 4

Epoch 3

Epoch 2

Page 12: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

12

“Trace” Phase

Page 13: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

13

The Thread’s Full Monty

1616 6464 256256

Thread 1 inVM

color

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

dblb

dblb

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

color

PhaseInfo

phase workers

PhaseInfo

phase workers

trace 3

color

color

sweep

sweep

Epoch

agreed newest

Epoch

agreed newest4335

Page 14: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

14

Work Packet Data Structures

wbuffillwbuffill

wbuftracewbuftracetotracetotrace freefree

wbuf.epoch < Epoch.agreed

WBufEpochWBufEpoch

39

WBufCountWBufCount

3

39

37

37

Page 15: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

15

trace() { thread->epoch = Epoch.newest; bool canTerminate = true;

if (WBufCount > 0) getWriteBuffers(); canTerminate = false;

while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0);

if (canTerminate) traceTerminate();}

Page 16: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

16

Getting Write Buffer RootsgetWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs

LOCK(wbuf-fill); LOCK(wbuf-trace);

for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace;

UNLOCK(wbuf-trace); UNLOCK(wbuf-fill);}

Page 17: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

17

Write Barrier

writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new);}

Page 18: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

18

Write Barrier Slow PathoutOfLineBarrier(Object obj) { if (obj == null || obj.marked) return;

obj.marked = true;

bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data < thread->wbuf->end;

if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest

*thread->wbuf->data++ = obj;}

Page 19: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

19

“Trace Terminate” Phase

Page 20: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

20

Trace Termination

wbuffillwbuffill

wbuftracewbuftracetotracetotrace freefree

WBufEpochWBufEpoch

39

WBufCountWBufCount

0

Page 21: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

21

Asynchronous Agreement

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

PhaseInfo

phase workers

PhaseInfo

phase workerstrace 1

Epoch

agreed newest

Epoch

agreed newest4237

PhaseInfo

phase workers

PhaseInfo

phase workersterminate 1

desiredEpoch = Epooch.newest;…WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING

Page 22: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

22

Ragged Barrierbool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true;

LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch);

Epoch.agreed = latest; UNLOCK(threadlist);

if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false;} * Non-locking implementation?

Page 23: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

23

Part 1: Scan Roots• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

Page 24: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

24

Fuzzy Snapshot

• Finally, we assume no magic

• Initiate Collection

Page 25: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

25

“Initiate Collection” Phase

Page 26: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

26

Initiate: Color Black, Double Barrier

1616 6464 256256

Thread 1 inVM

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

PhaseInfo

phase workers

PhaseInfo

phase workers

initiate 3

color

color

sweep

sweep

Epoch

agreed newest

Epoch

agreed newest4237

color

color

dblb dblb

dblb dblb

Page 27: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

27

What is a Double Barrier?Store both Old and New Pointers

Stack

pp

rr

TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

Page 28: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

28

Why Double Barrier?

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

“Snapshot” = { V, W, X, Z }

VVa

b T3Stack

kk

mm

jj

T2: m.b = n (writes X.b = W)T3: j.b = k (writes X.b = V)T1: q = p.b (reads X.b: V, W, or Z??)

qq

Page 29: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

29

Yuasa (Single) Barrier with 2 Writers

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

VVa

b T3Stack

kk

mm

jj

T2: m.b = n (X.b = W)

qq

T2 T3

T3: j.b = k (X.b = V)

Page 30: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

30

Yuasa Barrier Lost Update

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

VVa

b T3Stack

kk

mm

jj

T2: m.b = n (X.b = W)

qq

T2 T3

T3: j.b = k (X.b = V)

Page 31: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

31

Hosed!

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

VVa

b T3Stack

kk

mm

jj

qq

T2 T3T1

T2: m.b = n (X.b = W)T3: j.b = k (X.b = V)T1: q = p.b (q <- W)T2: n = null

T1: Scan Stack

T2: Scan Stack

Page 32: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

32

“Thread Stack Scan” Phase

Page 33: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

33

Scan Stacks (double barrier off)

1616 6464 256256

Thread 1 inVM

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

PhaseInfo

phase workers

PhaseInfo

phase workers

initiate 3

color

color

sweep

sweep

Epoch

agreed newest

Epoch

agreed newest4237

color

color

dblb dblb

dblb dblb

Scan Stack 1

Scan Stack 2

Page 34: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

34

All Done!

QuickTime™ and a decompressor

are needed to see this picture.

Page 35: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

35

Boosting: Ensuring Progress

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

Boost to master priority

Revert(?)

Page 36: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

36

Part 4: Defragmentation• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

• Defragmentation

Page 37: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

37

freefree?

free16

free16

free64

free64

free256free256

alloc’dalloc’d sweepsweep

2562561616 6464

Page 38: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

38

Two-way Communication

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

objects have moved

pointers have changed

Page 39: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

39

Defragmentation

T T a

b

U U a

b

Z Z a

b

X X a

b

T’ T’ a

b

C C a

b

B B a

b

D D a

b

E E a

b

Page 40: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

40

Staccato Algorithm

FREE

T T a

b

NORMAL

T T a

b

COPYING

T T a

b

MOVED

T T a

b

T’ T’ a

b

T’ T’ a

b

allocatenop

defragmentcas

access (abort)cas

commitcas

accessload

accessload

reapstore

createstore

Page 41: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

41

Scheduling

Page 42: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

42

Guaranteeing Real Time

• Guaranteeing usability without realtime:– Must know maximum live memory

• If fragmentation & metadata overhead bounded

• We also require:– Maximum allocation rate (MB/s)

• How does the user figure this out???– Very simple programming style– Empirical measurement– (Research) Static analysis

Page 43: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

43

Conclusions

• Systems are made of concurrent components

• Basic building blocks:– Locks– Try-locks– Compare-and-Swap– Non-locking stacks, lists, …– Monotonic phases– Logical clocks and asynchronous agreement

• Encapsulate so others won’t suffer!

Page 44: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

44

http://www.research.ibm.com/metronome

https://sourceforge.net/projects/tuningforkvp

Page 45: 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

45

Legend

GC Phases

APP

TRACE

SNAPSHOT

FLIP

SWEEP

APP

APP

APP

APP

APP

APP

APP

APP APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

TERMINATE

APP

APP

APP

APP

APP

APP

APP

APP

…………

…………