Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
-
Upload
red-hat-developers -
Category
Software
-
view
890 -
download
3
Transcript of Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah Garbage Collector: Java without the GC Hiccups
Christine H. Flood
A Cure for hiccups
Java Hiccup cure
Stop The Word Evacuation
Java ThreadsGC Threads
Java Threads
Response Time Predictability Disappears
Init Mark
FinalMark
Concurrent Mark Concurrent Evacuation
Shenandoah
SpecJBB2015
Algorithm Max- JOPS
Critical-JOPS
TotalPauseTime
AveragePauseTime
MaxPauseTime
Shenandoah 71652 43472 106.66s 60.06ms 558.12ms
G1 80467 6199 545.70s 597.70ms 1709.61ms
Intel Platform: Brickland-EX Cpu:Broadwell-EX, QDF:QKT3B0 Stepping (QS), 2.2Ghz, 24 Core, 60MB shared cacheCOD ENABLED Intel(R) Xeon(R) CPU E7-8890 v4 @2.20GHz 524288 MB memory, 1598 GB disk space
Can we try it ourselves?
● Released in Fedora 24
● May also be downloaded and built from
– http://openjdk.java.net/projects/shenandoah/
How Does It Work?
Brief Intro to Compacting GCs
Heap after several
binary tree modifications
Heap after Compaction
And reclamation ofUnreachable objects
Two phases:
1) Trace 2) Compact
Concurrent Tracing
● Solved Problem
● Snapshot At The Beginning (SATB)
– Used by several OpenJDK GC algorithms● CMS● G1● Shenandoah
Concurrent Compaction
● Only in Shenandoah
● Move the Live Objects While the Java Threads are Running.
80% live
empty
100% live
empty
empty
empty
50% live
100% live
20% live
10% live
Shenandoah Heap: Region BasedChooseRegionsWith The
MostGarbage
Why is Concurrent Compaction Complicated?
● Java Thread 1
– Foo.x = 1
● Java Thread 2
– Foo.y = 2
● GC Thread
- Copies Foo to Foo'
● Java Thread 3
– Foo.z = 3
What we want to happen when the GC Thread copies Foo
T1
T2
T3
Foo
● Before ● After
T1
T2
T3
Foo'
But Finding and Updating all the references to Foo takes time.
What Shenandoah does
● Before ● After
Almost as good, as long as all accesses go through the Forwarding pointer.
IndirectionPointer
Foo
IndirectionPointer
Foo
IndirectionPointer
Foo'
T1
T2
T3
T1
T2
T3
IndirectionPointer
Foo
IndirectionPointer
Foo'
Read Barriers
● Without Read Barrier
– Read(Foo + Field Offset)
● With ReadBarrier
– Read(Read(Foo-8) + Field Offset)
Read Barriers: Reading a Field● Without Shenandoah
0x00007fffe1102cd1: mov 0x10(%rsi),%rsi ;*getfield value
; - java.lang.String::equals@20 (line 982)
● With Shenandoah
0x00007fffe1102ccd: mov -0x8(%rsi),%rsi read the contents of the indirection pointer for the address contained in register rsi back into rsi.
0x00007fffe1102cd1: mov 0x10(%rsi),%rsi ;*getfield value
; - java.lang.String::equals@20 (line 982)
●
Smart compilerWill fill delay slots
But there is still a race condition● Java Thread
Read ResolveLocation(Foo – 0x8)
…
Writes to Foo
● GC Thread
...
Copies Foo to Foo'
● Solution: Copying write barriers.
● Java Threads aid in evacuation, by not writing to objects targeted for evacuation.
Write Barrier
0x00007fffe1110318: movabs $0x7fffec0b92c0,%rax 0x00007fffe1110322: mov (%rax,%rbx,1),%al 0x00007fffe1110325: test $0x1,%al ← evacuation in progress? 0x00007fffe1110328: je 0x00007fffe1110339 ← if not jump to putfield 0x00007fffe111032e: xchg %rdi,%rax 0x00007fffe1110331: callq 0x00007fffe10ffd20 ; {runtime_call} ← else make a call out to the runtime to copy the object to an evacuation region. 0x00007fffe1110336: xchg %rax,%rdi 0x00007fffe1110339: mov %esi,0x10(%rdi) ;*putfield count ; - java.util.Hashtable::addEntry@83 (line 436)
Aren't Those Barriers Expensive?
So, what do these barriers cost?● Not as much as you might think….
– Barrier Optimizations
● New Objects
● Immutable Fields
● Array Size
● Class Pointers
● Read after Read
● Read after Write
● Hoisting
We ran several DaCapo Benchmarks Without Any GC Activity
Benchmark Shenandoah G1 Percentage Overhead
Avrora 2096ms 2052ms 2.1%FOP 1103ms 1044ms 5.6%LUIndex 861ms 832ms 3.5%
Why not generational?
25
Why not Generational?
● Generational hypothesis is the observation that, in most cases, young objects are much more likely to die than old objects.– Memory management Glossary
Why Not Generational?
● LRU Benchmark
– Models a URL cache mapping URL to web page content.
– Generational GC pays a steep penalty for copying data.
Collector Total Time Total Pause Time
Average Pause Time
Max Pause Time
Shenandoah 15167ms 3.81s 23.19ms 44.85ms
G1 178244ms 11.89s 116.60ms 230.573ms
How Does Shenandoah Compare With Other OpenJDK Collectors?
Currently Available OpenJDK GC's
● Serial GC
– Small Footprint
– Minimal overhead
● Parallel GC
– High Throughput
● G1
– Managed Pause Times
– Compaction
● ParNew/CMS
– Minimal Pause Times
What's Next?
Shorter Pause Times
● We are moving more of our work into concurrent phases to meet the original 10ms goal.
Shenandoah 2.0
● Observations
– Marking the entire heap takes a long time and touches rarely used parts of memory.
– Garbage is only created by stack changes or writes to the heap.
X
Focus GC wherever writes are happening.
● Generational Application
– Writes happening in recently allocated regions
● LRU
– Writes happening in oldest regions
Shenandoah 2.0 Theory
● Keep track of writes to regions.
● Focus on regions which have changed
● Collect Region Sets together.
Table of Inter-Region References
Regions0 1 2 3 4 5 6 7
0
1
2
3 X
4
5 X
6
7 X
Regions 3 & 5 collected together
Table of Inter-Region References
Regions0 1 2 3 4 5 6 7
0
1
2
3 X
4
5 X
6
7 XScan Region 7 when collecting region 6.
80% live
empty
100% live
empty
empty
empty
50
0
200
100
Shenandoah Heap: Region BasedChooseRegionsWith The
MostUpdates
Partial Collections
● Scan Thread Stacks and Other Roots● Scan Entire Region Group and Referencing Regions.
Region Groups Help NUMA
● Regions that reference each other will be collected together.
NUMA Aware GC Threads
● NUMA node 1 ● NUMA node 2
Java Threads
Concurrent GCThreads
N1 Region N2 Region
N1 Region N2 Region
Shared Region
Shared Region
Shared Region
Empty Region
Empty Region
Empty Region
Shared Region
Shared Region
Java Threads
Concurrent GCThreads
Takeaway MessageShenandoah rocks for some applications!
Who would benefit from Shenandoah?
Stock trading applicationsE-commerce web sites
Any applications with QOS guaranteesInteractive Applications
Who would benefit from Shenandoah?
Applications with large heaps that require fast response times
More Information
http://openjdk.java.net/projects/shenandoah/[email protected]@redhat.com