Moving to g1 gc by Kirk Pepperdine.

Post on 15-Jan-2017

117 views 0 download

Transcript of Moving to g1 gc by Kirk Pepperdine.

Copyright 2016 Kirk Pepperdine

Moving to G1GC

Copyright 2016 Kirk Pepperdine

About me

- Write and speak about performance tuning- Offer performance tuning services and training- created jPDM, a performance tuning methodology

- Co-founder of jClarity- building the first generation of performance

diagnostic engines- Java Champion since 2006

Copyright 2016 Kirk Pepperdine

G1GC will the be the default collector in Java 9

What impact might this have on your applications performance?

Copyright 2016 Kirk Pepperdine

Questions To Be Answered

- What does a Regional heap look like?- How does the current G1GC algorithm work?- How does performance compare to other collectors- What are the tools we can use the to help us;

- engage in evidence based tuning- develop strategies so GC doesn’t interfere with

with our application’s throughput- tune our application to work better with the

collector

Copyright 2016 Kirk Pepperdine

Generational Garbage Collection

- Mark-Sweep Copy (evacuation) for Young- eden and survivor spaces- both serial and parallel implementations

- Mark-Sweep (in-place) for Old space- Serial and Parallel with compaction- (mostly) Concurrent Mark-Sweep- incremential mode

Copyright 2016 Kirk Pepperdine

Why another collector

- Scalability- pause time tends to be a function of heap size

- CMS is difficult to tune- dozens of parameters some of which are very difficult

to understand how to use- -XX:TLABWasteTargetPercent=????

- Completely unpredictable- well, maybe but that is a different talk

Copyright 2016 Kirk Pepperdine

G1GC

- Designed to scale- break the pause time heap size dependency

- Easier to tune (maybe)- fewer configuration options

- Predictable- offer pause time goals and have the collector tune it’s

self

Copyright 2016 Kirk Pepperdine

A G1GC heap is

- 1 large contigous reserved space- specified with -mx- split into ~2048 regions- size is 1, 2, 4, 8, 16, 32, or 64m

eg. -mx10G,Region size = 10240M/2048 = 5mreduce to 4GNumber of regions = 10G/4m = 2560

Copyright 2016 Kirk Pepperdine

Regions

- Free regions are kept on a free region list- When in use will be tagged as;- Eden, Survivor, Old, or Humongous

Copyright 2016 Kirk Pepperdine

Allocation

- Free regions are kept on a free region list- mutator threads acquire a region from region free list- tag region as Eden

- allocate object into region- when region is full, get a new regions from free list

Eden

Eden

Eden

Eden

Copyright 2016 Kirk Pepperdine

Humongous Allocation

- allocation is larger than 1/2 a regions size- size of a regions defines what is humongous

- allocate into a humoungous region - created from a set of contigous regions

Eden

Eden

Eden

Eden

Humongous

Copyright 2016 Kirk Pepperdine

Garbage Collection Triggers

- Alloted number of Eden regions have been consumed- Unable to satisfy a Humongous allocation- regions fragmentation- may lead to full collection

- Heap is full- full collection

- Metaspace threshold is reached- full discussion beyond the scope of this talk

Copyright 2016 Kirk Pepperdine

Garbage Collection

- Young Gen is Mark-Sweep- Mostly Concurrent-Mark of Tenured- initial-mark included with Young-Gen collection- concurrent-root-region-scan- concurrent-mark- remark- cleanup- concurrent-cleanup

- Mixed is mark Young, sweep Young and some tenured

Copyright 2016 Kirk Pepperdine

Reclaiming Memory (detailed)

- Mark Sweep Copy (Evacuating) Garbage Collection- Capture all mutator threads at a safepoint- Complete RSet refinement- Scan for GC Roots- Trace all references from GC roots- mark all data reached during tracing

- Copy all marked data into a “to space”- Reset supporting structures- Release all mutator threads

Copyright 2016 Kirk Pepperdine

RSet

- Track all external pointers to a region- GC roots for the region

- Expensive to update- mutations recored to a refinement

queue- update delegated to refinement

threads

Copyright 2016 Kirk Pepperdine

RSet Refinement

- Refinement queue is divided into 4 regions- White: no refinement threads are working- Green: number of cards that can be processed

without exceeding 10% of pause time- Yellow: all refinement threads are working to keep

up- Red: Application threads are involved in refinement

Copyright 2016 Kirk Pepperdine

CSets

- Set of all regions to be swept- Goal is to keep pauses under MaxGCPauseMillis- controls the size of the CSet

- CSet contain- all Young regions- selected Old regions during mixed collections- number / mixed GC ratio

Copyright 2016 Kirk Pepperdine

Heap after a Mark/Sweep

- all surviving objects are copied into (to) Survivor regions- Eden and (from) Survivor regions are returned to free

regions list

Humoungous

Survivor

Copyright 2016 Kirk Pepperdine

Promotion to Old

- Data is promoted to old- from survivor when it reaches tenuring threshold- to prevent survivor from being overrun- pre-emptive or reactive

Humongous

Survivor Old

Copyright 2016 Kirk Pepperdine

Parallel Phases

- external root scanning- updating remembered sets- scan remembered sets- code root scanning- object copy- string dedup

Copyright 2016 Kirk Pepperdine

Serial Phases

- code root fixup- code root migration- clear CT- choose CSet- Reference processing- redirty cards- free CSet

Copyright 2016 Kirk Pepperdine

Starting a (mostly) Concurrent Cycle

- Scheduled when heap occupancy reaches 45%- initial-mark runs inside a Young collection- mark calculates livelyness- used for CSet inclusion decisions

Eden

Eden

Eden

Eden

Humoungous Survivor

Survivor

Old

OldOld

Old

Old

Old

Old

Old OldOld

Old

Old

Old

Old

Old

Copyright 2016 Kirk Pepperdine

Flags (you want to use)

-XX:+UseG1GC-mx4G-XX:MaxGCPauseMillis=200

-Xloggc:gc.log-XX:+PrintGCDetails-XX:+PrintTenuringDistribution-XX:+PrintReferenceGC"-XX:+PrintGCApplicationStoppedTime-XX:+PrintGCApplicationConcurrentTime"

Copyright 2016 Kirk Pepperdine

Flags (you might want to use)-XX:G1HeapRegionSize=1-XX:InitiatingHeapOccupancyPercent=45-XX:+UnlockExperimentalVMOptions"-XX:G1NewSizePercent=5

-XX:+UnlockDiagnosticVMOptions-XX:+G1PrintRegionLivenessInfo

-XX:SurvivorRatio=6-XX:MaxTenuringThreshold=15

Copyright 2016 Kirk Pepperdine

Flags (you should think twice about using)

-XX:G1MixedGCCountTarget=8

-XX:+UnlockExperimentalVMOptions"-XX:G1MixedGCLiveThresholdPercent=85/65

Copyright 2016 Kirk Pepperdine

Flags (you should never use)

-XX:+UnlockExperimentalVMOptions"-XX:G1OldCSetRegionThresholdPercent=10-XX:G1MaxNewSizePercent=60-XX:G1HeapWastePercent=10-XX:G1RSetUpdatingPauseTimePercent=10

Copyright 2016 Kirk Pepperdine

Things that give the G1 grief

- RSet refinement- too much overhead to put work on mutator thread- affects application throughput- high rates of mutation place pressure on RSet

refinement- will affect Young parallel phase and remark times

- Object copy- not much to say here (unfortunately)

Copyright 2016 Kirk Pepperdine

Things that give the G1 grief

- Humongous allocations- definition controlled by region size- bigger region yields bigger RSet refinement costs

- Floating garbage- “dead” objects in other regions keep dead objects

alive- negative impact on object copy costs

- more aggressive ripeness settings- most costly collections

Copyright 2016 Kirk Pepperdine

Tuning Cassandra (benchmark)

- Out of the box tuned for using CMS- exceptionally complex set of configurations

- Reconfigured- to run G1- given fixed unit of work which should ideally be

cleared in 15 minutes

Goal: Configure G1 to maximize MMU

Copyright 2016 Kirk Pepperdine

Cassandra throughput running @ 100% CPU

0

17500

35000

52500

70000

1 2 3 4 5 6 7 8 9 10 11 12

CMS

G1GC

Copyright 2016 Kirk Pepperdine

Run times

00:12:35

00:14:40

00:16:45

00:18:50

00:20:55

1 2 3 4 5 6 7 8 9 10 11

Copyright 2016 Kirk Pepperdine

Weak Generational HypothesisRa

te

Time

Copyright 2016 Kirk Pepperdine

Performance Seminar`

www.kodewerk.com

Java P

erform

ance T

uning,

June 2

-5, Chan

ia Gree

ce