Copyright 2017 Kirk Pepperdine
BETTER PERFORMANCEBETTER CODE
Copyright 2017 Kirk Pepperdine
ABOUT ME▸Author of jPDM, a performance tuning methodology
▸bring structure and predicability to performance tuning
▸ Found of jClarity
▸next generation of performance tooling based on jPDM
▸ Performance consulting and Training (Kodewerk)
▸ Java Champion since 2006
Copyright 2017 Kirk Pepperdine
TEXT
TITLE TEXT
▸ Body Level One
▸ Body Level Two
▸ Body Level Three
▸ Body Level Four
▸ Body Level Five
www.kodewerk.com
Java P
erform
ance T
uning
Worksh
op
Copyright 2017 Kirk Pepperdine
▸Does what it’s suppose to do
WHAT IS GOOD CODE
jClarity
Copyright 2017 Kirk Pepperdine
▸Does what it’s suppose to do
▸ Is easy for Humans to read
WHAT IS GOOD CODE
jClarity
"(?:(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}[\\+|\\-]\\d{4}): )?(\\d+(?:\\.|,)\\d{3}): "
Copyright 2017 Kirk Pepperdine
▸Does what it’s suppose to do
▸ Is easy for Humans to read
▸ Translates well into the execution environment
WHAT IS GOOD CODE
jClarity
Copyright 2017 Kirk Pepperdine
▸Does what it’s suppose to do
▸ Is easy for Humans to read
▸ Translates well into the execution environment
WHAT IS GOOD CODE
jClarity
Copyright 2017 Kirk Pepperdine
CODING PRINCIPLES▸ SOLID
▸Single Responsibility
▸Open Closed
▸Liskov substitution
▸ Interface segregation
▸Dependency inversion
▸Delegation (tell don’t ask)
▸ Small methods
▸ Localized variables
jClarity
Copyright 2017 Kirk Pepperdine
COMPLEXITY▸We need to be at war with complexity
▸find the proper abstractions
▸ Implementations that are hard to explain
▸are unlikely to be good
▸often reflect the current (lack) of understanding of the problem
▸ Implementations that are easy to explain
▸maybe good
▸maybe too simple for the problem at hand
jClarity
Copyright 2017 Kirk Pepperdine
COUPLING VS COHESION▸Coupling is the degree of interdependence between classes
▸you need some degree of coupling to get useful work done
▸high degrees of coupling result in code that is harder to maintain
▸Cohesion refers to the degree belong together
▸ things that are related should be bound together
▸ low cohesion results when bundle up things that don’t belong together
▸ violates SRP
jClarity
Copyright 2017 Kirk Pepperdine
COUPLING VS COHESION
jClarity
Tension between Coupling and Cohesion
Copyright 2017 Kirk Pepperdine
STABILITY RATIO▸Afferent Coupling is a count of the number of classes dependent upon a target
class
▸ Efferent Coupling is a count of the number of classes the target class is dependent upon
▸ Instability = efferent couplings / afferent + efferent couplings
▸ Indicator of classes resiliency to change
▸Range of 0-1 where 0 is stable and 1 is unstable
▸Code with a large number of dependencies is highly coupled
▸ Instability ratio will be closer to 1 implying code is not resilient to change
jClarity
Copyright 2017 Kirk Pepperdine
▸Does what it’s suppose to do
▸ Is easy for Humans to read
▸ Translates well into the execution environment
WHAT IS GOOD CODE
jClarity
Copyright 2017 Kirk Pepperdine
EXECUTION ENVIRONMENT
jClarity
Java source code
javac
class Loader.class file
JVM HotSpot
method cache
Runtime
code cache
JIT
ahead of time compilation
Continuous and Just In Time compilation
Profiler
Copyright 2017 Kirk Pepperdine
JIT COMPILERS▸C1 - client
▸easy to reach optimizations
▸ compile count threshold 1500
▸C2 - server - optimizing compiler
▸deeper more complex optmizations
▸ compile count threshold 10,000
▸ Tiered
▸ combination of C1 and C2
▸optimizations are applied as they are found
jClarity
Copyright 2017 Kirk Pepperdine
BENEFIT OF HOTSPOT▸ Time to complete workload
▸with -Xint : 766.973 seconds
▸with JIT : 124.740 seconds
jClarity
766.973/124.740 ~= 6
Copyright 2017 Kirk Pepperdine
STATIC AND DYNAMIC OPTIMIZATIONS
jClarity
Inlining delayed compilation tiered compilation on-stack replacement dependence graph representation static single assignment representation exact type inference memory value inference constant folding reassociation operator strength reduction null check elimination type test strength reduction type test elimination algebraic simplification common subexpression elimination integer range typing conditional constant propagation
dominating test detection flow-carried type narrowing dead code elimination dead value elimination class hierarchy analysis devirtualization symbolic constant propagation autobox elimination escape analysis lock elision lock fusion de-reflection optimistic nullness assertions optimistic type assertions optimistic type strengthening optimistic array length strengthening untaken branch pruning optimistic N-morphic inlining branch frequency prediction call frequency prediction expression hoisting expression sinking
redundant store elimination adjacent store fusion card-mark elimination merge-point splitting loop unrolling loop peeling safepoint elimination loop vectorization inlining (graph integration) global code motion heat-based code layout switch balancing throw inlining local code scheduling local code bundling delay slot filing graph-coloring register allocation live range splitting copy coalescing constant splitting copy removal address mode matching instruction peepholing DFA-based code generator
Copyright 2017 Kirk Pepperdine
jClarity
Inlining delayed compilation tiered compilation on-stack replacement dependence graph representation static single assignment representation exact type inference memory value inference constant folding reassociation operator strength reduction null check elimination type test strength reduction type test elimination algebraic simplification common subexpression elimination integer range typing conditional constant propagation
dominating test detection flow-carried type narrowing dead code elimination dead value elimination class hierarchy analysis devirtualization symbolic constant propagation autobox elimination escape analysis lock elision lock fusion de-reflection optimistic nullness assertions optimistic type assertions optimistic type strengthening optimistic array length strengthening untaken branch pruning optimistic N-morphic inlining branch frequency prediction call frequency prediction expression hoisting expression sinking
redundant store elimination adjacent store fusion card-mark elimination merge-point splitting loop unrolling loop peeling safepoint elimination loop vectorization inlining (graph integration) global code motion heat-based code layout switch balancing throw inlining local code scheduling local code bundling delay slot filing graph-coloring register allocation live range splitting copy coalescing constant splitting copy removal address mode matching instruction peepholing DFA-based code generator
STATIC AND DYNAMIC OPTIMIZATIONS
Copyright 2017 Kirk Pepperdine
FOO() CALLS BAR()▸ Forms a call site
▸virtual method lookup in a virtual method table
▸ vtable is constructed at class loading time
▸ jmp to code for BAR() and execute it with a return jmp
▸ involves pushing and popping variables on the stack
▸ Inlining eliminates the call site
▸ replaces the call site in foo() with the body of bar()
jClarity
Copyright 2017 Kirk Pepperdine
MASTERMIND▸Game to discover a hidden code
▸make a guess which is scored
▸ Red -> both color and column are correct
▸White -> only color is correct
▸use previous guesses to refine current guess
▸ can our current guess produce the scores for all the previous guesses
▸ P(8,4)=1680 possible combinations
▸very small solution space for a computer
jClarity
Copyright 2017 Kirk Pepperdine
MASTERMIND SIMULATION▸ P(100000,3) = 999,970,000,200,000
▸very large solution space
▸ Player thread makes guess
▸filters guess against all provious guesses
▸ if pass submits it to be scored
▸Board records the guess with the score
▸ Players guess comes from stack containing the permutation group
▸generate all 999,970,000,200,000 permutations
jClarity
Copyright 2017 Kirk Pepperdine
MASTERMIND SIMULATION▸ P(100000,3) = 999,970,000,200,000
▸very large solution space
▸ Player thread makes guess
▸filters guess against all provious guesses
▸ if pass submits it to be scored
▸Board records the guess with the score
▸ Players guess comes from stack containing the permutation group
▸generate all 999,970,000,200,000 permutations
jClarity
Seriously????
Do you know how
long that will take????
Copyright 2017 Kirk Pepperdine
MASTERMIND SIMULATION▸ Player thread use an index into the permutation group
▸element is generated on the fly
▸need to transpose an index into a element
▸ but how????
jClarity
Copyright 2017 Kirk Pepperdine
TRANSLATE 555 TO HEX
▸ Pivot values
▸16 = 10Hex, 256 = 100Hex, 4096 = 1000Hex
▸Calculation
▸555 / 256 = 2, 555 % 256 = 43 = (2*256) + 43
▸43 / 16 = 2, 43 % 16 = 11
▸0x22B
jClarity
digit = number / pivot value number = number % pivot value pivot value = pivot value / base
Copyright 2017 Kirk Pepperdine
TRANSLATE 0 TO ELEMENT IN P(100000,3)
▸ Symbols -> [0,1,2,3,4,….99999]
▸Calculation
▸0 / P1= 0, 0 % P1 = 0, symbol[0] = 0, symbols -> [1,2,3,4,….99999]
▸0 / P2= 0, 0 % P2 = 0, symbol[0] = 1, symbols -> [2,3,4,….99999]
▸0 / P3= 0, 0 % P3 = 0, symbol[0] = 2, symbols -> [1,2,3,4,….99999]
▸Element -> 0,1,2
jClarity
digit = index / pivot value number = index % pivot value pivot value = pivot value / base remove symbol from list of symbols
Copyright 2017 Kirk Pepperdine
Time for a Demo!
jClarity
Copyright 2017 Kirk Pepperdine
UGLY CODE CAN RUN FAST ALSO
jClarity
Copyright 2017 Kirk Pepperdine
VARIABLE ORDERING
▸ Violating Single Responsible Pattern sets up the conditions for False sharing
▸ False sharing performance impact
▸Single thread: 532ms, CPU 100%
▸8 threads with false sharing: 8310ms, CPU 800%
▸8 threads no false sharing: 1290ms, CPU 800%
jClarity
•doubles (8) and longs (8) • ints (4) and floats (4) • shorts (2) and chars (2) • booleans (1)and bytes (1) • references (4/8) •
Copyright 2017 Kirk Pepperdine
Time for a Demo!
jClarity
Copyright 2017 Kirk Pepperdine
MONITORING HOTSPOT▸ -XX:+PrintCompliation
▸ -XX:+LogCompilation
▸ requires -XX:+UnlockDiagnosticVMOptions
▸ log is best viewed using JITWatch
▸ requires -XX:+TraceClassLoading
jClarity
Copyright 2017 Kirk Pepperdine
INLINING STATES▸ Inline hot
▸ the method was determined hot
▸ Too big cold
▸ the method was not inlined as the code is too big
▸ the method was not hot
▸ Too big hot
▸ the method was determined hot
▸ but not inlined because the code is too big.
jClarity
Copyright 2017 Kirk Pepperdine
(SOME) THRESHOLDS▸ Inlining thresholds
▸MaxInlineSize=35 (bytes)
▸MaxInlineLevel=9 (nested)
▸MaxRecursiveInlineLevel=1
▸Medium methods
▸DesiredMethodLimit=8000 (bytecodes)
▸already compiled and too big to accept more inlining
▸MaxTrivialSize=6, MinInliningTheshold=250
▸ small methods get inlined very quickly
▸HugeMethodLimit=8000
▸ these won’t get compiled so forget about inlining
jClarity
Copyright 2017 Kirk Pepperdine
Back to the Code
jClarity
Copyright 2017 Kirk Pepperdine
BIT OF A BOOST BUT…..
jClarity
Copyright 2017 Kirk Pepperdine
VARIABLE ORDERING
▸ Violating Single Responsible Pattern sets up the conditions for False sharing
▸ False sharing performance impact
▸Single thread: 532ms, CPU 100%
▸8 threads with false sharing: 8310ms, CPU 800%
▸8 threads no false sharing: 1290ms, CPU 800%
jClarity
•doubles (8) and longs (8) • ints (4) and floats (4) • shorts (2) and chars (2) • booleans (1)and bytes (1) • references (4/8) •
Copyright 2017 Kirk Pepperdine
CONCLUSION▸ old story, code for correctness and readability
▸ the two are related
▸ software metrics can help
▸ Know your execution environment to make sure the code translates well into it
▸ tools are required
▸ helps you focus on the trees in the forest
▸HotSpot helps
▸ adds an extra layer of complexity
▸won't fix egregious coding mistakes
jClarity
Copyright 2017 Kirk Pepperdine
Questions?
jClarity
Top Related