JVM Dive for mere mortals

Post on 16-Apr-2017

1.020 views 1 download

Transcript of JVM Dive for mere mortals

@jkubrynski / kubrynski.com

JVM DIVE FOR MERE MORTALSJAKUB KUBRYNSKI

jk@devskiller.com / @jkubrynski / http://kubrynski.com

$ WHOAMICO-FOUNDER OF DEVSKILLER / CODEARTE

TRAINER AT BOTTEGA

CONFITURA ORGANIZER

DEVOXX.PL PROGRAM COMMITTEE

ACKNOWLEDGEMENTSMARTIN THOMPSON (@MJPT777)

ALEKSEY SHIPILËV (@SHIPILEV)

JAVA VIRTUAL MACHINE

LIFE CYCLEidea -> feature on production

LIFE CYCLEsource -> javac -> bytecode

bytecode -> classloader -> interpreter

interpreter -> JIT -> optimized native code

SOURCE CODEpackage com.random.company.app;

public class StringUtilsHelper

public boolean isEmpty(String str) return str != null && str.length() > 0;

JAVACconverts source code into byte codecheckssimple optimizations

CLASS FILEClassFile u4 magic; // CAFEBABE u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count­1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count];

BYTECODElist of operation codes

$xxd ­p Test.class ...1b04a0000504ac2a1b0464b600021b68ac...

1b => iload_1 04 => iconst_1 a0 => if_icmpne 7 04 => iconst_1 ac => ireturn 2a => aload_0 1b => iload_1 04 => iconst_1 64 => isub b6 => invokevirtual #5 1b => iload_1 68 => imul ac => ireturn

CLASSLOADERdynamically loads classeshierarchies

Bootstrap classloaderExtension classloaderApplication classloaderCustom classloader

CLASSLOADING PHASESloading -> reads class lelinking

verifying -> veries bytecode correctnesspreparing -> allocates memoryresolving -> links with classes, interfaces, elds, methods

initializing -> static initializers

INTERPRETERtemplate interpreterdetects the critical hot spots in the program

JITJust-In-Timeoptimizes codecompiles methods into native code-client (C1) / -server (C2)runs up to 20 times faster

INLININGpublic String getStringFromSupplier(Supplier<String> supplier) return supplier.get();

public String businessMethod(String param) Supplier<String> stringSupplier = new StringSupplier(”my” + param); return getStringFromSupplier(stringSupplier);

// turns to

public String businessMethod(String param) Supplier<String> stringSupplier = new StringSupplier(”my” + param); return stringSupplier.get();

UNROLLINGprivate static String[] options = "yes", "no", "true", "false"

public void someMethod() for (String opt : options) process(opt);

//turns into

public void someMethod() process("yes"); process("no"); process("true"); process("false");

SCALAR REPLACEMENTpublic record(int x, int y) Point point = new Point(x, y); storePoint(point);

// inlining

public record(int x, int y) Point point = new Point(x, y); events.store("Added point", point.x, point.y);

// scalar replacement

public record(int x, int y) events.store("Added point", x, y);

DEAD CODE ELIMINATIONpublic void myMethod() for (int i = 0; i < THRESHOLD; i++) new String("test");

// turns into

public void myMethod()

LOCK ELISIONpublic void process(List<User> users) List<User> result = new ArrayList<>(); synchronized(result) fillResult(users);

//turns into

public void process(List<User> users) List<User> result = new ArrayList<>(); fillResult(users);

TYPE SHARPENINGList<User> users = new ArrayList<>();

// turns into

ArrayList<User> users = new ArrayList<>();

ON STACK REPLACEMENThappens when the interpreter discovers that a method is loopingconverts an interpreted stack frame into a native compiled stackframe

TIERED COMPILATIONLEVELS

0: Interpreted code1: Simple C1 compiled code2: Limited C1 compiled code3: Full C1 compiled code4: C2 compiled code

WHY SHOULD I CARE?JIT does most of the optimizations we could do manually without"obfuscating" source codePerformance/load tests should run only on "hot" application

HOW TO TRACK?When after restarting your app is at the full speed?

$ jstat ­compiler <PID> 1s

// or

­XX:+PrintCompilation

EXECUTION COMPONENTSprogram counterframestack

STACK TRACE"main@1" prio=5 tid=0x1 nid=NA runnable java.lang.Thread.State: RUNNABLE at io.codearte.BlockBuilder.startBlock(BlockBuilder.groovy:21) at io.codearte.Generator.process(Generator.java:318) at io.codearte.ImportantApp.do(ImportantApp.java:64) at sun.reflect.NativeMethodImpl.invoke(NativeMethodImpl.java:18) at sun.reflect.NativeMethodImpl.invoke(NativeMethodImpl.java:62) at java.lang.reflect.Method.invoke(Method.java:497)

DEBUGGING

DEBUGGING

MEMORY LAYOUT

OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int Product.id N/A 16 4 String Product.name N/A 20 4 (loss due to the next object alignment) Instance size: 24 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int Product.id N/A 16 4 int Product.price N/A 20 4 String Product.name N/A Instance size: 24 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int Product.id N/A 16 4 int Product.price N/A 20 1 boolean Product.available N/A 21 3 (alignment/padding gap) N/A 24 4 String Product.name N/A 28 4 (loss due to the next object alignment) Instance size: 32 bytes (estimated, the sample instance is not available) Space losses: 3 bytes internal + 4 bytes external = 7 bytes total

OBJECT LAYOUTcom.eshop.model.Product object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 16 (object header) N/A 16 4 int Product.id N/A 20 4 int Product.price N/A 24 1 boolean Product.available N/A 25 7 (alignment/padding gap) N/A 32 8 String Product.name N/A Instance size: 40 bytes (estimated, the sample instance is not available) Space losses: 3 bytes internal + 0 bytes external = 3 bytes total

GARBAGE COLLECTORcleans memoryimportant performance factorvector algorithmstop the world in safepoints

GC ALGORITHMSSerialParallelConcurrent Mark SweepG1

GENERICSLIST<PILOT>

GENERICSSTARRING: TYPE ERASURE

GENERICSGENERICS LOVE DECLARATIONS

* EXCEPT LOCAL VARIABLESclass Pilots implements List<Pilot> ... // generics class Pilots extends ArrayList<Pilot> ... // generics List<Pilot> field; // generics List<Pilot> getPilots() ... // generics void getPilots(List<Pilots> pilots) ... // generics

field = new ArrayList<>(); // no generics :(

field = new ArrayList<>() ; // generics :P

GENERICSclass Pilots extends ArrayList<Pilot> ... Pilots.class.getGenericSuperclass() // returns java.util.ArrayList<Pilot>

List<Pilot> field; MyClass.class.getDeclaredField("field").getGenericType() // returns java.util.List<Pilot>

List<Pilot> field = new ArrayList<>(); field.getClass().getGenericSuperclass() // returns java.util.AbstractList<E>

List<Pilot> field = new ArrayList<Pilot>() ; field.getClass().getGenericSuperclass() // returns java.util.ArrayList<Pilot>

DISPATCH TYPESinvokevirtualinvokestaticinvokespecialinvokedynamic

LAMBDASgenerated by javacbootstraped by LambdaMetafactorycalled with invokedynamic

LAMBDA UNDER THE HOODBigDecimal sumCreditEntries(Client client) return sumEntries(client.getAccounts(), account ­> account.getCreditEntries());

private static java.util.List lambda$sumCreditEntries$0(com.sandbox.Account);

private Period period; BigDecimal sumCreditEntries(Client client) return sumEntries(client.getAccounts(), account ­> account.getCreditEntries(period));

private java.util.List lambda$sumCreditEntries$0(com.sandbox.Account);

BigDecimal sumCreditEntries(Client client, Period period) return sumEntries(client.getAccounts(), account ­> account.credit(period));

private static java.util.List lambda$sumCreditEntries$0 (java.time.Period, com.sandbox.Account);

METHOD REFERENCESIMILAR TO LAMBDAS, BUT NO NEED TO GENERATE A METHOD

BECAUSE WE'RE CALLING A METHOD

BENCHMARKSCallTypes.baseline avgt 30 4.163 ± 0.009 ns/op CallTypes.lambda avgt 30 4.174 ± 0.015 ns/op CallTypes.methodRef avgt 30 4.244 ± 0.049 ns/op

CallTypesExternal.baseline avgt 30 50.055 ± 0.275 ns/op CallTypesExternal.lambda avgt 30 50.980 ± 0.650 ns/op CallTypesExternal.methodRef avgt 30 50.655 ± 0.376 ns/op

METHODHANDLES

METHODHANDLESReplacement for reectionReection does access control during invocation whileMethodHandle checks with lookup

EXAMPLEMethodHandle toUpperCase = MethodHandles.lookup() .findVirtual(String.class, "toUpperCase", MethodType.methodType(String.class))

Object result = toUpperCase.invoke("test")); String result = (String) toUpperCase.invokeExact("test"));

BENCHMARKSBenchmark Mode Cnt Score Error Units baseline avgt 30 198.993 ± 0.156 ns/op handleExactWithoutLookup avgt 30 208.354 ± 0.675 ns/op handleWithoutLookup avgt 30 209.902 ± 0.331 ns/op reflectWithoutLookup avgt 30 213.322 ± 0.430 ns/op

handleWithLookup avgt 30 4306.501 ± 245.989 ns/op reflectWithLookup avgt 30 748.601 ± 2.566 ns/op

1ns = 0.000 001 ms = 0.000 000 001 s

STREAMSAPI for collection processingsplits implementation and business logicdoesn't store elements -> it's just a pipelinelaziness gives space for optimizations

PERFORMANCEstrings.map(String::toLowerCase) .filter(s ­> s.charAt(5) > 5) .map(s ­> s.substring(6, 12)) .collect(toList())

EACH STRING IS AROUND 24 CHARS

PERFORMANCE-Xmx512m

Benchmark (size) Mode Cnt Score Error Units for 100000 avgt 30 5946.020 ± 60.100 µs/op stream 100000 avgt 30 6647.524 ± 123.752 µs/op parallelStream 100000 avgt 30 2486.218 ± 49.030 µs/op

for 1000000 avgt 30 103638.567 ± 3367.418 µs/op stream 1000000 avgt 30 108666.331 ± 2759.447 µs/op parallelStream 1000000 avgt 30 139446.551 ± 5978.815 µs/op

for 1500000 avgt 30 340931.876 ± 32919.570 µs/op stream 1500000 avgt 30 340603.189 ± 22086.747 µs/op parallelStream 1500000 avgt 30 507793.070 ± 95685.964 µs/op

for 2000000 avgt 10 694607.055 ± 50240.340 µs/op stream 2000000 avgt 30 686536.389 ± 20536.336 µs/op parallelStream 2000000 OutOfMemoryError: GC overhead limit exceeded

GC OVERHEAD-Xmx512m gc.alloc.rate.norm

Benchmark (size) Mode Cnt Score Error Units for 100000 avgt 30 6896.776 ± 0.029 KB/op stream 100000 avgt 30 6897.174 ± 0.462 KB/op parallelStream 100000 avgt 30 10232.720 ± 0.005 KB/op

for 1000000 avgt 30 70745.169 ± 0.321 KB/op stream 1000000 avgt 30 70745.585 ± 0.388 KB/op parallelStream 1000000 avgt 30 98321.253 ± 0.994 KB/op

for 1500000 avgt 30 106122.045 ± 2.760 KB/op stream 1500000 avgt 30 106122.462 ± 2.583 KB/op parallelStream 1500000 avgt 30 147476.576 ± 23.135 KB/op

for 2000000 avgt 10 145153.644 ± 5.284 KB/op stream 2000000 avgt 30 145154.058 ± 2.427 KB/op parallelStream 2000000 OutOfMemoryError: GC overhead limit exceeded

IGNORE THE MEMORY-Xmx4g

Benchmark (size) Mode Cnt Score Error Units for 100000 avgt 30 23.966 ± 1.246 ms/op stream 100000 avgt 30 24.838 ± 1.274 ms/op parallelStream 100000 avgt 30 7.096 ± 0.131 ms/op

for 1000000 avgt 30 250.654 ± 8.956 ms/op stream 1000000 avgt 30 260.075 ± 7.867 ms/op parallelStream 1000000 avgt 30 76.781 ± 2.910 ms/op

for 2000000 avgt 30 533.450 ± 28.502 ms/op stream 2000000 avgt 30 554.711 ± 38.503 ms/op parallelStream 2000000 avgt 30 165.757 ± 9.707 ms/op

STREAMS SUMMARYstreams are cleaner and more readable than loopingserial streams have similar performance and overhead to manualloopingparallel streams are really fastparallel streams bring bigger memory overhead due to storingpartial resultsparallel streams always use commonPool (we can hack to use own)

EXCEPTIONSpublic class ClientAlreadyExistsException extends Throwable

EXCEPTIONSBenchmark Mode Cnt Score Error Units Exceptions.standardExcept avgt 30 1029.919 ± 5.026 ns/op Exceptions.standardExceptDeep avgt 30 1121.771 ± 6.615 ns/op

DEEP MEANS THERE ARE 4 MORE FRAMES

EXCEPTIONSpublic class ClientAlreadyExistsException extends Throwable

@Override public synchronized Throwable fillInStackTrace() return this;

EXCEPTIONSBenchmark Mode Cnt Score Error Units Exceptions.standardExcept avgt 30 1029.919 ± 5.026 ns/op Exceptions.standardExceptDeep avgt 30 1121.771 ± 6.615 ns/op Exceptions.stacklessExcept avgt 30 18.827 ± 0.066 ns/op Exceptions.stacklessExceptDeep avgt 30 19.835 ± 0.053 ns/op

DEEP MEANS THERE ARE 4 MORE FRAMES

FURTHER READINGOptimizing Java - Benjamin J Evans, James Gough

The Well-Grounded Java Developer - Benjamin J. Evans, MartijnVerburg

Java Performance - Charlie Hunt, Binu John

Java Performance: The Denitive Guide - Scott Oaks

TOOLSjdkVisual VMMission ControlJProlerHonest ProlerJava Object Layout

I WANT MORE!THE JAVA® VIRTUAL MACHINE SPECIFICATION

HG CLONE HTTP://HG.OPENJDK.JAVA.NET/JDK8/JDK8/

HTTP://OPENJDK.JAVA.NET/PROJECTS/CODE-TOOLS/JMH

BENCHMARKSHTTPS://GITHUB.COM/JKUBRYNSKI/JVM-DIVE-BENCHMARKS

QUESTIONS?

THANKS!