So You Want To Write Your Own Benchmark
-
Upload
dror-bereznitsky -
Category
Technology
-
view
16.265 -
download
1
description
Transcript of So You Want To Write Your Own Benchmark
So you want to write
your
own microbenchmark
Dror B
erezni
tsky
Decem
ber 18
th 2008
2
Agenda
• Introduction
• Java™ micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
3
Microbenchmark – simple definition
1. Start the
clock
2. Run the code 3. Stop the
clock
4. Report
4
Better microbenchmark definition
• Small program
• Goal: Measure something about a few
lines of code
• All other variables should be removed
• Returns some kind of a numeric
result
5
Why do I need microbenchmarks?
• Discover something about my code:
• How fast is it
• Calculate throughput – TPS, KB/s
• Measure the result of changing my code:
• Should I replace a HashMap with a TreeMap?
• What is the cost of synchronizing a method?
6
Why are you talking about this?
• It’s hard to write a robust
microbenchmark
• it’s even harder to do it in Java™
• There are not enough Java
microbenchmarking tools
• There are too many flawed
microbenchmarks out there
7
Agenda
• Introduction
• Java micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
8
A microbenchmark story: the problem
The boss asks you to solve a performance issue
in one of the components
Blah, blah …
9
A microbenchmark story: the cause
You find out that the cause is excessive use of Math.sqrt()
10
A microbenchmark story: a solution?
• You decide to develop a state of the art
square root approximation
• After developing the square root
approximation you want to benchmark it against the java.lang.Math
implementation
11
public static void main(String[] args) {
long start = System.currentTimeMillis(); // start the clock
for (double i = 0; i < 10 * 1000 * 1000; i++) {
mySqrt(i); // little piece of code
}
long end = System.currentTimeMillis(); // stop the clock
long duration = end - start;
System.out.format("Test duration: %d (ms) %n", duration);
}
SQRT approximation microbenchmark
Let’s run this little piece of code in a loop
and see what happens …
12
SQRT microbenchmark results
Wow, this is really fast !
Test duration: 0 (ms)
13
Flawed microbenchmark
14
SQRT microbenchmark: what’s wrong?
The Java™ HotSpot virtual machine
Dynamic optimizations
On Stack Replacement
Dynamic Compilation
Dead code elimination
Classloading
Garbage collection
15
The HotSpot: a mixed mode system
Profiling
DynamicCompilation
Stuff Happen
Code is interpreted
Interpreted againor recompiled
1
2
3
4
5
16
Dynamic compilation
• Dynamic compilation is unpredictable
• Don’t know when the compiler will run
• Don’t know how long the compiler will run
• Same code may be compiled more than once
• The JVM can switch to compiled code at will
17
• Dynamic compilation can seriously
influence microbenchmark results
Dynamic compilation cont.
Interpreted execution +
Dynamic compilation +
Compiled code execution
≠Compiled / Interpreted code
execution
Continuous recompilation Steady-state
18
Dynamic optimizations
• The HotSpot server compiler performs
large variety of optimizations:
• loop unrolling
• range check elimination
• dead-code elimination
• code hoisting …
19
Code hoisting ?
Did he just said
“code
hoisting”?
20
What the heck is code hoisting ?
• Hoist = to raise or lift
• Size optimization
• Eliminate duplicated pieces
of code in method bodies
by hoisting expressions
or statements
21
Code hoisting example
Optimizing Java for Size: Compiler Techniques for Code Compaction, Samuli Heilala
a + b is a busy
expression
After hoisting the
expression a + b. A
new local variable t
has been introduced
22
Dynamic optimizations cont.
• Most of the optimizations are performed
at runtime
• Profiling data is used by the compiler to
improve optimization decisions
• You don’t have access to the dynamically
compiled code
23
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
Example: Very fast square root?
10,000,000 calls to Math.sqrt() ~ 4 ms
24
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
System.out.format("Result: %d %n", result);
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
Example: not so fast?
Now it takes ~ 2000 ms ?!?
Single line
of code
added
25
DCE - Dead Code Elimination
• Dead code - code that has no effect on the
outcome of the program execution
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
Dead Code
26
OSR - On Stack Replacement
• Methods are HOT if they cumulatively
execute more than 10,000 of loop
iterations
• Older JVM versions did not switch to the
compiled version until the method exited
and was re-entered
• OSR - switch from interpretation to
compiled code in the middle of a loop
27
OSR and microbenchmarking
• OSR’d code may be less performant
• Some optimizations are not performed
• OSR usually happen when you put
everything into one long method
• Developers tend to write long main()
methods when benchmarking
• Real life applications are hopefully divided
into more fine grained methods
28
Classloading
• Classes are usually loaded only when
they are first used
• Class loading takes time
• I/O
• Parsing
• Verification
• May flow your benchmark results
29
Garbage Collection
• JVM automatically claim resources by
• Garbage collection
• Objects finalization
• Outside of developer’s control
• Unpredictable
• Should be measured if invoked as a result
of the benchmarked code
30
Time measurement
public static void main(String[] args) throwsInterruptedException {
long start = System.currentTimeMillis();
Thread.sleep(1);
final long end = System.currentTimeMillis();
final long duration = (end - start);
System.out.format("Test duration: %d (ms) %n", duration);
}
Test duration: 16 (ms)
How long is one millisecond?
31
System.curremtTimeMillis()
• Accuracy varies with platform
Markus KoblerLinux – 2.6 kernel1 ms
Java GlossaryMac OS X 1 ms
David HolmesWindows NT, 2K, XP, 200310 – 15 ms
Java Glossary Windows 95/98 55 ms
SourcePlatformResolution
32
Wrong target platform
• Choosing the wrong platform for your
microbenchmark
• Benchmarking on Windows when your
target platform is Linux
• Benchmarking a highly threaded
application on a single core machine
• Benchmarking on a Sun JVM when the
target platform is Oracle (BEA) JRockit
33
Caching
• Caching
• Hardware – CPU caching
• Operating System – File system caching
• Database – query caching
34
Caching: CPU L1 and L2 caches
• The more the data accessed are far from the CPU, the more the delays are high
• Size of dataset affects access cost
136.44657438128192K
9.82141345116k
Cost (ns)Time (us)Array size
Jcachev2 results for Intel® core™2 duo T8300, L1 = 32 KB, L2 = 3 MB
35
Busy environment
• Running in a busy environment – CPU,
IO, Memory
36
Agenda
• Introduction
• Java micro benchmarking pitfalls
•Writing your own benchmark
• Micro benchmarking tools
• Summary
37
Warm-up your code
38
Warm-up up your code
• Let the JVM reach steady state execution
profile before you start benchmarking
• All classes should be loaded before
benchmarking
• Usually executing your code for ~10
seconds should be enough
39
Warm-up up your code – cont.
• Detect JIT compilations by using
• CompilationMXBean.
getTotalCompilationTime()
• -XX:+PrintCompilation
• Measure classloading time
• Use the ClassLoadingMXBean
40
CompilationMXBean usage
import java.lang.management.ManagementFactory;
import java.lang.management.CompilationMXBean;
long compilationTimeTotal;
CompilationMXBean compBean =
ManagementFactory.getCompilationMXBean();
if (compBean.isCompilationTimeMonitoringSupported())
compilationTimeTotal = compBean.getTotalCompilationTime();
41
Dynamic optimizations
• Avoid on stack replacement
• Don’t put all your benchmark code in one big main() method
• Avoid dead code elimination
• Print the final result
• Report unreasonable speedups
42
Garbage Collection
• Measure garbage collection time
• Force garbage collection and finalization
before benchmarking
• Perform enough iteration to reach garbage
collection steady state
• Gather gc stats: -XX:PrintGCTimeStamps
-XX:PrintGCDetails
43
Time measurement
• Use System.nanoTime()
• Microseconds accuracy on modern operating
systems and hardware
• Not worse than currentTimeMillis()
• Notice: Windows users
• executes in microseconds
• don’t overuse !
44
JVM configuration
• Use similar JVM options to your target
environment:
• -server or –client JVM
• Enough heap space (-Xmx)
• Garbage collection options
• Thread stack size (-Xss)
• JIT compiling options
45
Other issues
• Use fixed size data sets
• Too large data sets can cause L1 cache
blowout
• Notice system load
• Don’t play GTA while benchmarking !
46
Agenda
• Introduction
• Java micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
47
• Various specialized benchmarks
• SPECjAppServer ®
• SPECjvm™
• CaffeineMark 3.0™
• SciMark 2.0
• Only a few benchmarking frameworks
Java™ benchmarking tools
48
Japex Micro-Benchmark framework
• Similar in spirit to JUnit
• Measures throughput – work over time
• Transactions Per Second (Default)
• KBs per second
• XML based configuration
• XML/HTML reports
49
Japex: Drivers
• Encapsulates knowledge about a specific algorithm implementation
• Must extend JapexDriverBase
public interface JapexDriver extends Runnable {
public void initializeDriver();
public void prepare(TestCase testCase);
public void warmup(TestCase testCase);
public void run(TestCase testCase);
public void finish(TestCase testCase);
public void terminateDriver();
}
50
public class SqrtNewtonApproxDriver extends JapexDriverBase {
private long tmp;
…
@Override
public void warmup(TestCase testCase) {
tmp += sqrt(getNextRandomNumber());
}
…
}
Japex: Writing your own driver
51
<testSuite name="SQRT Test Suite"
xmlns=http://www.sun.com/japex/testSuite …>
<param name="libraryDir" value="C:/java/japex/lib"/>
<param name="japex.classPath" value="./target/classes"/>
<param name="japex.runIterations" value="1000000" />
<driver name="SqrtApproxNewtonDriver">
<param name="Description" value="Newton Driver"/>
<param name="japex.driverClass“
value="com.alphacsp.javaedge.benchmark.
japex.driver.SqrtNewtonApproxDriver"/>
</driver>
<testCase name="testcase1"/>
</testSuite>
Japex: Test suite
52
Japex: HTML Reports
53
Japex: more chart types
Scatter chart
Line chart
54
Japex: pros and cons
• Pros
• Similar to JUnit
• Nice HTML reports
• Cons
• Last stable release on March 2007
• HotSpot issues are not handled
• XML configuration
55
Brent Boyer’s Benchmark framework
• Part of the “Robust Java benchmarking”
article by Brent Boyer
• Automate as many aspects as possible:
• Resource reclamation
• Class loading
• Dead code elimination
• Statistics
56
Benchmark framework example
Benchmark.Params params = new Benchmark.Params(true);
params.setExecutionTimeGoal(0.5);
params.setNumberMeasurements(50);
Runnable task = new Runnable() {
public void run() {
sqrt(getNextRandomNumber());
}
};
Benchmark benchmark = new Benchmark(task, params);
System.out.println(benchmark.toString());
57
Benchmark single line summary
first = 25.702 us,
mean = 91.070 ns
(CI deltas: -115.591 ps, +171.423 ps)
sd = 1.451 us (CI deltas: -461.523 ns, +676.964 ns)
WARNING: execution times have mild outliers, SD
VALUES MAY BE INACCURATE
Benchmark output:
58
Outlier and serial correlation issues
• Records outlier and serial correlation issues
• Outliers indicate that a major measurement error happened
• Large outliers - some other activity started on the computer during measurement
• Small outliers might hint that DCE occurred
• Serial correlation indicates that the JVM has not reached its steady-state performance profile
59
Benchmark : pros and cons
• Pros
• Handles HotSpot related issues
• Detailed statistics
• Cons
• Each run takes a lot of time
• Not a formal project
• Lacks documentation
60
Agenda
• Introduction
• Java micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
61
Summary 1
• Micro benchmarking is hard when it comes to Java™
• Define what you want to measure and how want to do it, pick your goals
• Know what you are doing
• Always warm-up your code
• Handle DCE, OSR, GC issues
• Use fixed size data sets and fixed work
62
Summary 2
• Do not rely solely on microbenchmark
results
• Sanity check results
• Use a profiler
• Test your code in real life scenarios under
realistic load (macro-benchmark)
63
Summary: resources
• http://www.ibm.com/developerworks/java/librar
y/j-benchmark1.html
• http://www.azulsystems.com/events/javaone_20
02/microbenchmarks.pdf
• https://japex.dev.java.net/
• http://www.ibm.com/developerworks/java/librar
y/j-jtp12214/
• http://www.dei.unipd.it/~bertasi/jcache/
64
Thank Thank
You !You !