Java on z/OS: A fresh look Scott Chapman American Electric Power.
Transcript of Java on z/OS: A fresh look Scott Chapman American Electric Power.
Java on z/OS: A fresh look
Scott ChapmanAmerican Electric Power
Important notes
I don’t really like Java as a language I’m not a Java expert
Results presented herein may be installation-dependent There’s a lot of moving parts here
I understand there’s zAAP on zIIP “zAAP” used generically here
All trademarks of IBM, Oracle, and everybody else hereby recognized
Why Java on z/OS?
Because programmers want to use it
http://xkcd.com/801/
Why Java on z/OS
Because it enables open source projects that are cool/useful/interesting
Key trick: run the JVM in ASCII -Dfile.encoding=ISO8859-1
Many things will just run with that run-time option!
What about a GUI?
Turns out that that just works too! Start Xming X server on your PC
Check the “No Access Control” option Set the DISPLAY environment variable Run the code
S147774:/u/s147774: >export DISPLAY=10.97.131.15:0S147774:/u/s147774: >java -Xmx320m -jar ga33.jar
Debugging Javascript code running in Helma on themainframe with the GUI connected to Xming on
my laptop
Works better than I expected
Why Java on z/OS Because it enables more
programming language choices Javascript built in to Java 6
Rhino interpreter from Mozilla In theory, should be able to run any JVM-
based language (I haven’t tested these) Jython Groovy Clojure Scala Ruby (via JRuby)
Why Java on z/OS
It may perform better If you are on a sub-capacity machine
It may save you money Pretty unlikely Only if you can take some work away
from your peaks
Which job is better?
How cheap are zAAP/zIIPs?• $100K/SE (z196, zEC12)• How much is $100K?• Consider adding 1 engine to z196-710:
a) 710 = 10,250 MIPS, 1191 MSUs
b) 711 = 11,073 MIPS, 1286 MSUs
c) 710+1 zIIP = 10,302+1,000 MIPS
• z/OS (base) at this level costs $62/MSU• Scenario B, z/OS base goes up almost
$6K/month
• zIIP costs < 17 months of z/OS Base • Not to mention features, DB2, CICS, etc.
What about accessing z/OS services? JZOS Classes to easily access z/OS
specific constructs z/OS datasets RACF Respond to operator commands Access JES Spool
Ways to Run Java on z/OS
WebSphere CICS DB2 Stored Procedures Batch Started Tasks Unix shell
Batch / Started Task options
BPXBATC BPXBATCH (traditional alias) BPXBATSL (local spawn alias) Traditional approach
Difficulty with 100-byte JCL Parm
JZOS Ships with z/OS Avoids 100-byte parm limit Adds a lot of flexibility
Measuring Java
zAAP vs. GCP time
Watch the normalization factor! Most SMF values not normalized Tools/reports may normalize for you
Consider IFAHONORPRIORITY=NO Avoid using GCPs to help zAAPs Can result in >99% of Java CPU time
executed on zAAP
SDSF zAAP vs. GCP columns
JOBNAME CPU-Time GCP-Time zAAP-Time zACP-Time zAAP-NTime P3SR01BS 1514.11 9.53 772.02 2.26 1501.82 P3SR01AS 1706.50 12.82 868.75 1.95 1690.00 P3SR01B 788.55 197.66 281.64 1.53 547.87 P3SR01A 763.01 192.47 272.33 1.10 529.77 P3SR02A 2953.37 422.62 1188.79 5.39 2312.56 P3SR02B 3051.88 437.74 1226.02 6.55 2385.00 P3SR01AS 7281.39 62.56 3698.72 11.47 7195.17 P3SR02BS 2805.58 123.85 1316.22 22.15 2560.45 P3SR01BS 7783.21 63.38 3955.54 14.38 7694.77 P3SR02AS 2591.27 118.60 1216.36 10.74 2366.21 RTMSERVE 2661.39 3.85 1363.45 1.03 2652.34
zAAP on GCP normalizedrealTCB + SRB
This data comes from RMF
SMF 30 Accounting
BPXBATCH vs. BPXBATSL vs. JZOS Important due to spawned OMVS tasks
Single step job results: BPXBATSL: 1 step, 1 job record BPXBATCH: 6 step, 4 job records
CPU time collected on type OMVS records JZOS: 2 step, 2 job records
CPU time almost completely on JOB types
Some interesting calculations
zAAPn = SMF30_TIME_ON_IFA * SMF30ZNF / 256
percent work done on zAAP =
zAAPn / (zAAPn + SMF30CPT + SMF30CPU)
(“Generosity” or “offload” factor)
percent zAAP sent to GCP =
SMF30_TIME_IFA_ON_CP / (SMF30_TIME_ON_IFA+SMF30_TIME_IFA_ON_CP)
(“Fallback” percentage—can be <1%, although some fallback is normal and expected)
Other SMF records
RMF records Look for breakdown of processor types
for both hardware and report / service classes
WAS 120 records New subtype 9s for WAS 7+ much
better! HIS type 113 records
GCP vs. zAAP vs. zIIP
Java Performance
What about performance?
Java on the mainframe has a history of performance problems
Java is inherently “heavy” due to the JVM Scott’s Law: “The easier you make it on
the programmer, the harder it is on the system”
Today’s z hardware and software are up to the task! (But you probably want zAAPs!)
Heard at WAS Week 200x…
“Our goal is to get JVM startup time down to about 1 second.”
Seemed like a stretch at the time! WAS startup took several minutes
Today: WAS Servant Startup <1 min15.49.15 STC14327 ---- MONDAY, 18 APR 2011 ----
15.49.15 STC14327 $HASP373 P3SR02AS STARTED
15.49.15 STC14327 IEFUSI BPXBATSL-P3ASRU ABOVE REGION SET TO 1536MB
15.49.15 STC14327 IEF403I P3SR02AS - STARTED - TIME=15.49.15
15.49.16 STC14327 +BBOO0004I WEBSPHERE FOR Z/OS SERVANT PROCESS
P3CELL/P3NODEA/P3SR02/P3SR02A IS STARTING.
15.49.16 STC14327 +BBOO0239I WEBSPHERE FOR Z/OS SERVANT PROCESS p3cell/p3nodea/p3sr02a IS
STARTING.
15.49.16 STC14327 +BBOO0308I SERVANT PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A IS EXECUTING
IN 64-BIT ADDRESSING MODE.
15.49.16 STC14327 +BBOM0007I CURRENT CB SERVICE LEVEL IS build level 7.0.0.12
(cf121027.08) release WAS70.ZNATV date 07/09/10 11:02:02.
...
15.49.56 STC14327 +BBOO0222I: WSVR0001I: Server SERVANT PROCESS p3sr02a open for
e-business
15.49.57 STC14327 +BBOO0020I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT
PROCESS P3SR02A.
15.49.57 STC14327 +BBOO0248I INITIALIZATION COMPLETE FOR WEBSPHERE FOR Z/OS SERVANT
PROCESS P3CELL/P3NODEA/P3SR02/P3SR02A.
Not much in that particular servant
Today: HelloWorld in <2 seconds10.08.55 JOB47259 IEF403I S147774B - STARTED - TIME=10.08.55 10.08.57 JOB47259 - --TIMINGS (MINS.)-- ----PAGING COUNTS--- 10.08.57 JOB47259 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO 10.08.57 JOB47259 -S147774B RUNOMVS 00 59 .00 .00 .02 2524 0 0 0 0 10.08.57 JOB47259 IEF404I S147774B - ENDED - TIME=10.08.57
10.08.57 JOB47259 -S147774B ENDED. NAME-BPXBATCH TEST TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= .02 10.08.57 JOB47259 $HASP395 S147774B ENDED
OutputHello Scott Java runtime: IBM Corporation 1.6.0, vm version 2.4 Running on: s390 z/OS 01.10.00 Running for: S147774 Classpath: /usr/lpp/java/J6.0/lib:/usr/lpp/java/IBM/J1.3/l
JCL//RUNOMVS EXEC PGM=BPXBATCH, // PARM='SH java -Xms32M -Xmx32M HelloWorldApp Scott'//SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //SYSUDUMP DD SYSOUT=* //STDENV DD * //STDOUT DD SYSOUT=* //STDERR DD SYSOUT=*
z10 EC 504 with zAAP
Small machine
10.51.53 JOB10901 IEF403I S147774B - STARTED - TIME=10.51.53 10.52.04 JOB10901 - --TIMINGS (MINS.)-- ----PAGING COUNTS---
10.52.04 JOB10901 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO
10.52.04 JOB10901 -S147774B RUNOMVS 00 86 .00 .00 .18 2252 0 0 0 0
10.52.04 JOB10901 IEF404I S147774B - ENDED - TIME=10.52.04
10.52.04 JOB10901 -S147774B ENDED. NAME-BPXBATCH TEST TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= .18
10.52.04 JOB10901 $HASP395 S147774B ENDED
z10 BC E02without zAAPs
Not surprising that ~50 MIPS engines can’t keep up with
450 / 900 MIPS engines
What about doing real work?
Days of assuming it will run faster on your PC are over Have seen H2 perform better on z/OS
Still, it is Java, it’s not CPU-free Performance may depend on:
zAAP and GCP capacity System settings (USS, zFS, WLM) Application code Java Settings (heap size, GC policy) Random luck
Application code Application code is always important
Regardless of the language!
BufferedReader or ZFile? Classic “it depends” BufferedReader seems like it should be faster But they provide different results: byte array vs.
string What you want to do with the result may impact
which is best for any given situation
Java has lots of similar but slightly different ways of doing things
Heap settings Heap settings always seen as an
issue Size is the usual suggestion
Is bigger always better? Does anybody know how much heap
they really need? (no) Min / Max sizes same or different? Garbage collection policy options
Memory is an issue
Java’s memory usage can be an issue “Requirements” for 100s of MBs are
not unusual Often “requirements” seem to be a
SWAG Java heap size can’t be reliably predicted
from the code & expected volumetrics Test with reasonable numbers before
assuming the requirements are real Be sure to get all processing scenarios!
Garbage Collection Options (IBM Java 6) optthruput – default
Probably best for batch gencon – generational / concurrent
maybe good for large heap, transactional workloads (WAS)
optavgpause – reduces long pauses subpool – “improved” object allocation
For important workloads, may want to test all of them at various size
Lots of other heap/gc options too See IBM JDK Diagnostics Guide!
Heap size impact - Workload 1
0
5
10
15
20
25
30
35
40
45
Run 1 Run 2 Run 3 Run 4 Run 5
zAA
Pn
se
co
nd
s
32MB 64MB 128MB 256MB 512MB
For some workloads, heap size may not matter
Heap size impact - Workload 2
0
50
100
150
200
250
300
350
Run 1 Run 2 Run 3 Run 4 Run 5
zAA
Pn
se
co
nd
s
32MB 64MB 128MB 256MB 512MB
Too small of a heap cancause CPU increase
Variable vs. Fixed Heap size
0
50
100
150
200
250
300
350
WL1 32MB WL1 32-128MB WL1 128MB WL2 32MB WL2 32-128MB WL2 128MB
zAA
Pn
Sec
onds
Run 1 Run 2 Run 3 Run 4 Run 5
There might be a slightbenefit to a fixed
heap size
Heap size most important,but GC Policy alsocan be significant
GC Policy Comparison, Workload 2
0
100
200
300
400
500
600
700
800
Run 1 Run 2 Run 3 Run 4 Run 5
zAA
Pn
Sec
on
ds
optthruput 128MB optavgpause 128MB subpool 128MB gencon 128MB
optthruput 32MB optavgpause 32MB subpool 32MB
Runtime options
0
20
40
60
80
100
120
140
Run 1 Run 2 Run 3 Run 4 Run 5
zAA
Pn
Se
co
nd
s
Baseline jit:count=0 quickstart
Don’t messwith the JIT!
Quickstart with trivial workload
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Run 1 Run 2 Run 3 Run 4 Run 5
zAA
Pn
se
co
nd
s
baseline quickstart Could be goodfor certainworkloads
So what’s the random thing? Much more variation in CPU time
measurements with today’s CPUs Superscalar pipeline and cache issues
Seems to impact my Java work more than I expected Consistently ran same workload Extremely lightly utilized LPAR Lightly utilized zAAPs Same variability over time
So I tried some more tests…
Java Workload Variability
0
20
40
60
80
100
120
140
160
180
200
17M
AY11
:07:
45:0
0
17M
AY11
:10:
00:0
0
17M
AY11
:12:
00:0
0
17M
AY11
:14:
00:0
0
17M
AY11
:16:
00:0
0
17M
AY11
:18:
00:0
0
17M
AY11
:20:
00:0
0
17M
AY11
:22:
00:0
0
18M
AY11
:04:
00:0
0
18M
AY11
:06:
00:0
0
18M
AY11
:08:
00:0
0
18M
AY11
:10:
00:0
0
18M
AY11
:12:
00:0
0
18M
AY11
:15:
15:0
0
18M
AY11
:17:
15:0
0
18M
AY11
:19:
15:0
0
18M
AY11
:21:
15:0
0
18M
AY11
:23:
15:0
0
19M
AY11
:01:
15:0
0
19M
AY11
:03:
15:0
0
19M
AY11
:05:
15:0
0
19M
AY11
:08:
15:0
0
19M
AY11
:10:
15:0
0
19M
AY11
:12:
15:0
0
19M
AY11
:14:
15:0
0
19M
AY11
:16:
15:0
0
19M
AY11
:18:
15:0
0
19M
AY11
:20:
15:0
0
19M
AY11
:22:
15:0
0
20M
AY11
:00:
45:0
0
20M
AY11
:02:
45:0
0
20M
AY11
:04:
45:0
0
CPU
sec
onds
(zA
APn
+ G
CP)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
CPU
Sec
onds
for t
rivia
l wor
kloa
d
Workload1, 32MB Workload1, 512MB Workload1, REXX Workload2, 128MB Workload2, 512MB Trivial, 32MB
One zAAP Zero zAAPsTwo zAAPs
Why is this?
I don’t know, but best guess is CPU cache and memory access effects
But I thought I’d look at the 113 records to see if I could find anything interesting….
Processor Speed
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 2
Proc 0 = GCPProc 2 = zAAP
Data fromTest period 1(One zAAP)
Executed Instruction Rate
0
50
100
150
200
250
300
350
400
0 2
Seems to confirmour SMF30 data
Proc 0 = GCPProc 2 = zAAP
Level 1 Miss Percentage
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 2
Proc 0 = GCPProc 2 = zAAP
Percent sourced from L1.5 Cache
0
10
20
30
40
50
60
70
80
90
100
0 2
L1.5 Improvementcorresponds to dipin machine usage
Proc 0 = GCPProc 2 = zAAP
Percent TLB Miss of Total CPU
0
5
10
15
20
25
30
35
40
45
50
0 2
Dip in GCP TLB Missoverhead due to
machine less busyProc 0 = GCPProc 2 = zAAP
Estimated Cycles Per Instruction
0
1
2
3
4
5
6
7
8
9
10
0 - Sum of ESTIMATEDINSTRUCTIONCOMPLEXITYCPI(ESTICCPI)
0 - Sum of ESTIMATEDCPI FROMFINITECACHE/MEM(ESTFINCP)
2 - Sum of ESTIMATEDINSTRUCTIONCOMPLEXITYCPI(ESTICCPI)
2 - Sum of ESTIMATEDCPI FROMFINITECACHE/MEM(ESTFINCP)
Proc 0 = GCPProc 2 = zAAP
My Guesses… My test Java workloads were too cache and
superscalar friendly Perhaps makes it more susceptible to pipeline
hazards But:
Wouldn’t the REXX workload be even more superscalar and cache friendly?
Why were the 113 measurements so consistent?
Or Java is really doing variable amounts of work?
Or… something isn’t right someplace? Take away: Java CPU measurements might be
more variable than you expect
Most recent testing Repeated testing later in the year
z/OS 1.12 vs. 1.10 1 Year more recent Java 6 (Fall 2010 vs. Fall 2009)
Still saw variability, but worst of it was closer to 25-30% instead of upwards of 75%
Saw similar variability when testing on a z9 with zAAPs
Saw at least one instance in a production LPAR with similar variability: (in 3 executions of the same job, 1st consumed just over half as much CPU of the later runs)
Could not readily replicate on a WSC system running under z/VM
Summary
Java enables all sorts of cool things you might not have thought could run on the mainframe
Mainframe’s Java performance not significantly worse than any other platform (Assuming adequate zAAP capacity)
Lots of tuning knobs for Java Java CPU time measurements might
be more variable