J2EE Batch Processing
-
Upload
rabi-shankar -
Category
Documents
-
view
72 -
download
0
description
Transcript of J2EE Batch Processing
1
Batch ProcessingWith J2EEDesign, Architecture and Performance
Chris Adkin
28th December 2008
2
Introduction
For the last two years I have worked on a project testing the performance and scalability of batch processes using a J2EE application server.
This presentation summarises my findings and conclusions based upon this work.
3
Introduction There is scarce information:-
on batch processing using J2EE in the public domain. on the end to end tuning of J2EE architectures which use Oracle
for persistence There is a lack of information within the DBA community on
performance tuning with respect to J2EE than goes beyond JDBC usage.
Most J2EE material only goes as far down to the database as persistence frameworks and JDBC.
The available information is not as “Joined up” as it could be. Hopefully, this presentation may fill some of these gaps and bridge the
divide between J2EE and database tuning.
4
Design Considerations
5
Design and Architecture Considerations
Use a third party tools and frameworks:- Spring batch Quartz
J2EE Application Server Extensions IBM WebSphere Compute Grid
Write your own infrastructure, devx has a good example.
6
Considerations For Available Infrastructures Quartz
Not a full blown batch infrastructure and execution engine, just a scheduler.
Spring Batch Version 2.0 was therefore not available at the time
of inception for my project. Spring 1.0 is only designed to run on one JVM and
was written for JSE 1.4. Earlier versions of spring can compromise the
transaction integrity of the application server, refer to this article.
7
Considerations For Available Infrastructures WebSphere Compute Grid
IBM has a long track record in both the J2EE and batch processing worlds.
“a complete out-of-the-box solution for building and deploying Java-based batch applications on all platforms supported by WebSphere” according to this article.
Integrates the tightest with WebSphere out of all the available options, but also ties you into WebSphere.
Requires WebSphere Network deployment as a pre-requisite. Not just a batch job processing infrastructure but a grid as
well. Comes with full tooling for developing batch jobs.
8
Infrastructure Considerations
Workload partitioning and scalability Can the workload be subdivided for distribution amongst
worker threads and nodes in a J2EE application server cluster ?
Does the infrastructure scale across JVM threads ? A grid ? J2EE application servers in a cluster ? Multiple JVMs via JMS and associated queuing
technologies ?
9
Infrastructure Considerations
Job traceability Does the framework give visibility of each stage of
processing that a job is at ?. Can the level of logging / tracing / auditing be changed
for individual batch jobs and how fine grained is this ?.
Exception handling Does the framework allow for this ?.
10
Infrastructure Considerations
Resource consumption management Control over CPU utilisation.
Extensibility Do you have to get your hands dirty with maintaining
the framework or can you just ‘Drop’ your business logic into it ?.
Is the framework flexible in handling the delivery of jobs from different sources ?, JMS, web services ? Etc . . .
Is the framework flexible in integrating with different end points ?.
11
Infrastructure Considerations
Scheduling and event notification Does the framework provide a scheduling
mechanism or can it easily hook into a third party scheduler products.
In particular the more popular schedulers such as BMC Control-M or Tivoli Maestro ?
Does the framework provide hooks into a pager and / or email event notification system ?.
12
Infrastructure Considerations
Resilience If a job or batch fails, will it bring the whole application
server down ?. If a batch fails, does it roll back and leave the application
in a consistent state ?. Can batches be re-started without any special steps
having to be performed.
13
Batch Environment Components
Batch execution environment The actual batch run time environment
Batch ‘container’ software to provide the services for a batch to run.
Scheduling Does the environment provide this or hooks into
third party schedulers ? The application itself
14
What Does J2EE Provide For A Batch Environment Pooling for the efficient management of resources.
Access to logging frameworks, Apache log4j, Java Util Logging (JUL).
Rich integration infrastructure via J2EE Connection Architecture and JDBC Java Messaging Services Web Services Web Service based publish / subscribe style event processing via
WS Notification Session Initiation Protocol (SIP) Service Component Architecture (provided in WebSphere 7 via a
feature pack).
15
What Does J2EE Provide For A Batch Environment Asynchronous processing via message driven beans.
Transaction support via JTS and an API via JTA.
Scalability across multiple Java Virtual Machines Most J2EE application server vendors offer clustered solutions.
Scalability across multiple Java threads Threading is not supported in the EJB container by definition of the
J2EE standard, however, it can be simulated using a client JVM or asynchronous beans.
Security via JAAS.
16
ORM Considerations
Many frameworks are available: iBatis, Toplink, Spring, Hibernate and IBM pureQuery.
Java Persistence Architecture lessens the need for such frameworks.
Few frameworks utilise the Oracle array interface. Use of a framework can vastly reduce the amount
of code required to be written. A “Half way house” is to use a JDBC wrapper.
17
ORM Considerations Questions to ask when choosing an ORM:-
Can custom SQL be used ? Can SQL be hinted ? Does it have caching capabilities ? Does it allow stored procedures to be called ?, both PL/SQL and
Java. Does it allow for any batch / bulk operations to be performed ?, e.g.
wrappers for the JDBC batching API.
A hybrid approach can be adopted, for example:- Read only entity beans for access to standing data, these have been
highly optimised of late, as per this article. “Hand rolled” JDBC for bulk operations leveraging things such as
the Oracle array interface.
18
Caching Considerations What is the percentage split between read and write activity
against data stored in the database:- Read intensive, then caching needs to be seriously
considered. Write intensive, consider stored procedures and
leveraging bulk operations as much as possible. Some database (including Oracle) support Java stored
procedures Leverage skills of J2EE developers within the database !.
Whatever you do, frequently accessed standing data should always be cached.
19
Caching Considerations Processing that takes place within a batch job may not
reuse the same data, but batch jobs that follow on from one another might.
Java objects talking to Java objects is faster than Java objects talking to relational data. However, if there is a reporting requirement, most
reporting tools run off relational data.
Is a relational database going to be ‘Fronted’ with some sort of cache, or is an in memory object cache to be used without any database featuring at all ?.
20
Caching Considerations
Is a custom caching design going to be used ?. “Scale proof” this using Network Deployment friendly memory
structures such as DistributedMap.
Is an off the shelf caching (and grid) solution such as Coherence or WebSphere eXtreme scale to be used ? These are intrusive technologies that need to be factored into
development.
An in memory relational database caching solution, e.g. TimesTen can be easily retrofitted into the technical infrastructure:- Does the integration layer expect objects or relational data ?.
21
Design Challenges
Resource UtilisationUsing a database for persistence incurs
performance penalties:-Network round tripsLatency in data retrieval and
modificationObject Relational impedance mismatch,
the “Vietnam of Computer Science”.
22
Design Challenges Resource Utilisation
Well designed and written batch processes may saturate CPU capacity on the application server:-
Good for throughput. Spare CPU capacity may be required to run
multiple batches at once in “catch up” scenarios. Not so good for any other none batch activities
using the environment. Sustained spikes in J2EE application server CPU
utilisation whilst batch processes are running and low CPU activity at other times.
23
Design Challenges
ORM (Object Relational Mapping) frameworks There are a multitude ORM frameworks on the market. ORM frameworks abstract away the underlying database. Little or no support for JDBC batching and the Oracle array
interface. Focus on item by item processing and not on database
features conducive to achieving good performance and scalability.
J2EE Java Persistence has come a long way with JEE 5 in the form of the Java Persistence API, both in terms of functionality and performance.
Good for programmer productivity as less “hand cranked code” is required.
24
Design Challenges Raw JDBC
Statement batching support, available in JDBC 2.0 onwards.
Support for batch retrieval via fetch sizes configuration. Can result in having to produce more “hand cranked”
code than that required with an ORM framework. Provide access to vendor specific performance related
features such as the "Oracle array interface" Requires more skill on the part of the Java programmer
in terms of having SQL and database knowledge. Development team might require a DBA.
25
Design Challenges SQLJ
Is essentially a JDBC wrapper, SQLJ calls are translated into JDBC calls by a pre-processor
Can achieve similar results as JDBC with less coding. Support for statement batching. SQLJ syntax can be checked at compile time. Does not support the Oracle array interface. An IBM SQLJ reference. Oracle SQLJ examples.
26
Design Challenges
Can the Oracle array interface be leveraged ? Despite all the choices available only raw JDBC provides
access to the Oracle array interface. There may come in point in scaling your architecture when the
Oracle array interface needs to be used, in order to:- Minimize network round trips Minimize parsing Leverage bulk operations within the database
27
Design Challenges
Can Oracle 11g client side caching be used ? An extension of the technology that allows results to be
cached in the server shared pool, but on the “Client side”. Requires the use of the thick JDBC driver. Can vastly reduce network, round trips, data access latency
and CPU utilisation on the database server. An excerpt from the 360 degree programming blog:-
“Running the Nile benchmark[3] with Client Result Cache enabled and simulating up to 3000 users results inUp to 6.5 times less server CPU usage 15-22% response time improvement 7% improvement in mid-tier CPU usage”
28
To Batch Or Not Too Batch
When real time asynchronous processing is applicable Processing needs to take place as soon as the
source data arrives, which does not all come at the same time.
When the processing window is too small to process all the jobs in one batch and when the jobs arrive continuously throughout the day.
Jobs are delivered asynchronously.
29
To Batch Or Not Too Batch
When a batch environment is applicable If the jobs processed are delivered in batches, this
will to a degree enforce batch type processing. When files need to be generated for delivery to
another organisation. If migrating from a none J2EE legacy batch
environment to J2EE, stick to batch in the first iterations of development, rather than jump to J2EE and an event processing architecture in one “Quantum leap”.
30
A “Third Way” Hybrid Environment
A real world example of where this is in operation Most retailers aggregate sales information from their point
of sales (POS) systems for processing at the head office. Larger retailers tender so many transactions that
processing them within a single batch window is not practical.
Therefore, for some retailers, information from the POS systems is continuously trickled to the head office and then batched up for processing when a certain number of files have been received.
31
Our Batch Process Design
J2EE tier WebSphere launch client to instigate batch processes. Client using Java threads to fire off multiple requests at the
application server and hence ‘Simulate’ threading within the application server.
A batch session bean to process arrays of jobs within a loop inside the WebSphere application server.
Stateless session beans. Container managed transactions (JTS). Each job is processed within its own transaction. Application configurable max threads per batch process and max
jobs per thread.
32
Our Batch Process Design
Persistence (Oracle) tier Raw JDBC and the Oracle thin driver. Some use of JDBC statement batching. Oracle 10g release 2 for the database. Limited use of stored procedures. J2EE tier data caching limited to standing data:-
Data cached in XML within the application server. When a standing data table is accessed for the first
time it is cached. All subsequent retrievals are via XPath.
33
Our Batch Process Design
Not a true batch implementation as such. Web GUI, Web service(s) and hand held units can
and are used whilst ‘batch’ processes run. ‘Batch’ in the context of large numbers of jobs are
processed together within specific time windows. All batch control is via the WebSphere launch
client, no GUI based job control.
34
Performance Monitoring and Tuning “Tool Kit”
Application Server and Client JVM Verbose garbage collection output WebSphere Performance Monitoring Infrastructure (PMI) WebSphere performance advisor Java thread dumps JProfiler Java profiler
Oracle Database 10g performance infrastructure, advisors, addm, time model etc.
Operating System Tools Prstat, sar, vmstat, iostat etc .
Veritas volume management monitoring tools vxstat
35
Performance Monitoring and Tuning “Tool Kit”
Available IBM WebSphere tools not used on the project:- IBM Support Assistant plugins, namely the
thread analyzer and verbose garbage collection output analyzer.
ITCAM for Response Time Tracking. ITCAM for WebSphere.
Available Sun tools not used on the project:- jstat jconsole
36
Batch Architecture Deployment Diagram
Launch Client WebSphere 6.1 Application Server
EJB Container
Domain Layer
Data Access Layer
Utility Services (batch manager, logging, exception handling, standing data cache
etc)
Domain Interface Layer
Data Access Interface Layer
Oracle Database Server-<< JDBC >>
* -<< JDBC >>
*
-<< RMI >>
* -<< RMI >>
*
37
Software Architecture
Classical Horizontally Layered architecture Apache struts => out of the can MVC framework. Business logic tier implemented used stateless
beans, session façade and business delegate and service locator patterns.
Data Access layer written using stateless beans and raw JDBC and data transfer object pattern.
Utility layer providing logging, exception handling, service locator, EJB home caching, standing data cache and parameters and controls functionality.
38
Software Architecture
Vertical layering also Functional areas divided into vertical slices that go
through both the business logic / domain layers and the data access / integration layer.
Loose coupling of vertical slices via ‘manager’ beans, the session façade design pattern and coarse interfaces.
39
Domain / business logic layer Cached standing data EJB home caching (service locator design pattern) Use of session façade pattern with coarse interfaces All beans are stateless
IBM consider this to be a best practice. Unlike calls to stateful session beans, calls to stateless can be
load balanced across all members of a cluster. J2EE community regards stateless beans as being better than
statefull beans for performance.
Software Performance Features
40
Software Performance Features
Data Access Layer Use of Data Transfer Objects JDBC connection pooling, min and max settings on the
JDBC pool set to the same to prevent connection storms. JDBC statement batching used in places. JDBC prepared and callable statements used so as not
to abuse the Oracle database shared pool. Soft parsing may still be an issue, but can be reduced
slightly by using session_cached_cursors.
General design Batch process threading for scale out.
41
Batch Design Sequence Diagrambatch Clientbatch Client J2EE ContainerJ2EE Container DatabaseDatabase
1: Start the Batch process
3: Get no.of threads and no.of jobs per thread parameters
5: returns
6: Get the list of SPRs/Jobs to be processed
8: returns a list of SPRs / Job Ids
9: Create No.of threads and pass the 'job list' as parameter
10: Each thread makes a call to a Bean method by sending the ' job list' as parameter
12: On completion, each thread ends here
11: Loop through each SPR/ Job Id within the 'job list' to process them
4: Retrieve the parameters
7: Retrive the SPRs/Job Ids
2: Create a Batch record with Start time
13: Update the Batch record with Status, end time
42
Where Does The Source Data For Our Batch Processes Originate ?
Flat files delivered via ftp
Web Services
A third party of the shelf package via JNI
Hand Held Units using J2ME
43
Design Critique
44
Pros Design can scale out via threads. Design can scale out across multiple JVMs. Design is simple and clean.
Because of the online usage, the row by row processing simplifies the design.
Complex code might be required to allow for both batch array processing and on line usage.
45
Pros
If a single job fails the whole batch does not need to be rolled back.
CPU usage of batch can be controlled by changing the number of threads.
Provides a framework for the batch infrastructure.
46
Cons
Inefficiencies by design when accessing the database Limited opportunities for leveraging the JDBC
batching API and the Oracle array interface. Design is prone to a lot of ‘Chatter’ between the
application and database servers. Large “Soft parse overhead”.
47
Cons HHU job retrieval may be more conducive to an event
processing architecture than a batch architecture:- Better for more even CPU utilisation.
We have to maintain the infrastructure code as well as the business logic / domain code.
Is there a better way of simulating threading that could reduce the role of the launch client, message driven beans perhaps ?:- i.e. limiting the role of the launch client in batch processing
will be better for performance and scalability.
48
Network Round Trip Overheads
Database utilisation – network round trip overhead From
“Designing Applications For Performance And Scalability”:-
“When more than one row is being sent between the client and the server, performance can be greatly enhanced by batching these rows together in a single network roundtrip rather than having each row sent in individual network roundtrips. This is in particular useful for INSERT and SELECT statements, which frequently process multiple rows and the feature is commonly known as the array interface.”
There is minimal scope for leveraging the array interface (and also the JDBC batching API) using our design.
49
Parsing Overheads
Best J2EE programming practise dictates that resources should be released as soon as they are no longer required.
All cached prepared statement objects are discarded when the associated connection is released.
This could be coded around, but would lead to code that is both convoluted and prone to statement cache leaks.
50
Parsing Overheads
The statement API is more efficient than the preparedStatement JDBC API for the first execution of a statement. Subsequent executions of a prepared statement
are more efficient and more scalable. Using the statement API would be less resource
intensive on the application server but more resource intensive on the database.
51
Parsing Overheads
Should the prepared statement cache size be set to zero ? No point in baring the overheads associated with
cached statement object creation. Will also create unnecessary pressure on the
JVM heap.
52
Parsing Overheads
Why is parsing such a concern ?:- Oracle’s Tom Kyte and the Oracle Real World
Performance group stress that parsing and efficient cursor use cannot be over stated when it comes to the scalability of applications that use Oracle.
This is not a problem unique to Oracle, WebSphere and DB2 material advocates the use of static SQL for the very same reason of avoiding parsing.
53
Parsing Overheads
Database utilisation – soft parse overhead The “Designing Applications For Performance And
Scalability – An Oracle White Paper” quotes the type of SQL usage with our design as being:-
“Category 2 – continued soft parsing The second category of application is coded such that the hard parse is replaced by a soft parse. The application will do this by specifying the SQL statement using a bind variable at run-time including the actual value . . . Continued . . .
54
Parsing Overheads Database utilisation – soft parse overhead
The application code will now look somewhat similar to:
loop cursor cur; number eno := <some value>; parse(cur, “select * from emp where empno=:x”);
bind(cur, “:x”, eno); execute(cur); fetch(cur); close(cur); end loop;”
Refer to “Soft things can hurt” !!!
55
Parsing Overhead
The Oracle Automatic Database Diagnostic Monitor (ADDM) reports on the performance impact of continuous soft parsing:-
FINDING 3: 13% impact (211 seconds)-----------------------------------Soft parsing of SQL statements was consuming significant database time.
RECOMMENDATION 1: Application Analysis, 13% benefit (211 seconds) ACTION: Investigate application logic to keep open the frequently
used cursors. Note that cursors are closed by both cursor close calls and session disconnects.
56
Parsing Overhead
“Category 3” processing as per the white paper is more efficient and what we should really be striving for, as per the PL/SQL below:-
“cursor cur; number eno; parse(cur, “select * from emp where empno=:x”); loop
eno := <some value>; bind(cur, “:x”, eno); execute(cur); fetch(cur);
end loop; close(cur) ;”
57
Testing Environment
58
Monitoring And Tuning The Software
Lots of things to monitor and tune:- Client JVM Server JVM Application server Object Request Broker EJB container JDBC connection pool usage and statement cache Application code Database usage and resource utilisation Application server resource utilisation, mainly CPU Network between the application server and database server Number of threads per batch job Number of jobs per thread
59
Testing Environment
Performance targets based on actual run times of batch processes from legacy environment.
In testing, 200% of the equivalent legacy workload was used and the database was artificially ‘aged’ to give it the appearance of containing two years worth of data.
Oracle 10g database flashback used to reproduce tests. Large full table scan used to clear out the Oracle db cache
and cache on storage array cache To prevent results from being skewed when repeating the
same test again after making a performance optimization.
60
Test Work Load
Apart from the processing of flat files, most jobs process between 120,000 and 180,000 jobs.
Few reference will be made to this in the presentation, in that what we refer to as a ‘Job’ will have little meaning to other people unless they are using the same application.
However, there is a consensus that a ‘job’ is something that requires a discrete set of actions to be performed against it in order to be processed.
61
Hardware and Software Platforms
IBM WebSphere application server 6.1 base edition 32 bit. Oracle Enterprise Edition 10.2.0.4.0 (10g release 2). Solaris 10. 1 x 4 CPU (single core) Fujitsu Siemens Prime Power 450
with 32Gb Ram to host database. 1 x 4 CPU (single core) Fujitsu Siemens Prime Power 450
with 32Gb Ram to host application server. 100Mb Ethernet network. EMC CX3-20F storage array for database accessed via
fibre channel.
62
Hardware and Software Platforms
EMC CX3-20F storage array for database accessed via fibre channel, with:- Two Intel Zeon based storage processors Two trays of disks, with 15 disks per tray. 1Gb cache.
63
EMC CX3-20F Configuration Despite being ‘Batch’ oriented, from a database perspective, the ratio of
logical reads to block changes is 92%. Some people dislike RAID5, we however, think it is perfectly suitable for
read intensive work loads:- i.e. spread the database files across as many disks as possible. Some disks will be lost to EMC vault disk usage.
Raid 1 was used for the redo logs and archived redo log files. Cache on the array was split 50/50 between read and write usage as
per EMC recommended best practice. The size of the database in terms of application segments was
approximately 25G, not that large really.
64
Database Statistics A classical approach to ascertaining application scalability
is to look at resource consumption, latching in particular. Refer to Tom Kyte’s runstats package. The main problem with this was:-
Flashing the database back between tests would result in the loss of any resource consumption data loaded into a table.
Runstats is designed for capturing statistics within a single Oracle session.
This information could be written to a file, but this would result in expending effort in developing such a tool.
Fortunately, Oracle 10g provides an out of the box solution to this in the form of the db time model . . .
65
Database Statistics
What is db time ? A statistic that comes with the 10g performance management
infrastructure. Sum total of time spent in none idle database calls by
foreground processes across all sessions. !!! Not to be confused with “wall clock time” !!!. Provides a single high level metric for monitoring database
utilisation Higher db time = high database utilisation.
Makes tuning ‘simply’ a matter of reducing db time. Refer to this presentation from the architect at Oracle who
invented this.
66
Monitoring And Tuning The Software
So as not to be drowned by statistics, the following high level statistics were chosen for monitoring purposes:- Oracle CPU usage Oracle database time Average database load session WebSphere application server CPU usage
67
Database Statistics
Database load is a 10g statistic that usually accompanies db time, but what is this ? Active sessions as reported by the 10g Automatic Database
Diagnostic Monitor Is calculated by db time
wall clock time Higher average database load = greater database utilisation. High database utilisation = good throughput from application
server. Low database utilisation = some bottleneck in the application
server is bottle necking throughput through to the database.
68
How The db time Model Should Help
If to begin with, the CPU usage on the application server is high and the db time expended in the database low, this would imply some sort of bottleneck in the database.
If a bottleneck is addressed in the application server and db time goes up, methods for reducing the db time should be looked at.
69
Identifying Performance Bottlenecks
How do we know where the bottleneck is ?:- The Tivoli Performance Viewer EJB Summary report
is a good place to start. In the example screen shot on the next slide, the
total time expended by the batch manager session bean can be compared to the sum total time expended by the dbaccess module beans.
Separate beans for accessing the database not only separates the integration layer access from the business logic, but helps with performance tuning.
70
Identifying Bottlenecks
71
Identifying Bottlenecks
From the screen shot on the previous slide(ScheduleManager is not associated with the batch processes) batch manager bean time = 429,276,448 time spent in dbaccess beans = 1,737,440 Db access time as a % total = 0.40% The bottleneck might be on the application server !!!.
There is also an EJB method summary report for drilling down further.
72
The ‘Carrot’ Model
Documents the thread usage in a J2EE application servers generic components:- HTTP Server Web Container EJB Container (driven by the number of active ORB
threads) JDBC Connection Pool Database
73
The ‘Carrot’ Model
Typically, utilisation should be high towards the ‘front’ of the application server (HTTP server) and gradually dwindle of towards the end (database). Hence the ‘carrot’ analogy, unless the
application architectures is similar to the Microsoft Pet Store .Net versus J2EE benchmark, i.e there is little business logic outside the database.
74
The ‘Carrot’ Model
In summary, most of the load on the software stack will be carried by the J2EE application server.
Measuring the CPU on both the J2EE application and Oracle database servers, will show how well the ‘Carrot’ model applies to our architecture and design.
75
The ‘Carrot’ Model
0
20
40
60
80
100
120
140
160
180
200
Threads Used
HTTP Server WebContainer
ORB threads JDBCConnection
Pool
DatabaseSessions
Component
J2EE Component Utilisation
76
Software Configuration Base Line
77
Oracle Initialisation Parameterscommit_write BATCH, NOWAITcursor_sharing SIMILARcursor_space_for_time TRUEdb_block_size 8192db_flashback_retention_target 999999log_archive_max_processes 4open_cursors 65535optimizer_index_cost_adj 100optimizer_dynamic_sampling 1optimizer_index_caching 0pga_aggregate_target 4294967296processes 500query_rewrite_enabled TRUEsession_cached_cursors 100sga_max_size 5368709120sga_target 4697620480statistics_level TYPICALundo_management AUTOundo_retention 691200undo_tablespace UNDOworkarea_size_policy AUTO
78
WebSphere Configuration Server JVM
-server -Xms2000m –Xmx 2500m
Client JVM -client -Xms200m –Xmx500m
JDBC Connection Pool Min connections 100 Max connections 100
ORB configuration Min threads 100 Max threads 100 JNI reader thread pool set to 100 Fragment size set to 3000
79
Application Configuration
Threads per batch process 100
Jobs per thread 100
Log4j logging level INFO
80
Notes On Oracle Parameter Settings
Cursor management has a major impact on the scalability of applications that use Oracle
With this in mind cursor_sharing, session_cached_cursors and cursor_space_for_ time have all been explicitly set.
“Designing applications for performance and scalability” has some salient points regarding these parameters which will be covered in the next few slides.
81
Notes On Oracle Parameter Settings
A separate JTS transaction per each job results in heavy usage of the Oracle log buffer and its associated synchronization mechanisms. The redo allocation latch is a unique point of
serialisation within the database engine. Therefore the log buffer needs to be used with care.
Asynchronous and batched commit writes were introduced for this purpose. Helps to prevent waits due log file sync waits.
82
Tuning
83
Disclaimer Tuning efforts of different projects will yield different results from those
detailed here due to differences in the :- Software stack component versions, e.g. using Oracle 10.1 and not
10.2, WebSphere 6.0 or 7.0 and not 6.1, 64 bit WebSphere and not 32 bit.
Software stack component vendors, e.g. you may be using Weblogic or JBoss and DB2 instead of Oracle
J2EE application server and database server topology J2EE and database initialisation parameters Application architecture design and coding Server hardware Data Etc . . .
84
Disclaimer
Despite all the reasons as to why your results might vary from those presented, the technical precepts behind what has been done should hold true for more than just the application tested here.
85
A Note On The Results
The tuning efforts made were mainly focussed on tuning the software stack from an environment perspective.
In practise there were a lot more ‘tweaks’ made than those presented here.
The optimisations made have been distilled down to those which made the greatest impact.
Despite this the biggest performance and scalability gains often come from:- The architecture The design Coding practises used
86
A Note On The Results
The next set of findings relate to the most ubiquitous type of batch process in our software.
This is a batch process that:- retrieves a list of jobs from the database. partitions jobs into ‘chunks’. invokes beans in the application server via child
threads with these ‘chunks’ attached as objects.
87
Finding 1: pass by copy overhead Symptom
db time, database load and CPU utilisation on the database server were all low.
CPU utilisation on the application server at 100%. Root cause
database access beans invoked by remote method calls. Action
set pass by reference to ‘On’ on the Object Request Broker. Result
Elapsed time 01:19:11 -> 00:41:58 WebSphere CPU utilisation 96% -> 66% Db time / avg sessions 23470 / 4.1 -> 40071 / 14.5
88
Finding 2: threading
Symptom high db time and database load high CPU time attributed to
com.ibm.ws.util.ThreadPool$Worker.run method (visible via Java profiler).
Root cause batch process threading set to high, 100 threads for
4 CPU boxes !!!.
89
Finding 2: threading
Action lower number of threads, optimum between 16 and
32 depending on the individual batch process.
Result (threads 100 -> 32) Elapsed run time 00:41:58 -> 00:36:45 Db time / avg sessions 40071 / 14.5 ->
21961 / 8.9 WebSphere CPU utilisation 66 % -> 73 %
90
Finding 3: db file sequential read over head
Symptom “db file sequential read event” = 73.6% total call
time. Root cause
job by job processing = heavy index range scanning. Action
compress most heavily used indexes. Result
Elapsed run time 00:36:45 -> 00:36:38 Db time / avg sessions 21961 / 8.9 -> 9354 /
3.6 WebSphere CPU utilisation 73 % -> 74 %
91
Finding 4: Physical read intensive objects
Symptom ADDM advised that there were physical read intensive objects
Root cause With a batch process same data is rarely read twice, except for
standing / lookup data. Action
‘pin’ hot objects into a ‘keep’ area configured in the db cache Result
Elapsed run time 00:36:38 -> 00:26:36 Db time / avg sessions 9354 / 3.6 -> 4105 / 2.3 WebSphere CPU utilisation 74 % -> 87 %
92
Finding 5: Server JVM heap configuration and
Symptom major garbage collections take place one a minute.
Root cause heap incorrectly configured.
Action tune JVM parameters.
Result Elapsed run time 00:26:36 -> 00:25:01 Db time / avg sessions 4105 / 2.3 -> 3598 /
2.4 WebSphere CPU utilisation 87 % -> 86 %
93
Finding 5: Server JVM heap configuration and
The most effective JVM parameter settings were found to be those used by IBM in a WebSphere 6.1 bench mark on Solaris submitted to the SPEC.
Resulted in one major garbage collection every 10 minutes.
Minimum heap size=2880 MB Maximum heap size=2880 MB initialHeapSize="2880" maximumHeapSize="2880" verboseModeGarbageCollection="true -server -Xmn780m -Xss128k -XX:-ScavengeBeforeFullGC -XX:+UseParallelGC -XX:ParallelGCThreads=24 -XX:PermSize=128m -XX:MaxTenuringThreshold=16 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseParallelOldGC
94
Finding 5: Server JVM heap configuration and
Usage of the JVM configuration from the IBM bench mark came after a lot of testing and experimentation via trial and error.
The Sun JVM tuning material supports this approach.
The heap is probably oversized for our requirements, but for a “first cut” at getting the configuration correct it is not a bad start.
95
Finding 6: Client JVM heap configuration and ergonomics
Symptom major garbage collections take place more than
once a minute. Root cause
heap incorrectly configured. Action
tune JVM parameters. Result
Elapsed run time 00:25:01 -> 00:24:20 Db time / avg sessions 3598 / 2.4 ->
3704 /2.5 WebSphere CPU utilisation 86 % -> 86 %
96
Finding 6: Client JVM heap configuration and ergonomics
Client JVM configurationJVM Options: -server -Xms600m -Xmx600m -XX:+UseMPSS -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:MaxTenuringThreshold=3 -XX:SurvivorRatio=2 -Xss128k -Dcom.ibm.CORBA.FragmentSize=3000 -Dsun.rmi.dgc.client.gcInterval=4200000 -Dsun.rmi.dgc.server.gcInterval=4200000 Server diagnostic trace turned off
97
Finding 6: Database Block Size Symptom
Significant latching around the db cache. Root cause
Block size too small. Action
Increase block size from 8 to 16K. larger block size = fewer index leaf blocks = less index branch
blocks = smaller indexes = less physical and logical IO, less logical IO = less latching
Result Elapsed run time 00:24:20 ->
00:21:25 Db time / avg sessions 3704 / 2.5 -> 2623 / 2 WebSphere CPU utilisation 86 % -> 93 %
98
Finding 7: JVM aggressive optimizations
Symptom No symptom as such, load still on the application
server. Root cause
N/A Action
Further experimentation with the server JVM options resulted in aggressive optimizations being used.
Result Elapsed run time 00:21:25 ->
00:18:36 Db time / avg sessions 2623 / 2 -> 2516 / 2.1 WebSphere CPU utilisation 93 % -> 85 %
99
Finding 7: JVM aggressive optimizations
AggressiveOpts has to be used with -XX:+UnlockDiagnosticVMOptions -XX:-EliminateZeroing, otherwise the application server would not start up !!!.
The following excerpt from the Java Tuning White Paper should be heeded:-
“Enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread which first acquires its monitor via a bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.”
100
A Note On The Results
The other type of batch process in our software involved the reading and writing to files after the contents of files / database tables had been validated against standing data.
This type of batch process was highly ‘Chatty’ by design.
101
Tuning Finding: ‘Chatty’ Batch Process Design
Symptom Low CPU usage on WebSphere server. Low CPU usage on the database server.
Root cause Oracle stored procedure called to validate each record field in
files records being read and written, performance death by network round trips !!!!!!!!!.
Action Modify code to perform validation using pure Java code
against standing data cached within the application server. Results
See next slide
102
Tuning Finding: ‘Chatty’ Batch Process Design
Finding 3: excessive calls to Oracle stored procedures Results
Validation Method
Lines In File Threads Run Time(mm:ss)
% Improvement Over PL/SQL WebSphere
CPUOracle CPU
PL/SQL
15000
8
02:18 NA 68 60
Java
01:31 34% 77 68
4 01:48 24% 51 56
103
Other Findings
With some batch processes “cursor: pin S” wait events were observed, this accounted for up to 7.2% of total call time.
Investigating this alluded me to the fact that in 10.2.0.3.0 onwards the library cache pin had been replaced by a mutex.
In 11g even more of the what were library cache latches have been replaced with mutexes.
Notable, because one of the ways of comparing the scalability of different tuning efforts is to measure and compare latching activity.
104
Tuning Results Summary
105
Types Of Batch Processes
The following graphs illustrate capture the following statistics for an ‘atypical batch process” that has had all the tuning recommendations applied:- the average percentage CPU usage db time elapsed time
106
Batch Elapsed Time
0
200
400
600
800
1000
1200
4 8 16 32
Threads
Tim
e (s
)
Batch Elapsed Time
107
Batch DB Time
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32
Threads
db
tim
e
Batch DB Time
108
Batch average db load
0
1
2
3
4
5
6
7
4 8 16 32
Therads
av
era
ge
db
lo
ad
in
se
ss
ion
s
average db load
109
Server % CPU Utilisation / Thread Count
0
10
20
30
40
50
60
70
80
90
4 8 16 32
Threads
Tim
e (
s)
% Database CPU Usage
% App Server CPU Usage
110
Critique Of Tools Used
111
Critique Of Tools Used
Oracle 10g dbtime model This worked very well for measuring to the database
utilisation. It does not however give any indication of how heavy
utilisation is compared to the total capacity that the database tier can provide.
Both the Oracle diagnostics and tuning packs need to be licensed in order to use the tools that accompany the time model, namely the ADDM and workload repository.
These extra options are not cheap. The “ASH Masters” provide a low cost alternative to the 10g
performance infrastructure.
112
Critique Of Tools Used
JProfiler (Java profiler) Provides detailed information on:-
Heap usage Thread lock monitor usage CPU usage, at method, class, package and bean level. JDBC usage. CPU profiling with drill downs all the way to JDBC calls. JNDI lookup activity.
Worked well for:- highlighting the RMI pass by object over head Diagnosing an issue earlier on whereby a ‘singleton’ object was
being created thousands of times resulting in excessive CPU usage and heap usage.
113
Critique Of Tools Used JProfiler:-
Used on the grounds that:- It was extremely easy to configure Attached to the JVM of WebSphere 6.1 Other products were more suited to JSE program profiling Some profilers could not attach to the WebSphere JVM, or could,
but not that of version 6.1 Other profilers came with unwieldy proprietary IDEs that we did
not require
Had a 100% performance overhead on the application server and should therefore not be used on production environments.
Kill -3 can be used to generate thread dumps, the “Poor mans profiler” according to some, this is much less intrusive than using a full blown Java profiler.
114
Critique Of Tools Used
Tivoli Performance Monitoring Infrastructure (PMI) Comes with a number of summary reports, the EJB report of
which was particularly useful. If too many data points are graphed, the PMI viewer can
become painfully slow. Turning some data points on can have a major impact on
performance. One project member used the
WebSphere PerfServlet to query PMI statistics and graph them using big brother and round robin graphing.
115
Critique Of Tools Used
WebSphere performance advisor Only useful information it provided was regarding the
turning off of the diagnostic trace service.
Relies on PMI data points being turned on in order to generate ‘Useful’ advice.
Turning some data point on can have a detrimental affect on performance, to reiterate what was mentioned on earlier slides.
Perhaps more useful when running WebSphere with the IBM JVM, as this is more tightly integrated into the performance monitoring infrastructure than the Sun JVM.
116
Conclusions
117
Bottlenecks In Distributed Object Architectures
This alluded to Martin Fowler’s "First law of distributed object architectures".
If remote interfaces are used and beans are deployed to a WebSphere application server in a single node configuration, the pass by copy overhead is still considerable.
118
Bottlenecks In Distributed Object Architectures
WebSphere application server provides a “quick win” for this situation in the form of the object request broker pass by reference setting. !!!! CAUTION !!!! This should not be used when the
invoking beans assume that they can use these objects passed by reference without the invoked beans having altered the received object(s).
For scale out, prefer shared nothing architectures as per this article from Sun.
WebSphere Network Deployment uses a shared nothing architecture.
119
Tuning Multi Tiered Applications
When multiple layers and tiers are involved an all encompassing approach needs to be taken to tuning the software stack:- Tuning the database in isolation may not result in the
performance and scalability goals being met. Tuning the J2EE application in isolation may not result in
the performance and scalability goals being met. Refer to
"Why you cant see your real performance problems" by Cary Millsap.
120
Tuning Multi Tiered Applications
The bottlenecks needs to be identified and targeted wherever they exist in the application stack.
A prime example of this is that the impact of database tuning would have been negligible had the pass by copy bottleneck not been addressed.
121
Threading
A given hardware platform can only support a finite number of threads.
There will be a “sweet spot” at which a given number of threads will give the best throughput for a given application on a given software stack.
Past a certain threshold, the time spent on context switching, thread synchronization and waiting on contention within the database, will result in diminishing returns from upping the thread count.
122
Avoid ‘Chatty’ Designs
‘Chatty’ ??? Yes, designs that can result in excessive chatter
between certain components. This can be particularly bad when there is a network
involved. “Design and Coding Applications for Performance and
Scalability” by IBM recommends putting processing closest to the resource that requires it (section 2.5.9).
123
Avoid ‘Chatty’ Designs
A subtly different angle on this is that ‘Chatty’ designs should be avoided:- Specifically, avoid designs that and incur frequent
network round trips between the database and the application server.
Tuning finding 3 supports this.
124
Avoid ‘Chatty’ Designs
Low CPU consumption on both the application server and database servers could be a sign of ‘Chatty’ software. i.e. excessive calls to the database, thus making
network round trips the bottleneck. Perform processing exclusively within the
application server where possible, but not when there are database features available specifically for carrying this work out.
125
Avoid ‘Chatty’ Designs
Operations that involve significant bulk data manipulation should be done in the database.
Always look to minimise network round trips by leveraging:- Stored procedures Array interfaces, both in Oracle and the JDBC API Tuning the JDBC fetch size In line views Merge statements Sub query factoring SQL statement consolidation
126
Avoid ‘Chatty’ Designs
‘Chatty-ness’ can be a problem within the application server also:- There are two vertical layers of domain (business) logic within
the application which are invariably called together. These could be consolidated into one layer with the benefit
of:- Code path length reduction Allowing for SQL statement consolidation
Not addressed to date as all of our performance goals have been achieved without having to carry this work out.
127
JVM Tuning The Java Virtual Machine is a platform in its own right,
therefore it deserves a certain amount of attention when it comes to tuning.
When using the Sun JVM, use the appropriate garbage collection ‘Ergonomics’ for you application.
As per some of Sun’s tuning material, there can be an element of trial and error in JVM tuning.
Use verbose garbage collection to minimise major garbage collections.
Look at what tuning experts have done on your platform in the past to get ideas. www.spec.org is not a bad place to look as per the example used in this material.
128
Row by Row Processing Scalability and Performance
There was great concern over the row by row access to the persistence layer. However, a bottleneck is only an issue if it prevents
performance goals from being achieved. It would be interesting to find the level of application
server through put required to make the database become the bottleneck.
This would require more application server instances, i.e. WebSphere network deployment.
129
Is The Database The Bottleneck ? db time does not help when measuring resource
usage and time spent in the database relative to the total available capacity.
However, as we have gone from 14.4 to 2.5 in terms of average database load (db time / elapsed time), we can infer that:- An average load of 2.5 sessions suggests that the
database is not the bottleneck. There is ample spare resource capacity on the
database tier. This conforms with the ‘Carrot’ model.
130
Is The Database The Bottleneck ?
Parsing was raised as a concern, the % None-parse CPU on the “Automatic Workload Repository” excerpt on the next slide will dispel this.
This report was captured whilst running an atypical batch process with all the tuning changes applied and 32 threads.
The “Parse CPU to parse elapsd” ratio is not too optimal, however as the % Non-Parse CPU is quite small, this is not a major concern.
131
Is The Database The Bottleneck ?
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 99.33 In-memory Sort %:
100.00
Library Hit %: 99.99 Soft Parse %: 99.99
Execute to Parse %:
91.14 Latch Hit %: 99.91
Parse CPU to Parse Elapsd %:
24.76 % Non-Parse CPU:
94.13
132
There Is Always A Bottleneck
In all applications there are always performance and scalability bottlenecks. A J2EE application server will usually be bound by CPU
capacity and memory access latency from a pure resource usage point of view.
A relational database will usually be constrained by physical and logical IO.
In the J2EE world where a database is used for persistence, tuning will involve moving the bottleneck between the application server and the database.
133
Useful Resources
IBM resources Designing and Coding Applications For Performance
and Scalability in WebSphere Application Server WebSphere Application Server V6 Performance and
Scalability Handbook IBM WebSphere Application Server V6.1 on the Sol
aris 10 Operating System
134
Useful Resources
IBM WebSphere Compute Grid resources WebSphere Extended Deployment Compute Grid Executing Batch Programs In Parallel With WebSphere
Extended Deployment Compute Grid Compute Grid Run Time Compute Grid Applications Swiss Re Use Of Compute Grid Compute Grid Discussion Forum
Links provided courtesy of Snehal Antani of IBM.
135
Useful Resources
Sun Resources Albert Leigh’s Blog Dileep Kumar's Blog Scaling Your J2EE Applications Part 1 Scaling Your J2EE Applications Part 2 Java Tuning White Paper J2SE and J2EE Performance Best Practices, Tips
And Techniques
136
Useful Resources
Oracle Resources Oracle Real World Performance Blog 360 Degree DB Programming Blog Oracle Technology Network JDBC Resources Designing Applications For Performance And Scalability - An
Oracle White Paper Best Practices For Developing Performant Applications
137
Useful Resources
Other resources Standard Performance Evaluation Council
jAppServer 2004 Results JProfiler