Performance Tuning Mainframe
Applications“It’s Not So Hard”
Tony Shediak
Ampdev Pty Ltd
Compuware User Conference 2007
Why is Performance “Painful”• Significant Lack of Mainframe Skills – and dropping. Performance and Competence go together
• Little Focus (and budget) on Mainframe Training. “Management believes every mainframe programmer is automatically competent in COBOL” – it’s only COBOL right ? What about VSAM, CICS, DB2, IMS etc… If it is on your resume then surely you must know it………
How many programmers can read a DUMP ?
• Lack of Performance disciplines, standards and Tools usage, and more importantly - their ENFORCEMENT. Most programmers will NOT care unless you MAKE THEM CARE.
• All the focus is on Cost-cutting, Functionality, Change Control, Audit compliance, processes, processes and more processes. But
we are Not stopping the rubbish getting into Production.
The COBOL experiment• Setup Anonymous COBOL Skills Test
• Large organisation (still a client , can’t name them - before 6 beers)
• 150+ COBOL programmers
• Out of 100%:
• Lowest Mark 4%
• Highest Mark 93%
• Average Mark
• Weakest areas – Mainframe fundamentals, Data Types and Indexes
vs Subscripts
• Remediation – assign and train COBOL practice lead to raise skill level by running internal half-day COBOL skills work-shop and half day COBOL Dumps work-shop
28%
The Usual Suspects• Inefficient use of the programming language data types
• Data conversions caused by mixing data types unnecessarily
• Inefficient compiler options
• Inefficient initialisation of large structures/groups
• Over-Use of built-in functions or program language constructs that generate subroutine calls
• Inadequate VSAM buffering for the required function
• Long-running jobs processing several large databases randomly in a single step
• Inefficient Date/Time processing
• Using SQL to process tables like files, record by record
• Using overly complex or inefficient SQL
• Over-qualifying IMS DL/I calls
• Inefficient file block size for the device
• PL/I Onkey condition – this is very expensive since Language Environment because the condition handling architecture employed by LE. A Cobol “key not found” condition is about 7 times faster than a PL/I one.
• Re-reading small files/databases/reference data over and over rather than loading into program storage. Don’t be afraid to use storage – Ent COBOL 3.4+ has extended WS limit to 134MB.
• Not using the most optimal utility for the job, e.g. SORT COPY vs REPRO.
Programming Efficiently• Understanding Data Types
• Binary – integers, loop control, subscripts etc - fast.
• Packed Decimal – fractions, money
• Floating point – large range of numbers
• Subscripting Tables/Arrays - If a subscript is not binary then it will be converted to binary whether you like it or not.
• COBOL Indexes – optimised sequential array processing. But how many programmers actually understand this ?
• Use optimal compiler options as much as possible
• Utilise the “LIST” compile option and browse the assembler
• Avoid heavy INITIALIZE as it does it one element at a time
Compile with “LIST”• Utilise the “LIST” compile option and browse the assembler and
look for CVB, CVD, PACK, BALR etc
E.g Find all occurrences CVB :COMMAND INPUT ===> F CVB word all
.
.
008119 MOVE
002CB0 F272 A9B8 34F4 PACK 2488(8,10),1268(3,3) TS2=0
002CB6 960F A9BF OI 2495(10),X'0F' TS2=7
002CBA 4F60 A9B8 CVB 6,2488(0,10) TS2=0 002CBE 4C60 53C4 MH 6,964(0,5) PGMLIT AT +44
002CC2 1A62 AR 6,2
002CC4 D202 6CBD 4EEF MVC 3261(3,6),3823(4) LKG6K-CNTYCODE()
412 WORD 'CVB'
Built-in Functions• Use them wisely. Check the “LIST” compile and see if a
subroutine call is generated, e.g. IGZCSTG for STRING andIGZCIN1 for INSPECT.
• In a lot of cases the “do it yourself code” is simple and far moreefficient. ** concatenate AAA (up to but not including the first blank) with all of BBB into DDD*
STRING AAA DELIMITED SPACE subroutine call because of search for space BBB DELIMITED SIZE INTO DDD.
* Do it yourself
PERFORM VARYING I FROM 1 BY 1 UNTIL I > LENGTH OF AAA much more efficient to do it OR AAA(I:1) = SPACE yourself – no subroutine call END-PERFORM. this code is
65% more efficient COMPUTE LEN-AAA = I – 1 than using STRING
MOVE AAA(1:LEN-AAA) TO DDD(1:LEN-AAA). MOVE BBB TO DDD(LEN-AAA + 1:LENGTH OF BBB).
Built-in Functions• But sometimes the built-in function can be more efficient. The
INSPECT converting with BOTH 2nd and 3rd arguments as constants will generate a TR (Translate) machine instruction.
** Change ALL ‘*’ to SPACE and leave everything else as is*
INSPECT AAA CONVERTING ‘*’ TO SPACE. This code is 90% more efficient than ‘Do it Yourself’
** Do it yourself*
PERFORM VARYING I FROM 1 BY 1 UNTIL I > LENGTH OF AAA
IF AAA(I : 1) = ‘*’
MOVE SPACE TO AAA(I : 1)
END-IF END-PERFORM.
VSAM Buffering• NSR - Good for Sequential access
• Read ahead
• One set of buffers per file
• LSR – Good for Random access
• No read ahead
• Buffers can be shared by several files
• SMB – System Managed Buffering
• Enabled by SMS Dataclas
• Allocates NSR or LSR buffers depending on how opened.Watch out for DYNAMIC opens (Direct vs Sequential ?)
• Makes JCL simple
Good Redbook – “Vsam Demystified”
VSAM Buffering – Use IAM
• IAM (Innovation Access Method) is a 3rd Party Product that intercepts (transparently) VSAM I/O and uses its own optimised Access Method with its own internal data structures.
• Default Buffering more than caters for most processing requirements with little (if any) tweaking required for most jobs.
• JCL is kept simple and for the most part unchanged
• Significant CPU savings (30% + ) as installed with all default settings
IMS Considerations• Are your programmers well skilled/trained in IMS Application
programming ? This is the first hurdle
• Understand your Data before you do anything else
• Over-qualifying SSAs is expensive – simple test program shows18% CPU reduction by minimal qualification
• Avoid single step processing with heavy random access to several large databases. It is best to split the processing into several steps (extract – sort – process) per large DB
• When processing HDAM/DEDB databases it is most efficient if the driving input is sorted in the same physical sequence as the database - RAPSORT.
• Load Heavily hit Reference data into IMS MPRs.
CICS Considerations• Use Dynamic CALL/FETCH of subroutines rather than CICS
LINK (as this creates LE Enclave). Ent PL/I has removed FETCH restrictions.
• Use THREADSAFE Progs with DB2 – can save 5 to 15% CPU by minimising TCB switching. But do your homework – read redbook “Threadsafe Considerations for CICS”
• Only turn CICS trace on when you need it. Otherwise turn off – save you about 3%
• Check your STROBE report for LSR buffer hits, LE Heap/Stack allocation. There is more storage available these days so why not use it – Bump up the default LE parms if you need to.
• Use VSAM Data Tables to reduce I/O
DB2 Considerations• Understand your Data before you do anything else
• Watch out for the “record-oriented” SQL approach:Open Cursor AFor Every A Row
Open Cursor BFor Every B row….EndClose Cursor B
End
• Simple can be effective – A tablespace scan is very fast as long as you do one pass.
• Learn to use EXPLAIN – even at a basic level you will pick up easy fix issues.
Date/Time Processing• Use COBOL/PLI functions instead of DB2
• DB2 Date/Time Arithmetic is expensive – so use it wisely
• LE Date/Time Arithmetic better than DB2 but still expensive
• In-house written functions for Date/Time validation and Arithmetic is by far the best performer
• When Developing in-house routines:
• Utilise internal tables as much as possible to store constant information rather than derive it each time. For example, Leap year indicator can be stored rather than calculating each time – storage is not so much an issue these days
• Make your routine “Reducible” – i.e. if input parameters are exactly the same as the last invocation then return the last saved output parameters
Example 1 – Compiler Options
Description: Subroutine performing name/address abbreviation
Language: COBOL
Performance
problems identified: Using PIC 999 display data for arithmetic operations and array subscripts. Compiled with SSRANGE and TRUNC(BIN) - Pre COBOL V2.2
How identified: Compile listing and STROBE
Tuning applied: Change all display data to BINARY, as all are integer. Compile with NOSSRANGE and TRUNC(OPT)
Effort required: 4 hours
Performance
improvement: 70% CPU reduction
Description: Program performing account range validation via a VSAM(IAM) ksds by checking the range on each record against all subsequent records
Language: PL/I
Performance
problems identified: Program issuing over 300M logical I/O’s even though file contains 30,000 records only
How identified: IAM I/O report
Tuning applied: load file into an array allocated above the 16M line to eliminate further I/O’s. re-work checking algorithm to eliminate multiple passes through each entry
Effort required: 3 days
Performance
improvement: 99.97% CPU reduction. Cpu secs dropped from 2000 to 0.5
Example 2 – Lots of Logical I/O
Description: Expensive CICS Tran used to create web drop-down list
Language: COBOL
Performance
problems identified: DB2 Table containing relatively static reference data has a very high hit rate
How identified: STROBE report
Tuning applied: Load data into CICS region using GETMAIN SHARED and refresh every hour.
Use ENQ/DEQ to serialise updates to storage to allow program to run as THREADSAFE
Effort required: 2 days
Performance
improvement: 99% CPU reduction.
Example 3 – High Hit Static Reference Data
Description: DB2 Program performing rate reporting
Language: PL/I
Performance
problems identified: SQL used in a record-oriented approach e.g. Open Cursor A For every A row Open Cursor B For every B row Open Cursor C …etc End Close Cursor B End Close Cursor A
How identified: STROBE report and eyeballing the source
Tuning applied: Rewrite SQL to utilise table JOINS (specifically Left outer join in this case) and create and process a single cursor, so DB2 does the work once
Effort required: 2 days
Performance
improvement: Elapsed time dropped from 12 hours to 3 mins
Example 4 – Record Oriented SQL
Description: Program performing specialised data extract across VSAM files
Language: COBOL
Performance
problems identified: using default buffering (2 data and 1 index). using NSR but most of the processing is random
How identified: STROBE report
Tuning applied: Change file open mode from dynamic to RANDOM. Enable SMB(System Managed Buffering) by adding dataclas to define cluster and reorganising to allow SMB dataclas to apply. Because the file mode is random SMB will use LSR buffering. Note that alternatively we could have left the file open mode as dynamic and added AMP=’ACCBIAS=DO’ to the JCL to force LSR
Effort required: 1 hour
Performance
improvement: 86% CPU reduction. 95% Elapsed time reduction
Example 5 – VSAM Buffering and SMB
Description: Subroutine performing name and address compression into a cross-reference key
Language: COBOL
Performance
problems identified: using complex INSPECT, STRING extensively
How identified: STROBE report and eyeballing source code
Tuning applied: Total rewrite of code eliminating ALL STRING and INSPECT functions and replacing them with simple iterative search loops and sub-string manipulations
Effort required: 6 days
Performance
improvement: 75% CPU reduction
Example 6 – Compiler Generated Subroutine Calls
Description: Program performing specialised data extract
Language: PL/I
Performance
problems identified: many compiler generated subroutine calls due to manipulation of UNALIGNED bit strings in a STRUCTURE and various data conversions caused by mixing data types
How identified: STROBE report and compile listing
Tuning applied: Changed all unaligned bit strings within STRUCTURES to aligned. Eliminated all other data conversions requiring subroutine calls. 70 compiler generated subroutine calls were eliminated from the object code
Effort required: 1 day
Performance
improvement: 65% CPU reduction
Example 7 – Data Conversions
Description: DB2 Program performing Online Query
Language: COBOL
Performanceproblems identified: SQL NOT utilising Table Index in Join because of mis-matching data types
hence causing Tablespace Scan: SELECT P.PROD_CD
. . FROM (SELECT A.PARAMETER_NUM_WHLE AS PROD_CD . DEC(15) . ) AS P INNER JOIN P.SD700T00 AS S DEC(4)
ON S.SD700_PRODUCT_CODE = P.PROD_CD
How identified: STROBE report and EXPLAIN or ISTROBE
Example 8 – Bad SQL - 1
Tuning applied: Use CAST function to Convert to Correct data type :
SELECT P.PROD_CD . .
FROM (SELECT CAST(A.PARAMETER_NUM_WHLE AS DECIMAL(4)) AS PROD_CD . . ) AS P INNER JOIN P.SD700T00 AS S
ON S.SD700_PRODUCT_CODE = P.PROD_CD
Now DB2 uses Index on SD700T00 Table. Tablespace scan is gone
Effort required: 1 Hour
Performance
improvement: 50% CPU reduction
Example 8 – Bad SQL - 2
Description: DB2 Program performing Batch Query
Language: COBOL
Performanceproblems identified: SQL performing entire Index Scan and minor sort unnecessarily:
SELECT DISTINCT 1 Sort Unique (DISTINCT)
to eliminate multiple rows
FROM LM135T00 WHERE LM135_RACFID = :LM135-RACFID OR SUBSTR(LM135_RACFID,2,7) = :LM135-RACFID
Index Scan with NO matching Cols. But Silly to use SUBSTR(..2,7) because data tells us that the first
character can only be a ‘*’ anyway.
e.g. we are looking for ‘X123456’ OR ‘*X123456’
How identified: STROBE report and EXPLAIN or ISTROBE
Example 9 – Bad SQL - 1
Tuning applied: Understand the data and hence eliminate Index Scan by searching for only valid
combinations: ASTER-LM135-RACFID = ‘*’ || LM135-RACFID
SELECT 1
FROM LM135T00 WHERE LM135_RACFID IN (:LM135-RACFID, :ASTER-LM135-RACFID) FETCH FIRST 1 ROW ONLY
Removes problem of multiple rows returned, hence we don’t need the Distinct.
e.g. we are still looking for ‘X123456’ OR ‘*X123456’
Effort required: 2 Hours
Performance
improvement: 98% CPU reduction
Example 9 – Bad SQL - 2
Description: Program performing data extract
Language: COBOL
Performance
problems identified: High volume initialisation of large table
How identified: STROBE report and compile listing
Tuning applied: Removed initialisation as it was not really required, the program was already keeping track of the number of elements via a counter anyway
Effort required: 2 hours
Performance
improvement: 30% CPU reduction
Example 10 – Initialising a Large Table
The AAPT Story – From 650 to 400 MIPS in 15 Months
• 35 initiatives implemented
• Mostly Bad SQL or COBOL or both. Lots of code changed
• Some quick wins – Package REBINDs, Modifying or adding Index, using IEBGENER (SORT COPY) instead of REPRO, Changing Job schedules to run less often, and Web front-end bug initiating CICS Tran too many times
• CICS Threadsafe implemented with 15% CPU savings across Region. Programs targeted by heavy SQL execution.
• Upgraded DASD sub-system significantly helped I/O and gave more room to reduce MIPS
• Test Smart vs Test Hard, e.g. if only SQL was changed then only SQL was tested
• Strong Senior management support was given
Make Performance Part of the Culture• Training, training and more training (can be internal)
• Establish mentoring programs
• Implement Practice leadership for your key technical areas,
e.g. COBOL, DB2, CICS, VSAM/IAM, IMS etc. These people should:
• Own and Set Standards - and have the authority to enforce
• Give Regular technical updates/presentations
• Consult to Apps Teams
• Investigate New features/versions and their benefit
• Be the ultimate authority in that area and have full
management support – otherwise pointless
Make Performance Part of the Culture• Establish Internal Performance Team – ideally Practice leads would be in this team. Responsibilities:
• Regularly monitor M/F health using STROBE, SMF reports etc and check out new software & OS features
• Work with Apps Teams to fix issues
• Constantly look for performance opportunities
• Implement and Enforce Performance Management as part of the Software Development/Maintenance methodology and processes. E.g. MUST provide STROBE report for New/Modified Programs as a deliverable before approval to Production. -- Criteria can be set to exclude low volume and inexpensive trans and trivially quick low frequency batch jobs. BUT Remember Performance Team is watching
Top Related