Download - Performance Tuning Mainframe Apps Compuware

Performance Tuning Mainframe

Applications“It’s Not So Hard”

Tony Shediak

Ampdev Pty Ltd

[email protected]

Compuware User Conference 2007

Why is Performance “Painful”• Significant Lack of Mainframe Skills – and dropping. Performance and Competence go together

• Little Focus (and budget) on Mainframe Training. “Management believes every mainframe programmer is automatically competent in COBOL” – it’s only COBOL right ? What about VSAM, CICS, DB2, IMS etc… If it is on your resume then surely you must know it………

How many programmers can read a DUMP ?

• Lack of Performance disciplines, standards and Tools usage, and more importantly - their ENFORCEMENT. Most programmers will NOT care unless you MAKE THEM CARE.

• All the focus is on Cost-cutting, Functionality, Change Control, Audit compliance, processes, processes and more processes. But

we are Not stopping the rubbish getting into Production.

The COBOL experiment• Setup Anonymous COBOL Skills Test

• Large organisation (still a client , can’t name them - before 6 beers)

• 150+ COBOL programmers

• Out of 100%:

• Lowest Mark 4%

• Highest Mark 93%

• Average Mark

• Weakest areas – Mainframe fundamentals, Data Types and Indexes

vs Subscripts

• Remediation – assign and train COBOL practice lead to raise skill level by running internal half-day COBOL skills work-shop and half day COBOL Dumps work-shop

28%

The Usual Suspects• Inefficient use of the programming language data types

• Data conversions caused by mixing data types unnecessarily

• Inefficient compiler options

• Inefficient initialisation of large structures/groups

• Over-Use of built-in functions or program language constructs that generate subroutine calls

• Inadequate VSAM buffering for the required function

• Long-running jobs processing several large databases randomly in a single step

• Inefficient Date/Time processing

• Using SQL to process tables like files, record by record

• Using overly complex or inefficient SQL

• Over-qualifying IMS DL/I calls

• Inefficient file block size for the device

• PL/I Onkey condition – this is very expensive since Language Environment because the condition handling architecture employed by LE. A Cobol “key not found” condition is about 7 times faster than a PL/I one.

• Re-reading small files/databases/reference data over and over rather than loading into program storage. Don’t be afraid to use storage – Ent COBOL 3.4+ has extended WS limit to 134MB.

• Not using the most optimal utility for the job, e.g. SORT COPY vs REPRO.

Programming Efficiently• Understanding Data Types

• Binary – integers, loop control, subscripts etc - fast.

• Packed Decimal – fractions, money

• Floating point – large range of numbers

• Subscripting Tables/Arrays - If a subscript is not binary then it will be converted to binary whether you like it or not.

• COBOL Indexes – optimised sequential array processing. But how many programmers actually understand this ?

• Use optimal compiler options as much as possible

• Utilise the “LIST” compile option and browse the assembler

• Avoid heavy INITIALIZE as it does it one element at a time

Compile with “LIST”• Utilise the “LIST” compile option and browse the assembler and

look for CVB, CVD, PACK, BALR etc

E.g Find all occurrences CVB :COMMAND INPUT ===> F CVB word all

.

.

008119 MOVE

002CB0 F272 A9B8 34F4 PACK 2488(8,10),1268(3,3) TS2=0

002CB6 960F A9BF OI 2495(10),X'0F' TS2=7

002CBA 4F60 A9B8 CVB 6,2488(0,10) TS2=0 002CBE 4C60 53C4 MH 6,964(0,5) PGMLIT AT +44

002CC2 1A62 AR 6,2

002CC4 D202 6CBD 4EEF MVC 3261(3,6),3823(4) LKG6K-CNTYCODE()

412 WORD 'CVB'

Built-in Functions• Use them wisely. Check the “LIST” compile and see if a

subroutine call is generated, e.g. IGZCSTG for STRING andIGZCIN1 for INSPECT.

• In a lot of cases the “do it yourself code” is simple and far moreefficient. ** concatenate AAA (up to but not including the first blank) with all of BBB into DDD*

STRING AAA DELIMITED SPACE subroutine call because of search for space BBB DELIMITED SIZE INTO DDD.

* Do it yourself

PERFORM VARYING I FROM 1 BY 1 UNTIL I > LENGTH OF AAA much more efficient to do it OR AAA(I:1) = SPACE yourself – no subroutine call END-PERFORM. this code is

65% more efficient COMPUTE LEN-AAA = I – 1 than using STRING

MOVE AAA(1:LEN-AAA) TO DDD(1:LEN-AAA). MOVE BBB TO DDD(LEN-AAA + 1:LENGTH OF BBB).

Built-in Functions• But sometimes the built-in function can be more efficient. The

INSPECT converting with BOTH 2nd and 3rd arguments as constants will generate a TR (Translate) machine instruction.

** Change ALL ‘*’ to SPACE and leave everything else as is*

INSPECT AAA CONVERTING ‘*’ TO SPACE. This code is 90% more efficient than ‘Do it Yourself’

** Do it yourself*

PERFORM VARYING I FROM 1 BY 1 UNTIL I > LENGTH OF AAA

IF AAA(I : 1) = ‘*’

MOVE SPACE TO AAA(I : 1)

END-IF END-PERFORM.

VSAM Buffering• NSR - Good for Sequential access

• Read ahead

• One set of buffers per file

• LSR – Good for Random access

• No read ahead

• Buffers can be shared by several files

• SMB – System Managed Buffering

• Enabled by SMS Dataclas

• Allocates NSR or LSR buffers depending on how opened.Watch out for DYNAMIC opens (Direct vs Sequential ?)

• Makes JCL simple

Good Redbook – “Vsam Demystified”

VSAM Buffering – Use IAM

• IAM (Innovation Access Method) is a 3rd Party Product that intercepts (transparently) VSAM I/O and uses its own optimised Access Method with its own internal data structures.

• Default Buffering more than caters for most processing requirements with little (if any) tweaking required for most jobs.

• JCL is kept simple and for the most part unchanged

• Significant CPU savings (30% + ) as installed with all default settings

IMS Considerations• Are your programmers well skilled/trained in IMS Application

programming ? This is the first hurdle

• Understand your Data before you do anything else

• Over-qualifying SSAs is expensive – simple test program shows18% CPU reduction by minimal qualification

• Avoid single step processing with heavy random access to several large databases. It is best to split the processing into several steps (extract – sort – process) per large DB

• When processing HDAM/DEDB databases it is most efficient if the driving input is sorted in the same physical sequence as the database - RAPSORT.

• Load Heavily hit Reference data into IMS MPRs.

CICS Considerations• Use Dynamic CALL/FETCH of subroutines rather than CICS

LINK (as this creates LE Enclave). Ent PL/I has removed FETCH restrictions.

• Use THREADSAFE Progs with DB2 – can save 5 to 15% CPU by minimising TCB switching. But do your homework – read redbook “Threadsafe Considerations for CICS”

• Only turn CICS trace on when you need it. Otherwise turn off – save you about 3%

• Check your STROBE report for LSR buffer hits, LE Heap/Stack allocation. There is more storage available these days so why not use it – Bump up the default LE parms if you need to.

• Use VSAM Data Tables to reduce I/O

DB2 Considerations• Understand your Data before you do anything else

• Watch out for the “record-oriented” SQL approach:Open Cursor AFor Every A Row

Open Cursor BFor Every B row….EndClose Cursor B

End

• Simple can be effective – A tablespace scan is very fast as long as you do one pass.

• Learn to use EXPLAIN – even at a basic level you will pick up easy fix issues.

Date/Time Processing• Use COBOL/PLI functions instead of DB2

• DB2 Date/Time Arithmetic is expensive – so use it wisely

• LE Date/Time Arithmetic better than DB2 but still expensive

• In-house written functions for Date/Time validation and Arithmetic is by far the best performer

• When Developing in-house routines:

• Utilise internal tables as much as possible to store constant information rather than derive it each time. For example, Leap year indicator can be stored rather than calculating each time – storage is not so much an issue these days

• Make your routine “Reducible” – i.e. if input parameters are exactly the same as the last invocation then return the last saved output parameters

Example 1 – Compiler Options

Description: Subroutine performing name/address abbreviation

Language: COBOL

Performance

problems identified: Using PIC 999 display data for arithmetic operations and array subscripts. Compiled with SSRANGE and TRUNC(BIN) - Pre COBOL V2.2

How identified: Compile listing and STROBE

Tuning applied: Change all display data to BINARY, as all are integer. Compile with NOSSRANGE and TRUNC(OPT)

Effort required: 4 hours

Performance

improvement: 70% CPU reduction

Description: Program performing account range validation via a VSAM(IAM) ksds by checking the range on each record against all subsequent records

Language: PL/I

Performance

problems identified: Program issuing over 300M logical I/O’s even though file contains 30,000 records only

How identified: IAM I/O report

Tuning applied: load file into an array allocated above the 16M line to eliminate further I/O’s. re-work checking algorithm to eliminate multiple passes through each entry

Effort required: 3 days

Performance

improvement: 99.97% CPU reduction. Cpu secs dropped from 2000 to 0.5

Example 2 – Lots of Logical I/O

Description: Expensive CICS Tran used to create web drop-down list

Language: COBOL

Performance

problems identified: DB2 Table containing relatively static reference data has a very high hit rate

How identified: STROBE report

Tuning applied: Load data into CICS region using GETMAIN SHARED and refresh every hour.

Use ENQ/DEQ to serialise updates to storage to allow program to run as THREADSAFE


Performance

improvement: 99% CPU reduction.

Example 3 – High Hit Static Reference Data

Description: DB2 Program performing rate reporting

Language: PL/I

Performance

problems identified: SQL used in a record-oriented approach e.g. Open Cursor A For every A row Open Cursor B For every B row Open Cursor C …etc End Close Cursor B End Close Cursor A

How identified: STROBE report and eyeballing the source

Tuning applied: Rewrite SQL to utilise table JOINS (specifically Left outer join in this case) and create and process a single cursor, so DB2 does the work once


Performance

improvement: Elapsed time dropped from 12 hours to 3 mins

Example 4 – Record Oriented SQL

Description: Program performing specialised data extract across VSAM files

Language: COBOL

Performance

problems identified: using default buffering (2 data and 1 index). using NSR but most of the processing is random

How identified: STROBE report

Tuning applied: Change file open mode from dynamic to RANDOM. Enable SMB(System Managed Buffering) by adding dataclas to define cluster and reorganising to allow SMB dataclas to apply. Because the file mode is random SMB will use LSR buffering. Note that alternatively we could have left the file open mode as dynamic and added AMP=’ACCBIAS=DO’ to the JCL to force LSR

Effort required: 1 hour

Performance

improvement: 86% CPU reduction. 95% Elapsed time reduction

Example 5 – VSAM Buffering and SMB

Description: Subroutine performing name and address compression into a cross-reference key

Language: COBOL

Performance

problems identified: using complex INSPECT, STRING extensively

How identified: STROBE report and eyeballing source code

Tuning applied: Total rewrite of code eliminating ALL STRING and INSPECT functions and replacing them with simple iterative search loops and sub-string manipulations


Performance


Example 6 – Compiler Generated Subroutine Calls

Description: Program performing specialised data extract

Language: PL/I

Performance

problems identified: many compiler generated subroutine calls due to manipulation of UNALIGNED bit strings in a STRUCTURE and various data conversions caused by mixing data types

How identified: STROBE report and compile listing

Tuning applied: Changed all unaligned bit strings within STRUCTURES to aligned. Eliminated all other data conversions requiring subroutine calls. 70 compiler generated subroutine calls were eliminated from the object code

Effort required: 1 day

Performance


Example 7 – Data Conversions

Description: DB2 Program performing Online Query

Language: COBOL

Performanceproblems identified: SQL NOT utilising Table Index in Join because of mis-matching data types

hence causing Tablespace Scan: SELECT P.PROD_CD

. . FROM (SELECT A.PARAMETER_NUM_WHLE AS PROD_CD . DEC(15) . ) AS P INNER JOIN P.SD700T00 AS S DEC(4)

ON S.SD700_PRODUCT_CODE = P.PROD_CD

How identified: STROBE report and EXPLAIN or ISTROBE

Example 8 – Bad SQL - 1

Tuning applied: Use CAST function to Convert to Correct data type :

SELECT P.PROD_CD . .

FROM (SELECT CAST(A.PARAMETER_NUM_WHLE AS DECIMAL(4)) AS PROD_CD . . ) AS P INNER JOIN P.SD700T00 AS S

ON S.SD700_PRODUCT_CODE = P.PROD_CD

Now DB2 uses Index on SD700T00 Table. Tablespace scan is gone

Effort required: 1 Hour

Performance



Description: DB2 Program performing Batch Query

Language: COBOL

Performanceproblems identified: SQL performing entire Index Scan and minor sort unnecessarily:

SELECT DISTINCT 1 Sort Unique (DISTINCT)

to eliminate multiple rows

FROM LM135T00 WHERE LM135_RACFID = :LM135-RACFID OR SUBSTR(LM135_RACFID,2,7) = :LM135-RACFID

Index Scan with NO matching Cols. But Silly to use SUBSTR(..2,7) because data tells us that the first

character can only be a ‘*’ anyway.

e.g. we are looking for ‘X123456’ OR ‘*X123456’

How identified: STROBE report and EXPLAIN or ISTROBE


Tuning applied: Understand the data and hence eliminate Index Scan by searching for only valid

combinations: ASTER-LM135-RACFID = ‘*’ || LM135-RACFID

SELECT 1

FROM LM135T00 WHERE LM135_RACFID IN (:LM135-RACFID, :ASTER-LM135-RACFID) FETCH FIRST 1 ROW ONLY

Removes problem of multiple rows returned, hence we don’t need the Distinct.

e.g. we are still looking for ‘X123456’ OR ‘*X123456’

Effort required: 2 Hours

Performance



Description: Program performing data extract

Language: COBOL

Performance

problems identified: High volume initialisation of large table

How identified: STROBE report and compile listing

Tuning applied: Removed initialisation as it was not really required, the program was already keeping track of the number of elements via a counter anyway

Effort required: 2 hours

Performance


Example 10 – Initialising a Large Table

The AAPT Story – From 650 to 400 MIPS in 15 Months

• 35 initiatives implemented

• Mostly Bad SQL or COBOL or both. Lots of code changed

• Some quick wins – Package REBINDs, Modifying or adding Index, using IEBGENER (SORT COPY) instead of REPRO, Changing Job schedules to run less often, and Web front-end bug initiating CICS Tran too many times

• CICS Threadsafe implemented with 15% CPU savings across Region. Programs targeted by heavy SQL execution.

• Upgraded DASD sub-system significantly helped I/O and gave more room to reduce MIPS

• Test Smart vs Test Hard, e.g. if only SQL was changed then only SQL was tested

• Strong Senior management support was given

Make Performance Part of the Culture• Training, training and more training (can be internal)

• Establish mentoring programs

• Implement Practice leadership for your key technical areas,

e.g. COBOL, DB2, CICS, VSAM/IAM, IMS etc. These people should:

• Own and Set Standards - and have the authority to enforce

• Give Regular technical updates/presentations

• Consult to Apps Teams

• Investigate New features/versions and their benefit

• Be the ultimate authority in that area and have full

management support – otherwise pointless

Make Performance Part of the Culture• Establish Internal Performance Team – ideally Practice leads would be in this team. Responsibilities:

• Regularly monitor M/F health using STROBE, SMF reports etc and check out new software & OS features

• Work with Apps Teams to fix issues

• Constantly look for performance opportunities

• Implement and Enforce Performance Management as part of the Software Development/Maintenance methodology and processes. E.g. MUST provide STROBE report for New/Modified Programs as a deliverable before approval to Production. -- Criteria can be set to exclude low volume and inexpensive trans and trivially quick low frequency batch jobs. BUT Remember Performance Team is watching