JCL Production Support_ABENDS

download JCL Production Support_ABENDS

of 11

description

Production Support/Application Testing/Software Defect and IBM Mainframe COBOL ABEND Research

Transcript of JCL Production Support_ABENDS

Production Support/Application Testing/Software Defect andIBMMainframe COBOL ABEND ResearchWhen an application ABEND (ABnormalEND-of-job) occurs, Z/OS stops executing your program, closes files and buffers and generates a single high-level message in the form of a System Completion Code (Sxxx).The System Completion Code is usually written to an output listing file through your //SYSOUT DD * JCL entry.This completion code indicates whythe systemhas decided to stop executing your application.It is related to, but often only loosely related to what is really wrong with your application.Because of this theSystem Completion Code represents onlythe starting point for your analysisof the problem.Other Debugging AssistanceAlong with the System Completion Code, useIBMs Problem Determination tools (PD Tools)- this will generate a listing (SYSOUT) which describes:The System Completion Code (and often a short text description of what it designates)A short explanation of the cause of the ABENDThe COBOL instruction (statement) or line number, which contained the invalid operation causing Z/OS to halt executionA "core-dump" (a hexadecimal printout) of the internal machine storage and registers relevant to the areas of your program surrounding the COBOL instruction which caused Z/OS to halt execution.This information is useful to begin understanding and researching the problem, but it is usually far from sufficient to solve the problem, which could be any combination of:Incomplete, incorrect or invalid COBOL procedural logicA typo such as a misplaced period, or incorrectly specified fieldIncorrect or invalid input dataBatch jobs run out of sequenceInput files missing or corrupted (hardware errors)Errors which relate to JCL problemsetc.There are as many different ways to analyze and research COBOL ABENDs as there are individual approaches to writing procedural logic.However, if you've never done this type of "logic-detective" work on a large scale, and to help you get started with this complex and crucial process, consider the following approach of five steps:PreparationResearchHypothesisSolutionResolutionAs a final note before beginning, understand that there are really two distinct phases of Production Support:1. Data Center on-call ABEND resolution - wherein a technician receives notification that a job or transaction hasABENDdand must be "fixed" within an extremely short timeframe (usually minutes to hours).In this case, the technician's main concern is to "patch" the problem - get the system back online, or get the batch jobstream back into production ("Patch-It").2.NextDayproblem resolution - wherein technician(s) actually track down and solve the problem that caused the ABEND ("Fix-It").The steps below represent a process for "FixIt" - they go well beyond the scope of the emergency measures used to "patch" the problem during anOnCallemergency.1.Preparation- Collect all necessary background information (WHAThappened andWHEREthe ABEND occurred)Print out the ABEND informationCollect all supporting ABEND output (SYSOUT) from the job - (ABEND-AID, DISPLAY statements, etc.)Obtain copies of the run-time:JCLProgram source -and all copybooks (or expanded source listing)From the JCL learn the dataset names of input and output files accessed by the program (which you may need to browse as part of your research)Learn the nature of the batch job from system documentation , or from an application business expert (at least at the level of module-flow and file-access)2.Research- Construct a mental map (understanding) of the program's execution (HOWthe ABEND occurred)To make the correctWHYdetermination usually requires a combination of "Static" and "Dynamic" analysis - complementary research and investigative approaches.Note:These steps need not be followed in this order.Rather, in time you will develop an "intuition" as to which kind(s) of analysis will be most likely to provide the information you need to solve your problem.In a production support roleStatic Analysis:1. Structural Visualization:is the generation of an accurate mental map, understanding or mental image of the program's control structure, or logic-architecture.Using the starting point represented by the ABEND condition (the statement which caused Z/OS to halt execution) and using electronic-assisted tools (such asIBMs Rational Asset Analyzer or Rational Developer for System z), build an accurate understanding of the code invocation at:The module/file level (System View)Paragraph/Section level (Hierarchy chart)(if necessary i.e. if the code is dense or complex) Statement level (Flow chart)Structural Visualization can done be "top-down", by asking open-ended questions; such as learning how a particular routine "hangs-together logically", or it can be used "bottom-up", by asking specific close-ended questions about a program, such as "How does this particular paragraph get executed?""How did this module get invoked?"2. Data Flow Analysis:A combination of control structure analysis and data item analysis, which seeks to determine the usage of particular fields throughout a program.Data flow analysis is used to determine (from a given instance of a data item) where the next occurrence(s) of that item exist in your program, and how the data item is used; (as a receiving field in aMOVEor mathematical operation, as the sending field in aMOVEstatement, as part of a logic-branch (IF, PERFORM UNTIL/VARYING, etc.).3. Data Impact Analysis:An expansion of Data Flow Analysis which traces the movement of data from field-to-field throughout a program, or throughout an entire application; including I/O (screens and files).Using Data Impact Analysis, you can identify all fields that might have had an impact on the contents of a field (before the ABEND occurred).And just as importantly - you can learn the affect changing this field will have on the behavior of the application.4. Textual or Data Item Usage:Utilized more for application maintenance and enhancement requests, this type of Static Analysis involves searching for "categories" ofprogram-items, such as "List all fields that contain *JUL*, *GREG*, *YR*, *YEAR* (suspect date candidates for Year2000 conversion), or list all such fields with two digits (numeric) or two-byte (alphanumeric) definitions.5. Code Partitioning:Again, utilized more for application maintenance, enhancements and application reengineering, Code Partitioning involves mentally organizing and analyzing code by function or process, such that you understand and can distinguish the usage of code by business process.For example: Find all code that relates to the calculation of premium renewal payments or Isolate the code that edits a particular file, with an eye towards creating a shared subroutine from the code.Dynamic Analysis:1. Tracing:Source-level interactive debugging.Watch the program execute statement-by-statement, and line-by-line.This is very useful for detailed-debugging, particularly of dense or complex instructions.Some software (for example, the Rational Developer for System z) allows you to trace the program logic, attempting to re-create the sequence of events (COBOL statements) that transpired up to and including the ABEND condition.Tracing is an invaluable method for detailed debugging.However, given the size and scope of production applications, it is generally more practical toTracespecific problem areas of a program.2. Interactive Execution:Execute (run) a program, stopping at selectiveBreakpoints(Pause execution each time a certain field-value changes, or when a value exceeds some threshold), and examining the contents (value) of specific fields.Interactive Execution must be done by (or with) an application analyst who understands how the system issupposed tooperate.Interactive Execution is useful for observing control flow, and is often combined with line-by-line tracing by setting selective breakpoints, monitoring values, "running" the application to the breakpoints, and then tracing the code line-by-line.3. Selective Data State Collection:Execute code and establish a functional summary of specific data states that it creates.Use these states in subsequent test runs to compare results of current values to expected values.4. Coverage:Analyze the number of times each COBOL statement is executed for a given run.This technique is extremely useful for analyzing test datacoverageof a given application.And it can be used effectively for debugging if it makes apparent problems such as infinite loops (S222, S322 and B37 ABENDs), over-loading tables - (loading tables beyond the maximum OCCURS clause and overlaying storage, which can cause S0C1, S0C4, and S0C7 ABENDs).Using a COBOL research and analysis tool (such asIBMs Rational Asset Analyzer or Rational Developer for System z), or some other source-level analysis software) perform Static and/or Dynamic Analysis on the specific areas of the application relating to the ABEND, to determine (based onWHEREthe problem manifested itself to the system - obtained from the ABEND-AID listing of which statement caused theABEND )HOWthis particular problem occurred in the application.3.Hypothesis- DetermineWHYthe ABEND occurredWith the research in steps 1 and 2, you should be able to describeWHAT,WHEREandHOWthe ABEND occurred (at what point in the program the logic failed, and what sequence of COBOL statements caused the failure).However, before modifying any logic, you must determineWHYthese statements (or sequence of events) caused this particular failure (e.g. "Why did this production input file contain spaces in a numeric field?""Why did the program's logic perform the Initialization routine twice?""Why did the Read routine execute past end-of-file?",etc.).Only through a determination ofWHYwill you be able to make a change to production business logic safely, and with confidence that;Your change will resolve the ABENDYour change will not introduce new (additional) ABENDsSometimes it is relatively easy to come to an understanding ofWHYcertain ABEND conditions occurred.For example, perhaps a period was left off the appropriate termination point for anIFstatement - which caused execution to perform an operation out of sequence.OrperhapsanIF .. NUMERICtest (which should have been coded for all numeric fields in a file) was forgotten.Or a paragraph was performed through the wrong paragraph-exit, or a production job was released before certain files were available (causing I/O errors).These types of ABEND situations can be understood (and usually resolved) fairly quickly.However, this is not always the case.What if - in the case of theIFstatement with the incorrect termination point - the logic that has been coded, correctly processed the first 100,000 records in the file?Making a change to a criticalIFcondition could very well affect other down-stream processing within the program, wrecking havoc with subsequent routines.Or what if - in the case of the file containing blanks in the numeric fields - the input file was supposed to be "clean" (validated) by this point in the jobstream - having gone through allegedly "exhaustive" edits in prior modules.By simply adding anIFtest you may solve your program's specific ABEND, but you will not have resolved the actual problem - which exists somewhere else in the system.In other words, provincial approaches to resolving production ABENDs are not recommended - as theyusuallychangethe problem, instead of solving it.It should be noted that, a clear understanding of the business functionality automated by this process is usually required to completely resolveWHYsomething has gone wrong.Callingonbusinessexperts or "application/business" experts who understand "the big picture" - and the context in which the job executes is the rule rather than the exception to this process.Developing a clear and accurate determination ofWHYa problem that lead to an ABEND condition exists may take a considerable amount of time, depending on the:Size, complexity and structure of the codeYour familiarity with the program's business purpose - coupled with your ability to grasp the point of each statement (assuming you didn't write the code)Type of ABEND and reason for the problem (some are more diabolical than others)Size of the input/output files, and capabilities of your file editorNote that, in addition to an understanding of the reason for the ABEND, the results of your investigation should produce an understanding of the solution to the problem (the fix itself).4.Solution- Fix the problem and test your solutionTake the appropriate action to resolve any business - or system-wide issues.Depending on how extensive the damage caused by the problem, or for how long any problems have persisted undetected:Files may have to be restored from backups from a previous point-in-timeJobs may have to be re-run from a previous point-in-time (synchronized with file generations)Files may have to be modified with "one-shot" programs, written to resolve issues that require "surgery" on the dataTake the appropriate action to fix the technical (coding) problemEdit program source - modifying the existing production logic and/orModify the JCL (if the error included JCL issues)Test your solutionCompile and Link the new version of the applicationCreate an "image copy" of the production file system, in order to test your fixRe-Run the batch joband analyze resultsRun "Regression Tests" against the new code - analyze for unexpected results5.Resolution Build and migrate back in to productionPromote your changes into productionSchedule and re-run the cycleAppendix - ABEND Completion Codes and some typical causesWhile there is a wide variety of reasons for ABEND conditions("WHYs")in production systems, it is possible (and useful) to categorize and organizeHOWcertain conditions often lead to certain types ofABEND completion codes - in order to expedite or streamline your analysis and research (an80/20 approachto analysis).The following information on a few common Z/OS ABEND completioncodes,and the conditions which generated them is included for you to make effective use of ABEND-AID listings and the above debugging, research and analysis process.S0C1Attempt to execute an invalid machine instructionS0C1s occur due to COBOL:Table-handling overlay (MOVEsto table subscripts/indexes which are out-of-range - and which overwritePROCEDURE DIVISIONinstructions)Statements referencingLINKAGE-SECTIONfields incorrectlyCALLsto an invalid subroutine nameThe COBOL compiler always generates valid machine instructions.S0C1's usually occur when populating tables beyond the validOCCURSrangeTypical Reasons forS0C1sExplanationMoving elementstoa table using a subscript or indexThis usually happens because of a loop thatwhichcontainsa value beyondis not terminated correctly - such as a routine whichthemaximumOCCURSin the table declarationpopulates a table from an input file containing morerecordsthan the tableOCCURSdeclaration providesfor.It can also happen through aMOVEor invalid mathstatementwhich computes an invalid subscript/indexvalue.Referencing incorrectly defined/passedIfthe definitions of yourLINKAGE SECTIONLINKAGESECTIONfieldsfields do not match, or the definitions in the calledprogramare larger than the calling program, you couldbeattempting to reference data outside of valid storagewhenstatements which reference those fields executeCALLto an invalid or unavailable module-nameIf your program makes a dynamicCALLand themodule-namebeing called is not found, you can getS806, S0C4 orS0C1system errors.The reasons forinvalidmodule-names include; misspelling the name,incorrectly specifying theSTEPLIB/JOBLIB DSN=in the JCL (or incorrectly concatenating theSTEPLIB/JOBLIBdatasets), leaving out apostrophes (or quotes) on aCALLliteral -which would cause the COBOL compiler to treat the statement as if it were aCALLidentifier - and if an identifier with that name exists in the Data Division, COBOL will attempt a dynamicCALLto the value of the identifier.S0C4Attempt to reference an invalid storage addressS0C4s occur due to COBOL:Table-handling overlay errors (MOVEsto table subscripts/indexes which are out-of-range - and which overwritePROCEDURE DIVISIONinstructions)Statements referencingLINKAGE SECTIONfields incorrectlyCALLsto an invalid subroutine nameSTOPRUNorGOBACKin theINPUTorOUTPUT PROCEDUREwhen using the COBOLSORTverbAttempt to access an unopened datasetUnless your program is executing with "bounds-checking" (supported by CA-CapexOptimizing, COBOL II and COBOL/370 - and generally not used in production), your table routines could overlay the contents of storage beyond the boundary of theOCCURSclause.This can cause S0C7s (see above) S0C1s and S0C4s by overwriting field values in the Data Division (S0C7s) or actually overwriting the instructions in yourPROCEDURE DIVISION, producing invalid addresses (operands) for the executable (machine) code (which in turn can cause S0C1s and S0C4s)Typical Reasons forS0C4sExplanationTable subscript or index contains a zero valueVerify that all table-handling subscript/indexreferencesare within the allowable range ofofthe table'sOCCURSclause(>= 1,