Tape drive Sg 244503

download Tape drive Sg 244503

of 326

Transcript of Tape drive Sg 244503

  • 7/27/2019 Tape drive Sg 244503

    1/325

    International Technical Support Organization

    System/390 MVS Parallel Sysplex

    Continuous Availability SE Guide

    December 1995

    SG24-4503-00

  • 7/27/2019 Tape drive Sg 244503

    2/325

  • 7/27/2019 Tape drive Sg 244503

    3/325

    International Technical Support Organization

    System/390 MVS Parallel Sysplex

    Continuous Availability SE Guide

    December 1995

    SG24-4503-00

    IBML

  • 7/27/2019 Tape drive Sg 244503

    4/325

    Take Note!

    Before using this information and the product it supports, be sure to read the general information under

    Special Not ices on page xvi i.

    First Edition (December 1995)

    This edition applies to Version 5 Release 2 of MVS/ESA System Product (5655-068 or 5655-069).

    Order publications through your IBM representative or the IBM branch office serving your locality. Publications

    are not stocked at the address given below.

    An ITSO Technical Bulletin Evaluation Form for reader s feedback appears facing Chapter 1. If the form has been

    removed, comments may be addressed to:

    IBM Corporation, International Technical Support Organization

    Dept. HYJF Mail Station P099

    522 South Road

    Poughkeepsie, New York 12601-5400

    When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any

    way it believes appropriate without incurring any obligation to you.

    Copyright International Business Machines Corporation 1995. All rights reserved.

    Note to U.S. Government Users Documentation related to restricted rights Use, duplication or disclosure is

    subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.

  • 7/27/2019 Tape drive Sg 244503

    5/325

    Abstract

    This document discusses how the parallel sysplex can help an installation get

    closer to a goal of continuous availability.

    It is intended for customer systems and operations personnel responsible forimplementing parallel sysplex, and the IBM Systems Engineers who assist them.

    It will also be useful to technical managers who want to assess the benefits they

    can expect from parallel sysplex in this area.

    The book describes how to configure both the hardware and software in order to

    eliminate planned outages and minimize the impact of unplanned outages.

    It describes how you can make hardware and software changes to the sysplex

    without disrupting the running of the applications.

    It also discusses how to handle unplanned hardware or software failures, and to

    recover from error situations with minimal impact to the applications.

    A knowledge of parallel sysplex is assumed.

    (296 pages)

    Copyright IBM Corp. 1995 iii

  • 7/27/2019 Tape drive Sg 244503

    6/325

    iv Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    7/325

    Contents

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii i

    Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi x

    How This Document Is Organized . . . . . . . . . . . . . . . . . . . . . . . . . xix

    Related Publicat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

    International Technical Support Organization Publications . . . . . . . . . . . xxi

    ITSO Redbooks on the World Wide Web (WWW) . . . . . . . . . . . . . . . . . xxii

    Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii i

    Part 1. Configuring for Continuous Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Chapter 1. Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.1 What Is Continuous Availabil ity? . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.1.1 Parallel Sysplex and Continuous Availabil ity . . . . . . . . . . . . . . . 3

    1.1.2 Why N + 1 ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.2 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.3 Coupl ing Fac il it ies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3.1 Separate Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3.2 How M any? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3.3 CF Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3.4 Coupling Facil ity Structures . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3.5 Coupling Facil ity Volati l ity/Nonvolati l i ty . . . . . . . . . . . . . . . . . . 8

    1.4 Sysplex T im ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    1.4.1 Duplicating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    1.4.2 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.4.3 Setting the Time in MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.4 Protect ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.5 I /O Configura tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.5.1 ESCON Logical Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.6 CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    1.6.1 3088 and ESCON CTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    1.6.2 Alternate CTC Configuration . . . . . . . . . . . . . . . . . . . . . . . . 14

    1.6.3 Sharing CTC Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    1.6.4 IOCP Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    1.6.5 3088 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    1.7 XCF Signal l ing Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    1.8 Dat a P lacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    1.9 DASD Configura tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    1.9.1 RAMAC and RAMAC 2 Array Subsystems . . . . . . . . . . . . . . . . 17

    1.9.2 3990 Model 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    1.9.3 3990 Model 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    1.9.4 DASD Path Recommendat ions . . . . . . . . . . . . . . . . . . . . . . . 17

    1.9.5 3990 Model 6 ESCON Logical Path Report . . . . . . . . . . . . . . . . 18

    1.10 ESCON Directors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    1.10.1 ESCON Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    1.10.2 ESCON Director Switch Matrix . . . . . . . . . . . . . . . . . . . . . . . 19

    1.11 Fibe r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    1.11.1 9729 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    1.12 Conso les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Copyright IBM Corp. 1995 v

  • 7/27/2019 Tape drive Sg 244503

    8/325

    1.12.1 Hardware Management Console (HMC) . . . . . . . . . . . . . . . . . 21

    1.12.2 How Many HMCs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    1.12.3 Using HMC As an MVS Console . . . . . . . . . . . . . . . . . . . . . . 21

    1.12.4 MVS Conso les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    1.12.5 Master Console Considerations . . . . . . . . . . . . . . . . . . . . . . 22

    1.12.6 Console Configuration Considerations . . . . . . . . . . . . . . . . . . 23

    1.13 T ape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    1.13.1 3490 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    1.14 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    1.14.1 VTAM CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    1.14.2 3745s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    1.14.3 CF Struc ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    1.15 Envi ronmental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    1.15.1 Uninterruptible Power Supply (UPS) . . . . . . . . . . . . . . . . . . . 26

    1.15.2 9672/9674 Protection against Power Disturbances . . . . . . . . . . . 27

    Chapter 2. System Software Configuration . . . . . . . . . . . . . . . . . . . . . 29

    2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    2.2 N, N+1 in a Software Environment . . . . . . . . . . . . . . . . . . . . . . . 29

    2.3 Shared SYSRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.1 Shared SYSRES Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2.3.2 Indirect Catalog Function . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2.4 M as te r Cat al og . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    2.5 Dynamic I /O Reconf igurat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    2.5.1 Except ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    2.6 I/O Def ini tion Fi le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    2.7 Couple Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    2.8 JES2 Checkpo in t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    2.8.1 JES2 Checkpoint Reconfiguration . . . . . . . . . . . . . . . . . . . . . 39

    2.9 RACF Dat abase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    2.10 PARMLIB Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    2.10.1 Developing Naming Conventions . . . . . . . . . . . . . . . . . . . . . 40

    2.10.2 MVS/ESA SP V5.2 Enhancements . . . . . . . . . . . . . . . . . . . . . 41

    2.10.3 MVS Conso les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    2.11 System Logger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    2.11.1 Logstream and Structure Allocation . . . . . . . . . . . . . . . . . . . 46

    2.11.2 DASD Log Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    2.11.3 Duplexing Coupling Facility Log Data . . . . . . . . . . . . . . . . . . 47

    2.11.4 DASD Staging Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    2.12 System Managed Storage Considerations . . . . . . . . . . . . . . . . . . 50

    2.12.1 SMSplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    2.12.2 DFSMShsm Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . 52

    2.12.3 Continuous Availabil ity Considerations . . . . . . . . . . . . . . . . . 52

    2.12.4 RESERVE Activi ty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    2.13 Shared Tape Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.13.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    2.13.2 Implementing Automatic Tape Switching . . . . . . . . . . . . . . . . 54

    2.14 Exploiting Dynamic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    2.14.1 Dynamic Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    2.14.2 Dynamic Subsystem Interface (SSI) . . . . . . . . . . . . . . . . . . . . 56

    2.14.3 Dynamic Reconfiguration of XES . . . . . . . . . . . . . . . . . . . . . 57

    2.15 Automating Sysplex Failure Management . . . . . . . . . . . . . . . . . . 57

    2.15.1 Planning for SFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    2.15.2 The SFM Isolate Function . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    2.15.3 SFM Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    vi Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    9/325

    2.15.4 SFM Act ivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    2.15.5 Stopp ing SFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    2.15.6 SFM Ut il izat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    2.16 Planning the Time Detection Intervals . . . . . . . . . . . . . . . . . . . . . 73

    2.16.2 Synchronous WTO(R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    2.17 ARM: MVS Automatic Restart Manager . . . . . . . . . . . . . . . . . . . . 79

    2.17.1 ARM Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    2.17.2 ARM Processing Requirements . . . . . . . . . . . . . . . . . . . . . . 80

    2.17.3 Program Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    2.17.4 ARM and Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    2.18 JES3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    2.18.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    2.18.2 JES3 Sysplex Considerations . . . . . . . . . . . . . . . . . . . . . . . 89

    2.18.3 JES3 Parallel Sysplex Requirements . . . . . . . . . . . . . . . . . . . 90

    2.18.4 JES3 Configurat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    2.18.5 Additional JES3 Planning Information . . . . . . . . . . . . . . . . . . 93

    Chapter 3. Subsystem Software Configuration . . . . . . . . . . . . . . . . . . . 95

    3.1 CICS V4 Transaction Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . 95

    3.1.1 CICS Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.1.2 CICS Af fin it ies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    3.1.3 File-Owning Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    3.1.4 Resource Definit ion Online (RDO) . . . . . . . . . . . . . . . . . . . . . 97

    3.1.5 CSD Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    3.1.6 Subsystem Storage Protection . . . . . . . . . . . . . . . . . . . . . . . 98

    3.1.7 Transact ion Isolat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    3.2 CICSPlex SM V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    3.2.1 CICSPlex SM Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 99

    3.3 IMS Transact ion Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    3.3.1 IMS Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    3.3.2 IMS RESLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    3.3.3 IMSIDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    3.3.4 Terminal Defin it ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    3.3.5 Data Set Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    3.3.6 IRLM Def ini t ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    3.3.7 Coupling Facil ity Structures . . . . . . . . . . . . . . . . . . . . . . . . 102

    3.3.8 Dynamic Update of IMS Type 2 SVC . . . . . . . . . . . . . . . . . . . 102

    3.3.9 Cloning Inhib itors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    3.4 DB2 Subsyst em . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    3.4.1 DB2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    3.4.2 DB2 Struc tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

    3.4.3 Changing Structure Sizes . . . . . . . . . . . . . . . . . . . . . . . . . 105

    3.4.4 DB2 Data Avai labi li ty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    3.4.5 IEFSSNXX Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . . 105

    3.4.6 DB2 Subsystem Parameters . . . . . . . . . . . . . . . . . . . . . . . . 1053.5 VSA M RLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    3.5.1 Control Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

    3.5.2 Defining the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    3.5.3 Defining the SMSVSAM Structures . . . . . . . . . . . . . . . . . . . . 108

    3.5.4 CICS Use of System Logger . . . . . . . . . . . . . . . . . . . . . . . . 109

    3.6 TSO in a Parallel Sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    3.7 System Automation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    3.7.1 NetView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    3.7.2 AOC/MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    3.7.3 OPC/ESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    Contents vii

  • 7/27/2019 Tape drive Sg 244503

    10/325

    3.8 VTAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    3.8.1 Configurat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    Part 2. Making Planned Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    Chapter 4. Systems Management in a Parallel Sysplex . . . . . . . . . . . . . 115

    4.1 The Importance of Systems Management in Parallel Sysplex . . . . . . 1154.1.1 Change Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    4.1.2 Prob lem Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    4.1.3 Operations Management . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    4.1.4 The Other System Management Disciplines . . . . . . . . . . . . . . 116

    4.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    Chapter 5. Coupling Facility Changes . . . . . . . . . . . . . . . . . . . . . . . 117

    5.1 Structure Attr ibutes and Al location . . . . . . . . . . . . . . . . . . . . . . 117

    5.2 Structure and Connection Disposition . . . . . . . . . . . . . . . . . . . . . 118

    5.2.1 Structure Disposi t ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

    5.2.2 Connection State and Disposit ion . . . . . . . . . . . . . . . . . . . . 119

    5.3 Structure Dependence on Dumps . . . . . . . . . . . . . . . . . . . . . . . 1205.4 To Move a Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    5.4.1 The Structure Rebuild Process . . . . . . . . . . . . . . . . . . . . . . 121

    5.5 Altering the Size of a Structure . . . . . . . . . . . . . . . . . . . . . . . . 123

    5.6 Changing the Active CFRM Policy . . . . . . . . . . . . . . . . . . . . . . . 125

    5.7 Reformatting the CFRM Couple Data Set . . . . . . . . . . . . . . . . . . . 126

    5.8 Adding a Coupl ing Faci l ity . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

    5.8.1 To Define the Coupling Facil ity LPAR and Connections . . . . . . . 127

    5.8.2 To Prepare the New CFRM Policy . . . . . . . . . . . . . . . . . . . . 127

    5.8.3 Setting Up the Structure Exploiters . . . . . . . . . . . . . . . . . . . . 128

    5.9 Servicing the Coupling Facil ity . . . . . . . . . . . . . . . . . . . . . . . . . 132

    5.9.1 Concurrent Hardware Upgrades: . . . . . . . . . . . . . . . . . . . . . 132

    5.9.2 Concurrent LIC Upgrades . . . . . . . . . . . . . . . . . . . . . . . . . 133

    5.10 Removing a Coupling Facil ity . . . . . . . . . . . . . . . . . . . . . . . . . 133

    5.11 Coupling Facil ity Shutdown Procedure . . . . . . . . . . . . . . . . . . . 134

    5.11.1 Coupling Facil ity Exploiter Considerations . . . . . . . . . . . . . . 138

    5.11.2 Shutting Down the Only Coupling Facility . . . . . . . . . . . . . . . 141

    5.12 Putting a Coupling Facility Back Online . . . . . . . . . . . . . . . . . . . 142

    Chapter 6. Hardware Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

    6.1 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

    6.1.1 Adding a Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

    6.1.2 Removing a Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

    6.1.3 Changing a Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    6.2 Logical Part it ions (LPARs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    6.2.1 Adding an LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.2.2 Removing an LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    6.2.3 Changing an LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    6.3 I /O De vic es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    6.4 ESCON Direc tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    6.5 Changing the Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    6.5.1 Using the Sysplex Timer . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    6.5.2 Time Changes and IMS . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    6.5.3 Time Changes and SMF . . . . . . . . . . . . . . . . . . . . . . . . . . 148

    6.5.4 Changing Time in the 9672 HMC and SE . . . . . . . . . . . . . . . . 148

    viii Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    11/325

    Chapter 7. Software Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    7.1 Adding a New MVS Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    7.1.1 Adding a New JES3 Main . . . . . . . . . . . . . . . . . . . . . . . . . 150

    7.2 Adding a New SYSRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

    7.2.1 Example JCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

    7.3 Implementing System Software Changes . . . . . . . . . . . . . . . . . . 154

    7.4 Add ing Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

    7.4.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    7.4.2 IMS Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

    7.4.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

    7.4.4 TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    7.5 Start ing the Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    7.5.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    7.5.2 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

    7.5.3 I MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

    7.6 Chang ing Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

    7.7 Moving the Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

    7.7.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

    7.7.2 I MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

    7.7.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637.7.4 TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

    7.7.5 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

    7.7.6 DFSMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

    7.8 Closing Down the Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . 165

    7.8.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

    7.8.2 I MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

    7.8.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

    7.8.4 System Automation Shutdown . . . . . . . . . . . . . . . . . . . . . . 169

    7.9 Removing an MVS Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

    Chapter 8. Database Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 171

    8.1 VSA M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

    8.1.1 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

    8.1.2 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

    8.1.3 Reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

    8.2 I MS/DB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

    8.2.1 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

    8.2.2 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

    8.2.3 Reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    8.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    8.3.1 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    8.3.2 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    8.3.3 Reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    Part 3. Handling Unplanned Outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

    Chapter 9. Parallel Sysplex Recovery . . . . . . . . . . . . . . . . . . . . . . . 179

    9.1 Sys tem Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

    9.1.1 Sysplex Failure Management (SFM) . . . . . . . . . . . . . . . . . . . 179

    9.1.2 Automatic Restart Management (ARM) . . . . . . . . . . . . . . . . . 179

    9.1.3 What Needs to Be Done? . . . . . . . . . . . . . . . . . . . . . . . . . 179

    9.2 Coupling Facil ity Failure Recovery . . . . . . . . . . . . . . . . . . . . . . 180

    9.3 Assessment of the Failure Condition . . . . . . . . . . . . . . . . . . . . . 185

    9.3.1 To Recognize a Structure Failure . . . . . . . . . . . . . . . . . . . . 185

    Contents ix

  • 7/27/2019 Tape drive Sg 244503

    12/325

    9.3.2 To Recognize a Connectivity Failure . . . . . . . . . . . . . . . . . . . 186

    9.3.3 To Recognize When a Coupling Facil ity Becomes Volati le . . . . . . 186

    9.3.4 Recovery from a Connectivity Failure . . . . . . . . . . . . . . . . . . 187

    9.3.5 Recovery from a Structure Failure . . . . . . . . . . . . . . . . . . . . 188

    9.4 DB2 V4 Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . 189

    9.4.1 DB2 V4 Built-In Recovery from Connectivity Failure . . . . . . . . . 189

    9.4.2 DB2 V4 Built-In Recovery from a Structure Failure . . . . . . . . . . 190

    9.4.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . . 190

    9.4.4 Manual Structure Rebui ld . . . . . . . . . . . . . . . . . . . . . . . . . 190

    9.4.5 To Manually Deallocate and Reallocate a Group Buffer Pool . . . . 190

    9.4.6 To Manually Deallocate a DB2 Lock Structure . . . . . . . . . . . . . 191

    9.4.7 To Manually Deallocate a DB2 SCA Structure . . . . . . . . . . . . . 192

    9.5 XCF Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . . . 192

    9.5.1 XCF Built-In Recovery from Connectivity or Structure Failure . . . . 192

    9.5.2 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . . 193

    9.5.3 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . . 193

    9.5.4 Manual Deallocation of the XCF Signall ing Structures . . . . . . . . 193

    9.5.5 Parti t ioning the Sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . 193

    9.6 RACF Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . 194

    9.6.1 RACF Built-In Recovery from Connectivity or Structure Failure . . . 1949.6.2 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . . 195

    9.6.3 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . . 195

    9.6.4 Manual Deallocation of RACF Structures . . . . . . . . . . . . . . . . 196

    9.7 VTAM Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . 196

    9.7.1 VTAM Built-In Recovery from Connectivity Failure . . . . . . . . . . 196

    9.7.2 VTAM Built-In Recovery from a Structure Failure . . . . . . . . . . . 196

    9.7.3 The Coupling Facil ity Becomes Volati le . . . . . . . . . . . . . . . . . 196

    9.7.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . . 196

    9.7.5 Manual Deallocation of the VTAM GRN Structure . . . . . . . . . . . 197

    9.8 IMS/DB Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . 197

    9.8.1 IMS/DB Built-In Recovery from a Connectivity Failure . . . . . . . . 197

    9.8.2 IMS/DB Built-In Recovery from a Structure Failure . . . . . . . . . . 198

    9.8.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . . 198

    9.8.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . . 198

    9.8.5 Manual Deallocation of an IRLM Lock Structure . . . . . . . . . . . . 199

    9.8.6 Manual Deallocation of a OSAM/VSAM Cache Structure . . . . . . 199

    9.9 JES2 Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . . . 199

    9.9.1 Connectivity Failure to a Checkpoint Structure . . . . . . . . . . . . 199

    9.9.2 Structure Failure in a Checkpoint Structure . . . . . . . . . . . . . . 202

    9.9.3 The Coupling Facil ity becomes Volati le . . . . . . . . . . . . . . . . . 203

    9.9.4 To Manually Move a JES2 Checkpoint . . . . . . . . . . . . . . . . . . 203

    9.10 System Logger Recovery from a Coupling Facil ity Failure . . . . . . . . 203

    9.10.1 System Logger Built-In Recovery from a Connectivity Failure . . . 203

    9.10.2 System Logger Built-In Recovery from a Structure Failure . . . . . 203

    9.10.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . 2039.10.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . 204

    9.10.5 Manual Deallocation of Logstreams Structure . . . . . . . . . . . . 204

    9.11 Automatic Tape Switching Recovery from a Coupling Facility Failure . 204

    9.11.1 Automatic Tape Switching Recovery from a Connectivity Failure . 204

    9.11.2 Automatic Tape Switching Built-In Recovery from a Structure

    Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

    9.11.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . 204

    9.11.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . 204

    9.11.5 Consequences of Failing to Rebuild the IEFAUTOS Structure . . . 205

    9.11.6 Manual Deallocation of IEFAUTOS Structure . . . . . . . . . . . . . 205

    x Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    13/325

    9.12 VSAM RLS Recovery from a Coupling Facil ity Failure . . . . . . . . . . 205

    9.12.1 SMSVSAM Built-In Recovery from a Connectivity Failure . . . . . 205

    9.12.2 SMSVSAM Built-In Recovery from a Structure Failure . . . . . . . 205

    9.12.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . 206

    9.12.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . 206

    9.12.5 Manual Deallocation of SMSVSAM Structures . . . . . . . . . . . . 206

    9.13 Couple Data Set Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

    9.13.1 Sysplex (XCF) Couple Data Set Failure . . . . . . . . . . . . . . . . 206

    9.13.2 Coupling Facility Resource Manager (CFRM) Couple Data Set

    Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

    9.13.3 Sysplex Failure Management (SFM) Couple Data Set Failure . . . 207

    9.13.4 Workload Manager (WLM) Couple Data Set Failure . . . . . . . . . 207

    9.13.5 Automatic Restart Manager (ARM) Couple Data Set Failure . . . . 207

    9.13.6 System Logger (LOGR) Couple Data Set Failure . . . . . . . . . . . 208

    9.14 Sysplex Timer Fai lures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

    9.15 Restart ing IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

    9.15.1 IMS/IRLM Failures Within a System . . . . . . . . . . . . . . . . . . 210

    9.15.2 CEC or MVS Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

    9.15.3 Automat ing Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

    9.16 Restart ing DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2119.17 Restart ing CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

    9.17.1 CICS TOR Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

    9.17.2 CICS AOR Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

    9.18 Recover ing Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

    9.18.1 Recovering an Application Failure . . . . . . . . . . . . . . . . . . . 212

    9.18.2 Recovering an MVS Failure . . . . . . . . . . . . . . . . . . . . . . . 213

    9.18.3 Recovering from a Sysplex Failure . . . . . . . . . . . . . . . . . . . 213

    9.18.4 Recovering from System Logger Address Space Failure . . . . . . 213

    9.18.5 Recovering OPERLOG Failure . . . . . . . . . . . . . . . . . . . . . . 213

    9.19 Restarting an OPC/ESA Controller . . . . . . . . . . . . . . . . . . . . . . 213

    9.20 Recovering Batch Jobs under OPC/ESA Control . . . . . . . . . . . . . 214

    9.20.1 Status of Jobs on Failing CPU . . . . . . . . . . . . . . . . . . . . . . 214

    9.20.2 Recovery of Jobs on a Fail ing CPU . . . . . . . . . . . . . . . . . . . 214

    Chapter 10. Disaster Recovery Considerations . . . . . . . . . . . . . . . . . 215

    10.1 Disasters and Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

    10.2 Disaster Recovery Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

    10.2.1 3990 Remote Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

    10.2.2 IMS Remote Site Recovery . . . . . . . . . . . . . . . . . . . . . . . . 216

    10.2.3 CICS Recovery with CICSPlex SM . . . . . . . . . . . . . . . . . . . 217

    10.2.4 DB2 Disaster Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 218

    Appendix A. Sample Parallel Sysplex MVS Image Members . . . . . . . . . 221

    A.1 Example Parallel Sysplex Configuration . . . . . . . . . . . . . . . . . . . 221

    A.2 I PLPARM M em bers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222A.2.1 LOADAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

    A.3 P AR ML IB M em be rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

    A.3.1 IEASYMAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

    A.3.2 IEASYS00 and IEASYSAA . . . . . . . . . . . . . . . . . . . . . . . . . 224

    A.3.3 COUPLE00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

    A.3.4 JES2 Startup Procedure in SYS1.PROCLIB . . . . . . . . . . . . . . . 227

    A.3.5 J2G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

    A.3.6 J2L42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

    A.4 VTAMLST M em bers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

    A.4.1 ATCSTR42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

    Contents xi

  • 7/27/2019 Tape drive Sg 244503

    14/325

    A.4.2 ATCCON42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

    A.4.3 APCIC42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

    A.4.4 APNJE42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

    A.4 .5 CDRM42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    A.4 .6 MPC03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    A.4.7 TRL03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    A.4 .8 APAPPCAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

    A.5 Al locat ing Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

    A.5.1 A LL OC JC L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

    Appendix B. Structures, How to ... . . . . . . . . . . . . . . . . . . . . . . . . 241

    B.1 To Gather Information on a Coupling Facil ity . . . . . . . . . . . . . . . . 241

    B.2 To Gather Information on Structure and Connections . . . . . . . . . . . 243

    B.3 To Deallocate a Structure with a Disposition of DELETE . . . . . . . . . 245

    B.4 To Deallocate a Structure with a Disposit ion of KEEP . . . . . . . . . . . 245

    B.5 To Suppress a Connection in Active State . . . . . . . . . . . . . . . . . . 245

    B.6 To Suppress a Connection in Failed-persistent State . . . . . . . . . . . 246

    B.7 To Monitor a Structure Rebui ld . . . . . . . . . . . . . . . . . . . . . . . . 246

    B.8 To Stop a Structure Rebui ld . . . . . . . . . . . . . . . . . . . . . . . . . . 248

    B.9 To Recover from a Hang in Structure Rebuild . . . . . . . . . . . . . . . 248

    Appendix C. Examples of CFRM Policy Transitioning . . . . . . . . . . . . . . 249

    C.1 Changing the Structure Def ini t ion . . . . . . . . . . . . . . . . . . . . . . . 249

    C.2 Changing the Coupling Facil ity Definition . . . . . . . . . . . . . . . . . . 255

    Appendix D. Examples of Sysplex Partitioning . . . . . . . . . . . . . . . . . . 259

    D.1 Parti t ioning on Operator Request . . . . . . . . . . . . . . . . . . . . . . . 259

    D.2 System in Missing Status Update Condition . . . . . . . . . . . . . . . . . 260

    Appendix E. Spin Loop Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 263

    Appendix F. Dynamic I/O Reconfiguration Procedures . . . . . . . . . . . . . 267

    F.1 Procedure to Make the System Dynamic I/O Capable . . . . . . . . . . . 267

    F.2 Procedure for Dynamic Changes . . . . . . . . . . . . . . . . . . . . . . . 270

    F.3 Hardware System Area Considerat ions . . . . . . . . . . . . . . . . . . . 271

    F.4 Hardware System Area Expansion Factors . . . . . . . . . . . . . . . . . 272

    Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

    List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

    Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

    xii Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    15/325

    Figures

    1. Sample Paral lel Sysplex Cont inuous Avai labi li ty Conf igurat ion . . . . . 5

    2. ESCON Logical Paths Configurat ion . . . . . . . . . . . . . . . . . . . . . . 13

    3. CTC Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4. Recommended XCF Signall ing Path Conf igurat ion . . . . . . . . . . . . . 165. Recommended DASD Path Conf igurat ion . . . . . . . . . . . . . . . . . . . 19

    6. ISCKDSF R16 ESCON Logical Path Report . . . . . . . . . . . . . . . . . . 20

    7. Console Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    8. Recommended Console Conf iguration . . . . . . . . . . . . . . . . . . . . 25

    9. 9910 Local UPS and 9672 Rx2 and Rx3 . . . . . . . . . . . . . . . . . . . . 28

    10. Indirect Catalog Funct ion with SYSRESA . . . . . . . . . . . . . . . . . . . 31

    11. Indirect Catalog Funct ion with SYSRESB . . . . . . . . . . . . . . . . . . . 32

    12. A lt er na te C on sol es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    13. Example of Failure Dependent Connection . . . . . . . . . . . . . . . . . . 48

    14. Example of Fai lure Dependent/ Independence Connections . . . . . . . . 49

    15. Basic Relat ionship between Sysplex Name and System Group . . . . . 51

    16. SMSplex Consist ing of System Group and Individual System Name . . . 51

    17. I so la ting a Fa il ing MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    18. INTERVAL and ISOLATETIME Relat ionship . . . . . . . . . . . . . . . . . . 61

    19. SFM Policy with the ISOLATETIME Parameter . . . . . . . . . . . . . . . . 62

    20. SFM LPARs Act ions T im ings . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    21. Sample JCL to Delete a SFM Policy . . . . . . . . . . . . . . . . . . . . . . 72

    22. Figure to Show Timing Relat ionships . . . . . . . . . . . . . . . . . . . . . 74

    23. JES3 *I S Display Showing Non-Existent Systems . . . . . . . . . . . . . . 88

    24. JES3-Managed and Auto-Switchable Tape . . . . . . . . . . . . . . . . . . 90

    25. NJE Node Definit ions Portion of JES3 Init Stream . . . . . . . . . . . . . . 91

    26. Sample JES3 Proc for Use by Multiple Globals . . . . . . . . . . . . . . . 92

    27. Cloned CICSplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    28. CICSPlex SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    29. Sample IMS 5.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . 10030. Sample DB2 Data Sharing Conf igurat ion . . . . . . . . . . . . . . . . . . 104

    31. Sample VSAM RLS Data Sharing Conf igurat ion . . . . . . . . . . . . . . 107

    32. START Command When Adding a New JES3 Global . . . . . . . . . . . 151

    33. V ol um e I nit ia li za tio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    34. Copy SYSRESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    35. SMP/ E ZONEEDIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    36. Add IPL Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    37. Example parallel sysplex Envi ronment . . . . . . . . . . . . . . . . . . . 154

    38. Introducing a New Software Level into the parallel sysplex . . . . . . . 155

    39. Redist ribut ing Workload on TORs . . . . . . . . . . . . . . . . . . . . . . 162

    40. Redist r ibut ing Workload on AORs . . . . . . . . . . . . . . . . . . . . . . 163

    41. DB2 Data Sharing Avai labi li ty . . . . . . . . . . . . . . . . . . . . . . . . 16842. Sample Checkpoint Def ini t ion . . . . . . . . . . . . . . . . . . . . . . . . . 200

    43. 3990-6 Peer-to-Peer Remote Copy Conf igurat ion . . . . . . . . . . . . . 217

    44. 3990-6 Extended Remote Copy Configuration . . . . . . . . . . . . . . . 218

    45. IMS Remote Site Recovery Conf iguration . . . . . . . . . . . . . . . . . 219

    46. DB2 Data Sharing Disaster Recovery Conf igurat ion . . . . . . . . . . . 220

    47. Example Paral lel Sysplex Configurat ion . . . . . . . . . . . . . . . . . . 221

    48. LOADAA Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

    49. IEASYMAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

    50. IEASYS00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

    51. IEASYSAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

    Copyright IBM Corp. 1995 xiii

  • 7/27/2019 Tape drive Sg 244503

    16/325

    52. COUPLE00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

    53. JES2 Member in SYS1.PROCLIB . . . . . . . . . . . . . . . . . . . . . . . 227

    54. J2G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

    55. J2L42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

    56. ATCSTR42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

    57. ATCCON42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

    58. APCIC42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

    59. APNJE42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

    60. CDRM42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    61. MPC03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    62. TRL03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    63. APAPPCAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

    64. Allocat ing System Specif ic Data Sets . . . . . . . . . . . . . . . . . . . . 238

    65. Coupl ing Facil ity Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

    66. St ructures and Connect ions Display . . . . . . . . . . . . . . . . . . . . . 243

    67. Monitoring Structure Rebui ld through Exploi ters Messages . . . . . . 246

    68. Monitoring Structure Rebuild by Displaying Structure Status . . . . . . 247

    69. CFRM Pol icy Samp le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

    70. JCL to Instal l a New CFRM Policy . . . . . . . . . . . . . . . . . . . . . . 252

    71. Origina l CFRM Pol icy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25672. New CF RM Poli cy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

    73. VARY OFF a System without SFM Pol icy Act ive . . . . . . . . . . . . . . 259

    74. VARY OFF a System with an SFM Pol icy Act ive . . . . . . . . . . . . . . 260

    75. System in Missing Status Update Condition and No Act ive SFM Policy 260

    76. System in Missing Status Update with an Active SFM Policy and

    CONNFAIL(YES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

    77. Resolut ion of a Spin Loop Condit ion . . . . . . . . . . . . . . . . . . . . 264

    78. HCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

    79. CONFIG Frame Fragment . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

    80. HCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

    81. Dynamic I /O Customiza tion . . . . . . . . . . . . . . . . . . . . . . . . . . 270

    xiv Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    17/325

    Tables

    1. Couple Data Set Placement Recommendat ions . . . . . . . . . . . . . . . 37

    2. JES2 Checkpoint Placement Recommendat ions . . . . . . . . . . . . . . . 39

    3. References Conta in ing Information on the Use of System Symbols . . . 42

    4. Summary of SFM Keywords and Parameters . . . . . . . . . . . . . . . . 635. I MS Dat a Set s i n Sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    6. Automation Recommendat ions . . . . . . . . . . . . . . . . . . . . . . . . 116

    7. Suppor t o f REBUILD by IBM Exp lo iters . . . . . . . . . . . . . . . . . . . 123

    8. Support o f ALTER by IBM Exp lo iters . . . . . . . . . . . . . . . . . . . . . 124

    9. DB2 Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

    10. Subsystem Recovery Summary Part 1 . . . . . . . . . . . . . . . . . . 182

    11. Subsystem Recovery Summary Part 2 . . . . . . . . . . . . . . . . . . 184

    12. Summary of Couple Data Sets . . . . . . . . . . . . . . . . . . . . . . . . 209

    Copyright IBM Corp. 1995 xv

  • 7/27/2019 Tape drive Sg 244503

    18/325

    xvi Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    19/325

    Special Notices

    This publication is intended to help customers systems and operations personnel

    and IBM systems engineers to plan, implement and use a parallel sysplex in

    order to get closer to a goal of continuous availabil ity. It is not intended to be a

    guide to implementing or using parallel sysplex as such. It only covers topicsrelated to continuous availability.

    The information in this publication is not intended as the specification of any

    programming interfaces that are provided by MVS Version 5 or any other product

    mentioned in this redbook. See the PUBLICATIONS section of the IBM

    Programming Announcement for MVS Version 5, or other products, for more

    information about what publications are considered to be product documentation.

    References in this publication to IBM products, programs or services do not

    imply that IBM intends to make these available in all countries in which IBM

    operates. Any reference to an IBM product, program, or service is not intended

    to state or imply that only IBMs product, program, or service may be used. Any

    functionally equivalent program that does not infringe any of IBMs intellectual

    property rights may be used instead of the IBM product, program or service.

    Information in this book was developed in conjunction with use of the equipment

    specified, and is limited in application to those specific hardware and software

    products and levels.

    IBM may have patents or pending patent applications covering subject matter in

    this document. The furnishing of this document does not give you any license to

    these patents. You can send license inquiries, in writ ing, to the IBM Director of

    Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA.

    The information contained in this document has not been submitted to any

    formal IBM test and is distributed AS IS. The information about non-IBM

    (VENDOR) products in this manual has been supplied by the vendor and IBM

    assumes no responsibility for its accuracy or completeness. The use of this

    information or the implementation of any of these techniques is a customer

    responsibility and depends on the customers ability to evaluate and integrate

    them into the customers operational environment. While each i tem may have

    been reviewed by IBM for accuracy in a specific situation, there is no guarantee

    that the same or similar results wil l be obtained elsewhere. Customers

    attempting to adapt these techniques to their own environments do so at their

    own risk.

    Reference to PTF numbers that have not been released through the normal

    distribution process does not imply general availabil ity. The purpose ofincluding these reference numbers is to alert IBM customers to specific

    information relative to the implementation of the PTF when it becomes available

    to each customer according to the normal IBM PTF distribution process.

    The following terms are trademarks of the International Business Machines

    Corporation in the United States and/or other countries:

    ACF/VTAM Advanced Peer-to-Peer Networking

    AIX APPN

    CICS CICS/ESA

    CICS/MVS CUA

    Copyright IBM Corp. 1995 xvii

  • 7/27/2019 Tape drive Sg 244503

    20/325

    The following terms are trademarks of other companies:

    C-bus is a trademark of Corollary, Inc.

    PC Direct is a trademark of Ziff Communications Company and is

    used by IBM Corporation under license.

    UNIX is a registered trademark in the United States and other

    countries licensed exclusively through X/Open Company Limited.

    Windows is a trademark of Microsoft Corporation.

    Other trademarks are trademarks of their respective companies.

    DATABASE 2 DB2

    DFSMS DFSMS/MVS

    DFSMSdfp DFSMSdss

    DFSMShsm DFSORT

    Enterprise Systems Connection

    Architecture

    ES/3090

    ES/9000 ESA/370

    ESA/390 ESCON XDF

    ESCON GDDM

    Hardware Configuration Definition IBM

    IMS IMS/ESA

    IPDS LPDA

    Magstar MVS/DFP

    MVS/ESA MVS/SP

    MVS/XA NetView

    PR/SM Processor Resource/Systems Manager

    PS/2 RACF

    RAMAC RETAIN

    RMF S/370

    S/390 SAA

    SQL/DS Sysplex Timer

    System/360 System/370

    System/390 Systems Applicat ion Architecture

    SystemView Virtual Machine/Enterprise Systems

    Architecture

    Virtual Machine/Extended Architecture VM/ESA

    VM/XA VSE/ESA

    VTAM

    xviii Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    21/325

    Preface

    This document discusses how the parallel sysplex can help an installation get

    closer to a goal of Continuous Availability.

    This document is intended for customer systems and operations personnelresponsible for implementing parallel sysplex, and the IBM Systems Engineers

    who assist them. It wil l also be useful to technical managers who want to

    assess the benefits they can expect from parallel sysplex in this area.

    How This Document Is Organized

    The document is in 3 parts:

    Part 1, Conf iguring for Cont inuous Avai labil i ty

    This part describes how to configure both the hardware and software in

    order to eliminate planned outages and minimize the impact of unplanned

    outages. Chapter 1, Hardware Conf igurat ion

    This chapter discusses how to design a hardware configuration for

    continuous availability.

    Chapter 2, System Software Conf igurat ion

    This chapter describes how to configure the system to support

    continuous availability and minimize the effort needed to maintain and

    run it.

    Chapter 3, Subsystem Software Conf igurat ion

    This chapter deals with configuring the various subsystems to provide an

    environment that will support the goal of continuous availability.

    Part 2, Making Planned Changes

    This part describes how you can make changes to the sysplex without

    disrupting the running of the applications.

    Chapter 4, Systems Management in a Parallel Sysplex

    This chapter discusses the importance of maintaining good systems

    management disciplines in a parallel sysplex environment.

    Chapter 5, Coupl ing Faci l ity Changes

    This chapter deals with changes that can be made to the coupling

    environment, for installation, planned or unplanned maintenance.

    Chapter 6, Hardware Changes

    This chapter discusses how to add, change or remove hardware

    elements of the sysplex in a non-disruptive way.

    Chapter 7, Software Changes

    This chapter discusses how to make changes such as adding, modifying

    or removing system images and subsystems.

    Chapter 8, Database Avai labi l i ty

    Copyright IBM Corp. 1995 xix

  • 7/27/2019 Tape drive Sg 244503

    22/325

    This chapter discusses subsystem (CICS, IMS, DB2) configuration options

    to minimise the impact of making database changes.

    Part 3, Handl ing Unplanned Outages

    This part describes how to handle unplanned outages and recover from error

    situations with minimal impact to the applications.

    Chapter 9, Paral lel Sysplex RecoveryThis chapter discusses how to recover from unplanned hardware and

    software failures.

    Chapter 10, Disaster Recovery Considerat ions

    This chapter contains a discussion of disaster recovery considerations

    specific to the parallel sysplex environment.

    Related Publications

    The publications listed in this section are considered particularly suitable for a

    more detailed discussion of the topics covered in this document.

    The publications listed are sorted in alphabetical order.

    CICS/ESA Release GuideGC33-0655

    CICS VSAM Recovery GuideSH19-6709

    CICS/ESA Dynamic Transaction Routing in a CICSPlex, SC33-1012

    CICS/ESA Version 4 Intercommunication Guide, SC33-1181

    CICS/ESA Version 4 Recovery and Restart Guide, SC33-1182

    CICS/ESA Version 4 CICS-IMS Database Control Guide, SC33-1184

    Concurrent Copy OverviewGG24-3936

    DB2 Version 4 Data Sharing: Planning and Administration, SC26-3269

    DB2 Version 4 Release Guide, SC26-3394

    DCAF V1.2.1 Installation and Using Guide, SH19-6838

    DFSMS/MVS V1 R3 DFSMSdfp Storage Administration Reference, SC26-4920 ES/9000 and ES/3090 PR/SM Planning Guide, GA22-7123

    ES/9000 9021 711-based Models Functional Characteristics, GA22-7144

    ES/9000 9121 511-based Models Functional Characteristics, GA24-4358

    Hardware Management Console Application Programming Interfaces,

    SC28-8141

    Hardware Management Console Guide, GC38-0453.

    IBM CICS Transaction Affinities Utility Users Guide, SC33-1159

    IBM CICSPlex Systems Manager for MVS/ESA Concepts and Planning,

    GC33-0786.

    IBM Token-Ring Network Introduction and Planning Guide, GA27-3677

    IBM 3990 Storage Control Reference for Model 6, GA32-0274

    IBM 9037 Sysplex Timer and System/390 Time Management, GG66-3264 Implementing Concurrent Copy, GG24-3990

    IMS/ESA Version 5 Administration Guide: Data Base, SC26-8012

    IMS/ESA Version 5 Administration Guide: System, SC26-8013

    IMS/ESA Version 5 Administration Guide: Transaction Manager, SC26-8014

    IMS/ESA V5 Operations Guide, SC26-8029

    IMS/ESA Version 5 Sample Operating Procedures, SC26-8032

    JES2 Multi-Access Spool in a Sysplex Environment, GG66-3263

    Large System Performance Reference Document, SC28-1187

    LPAR Dynamic Storage Reconfiguration, GG66-3262

    MVS/ESA Hardware Configuration Definition:Planning, GC28-1445

    xx Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    23/325

    MVS/ESA RMF Users Guide, GC33-6483

    MVS/ESA RMF V5 Getting Started on Performance Management, LY33-9176

    MVS/ESA SML:Implementing System-Managed Storage, SC26-3123

    MVS/ESA SP V5 Hardware Configuration Definition: Users Guide, SC33-6468

    MVS/ESA SP V5 Assembler Services Guide, GC28-1466

    MVS/ESA SP V5 Authorized Assembler Services Guide, GC28-1467

    MVS/ESA SP V5 Authorized Assembler Services Reference, Volume 2,

    GC28-1476

    MVS/ESA SP V5 Conversion Notebook, GC28-1436

    MVS/ESA SP V5 Initialization and Tuning Guide, SC28-1451

    MVS/ESA SP V5 Initialization and Tuning Reference, SC28-1452

    MVS/ESA SP V5 Installation Exits, SC28-1459

    MVS/ESA SP V5 JCL Reference, GC28-1479

    MVS/ESA SP V5 JES2 Initialization and Tuning Reference, SC28-1454

    MVS/ESA SP V5 JES2 Commands, GC28-1443

    MVS/ESA SP V5 JES3 Commands, GC28-1444

    MVS/ESA SP V5 Planning: Global Resource Serialization, GC28-1450

    MVS/ESA SP V5 Planning: Security, GC28-1439

    MVS/ESA SP V5 Planning: Operations, GC28-1441

    MVS/ESA SP V5 Planning: Workload Management, GC28-1493 MVS/ESA SP V5 Programming: Assembler Services References, GC28-1474

    MVS/ESA SP V5 Programming: Sysplex Services Guide, GC28-1495

    MVS/ESA SP V5 Programming: Sysplex Services Reference, GC28-1496

    MVS/ESA SP V5 Setting Up a Sysplex, GC28-1449

    MVS/ESA SP V5 System Commands, GC28-1442

    MVS/ESA SP V5 Sysplex Migration Guide, SG24-4581

    MVS/ESA SP V5 System Management Facilities (SMF) , GC28-1457

    S/390 MVS Sysplex Application Migration, GC28-1211

    S/390 MVS Sysplex Hardware and Software Migration, GC28-1210.

    S/390 MVS Sysplex Overview: An Introduction to Data Sharing and

    Parallelism, GC28-1208

    S/390 MVS Sysplex Systems Management, GC28-1209

    S/390 9672/9674 Managing Your Processors, GC38-0452

    S/390 9672/9674 System Overview, GA22-7148

    SMP/E R8 Reference, SC28-1107

    Sysplex Timer Planning, GA23-0365

    TSO/E V2 Users Guide, SC28-1880

    TSO/E V2 CLISTs, SC28-1876

    TSO/E V2 Customization, SC28-1872

    VTAM for MVS/ESA Version 4 Release 3 Migration Guide, GC31-6547

    International Technical Support Organization Publications

    Automating CICS/ESA Operations with CICSPlex SM and NetView, GG24-4424 Batch Performance, SG24-2557

    CICS Workload Management Using CICSPlex SM And the MVS/ESA Workload

    Manager, GG24-4286

    CICS/ESA and IMS/ESA: DBCTL Migration For CICS Users, GG24-3484

    DFSMS/MVS Version 1 Release 3.0 Presentation Guide, GG24-4391

    DFSORT Release 13 Benchmark Guide, GG24-4476

    Disaster Recovery Library: Planning Guide, GG24-4210

    MVS/ESA Software Management Cookbook, GG24-3481

    MVS/ESA SP-JES2 Version 5 Implementation Guide, SG24-4583

    MVS/ESA SP-JES3 Version 5 Implementation Guide, SG24-4582

    Preface xxi

  • 7/27/2019 Tape drive Sg 244503

    24/325

    MVS/ESA Version 5 Sysplex Migration Guide, SG24-4581

    MVS/ESA Sysplex Migration Guide, GG24-3925

    Planning for CICS Continuous Availability in an MVS/ESA Environment,

    SG24-4593

    RACF Version 2 Release 1 Installation and Implementation Guide, GG2

    RACF Version 2 Release 2 Technical Presentation Guide, GG24-2539

    Sysplex Automation and Consoles, GG24-3854

    S/390 Microprocessor Models R2 and R3 Overview, SG24-4575

    S/390 MVS Parallel Sysplex Continuous Availability Presentation Guide,

    SG24-4502

    S/390 MVS Parallel Sysplex Performance, GG24-4356

    S/390 MVS/ESA Version 5 WLM Performance Studies, SG24-4352

    Storage Performance Tools and Techniques for MVS/ESA, GG24-4045

    A complete list of International Technical Support Organization publications,

    known as redbooks, with a brief description of each, may be found in:

    International Technical Support Organization Bibliography of Redbooks,

    GG24-3070.

    To get a catalog of ITSO redbooks, VNET users may type:

    TOOLS SENDTO WTSCPOK TOOLS REDBOOKS GET REDBOOKS CATALOG

    A listing of all redbooks, sorted by category, may also be found on MKTTOOLS

    as ITSOCAT TXT. This package is updated monthly.

    How to Order ITSO Redbooks

    IBM employees in the USA may order ITSO books and CD-ROMs using

    PUBORDER. Customers in the USA may order by calling 1-800-879-2755 or by

    faxing 1-800-445-9269. Most major credit cards are accepted. Outside the

    USA, customers should contact their local IBM office. For guidance on

    ordering, send a PROFS note to BOOKSHOP at DKIBMVM1 or E-mail [email protected].

    Customers may order hardcopy ITSO books individually or in customized

    sets, called BOFs, which relate to specif ic functions of interest. IBM

    employees and customers may also order ITSO books in online format on

    CD-ROM collections, which contain redbooks on a variety of products.

    ITSO Redbooks on the World Wide Web (WWW)

    Internet users may find information about redbooks on the ITSO World Wide Web

    home page. To access the ITSO Web pages, point your Web browser to thefollowing URL:

    http://www.redbooks.ibm.com/redbooks

    IBM employees may access LIST3820s of redbooks as well. The internal

    Redbooks home page may be found at the following URL:

    http://w3.itsc.pok.ibm.com/redbooks/redbooks.html

    xxii Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    25/325

    Acknowledgments

    This publication is the result of a residency conducted at the International

    Technical Support Organization, Poughkeepsie Center.

    The advisor for this project was:

    The authors of this document are:

    G. Tom Russell

    International Technical Support Organization,

    Poughkeepsie

    Paola Bari

    I BM I ta ly

    Margaret Beal

    IBM Aust ral ia

    Horace Dyke

    IBM Canada

    Patr ick Kappeler

    IBM France

    Paul ONeill

    IBM Nord ic

    Ian Wai te

    IBM U K

    Preface xxiii

  • 7/27/2019 Tape drive Sg 244503

    26/325

    xxiv Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    27/325

    Part 1. Configuring for Continuous Availability

    This part describes how to configure both the hardware and software in order to:

    Eliminate planned outages

    Minimize the impact of unplanned outages

    Copyright IBM Corp. 1995 1

  • 7/27/2019 Tape drive Sg 244503

    28/325

    2 Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    29/325

    Chapter 1. Hardware Configuration

    This chapter discusses how to design a hardware configuration for continuous

    availabil ity. This means eliminating all single points of failure, and giving the

    possibility to make changes to hardware and software without disrupting the

    running of the applications.

    1.1 What Is Continuous Availability?

    When we speak about continuous availability we are really dealing with two

    different but interrelated topics, high availability and continuous operations.

    High availability has to do with keeping the applications running without any

    breakdown during the planned opening hours. The way we achieve this is by a

    combination of high reliability for the individual components of the system and of

    redundancy of components so that even if a component fails there is another one

    there that can replace it.

    Continuous operations on the other hand is about keeping the applications and

    systems running without any planned stops. This in itself would not be too big a

    problem if it were not for the opposing but equally urgent need for

    responsiveness to changing business requirements. So the simplistic solution to

    freeze all changes just will not do.

    What the end users increasingly require is that the applications are kept running

    without any planned or unplanned stops, and this is what we mean by continuous

    availabil ity.

    Up to now the only real solution to these requirements has been redundancy at

    the system level. This is a costly solution, but organizations such as airl ines that

    have these requirements often have two complete systems, where one runs theproduction and the other is a hot standby, and they can switch the production

    from one system to the other quickly. Then if they have an unplanned

    breakdown on the production system the standby one takes over with a

    minimum delay. Having a second system also allows them to make planned

    changes to the standby system, and then switch the production over to it when

    they are ready to bring the change into operation.

    1.1.1 Parallel Sysplex and Continuous AvailabilityThe parallel sysplex was designed to:

    Provide a single system image to the end-user of the application

    Support multiple copies of the applications, and provide services for dynamicbalancing of the workload over the multiple copies

    Provide locking facilities to allow data to be shared among the multiple

    copies of the applications with integrity

    Provide services to facilitate communication between the multiple copies

    From the perspective of continuous availability, the two most important functions

    provided by a parallel sysplex are:

    Data Sharing

    Copyright IBM Corp. 1995 3

  • 7/27/2019 Tape drive Sg 244503

    30/325

    Which allows multiple instances of an application running on multiple

    systems to work on the same databases simultaneously.

    Workload Balancing

    Which means that the workload can be distributed evenly across these

    multiple application instances. This is made possible by the fact that they

    can share data.

    These radically new possibilities provided by parallel sysplex change the way we

    approach continuous availabil ity.

    Today, a specific system provides the infrastructure for a major customer

    application. The loss or degradation of that system can severely impact the

    customer s business.

    In the parallel sysplex environment, where multiple cooperating systems provide

    the infrastructure, the loss or degradation of one of the many identical systems

    has little impact.

    This means that we can now design a system that is fault-tolerant from both a

    hardware and software perspective, giving us the possibility of the following:

    Very High Availability

    With redundancy in both hardware and software we can eliminate

    points-of-failure, and workload balancing can ensure that the work being

    done on a lost component will be distributed across the remaining ones.

    Nondisruptive Change

    Hardware changes can be made by removing the system that needs to be

    changed from the sysplex while the applications continue to run on the

    remaining systems, making the change, and then returning the system to the

    sysplex.

    Software changes can be achieved in a similar way, provided that thechanged version of the software in question can co-exist with the current

    ones in the sysplex. This coexistence (at level N and N+1) is a design

    objective of the IBM systems and subsystems that support parallel sysplex.

    This shift in philosophy changes the way we think about designing the

    configuration in a parallel sysplex. In order to take advantage (or exploit) the

    parallel sysplex there must be more than one of each hardware component, and

    the software must be designed for cloning.

    If the application requires N images in order to provide the processing capacity,

    then the system designer should provide N+1 images in the sysplex.

    1.1.2 Why N+1 ?When designing systems for high availability we must always consider the

    possibil ity that a component can fail. If we build the system with redundant

    components such that, even if any component does fail, the system will continue

    to function, then we have a fault-tolerant system. We can also say that we have

    no single point of failure.

    Obviously this component redundancy has a cost. The simplest, but most

    expensive solution, is to duplicate everything. This is often not an economically

    viable alternative. Fortunately there are others.

    4 Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    31/325

    If we assume that the individual components of the system are inherently

    reliable, that is that the probability of failure is very low for each component,

    then the probability of more than one failing at any one time is extremely low,

    and can be ignored. So, if we need a number of components (N) to do a

    particular job, all we need to do is allocate one extra to allow for the possibility

    of failure, and these N+1 components give us the redundancy we need. The

    larger the number of components (N) sharing the work, the less the relative cost

    of this redundancy.

    In other words, if we are flying in a two-engined plane and want to be safe in the

    case of an engine failure, then one engine must be able to f ly the plane. This

    means one of the two engines (50%) is redundant. If it is a four-engined plane

    then we want to be able to continue with three engines, so the fourth one (25%)

    is redundant.

    In the same way we have been building hardware redundancy into computer

    systems for some time, the number of channels to I/O units, power supplies in

    the processor, and so on.

    Now with parallel sysplex we can take this concept one step further, and

    introduce N+1 redundancy in the number of machines or system images in the

    system. This allows us to configure for the failure of entire machines or system

    images and sti l l keep the system on the air.

    Figure 1. Sample Paralle l Sysplex Continuous Availabi l ity Configurat ion. The coupling faci l it ies, sysplex t imers

    and all the links are duplicated to eliminate single points of failure.

    Ch ap ter 1 . Ha rd wa re Co nf igu rat ion 5

  • 7/27/2019 Tape drive Sg 244503

    32/325

    1.2 Processors

    The first prerequisite is that we have multiple processors following the N+1

    philosophy outlined above.

    1.2.1.1 CMOS-Only Sysplexes

    If we are designing a configuration from scratch, using CMOS processors, thenthis is just a matter of deciding what the optimal processor size is and then

    configuring N+1 identical machines, where N of these are sufficient to run the

    workload.

    In theory, the larger N is (that is, the smaller the individual machines) the less is

    the cost of the redundant N+1 machine.

    In practice there are counterbalancing reasons, such as the following:

    The performance overhead on the sysplex (between 0.5% and 1% for each

    extra machine).

    The extra human effort in managing more machines (which will depend on

    how well the systems management procedures and tools can handle multiplemachines).

    The extra work involved in maintaining more system images (which will

    depend on how well the clones are replicated and on how well the naming

    and other installation standards support this).

    How useful small machines are in handling the workload. If there are

    components in the workload that require larger machines to perform

    satisfactorily then this will tend to reduce the number of ways we can split

    the sysplex.

    1.2.1.2 Mixed SysplexesVery often a sysplex will be a mixture of large bipolar and smaller CMOS

    machines. This is for many installations a natural evolution from their current

    bipolar configurations and allows these machines to continue their useful life into

    the parallel sysplex world. It may also be necessary to keep these larger

    machines because parts of the workload need either the larger system image or

    the more powerful engines that these provide.

    In many cases it is not realistic to adopt a simplistic N+1 approach to these

    configurations with large machines due to the high cost of having a redundant

    large processor. In any event we are often dealing here with a transit ion state,

    where not all of the work can be partit ioned on a sysplex. What we need to

    consider from an availability viewpoint is the effect of the failure of each machine

    in the configuration, and particularly the larger ones. We must ensure that there

    is reserve capacity available to take over the essential work from that machine.This may involve removing or reducing the priority of some other nonessential

    work.

    6 Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    33/325

    1.3 Coupling Facilities

    The recommended configuration of coupling facilities for availability is to have at

    least two of them, and as separate 9674s, not partitions in processors doing

    other work.

    1.3.1 Separate MachinesThe reason for having them as separate machines is that if a coupling facility

    fails then the structures it contains will have to be rebuilt in another coupling

    facility, and this rebuild will be done using data from the coupled MVS-systems.

    If you run the coupling facility in a partition in a machine which is also running

    one of the systems in the sysplex, then a hardware failure on this machine will

    not only bring down the coupling facility but also one of the sources needed to

    rebuild it. The only way to recover from this situation is to restart the whole

    sysplex.

    1.3.2 How Many?In deciding how many coupling facil it ies you need, the same N+1 considerations

    apply as we have seen for processors. If one fails, we need to have sufficientprocessor capacity and memory available in the remaining ones to rebuild the

    structures and handle the load.

    The simplest design is where we have two coupling facilities, each of which has

    enough processor power and memory to handle the entire sysplex. In normal

    production we can then distribute the structures over these, and for each

    structure specify the other CF as the alternate for rebuild in case of a failure.

    1.3.3 CF LinksThe recommended number of CF links to each machine in the sysplex is at least

    two, for avai labi li ty reasons. You may need more for performance. SeeParallel

    Sysplex Performance, GG24-4356. Note that each of these receiver links (at theCF end) is separate. Sender l inks (at the MVS end) can be shared between

    partitions in a fashion similar to EMIF, so even if you have several partitions you

    will only need two links per machine for each CF you need to connect to. If you

    have an MP-machine which you plan to partition for any reason, then this means

    two links per CF on each side of the machine.

    In the coupling facility, one Intersystem Channel Adapter (fc #0014) is required

    for every two coupling links (#0007 or #0008). The Intersystem Channel Adapter

    is not hot pluggable, but the coupling links are. If you do not have a redundant

    9674 to switch the coupling load to, you may want to consider installing

    additional Intersystem Channel Adapters to allow for additional coupling links to

    be installed without an outage in the future. For details on hot plugging, refer tothe 9672/9674 System Overview, GA22-7148.

    1.3.4 Coupling Facility StructuresThere could be some planned activities that require a coupling facility shutdown.

    A coupling facil ity cannot be treated as a normal device. It requires a particular

    procedure to be unallocated by the subsystems and the shutdown can be

    disruptive or not depending on the initial coupling facility setting and the usage

    made by each different user. Here we will go through some considerations that

    can be useful in designing the coupling facility environment and making it

    possible to remove structures.

    Ch ap ter 1 . Ha rd wa re Co nf igu rat ion 7

  • 7/27/2019 Tape drive Sg 244503

    34/325

    While designing the coupling facility environment, you should consider which

    structures must be relocated to an alternate coupling facil ity. Some subsystems

    can continue to operate without their coupling facility structure, although there

    may be a loss of performance. For example, the JES2 checkpoint can be

    relocated to DASD and the RACF structure can simply be deallocated while

    coupling facil ity maintenance is being performed. For the remaining structures,

    you must ensure that enough capacity (storage, CPU cycles, link connections

    structure IDs, etc.) exists on an alternate coupling facility to allow structures to

    be rebuilt there.

    When you set up your coupling facility configuration you should provide

    definitions that enable the structures to be moved or rebuilt; structures being

    moved to the alternate coupling facility must have the alternate coupling facility

    name in the PREFLIST statement. The following is an example on how to define

    a structure that can be rebuilt:

    STRUCTURE NAME(IEFAUTOS) SIZE(640)REBUILDPERCENT(20)PREFLIST(CF01, CF02)

    For structures that will be moved (REBUILT) from the outgoing coupling facility toan alternate coupling facility, ensure that all systems using the structures have

    connectivity to the alternate coupling facility.

    1.3.5 Coupling Facility Volatility/NonvolatilityPlanning a coupling facility configuration for continuous availability requires

    particular attention to the storage volatility of the coupling facility where shared

    data resides. The advantages of a nonvolati le coupling facil ity are that if you

    lose power to a coupling facility that is configured to be nonvolatile, the coupling

    facility enters power save mode, saving the data contained in the structures.

    Continuous availability of structures can be provided by making the coupling

    facility storage contents nonvolatile.

    This can be done in different ways depending on how long a power loss we want

    to allow for:

    With a UPS

    With an optional battery backup feature

    With a UPS plus a battery backup feature

    For more details on this see 1.15.2, 9672/9674 Protection against Power

    Disturbances on page 27.

    The volatility or nonvolatility of the coupling facility is reflected by the volatility

    attribute, and can be monitored by the system and subsystems to decide onrecovery actions in the case of power failure.

    There are some subsystems that are very sensitive to the status of this coupling

    facility attribute, like the system logger, and they can behave in different ways

    depending on the volati l i ty status. To set the volatil i ty attribute you should use

    the coupling facility control code command:

    Mode Powersave

    This is the default setup and automatically determines the volatility status of

    the coupling facility based on the presence of the battery backup feature. If

    8 Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    35/325

    the battery backup is installed and working, the CFCC sets its status to

    nonvolati le. The battery backup feature will preserve coupling facil ity

    storage contents across a certain time interval (default is 10 seconds).

    Mode Non-Volatile

    This command should be used to inform the CFCC to set non-volatile status

    for its storage because a UPS is installed.

    Mode Volatile

    This command informs the CFCC to put its storage in volatile status

    irrespective of whether there is a battery or not.

    There are considerations in coupling facility planning depending on the

    sensitivity of subsystem users to coupling facility volatile/nonvolatile status:

    JES2

    JES2 can use a coupling facility structure for primary checkpoint data set,

    and its alternate checkpoint data set can either be in a coupling facility or on

    DASD. Depending on the volati l i ty of the coupling facil ity, JES2 will or wil l

    not allow you to have both primary and secondary checkpoint data sets on

    the coupling facility.

    Logger

    The system logger can be sensitive to the volatile/nonvolatile status of the

    coupling facility where the LOGSTREAM structures are allocated.

    Particularly, depending on the coupling facility status, the system logger is

    able to protect its data against a double failure (MVS failure together with

    the coupling facil ity). When you define a LOGSTREAM you can specify the

    following parameters:

    STG_DUPLEX(NO/YES)

    Specifies whether the coupling facility logstream data should be

    duplexed on DASD staging data sets. You can use this specif icationtogether with the DUPLEXMODE parameter to be configuration

    independent.

    DUPLEXMODE(COND/UNCOND)

    Specifies the conditions under which the coupling facility log data will be

    duplexed in DASD staging data sets. COND means that duplexing will be

    done only if the logstream contains a single point of failure and is

    therefore vulnerable to permanent log data loss:

    - Logstream is allocated to a volati le coupling facil ity residing on the

    same machine as the MVS system.

    - Duplexing will not be done if the coupling facil ity for the logstream is

    nonvolatile and resides on a different machine than the MVS system.

    DB2

    DB2 requests of MVS that structures be allocated in a nonvolatile coupling

    facility; however, it does not prevent allocation in a volatile coupling facility.

    DB2 does issue a warning message if allocation occurs into a volatile

    coupling facil ity. A change in volati l ity after allocation does not have an

    effect on your existing structures.

    The advantages of a nonvolatile coupling facility are that if you lose power to

    a coupling facility that is configured to be nonvolatile, the coupling facility

    Ch ap ter 1 . Ha rd wa re Co nf igu rat ion 9

  • 7/27/2019 Tape drive Sg 244503

    36/325

    enters power save mode, saving the data contained in the structures. When

    power is returned, there is no need to do a group restart, and there is no

    need to recover the data from the group buffer pools. For DB2 systems

    requiring high availability, nonvolatile coupling facilities are recommended.

    SMSVSAM Lock

    The coupling facility IGWLOCK00 lock structure is recommended to be

    allocated in a nonvolati le coupling facil ity. This lock structure is used to

    enforce the protocol restrictions for VSAM RLS data sets and maintain the

    record level locks. The support requires a single CF lock structure.

    IRLM Lock

    The lock structures for IMS or DB2 locks are recommended to be allocated in

    a nonvolati le coupling facil ity. Recovery after a power failure is faster if the

    locks are still available.

    IMS Cache Directory

    The cache directory structure for VSAM or OSAM databases can be

    allocated in a nonvolatile or volatile coupling facility.

    VTAM

    The VTAM Generic Resources structure ISTGENERIC can be allocated in

    either a nonvolati le or a volati le coupling facil ity. VTAM has no special

    processing for handling a coupling facility volatility change.

    1.4 Sysplex Timers

    In a multi-system sysplex it is necessary to synchronize the Time-of-Day (TOD)

    clocks in all the systems very accurately in order to maintain data integrity. If all

    the systems are in the same CPC, under PR/SM, then this is no problem as they

    are all using the same TOD clock. If the systems are spread across more than

    one CPC then the TOD clocks in all these CPCs must be synchronized using a

    single external time source, the sysplex timer.

    The IBM Sysplex Timer (9037) is a table-top unit that can synchronize the TOD

    clocks in up to 16 processors or processor sides, which are connected to it by

    fiber-optic l inks. For full details see IBM 9037 Sysplex Timer and System/390

    Time Management, GG66-3264-00.

    The sysplex cannot continue to function without the sysplex t imer. If any system

    loses the timer signal, it will be fenced from the sysplex and put in an

    unrestartable wait state.

    1.4.1 DuplicatingWhen the Expanded Availability Feature is installed, two 9037 devices linked to

    one another, provide a synchronized, redundant configuration. This ensures that

    the failure of one 9037, or a fiber optic cable, will not cause loss of time

    synchronization. It is recommended that each 9037 have its own AC power

    source, so that if one source fails, both devices are not affected.

    Note that these two timers must be within 2.2 meters of one another.

    The sysplex timer attaches to the processor via the processors Sysplex Timer

    Attachment Feature. Dual ports on the attachment feature permit redundant

    connections, so that there is no single point of failure.

    10 Continuous Availability with PTS

  • 7/27/2019 Tape drive Sg 244503

    37/325

    1.4.2 DistanceThe processors are connected to the timer by a multi-mode fiber, and can be up

    to three kms from the timer, depending on the fiber. Distances between the

    sysplex timer and CECs beyond 3,000 meters are supported by RPQ 8K1919.

    RPQ 8K1919 allows the use of single mode fiber optic (laser) links between the

    processor and the 9037. To support single mode fiber on the 9037, a specialLED/laser converter has been designed called the 9036 Model 003. The 9036-003

    is designed for use only with a 9037, and is available only as RPQ 8K1919. Two

    9036-003 extenders (two RPQs) are required between the 9037 and each sysplex

    timer attachment port on the processor.

    The si