Tape drive Sg 244503

7/27/2019 Tape drive Sg 244503

1/325

International Technical Support Organization

System/390 MVS Parallel Sysplex

Continuous Availability SE Guide

December 1995

SG24-4503-00

7/27/2019 Tape drive Sg 244503

2/325

7/27/2019 Tape drive Sg 244503

3/325

International Technical Support Organization

System/390 MVS Parallel Sysplex

Continuous Availability SE Guide

December 1995

SG24-4503-00

IBML

7/27/2019 Tape drive Sg 244503

4/325

Take Note!

Before using this information and the product it supports, be sure to read the general information under

Special Not ices on page xvi i.

First Edition (December 1995)

This edition applies to Version 5 Release 2 of MVS/ESA System Product (5655-068 or 5655-069).

Order publications through your IBM representative or the IBM branch office serving your locality. Publications

are not stocked at the address given below.

An ITSO Technical Bulletin Evaluation Form for reader s feedback appears facing Chapter 1. If the form has been

removed, comments may be addressed to:

IBM Corporation, International Technical Support Organization

Dept. HYJF Mail Station P099

522 South Road

Poughkeepsie, New York 12601-5400

When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any

way it believes appropriate without incurring any obligation to you.

Copyright International Business Machines Corporation 1995. All rights reserved.

Note to U.S. Government Users Documentation related to restricted rights Use, duplication or disclosure is

subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.

7/27/2019 Tape drive Sg 244503

5/325

Abstract

This document discusses how the parallel sysplex can help an installation get

closer to a goal of continuous availability.

It is intended for customer systems and operations personnel responsible forimplementing parallel sysplex, and the IBM Systems Engineers who assist them.

It will also be useful to technical managers who want to assess the benefits they

can expect from parallel sysplex in this area.

The book describes how to configure both the hardware and software in order to

eliminate planned outages and minimize the impact of unplanned outages.

It describes how you can make hardware and software changes to the sysplex

without disrupting the running of the applications.

It also discusses how to handle unplanned hardware or software failures, and to

recover from error situations with minimal impact to the applications.

A knowledge of parallel sysplex is assumed.

(296 pages)

Copyright IBM Corp. 1995 iii

7/27/2019 Tape drive Sg 244503

6/325

iv Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

7/325

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii i

Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi x

How This Document Is Organized . . . . . . . . . . . . . . . . . . . . . . . . . xix

Related Publicat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

International Technical Support Organization Publications . . . . . . . . . . . xxi

ITSO Redbooks on the World Wide Web (WWW) . . . . . . . . . . . . . . . . . xxii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii i

Part 1. Configuring for Continuous Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1. Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1 What Is Continuous Availabil ity? . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Parallel Sysplex and Continuous Availabil ity . . . . . . . . . . . . . . . 3

1.1.2 Why N + 1 ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Coupl ing Fac il it ies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 Separate Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.2 How M any? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.3 CF Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.4 Coupling Facil ity Structures . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.5 Coupling Facil ity Volati l ity/Nonvolati l i ty . . . . . . . . . . . . . . . . . . 8

1.4 Sysplex T im ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4.1 Duplicating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4.2 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.3 Setting the Time in MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.4 Protect ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 I /O Configura tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.1 ESCON Logical Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.6 CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.6.1 3088 and ESCON CTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.6.2 Alternate CTC Configuration . . . . . . . . . . . . . . . . . . . . . . . . 14

1.6.3 Sharing CTC Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.6.4 IOCP Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.6.5 3088 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.7 XCF Signal l ing Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.8 Dat a P lacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.9 DASD Configura tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.9.1 RAMAC and RAMAC 2 Array Subsystems . . . . . . . . . . . . . . . . 17

1.9.2 3990 Model 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.9.3 3990 Model 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.9.4 DASD Path Recommendat ions . . . . . . . . . . . . . . . . . . . . . . . 17

1.9.5 3990 Model 6 ESCON Logical Path Report . . . . . . . . . . . . . . . . 18

1.10 ESCON Directors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.10.1 ESCON Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.10.2 ESCON Director Switch Matrix . . . . . . . . . . . . . . . . . . . . . . . 19

1.11 Fibe r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.11.1 9729 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.12 Conso les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Copyright IBM Corp. 1995 v

7/27/2019 Tape drive Sg 244503

8/325

1.12.1 Hardware Management Console (HMC) . . . . . . . . . . . . . . . . . 21

1.12.2 How Many HMCs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.12.3 Using HMC As an MVS Console . . . . . . . . . . . . . . . . . . . . . . 21

1.12.4 MVS Conso les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.12.5 Master Console Considerations . . . . . . . . . . . . . . . . . . . . . . 22

1.12.6 Console Configuration Considerations . . . . . . . . . . . . . . . . . . 23

1.13 T ape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.13.1 3490 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.14 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.14.1 VTAM CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.14.2 3745s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.14.3 CF Struc ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.15 Envi ronmental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.15.1 Uninterruptible Power Supply (UPS) . . . . . . . . . . . . . . . . . . . 26

1.15.2 9672/9674 Protection against Power Disturbances . . . . . . . . . . . 27

Chapter 2. System Software Configuration . . . . . . . . . . . . . . . . . . . . . 29

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 N, N+1 in a Software Environment . . . . . . . . . . . . . . . . . . . . . . . 29

2.3 Shared SYSRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.1 Shared SYSRES Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3.2 Indirect Catalog Function . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 M as te r Cat al og . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.5 Dynamic I /O Reconf igurat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.5.1 Except ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 I/O Def ini tion Fi le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7 Couple Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.8 JES2 Checkpo in t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.8.1 JES2 Checkpoint Reconfiguration . . . . . . . . . . . . . . . . . . . . . 39

2.9 RACF Dat abase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.10 PARMLIB Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.10.1 Developing Naming Conventions . . . . . . . . . . . . . . . . . . . . . 40

2.10.2 MVS/ESA SP V5.2 Enhancements . . . . . . . . . . . . . . . . . . . . . 41

2.10.3 MVS Conso les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.11 System Logger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.11.1 Logstream and Structure Allocation . . . . . . . . . . . . . . . . . . . 46

2.11.2 DASD Log Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.11.3 Duplexing Coupling Facility Log Data . . . . . . . . . . . . . . . . . . 47

2.11.4 DASD Staging Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.12 System Managed Storage Considerations . . . . . . . . . . . . . . . . . . 50

2.12.1 SMSplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.12.2 DFSMShsm Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . 52

2.12.3 Continuous Availabil ity Considerations . . . . . . . . . . . . . . . . . 52

2.12.4 RESERVE Activi ty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.13 Shared Tape Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.13.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.13.2 Implementing Automatic Tape Switching . . . . . . . . . . . . . . . . 54

2.14 Exploiting Dynamic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.14.1 Dynamic Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.14.2 Dynamic Subsystem Interface (SSI) . . . . . . . . . . . . . . . . . . . . 56

2.14.3 Dynamic Reconfiguration of XES . . . . . . . . . . . . . . . . . . . . . 57

2.15 Automating Sysplex Failure Management . . . . . . . . . . . . . . . . . . 57

2.15.1 Planning for SFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.15.2 The SFM Isolate Function . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2.15.3 SFM Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

vi Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

9/325

2.15.4 SFM Act ivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.15.5 Stopp ing SFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.15.6 SFM Ut il izat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.16 Planning the Time Detection Intervals . . . . . . . . . . . . . . . . . . . . . 73

2.16.2 Synchronous WTO(R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2.17 ARM: MVS Automatic Restart Manager . . . . . . . . . . . . . . . . . . . . 79

2.17.1 ARM Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

2.17.2 ARM Processing Requirements . . . . . . . . . . . . . . . . . . . . . . 80

2.17.3 Program Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

2.17.4 ARM and Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

2.18 JES3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

2.18.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

2.18.2 JES3 Sysplex Considerations . . . . . . . . . . . . . . . . . . . . . . . 89

2.18.3 JES3 Parallel Sysplex Requirements . . . . . . . . . . . . . . . . . . . 90

2.18.4 JES3 Configurat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

2.18.5 Additional JES3 Planning Information . . . . . . . . . . . . . . . . . . 93

Chapter 3. Subsystem Software Configuration . . . . . . . . . . . . . . . . . . . 95

3.1 CICS V4 Transaction Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . 95

3.1.1 CICS Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.1.2 CICS Af fin it ies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.1.3 File-Owning Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.1.4 Resource Definit ion Online (RDO) . . . . . . . . . . . . . . . . . . . . . 97

3.1.5 CSD Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.1.6 Subsystem Storage Protection . . . . . . . . . . . . . . . . . . . . . . . 98

3.1.7 Transact ion Isolat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

3.2 CICSPlex SM V1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

3.2.1 CICSPlex SM Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.3 IMS Transact ion Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.3.1 IMS Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.3.2 IMS RESLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3.3.3 IMSIDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3.3.4 Terminal Defin it ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3.3.5 Data Set Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.3.6 IRLM Def ini t ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.3.7 Coupling Facil ity Structures . . . . . . . . . . . . . . . . . . . . . . . . 102

3.3.8 Dynamic Update of IMS Type 2 SVC . . . . . . . . . . . . . . . . . . . 102

3.3.9 Cloning Inhib itors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.4 DB2 Subsyst em . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.4.1 DB2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.4.2 DB2 Struc tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.4.3 Changing Structure Sizes . . . . . . . . . . . . . . . . . . . . . . . . . 105

3.4.4 DB2 Data Avai labi li ty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

3.4.5 IEFSSNXX Considerat ions . . . . . . . . . . . . . . . . . . . . . . . . . 105

3.4.6 DB2 Subsystem Parameters . . . . . . . . . . . . . . . . . . . . . . . . 1053.5 VSA M RLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.5.1 Control Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

3.5.2 Defining the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.5.3 Defining the SMSVSAM Structures . . . . . . . . . . . . . . . . . . . . 108

3.5.4 CICS Use of System Logger . . . . . . . . . . . . . . . . . . . . . . . . 109

3.6 TSO in a Parallel Sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

3.7 System Automation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.7.1 NetView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.7.2 AOC/MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.7.3 OPC/ESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Contents vii

7/27/2019 Tape drive Sg 244503

10/325

3.8 VTAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.8.1 Configurat ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Part 2. Making Planned Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Chapter 4. Systems Management in a Parallel Sysplex . . . . . . . . . . . . . 115

4.1 The Importance of Systems Management in Parallel Sysplex . . . . . . 1154.1.1 Change Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.1.2 Prob lem Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.1.3 Operations Management . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.1.4 The Other System Management Disciplines . . . . . . . . . . . . . . 116

4.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Chapter 5. Coupling Facility Changes . . . . . . . . . . . . . . . . . . . . . . . 117

5.1 Structure Attr ibutes and Al location . . . . . . . . . . . . . . . . . . . . . . 117

5.2 Structure and Connection Disposition . . . . . . . . . . . . . . . . . . . . . 118

5.2.1 Structure Disposi t ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.2.2 Connection State and Disposit ion . . . . . . . . . . . . . . . . . . . . 119

5.3 Structure Dependence on Dumps . . . . . . . . . . . . . . . . . . . . . . . 1205.4 To Move a Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.4.1 The Structure Rebuild Process . . . . . . . . . . . . . . . . . . . . . . 121

5.5 Altering the Size of a Structure . . . . . . . . . . . . . . . . . . . . . . . . 123

5.6 Changing the Active CFRM Policy . . . . . . . . . . . . . . . . . . . . . . . 125

5.7 Reformatting the CFRM Couple Data Set . . . . . . . . . . . . . . . . . . . 126

5.8 Adding a Coupl ing Faci l ity . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.8.1 To Define the Coupling Facil ity LPAR and Connections . . . . . . . 127

5.8.2 To Prepare the New CFRM Policy . . . . . . . . . . . . . . . . . . . . 127

5.8.3 Setting Up the Structure Exploiters . . . . . . . . . . . . . . . . . . . . 128

5.9 Servicing the Coupling Facil ity . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.9.1 Concurrent Hardware Upgrades: . . . . . . . . . . . . . . . . . . . . . 132

5.9.2 Concurrent LIC Upgrades . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.10 Removing a Coupling Facil ity . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.11 Coupling Facil ity Shutdown Procedure . . . . . . . . . . . . . . . . . . . 134

5.11.1 Coupling Facil ity Exploiter Considerations . . . . . . . . . . . . . . 138

5.11.2 Shutting Down the Only Coupling Facility . . . . . . . . . . . . . . . 141

5.12 Putting a Coupling Facility Back Online . . . . . . . . . . . . . . . . . . . 142

Chapter 6. Hardware Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.1 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.1.1 Adding a Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.1.2 Removing a Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.1.3 Changing a Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.2 Logical Part it ions (LPARs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.2.1 Adding an LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.2.2 Removing an LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.2.3 Changing an LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.3 I /O De vic es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.4 ESCON Direc tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.5 Changing the Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.5.1 Using the Sysplex Timer . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.5.2 Time Changes and IMS . . . . . . . . . . . . . . . . . . . . . . . . . . 146

6.5.3 Time Changes and SMF . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.5.4 Changing Time in the 9672 HMC and SE . . . . . . . . . . . . . . . . 148

viii Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

11/325

Chapter 7. Software Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.1 Adding a New MVS Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.1.1 Adding a New JES3 Main . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.2 Adding a New SYSRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.2.1 Example JCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.3 Implementing System Software Changes . . . . . . . . . . . . . . . . . . 154

7.4 Add ing Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

7.4.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

7.4.2 IMS Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

7.4.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.4.4 TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.5 Start ing the Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.5.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.5.2 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.5.3 I MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.6 Chang ing Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.7 Moving the Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

7.7.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

7.7.2 I MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.7.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1637.7.4 TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

7.7.5 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

7.7.6 DFSMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.8 Closing Down the Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.8.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7.8.2 I MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7.8.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.8.4 System Automation Shutdown . . . . . . . . . . . . . . . . . . . . . . 169

7.9 Removing an MVS Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Chapter 8. Database Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8.1 VSA M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8.1.1 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8.1.2 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.1.3 Reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.2 I MS/DB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.2.1 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.2.2 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.2.3 Reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8.3.1 Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8.3.2 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

8.3.3 Reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Part 3. Handling Unplanned Outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Chapter 9. Parallel Sysplex Recovery . . . . . . . . . . . . . . . . . . . . . . . 179

9.1 Sys tem Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.1.1 Sysplex Failure Management (SFM) . . . . . . . . . . . . . . . . . . . 179

9.1.2 Automatic Restart Management (ARM) . . . . . . . . . . . . . . . . . 179

9.1.3 What Needs to Be Done? . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.2 Coupling Facil ity Failure Recovery . . . . . . . . . . . . . . . . . . . . . . 180

9.3 Assessment of the Failure Condition . . . . . . . . . . . . . . . . . . . . . 185

9.3.1 To Recognize a Structure Failure . . . . . . . . . . . . . . . . . . . . 185

Contents ix

7/27/2019 Tape drive Sg 244503

12/325

9.3.2 To Recognize a Connectivity Failure . . . . . . . . . . . . . . . . . . . 186

9.3.3 To Recognize When a Coupling Facil ity Becomes Volati le . . . . . . 186

9.3.4 Recovery from a Connectivity Failure . . . . . . . . . . . . . . . . . . 187

9.3.5 Recovery from a Structure Failure . . . . . . . . . . . . . . . . . . . . 188

9.4 DB2 V4 Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . 189

9.4.1 DB2 V4 Built-In Recovery from Connectivity Failure . . . . . . . . . 189

9.4.2 DB2 V4 Built-In Recovery from a Structure Failure . . . . . . . . . . 190

9.4.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . . 190

9.4.4 Manual Structure Rebui ld . . . . . . . . . . . . . . . . . . . . . . . . . 190

9.4.5 To Manually Deallocate and Reallocate a Group Buffer Pool . . . . 190

9.4.6 To Manually Deallocate a DB2 Lock Structure . . . . . . . . . . . . . 191

9.4.7 To Manually Deallocate a DB2 SCA Structure . . . . . . . . . . . . . 192

9.5 XCF Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . . . 192

9.5.1 XCF Built-In Recovery from Connectivity or Structure Failure . . . . 192


9.5.3 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . . 193

9.5.4 Manual Deallocation of the XCF Signall ing Structures . . . . . . . . 193

9.5.5 Parti t ioning the Sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9.6 RACF Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . 194

9.6.1 RACF Built-In Recovery from Connectivity or Structure Failure . . . 1949.6.2 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . . 195


9.6.4 Manual Deallocation of RACF Structures . . . . . . . . . . . . . . . . 196

9.7 VTAM Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . 196

9.7.1 VTAM Built-In Recovery from Connectivity Failure . . . . . . . . . . 196

9.7.2 VTAM Built-In Recovery from a Structure Failure . . . . . . . . . . . 196

9.7.3 The Coupling Facil ity Becomes Volati le . . . . . . . . . . . . . . . . . 196


9.7.5 Manual Deallocation of the VTAM GRN Structure . . . . . . . . . . . 197

9.8 IMS/DB Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . 197

9.8.1 IMS/DB Built-In Recovery from a Connectivity Failure . . . . . . . . 197

9.8.2 IMS/DB Built-In Recovery from a Structure Failure . . . . . . . . . . 198



9.8.5 Manual Deallocation of an IRLM Lock Structure . . . . . . . . . . . . 199

9.8.6 Manual Deallocation of a OSAM/VSAM Cache Structure . . . . . . 199

9.9 JES2 Recovery from a Coupling Facil ity Failure . . . . . . . . . . . . . . . 199

9.9.1 Connectivity Failure to a Checkpoint Structure . . . . . . . . . . . . 199

9.9.2 Structure Failure in a Checkpoint Structure . . . . . . . . . . . . . . 202

9.9.3 The Coupling Facil ity becomes Volati le . . . . . . . . . . . . . . . . . 203

9.9.4 To Manually Move a JES2 Checkpoint . . . . . . . . . . . . . . . . . . 203

9.10 System Logger Recovery from a Coupling Facil ity Failure . . . . . . . . 203

9.10.1 System Logger Built-In Recovery from a Connectivity Failure . . . 203

9.10.2 System Logger Built-In Recovery from a Structure Failure . . . . . 203

9.10.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . 2039.10.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . 204

9.10.5 Manual Deallocation of Logstreams Structure . . . . . . . . . . . . 204

9.11 Automatic Tape Switching Recovery from a Coupling Facility Failure . 204

9.11.1 Automatic Tape Switching Recovery from a Connectivity Failure . 204

9.11.2 Automatic Tape Switching Built-In Recovery from a Structure

Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

9.11.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . 204

9.11.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . 204

9.11.5 Consequences of Failing to Rebuild the IEFAUTOS Structure . . . 205

9.11.6 Manual Deallocation of IEFAUTOS Structure . . . . . . . . . . . . . 205

x Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

13/325

9.12 VSAM RLS Recovery from a Coupling Facil ity Failure . . . . . . . . . . 205

9.12.1 SMSVSAM Built-In Recovery from a Connectivity Failure . . . . . 205

9.12.2 SMSVSAM Built-In Recovery from a Structure Failure . . . . . . . 205

9.12.3 Coupling Facil ity Becoming Volati le . . . . . . . . . . . . . . . . . . 206

9.12.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . 206

9.12.5 Manual Deallocation of SMSVSAM Structures . . . . . . . . . . . . 206

9.13 Couple Data Set Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

9.13.1 Sysplex (XCF) Couple Data Set Failure . . . . . . . . . . . . . . . . 206

9.13.2 Coupling Facility Resource Manager (CFRM) Couple Data Set

Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

9.13.3 Sysplex Failure Management (SFM) Couple Data Set Failure . . . 207

9.13.4 Workload Manager (WLM) Couple Data Set Failure . . . . . . . . . 207

9.13.5 Automatic Restart Manager (ARM) Couple Data Set Failure . . . . 207

9.13.6 System Logger (LOGR) Couple Data Set Failure . . . . . . . . . . . 208

9.14 Sysplex Timer Fai lures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

9.15 Restart ing IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

9.15.1 IMS/IRLM Failures Within a System . . . . . . . . . . . . . . . . . . 210

9.15.2 CEC or MVS Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

9.15.3 Automat ing Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

9.16 Restart ing DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2119.17 Restart ing CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

9.17.1 CICS TOR Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

9.17.2 CICS AOR Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

9.18 Recover ing Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

9.18.1 Recovering an Application Failure . . . . . . . . . . . . . . . . . . . 212

9.18.2 Recovering an MVS Failure . . . . . . . . . . . . . . . . . . . . . . . 213

9.18.3 Recovering from a Sysplex Failure . . . . . . . . . . . . . . . . . . . 213

9.18.4 Recovering from System Logger Address Space Failure . . . . . . 213

9.18.5 Recovering OPERLOG Failure . . . . . . . . . . . . . . . . . . . . . . 213

9.19 Restarting an OPC/ESA Controller . . . . . . . . . . . . . . . . . . . . . . 213

9.20 Recovering Batch Jobs under OPC/ESA Control . . . . . . . . . . . . . 214

9.20.1 Status of Jobs on Failing CPU . . . . . . . . . . . . . . . . . . . . . . 214

9.20.2 Recovery of Jobs on a Fail ing CPU . . . . . . . . . . . . . . . . . . . 214

Chapter 10. Disaster Recovery Considerations . . . . . . . . . . . . . . . . . 215

10.1 Disasters and Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

10.2 Disaster Recovery Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

10.2.1 3990 Remote Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

10.2.2 IMS Remote Site Recovery . . . . . . . . . . . . . . . . . . . . . . . . 216

10.2.3 CICS Recovery with CICSPlex SM . . . . . . . . . . . . . . . . . . . 217

10.2.4 DB2 Disaster Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Appendix A. Sample Parallel Sysplex MVS Image Members . . . . . . . . . 221

A.1 Example Parallel Sysplex Configuration . . . . . . . . . . . . . . . . . . . 221

A.2 I PLPARM M em bers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222A.2.1 LOADAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

A.3 P AR ML IB M em be rs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

A.3.1 IEASYMAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

A.3.2 IEASYS00 and IEASYSAA . . . . . . . . . . . . . . . . . . . . . . . . . 224

A.3.3 COUPLE00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

A.3.4 JES2 Startup Procedure in SYS1.PROCLIB . . . . . . . . . . . . . . . 227

A.3.5 J2G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

A.3.6 J2L42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

A.4 VTAMLST M em bers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

A.4.1 ATCSTR42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Contents xi

7/27/2019 Tape drive Sg 244503

14/325

A.4.2 ATCCON42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

A.4.3 APCIC42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

A.4.4 APNJE42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

A.4 .5 CDRM42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

A.4 .6 MPC03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

A.4.7 TRL03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

A.4 .8 APAPPCAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

A.5 Al locat ing Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

A.5.1 A LL OC JC L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

Appendix B. Structures, How to ... . . . . . . . . . . . . . . . . . . . . . . . . 241

B.1 To Gather Information on a Coupling Facil ity . . . . . . . . . . . . . . . . 241

B.2 To Gather Information on Structure and Connections . . . . . . . . . . . 243

B.3 To Deallocate a Structure with a Disposition of DELETE . . . . . . . . . 245

B.4 To Deallocate a Structure with a Disposit ion of KEEP . . . . . . . . . . . 245

B.5 To Suppress a Connection in Active State . . . . . . . . . . . . . . . . . . 245

B.6 To Suppress a Connection in Failed-persistent State . . . . . . . . . . . 246

B.7 To Monitor a Structure Rebui ld . . . . . . . . . . . . . . . . . . . . . . . . 246

B.8 To Stop a Structure Rebui ld . . . . . . . . . . . . . . . . . . . . . . . . . . 248

B.9 To Recover from a Hang in Structure Rebuild . . . . . . . . . . . . . . . 248

Appendix C. Examples of CFRM Policy Transitioning . . . . . . . . . . . . . . 249

C.1 Changing the Structure Def ini t ion . . . . . . . . . . . . . . . . . . . . . . . 249

C.2 Changing the Coupling Facil ity Definition . . . . . . . . . . . . . . . . . . 255

Appendix D. Examples of Sysplex Partitioning . . . . . . . . . . . . . . . . . . 259

D.1 Parti t ioning on Operator Request . . . . . . . . . . . . . . . . . . . . . . . 259

D.2 System in Missing Status Update Condition . . . . . . . . . . . . . . . . . 260

Appendix E. Spin Loop Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Appendix F. Dynamic I/O Reconfiguration Procedures . . . . . . . . . . . . . 267

F.1 Procedure to Make the System Dynamic I/O Capable . . . . . . . . . . . 267

F.2 Procedure for Dynamic Changes . . . . . . . . . . . . . . . . . . . . . . . 270

F.3 Hardware System Area Considerat ions . . . . . . . . . . . . . . . . . . . 271

F.4 Hardware System Area Expansion Factors . . . . . . . . . . . . . . . . . 272

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

xii Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

15/325

Figures

1. Sample Paral lel Sysplex Cont inuous Avai labi li ty Conf igurat ion . . . . . 5

2. ESCON Logical Paths Configurat ion . . . . . . . . . . . . . . . . . . . . . . 13

3. CTC Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4. Recommended XCF Signall ing Path Conf igurat ion . . . . . . . . . . . . . 165. Recommended DASD Path Conf igurat ion . . . . . . . . . . . . . . . . . . . 19

6. ISCKDSF R16 ESCON Logical Path Report . . . . . . . . . . . . . . . . . . 20

7. Console Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8. Recommended Console Conf iguration . . . . . . . . . . . . . . . . . . . . 25

9. 9910 Local UPS and 9672 Rx2 and Rx3 . . . . . . . . . . . . . . . . . . . . 28

10. Indirect Catalog Funct ion with SYSRESA . . . . . . . . . . . . . . . . . . . 31

11. Indirect Catalog Funct ion with SYSRESB . . . . . . . . . . . . . . . . . . . 32

12. A lt er na te C on sol es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

13. Example of Failure Dependent Connection . . . . . . . . . . . . . . . . . . 48

14. Example of Fai lure Dependent/ Independence Connections . . . . . . . . 49

15. Basic Relat ionship between Sysplex Name and System Group . . . . . 51

16. SMSplex Consist ing of System Group and Individual System Name . . . 51

17. I so la ting a Fa il ing MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

18. INTERVAL and ISOLATETIME Relat ionship . . . . . . . . . . . . . . . . . . 61

19. SFM Policy with the ISOLATETIME Parameter . . . . . . . . . . . . . . . . 62

20. SFM LPARs Act ions T im ings . . . . . . . . . . . . . . . . . . . . . . . . . . 67

21. Sample JCL to Delete a SFM Policy . . . . . . . . . . . . . . . . . . . . . . 72

22. Figure to Show Timing Relat ionships . . . . . . . . . . . . . . . . . . . . . 74

23. JES3 *I S Display Showing Non-Existent Systems . . . . . . . . . . . . . . 88

24. JES3-Managed and Auto-Switchable Tape . . . . . . . . . . . . . . . . . . 90

25. NJE Node Definit ions Portion of JES3 Init Stream . . . . . . . . . . . . . . 91

26. Sample JES3 Proc for Use by Multiple Globals . . . . . . . . . . . . . . . 92

27. Cloned CICSplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

28. CICSPlex SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

29. Sample IMS 5.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . 10030. Sample DB2 Data Sharing Conf igurat ion . . . . . . . . . . . . . . . . . . 104

31. Sample VSAM RLS Data Sharing Conf igurat ion . . . . . . . . . . . . . . 107

32. START Command When Adding a New JES3 Global . . . . . . . . . . . 151

33. V ol um e I nit ia li za tio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

34. Copy SYSRESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

35. SMP/ E ZONEEDIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

36. Add IPL Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

37. Example parallel sysplex Envi ronment . . . . . . . . . . . . . . . . . . . 154

38. Introducing a New Software Level into the parallel sysplex . . . . . . . 155

39. Redist ribut ing Workload on TORs . . . . . . . . . . . . . . . . . . . . . . 162

40. Redist r ibut ing Workload on AORs . . . . . . . . . . . . . . . . . . . . . . 163

41. DB2 Data Sharing Avai labi li ty . . . . . . . . . . . . . . . . . . . . . . . . 16842. Sample Checkpoint Def ini t ion . . . . . . . . . . . . . . . . . . . . . . . . . 200

43. 3990-6 Peer-to-Peer Remote Copy Conf igurat ion . . . . . . . . . . . . . 217

44. 3990-6 Extended Remote Copy Configuration . . . . . . . . . . . . . . . 218

45. IMS Remote Site Recovery Conf iguration . . . . . . . . . . . . . . . . . 219

46. DB2 Data Sharing Disaster Recovery Conf igurat ion . . . . . . . . . . . 220

47. Example Paral lel Sysplex Configurat ion . . . . . . . . . . . . . . . . . . 221

48. LOADAA Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

49. IEASYMAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

50. IEASYS00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

51. IEASYSAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

Copyright IBM Corp. 1995 xiii

7/27/2019 Tape drive Sg 244503

16/325

52. COUPLE00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

53. JES2 Member in SYS1.PROCLIB . . . . . . . . . . . . . . . . . . . . . . . 227

54. J2G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

55. J2L42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

56. ATCSTR42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

57. ATCCON42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

58. APCIC42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

59. APNJE42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

60. CDRM42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

61. MPC03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

62. TRL03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

63. APAPPCAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

64. Allocat ing System Specif ic Data Sets . . . . . . . . . . . . . . . . . . . . 238

65. Coupl ing Facil ity Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

66. St ructures and Connect ions Display . . . . . . . . . . . . . . . . . . . . . 243

67. Monitoring Structure Rebui ld through Exploi ters Messages . . . . . . 246

68. Monitoring Structure Rebuild by Displaying Structure Status . . . . . . 247

69. CFRM Pol icy Samp le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

70. JCL to Instal l a New CFRM Policy . . . . . . . . . . . . . . . . . . . . . . 252

71. Origina l CFRM Pol icy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25672. New CF RM Poli cy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

73. VARY OFF a System without SFM Pol icy Act ive . . . . . . . . . . . . . . 259

74. VARY OFF a System with an SFM Pol icy Act ive . . . . . . . . . . . . . . 260

75. System in Missing Status Update Condition and No Act ive SFM Policy 260

76. System in Missing Status Update with an Active SFM Policy and

CONNFAIL(YES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

77. Resolut ion of a Spin Loop Condit ion . . . . . . . . . . . . . . . . . . . . 264

78. HCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

79. CONFIG Frame Fragment . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

80. HCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

81. Dynamic I /O Customiza tion . . . . . . . . . . . . . . . . . . . . . . . . . . 270

xiv Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

17/325

Tables

1. Couple Data Set Placement Recommendat ions . . . . . . . . . . . . . . . 37

2. JES2 Checkpoint Placement Recommendat ions . . . . . . . . . . . . . . . 39

3. References Conta in ing Information on the Use of System Symbols . . . 42

4. Summary of SFM Keywords and Parameters . . . . . . . . . . . . . . . . 635. I MS Dat a Set s i n Sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6. Automation Recommendat ions . . . . . . . . . . . . . . . . . . . . . . . . 116

7. Suppor t o f REBUILD by IBM Exp lo iters . . . . . . . . . . . . . . . . . . . 123

8. Support o f ALTER by IBM Exp lo iters . . . . . . . . . . . . . . . . . . . . . 124

9. DB2 Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

10. Subsystem Recovery Summary Part 1 . . . . . . . . . . . . . . . . . . 182

11. Subsystem Recovery Summary Part 2 . . . . . . . . . . . . . . . . . . 184

12. Summary of Couple Data Sets . . . . . . . . . . . . . . . . . . . . . . . . 209

Copyright IBM Corp. 1995 xv

7/27/2019 Tape drive Sg 244503

18/325

xvi Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

19/325

Special Notices

This publication is intended to help customers systems and operations personnel

and IBM systems engineers to plan, implement and use a parallel sysplex in

order to get closer to a goal of continuous availabil ity. It is not intended to be a

guide to implementing or using parallel sysplex as such. It only covers topicsrelated to continuous availability.

The information in this publication is not intended as the specification of any

programming interfaces that are provided by MVS Version 5 or any other product

mentioned in this redbook. See the PUBLICATIONS section of the IBM

Programming Announcement for MVS Version 5, or other products, for more

information about what publications are considered to be product documentation.

References in this publication to IBM products, programs or services do not

imply that IBM intends to make these available in all countries in which IBM

operates. Any reference to an IBM product, program, or service is not intended

to state or imply that only IBMs product, program, or service may be used. Any

functionally equivalent program that does not infringe any of IBMs intellectual

property rights may be used instead of the IBM product, program or service.

Information in this book was developed in conjunction with use of the equipment

specified, and is limited in application to those specific hardware and software

products and levels.

IBM may have patents or pending patent applications covering subject matter in

this document. The furnishing of this document does not give you any license to

these patents. You can send license inquiries, in writ ing, to the IBM Director of

Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA.

The information contained in this document has not been submitted to any

formal IBM test and is distributed AS IS. The information about non-IBM

(VENDOR) products in this manual has been supplied by the vendor and IBM

assumes no responsibility for its accuracy or completeness. The use of this

information or the implementation of any of these techniques is a customer

responsibility and depends on the customers ability to evaluate and integrate

them into the customers operational environment. While each i tem may have

been reviewed by IBM for accuracy in a specific situation, there is no guarantee

that the same or similar results wil l be obtained elsewhere. Customers

attempting to adapt these techniques to their own environments do so at their

own risk.

Reference to PTF numbers that have not been released through the normal

distribution process does not imply general availabil ity. The purpose ofincluding these reference numbers is to alert IBM customers to specific

information relative to the implementation of the PTF when it becomes available

to each customer according to the normal IBM PTF distribution process.

The following terms are trademarks of the International Business Machines

Corporation in the United States and/or other countries:

ACF/VTAM Advanced Peer-to-Peer Networking

AIX APPN

CICS CICS/ESA

CICS/MVS CUA

Copyright IBM Corp. 1995 xvii

7/27/2019 Tape drive Sg 244503

20/325

The following terms are trademarks of other companies:

C-bus is a trademark of Corollary, Inc.

PC Direct is a trademark of Ziff Communications Company and is

used by IBM Corporation under license.

UNIX is a registered trademark in the United States and other

countries licensed exclusively through X/Open Company Limited.

Windows is a trademark of Microsoft Corporation.

Other trademarks are trademarks of their respective companies.

DATABASE 2 DB2

DFSMS DFSMS/MVS

DFSMSdfp DFSMSdss

DFSMShsm DFSORT

Enterprise Systems Connection

Architecture

ES/3090

ES/9000 ESA/370

ESA/390 ESCON XDF

ESCON GDDM

Hardware Configuration Definition IBM

IMS IMS/ESA

IPDS LPDA

Magstar MVS/DFP

MVS/ESA MVS/SP

MVS/XA NetView

PR/SM Processor Resource/Systems Manager

PS/2 RACF

RAMAC RETAIN

RMF S/370

S/390 SAA

SQL/DS Sysplex Timer

System/360 System/370

System/390 Systems Applicat ion Architecture

SystemView Virtual Machine/Enterprise Systems

Architecture

Virtual Machine/Extended Architecture VM/ESA

VM/XA VSE/ESA

VTAM

xviii Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

21/325

Preface

This document discusses how the parallel sysplex can help an installation get

closer to a goal of Continuous Availability.

This document is intended for customer systems and operations personnelresponsible for implementing parallel sysplex, and the IBM Systems Engineers

who assist them. It wil l also be useful to technical managers who want to

assess the benefits they can expect from parallel sysplex in this area.

How This Document Is Organized

The document is in 3 parts:

Part 1, Conf iguring for Cont inuous Avai labil i ty

This part describes how to configure both the hardware and software in

order to eliminate planned outages and minimize the impact of unplanned

outages. Chapter 1, Hardware Conf igurat ion

This chapter discusses how to design a hardware configuration for

continuous availability.

Chapter 2, System Software Conf igurat ion

This chapter describes how to configure the system to support

continuous availability and minimize the effort needed to maintain and

run it.

Chapter 3, Subsystem Software Conf igurat ion

This chapter deals with configuring the various subsystems to provide an

environment that will support the goal of continuous availability.

Part 2, Making Planned Changes

This part describes how you can make changes to the sysplex without

disrupting the running of the applications.

Chapter 4, Systems Management in a Parallel Sysplex

This chapter discusses the importance of maintaining good systems

management disciplines in a parallel sysplex environment.

Chapter 5, Coupl ing Faci l ity Changes

This chapter deals with changes that can be made to the coupling

environment, for installation, planned or unplanned maintenance.

Chapter 6, Hardware Changes

This chapter discusses how to add, change or remove hardware

elements of the sysplex in a non-disruptive way.

Chapter 7, Software Changes

This chapter discusses how to make changes such as adding, modifying

or removing system images and subsystems.

Chapter 8, Database Avai labi l i ty

Copyright IBM Corp. 1995 xix

7/27/2019 Tape drive Sg 244503

22/325

This chapter discusses subsystem (CICS, IMS, DB2) configuration options

to minimise the impact of making database changes.

Part 3, Handl ing Unplanned Outages

This part describes how to handle unplanned outages and recover from error

situations with minimal impact to the applications.

Chapter 9, Paral lel Sysplex RecoveryThis chapter discusses how to recover from unplanned hardware and

software failures.

Chapter 10, Disaster Recovery Considerat ions

This chapter contains a discussion of disaster recovery considerations

specific to the parallel sysplex environment.

Related Publications

The publications listed in this section are considered particularly suitable for a

more detailed discussion of the topics covered in this document.

The publications listed are sorted in alphabetical order.

CICS/ESA Release GuideGC33-0655

CICS VSAM Recovery GuideSH19-6709

CICS/ESA Dynamic Transaction Routing in a CICSPlex, SC33-1012

CICS/ESA Version 4 Intercommunication Guide, SC33-1181

CICS/ESA Version 4 Recovery and Restart Guide, SC33-1182

CICS/ESA Version 4 CICS-IMS Database Control Guide, SC33-1184

Concurrent Copy OverviewGG24-3936

DB2 Version 4 Data Sharing: Planning and Administration, SC26-3269

DB2 Version 4 Release Guide, SC26-3394

DCAF V1.2.1 Installation and Using Guide, SH19-6838

DFSMS/MVS V1 R3 DFSMSdfp Storage Administration Reference, SC26-4920 ES/9000 and ES/3090 PR/SM Planning Guide, GA22-7123

ES/9000 9021 711-based Models Functional Characteristics, GA22-7144

ES/9000 9121 511-based Models Functional Characteristics, GA24-4358

Hardware Management Console Application Programming Interfaces,

SC28-8141

Hardware Management Console Guide, GC38-0453.

IBM CICS Transaction Affinities Utility Users Guide, SC33-1159

IBM CICSPlex Systems Manager for MVS/ESA Concepts and Planning,

GC33-0786.

IBM Token-Ring Network Introduction and Planning Guide, GA27-3677

IBM 3990 Storage Control Reference for Model 6, GA32-0274

IBM 9037 Sysplex Timer and System/390 Time Management, GG66-3264 Implementing Concurrent Copy, GG24-3990

IMS/ESA Version 5 Administration Guide: Data Base, SC26-8012

IMS/ESA Version 5 Administration Guide: System, SC26-8013

IMS/ESA Version 5 Administration Guide: Transaction Manager, SC26-8014

IMS/ESA V5 Operations Guide, SC26-8029

IMS/ESA Version 5 Sample Operating Procedures, SC26-8032

JES2 Multi-Access Spool in a Sysplex Environment, GG66-3263

Large System Performance Reference Document, SC28-1187

LPAR Dynamic Storage Reconfiguration, GG66-3262

MVS/ESA Hardware Configuration Definition:Planning, GC28-1445

xx Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

23/325

MVS/ESA RMF Users Guide, GC33-6483

MVS/ESA RMF V5 Getting Started on Performance Management, LY33-9176

MVS/ESA SML:Implementing System-Managed Storage, SC26-3123

MVS/ESA SP V5 Hardware Configuration Definition: Users Guide, SC33-6468

MVS/ESA SP V5 Assembler Services Guide, GC28-1466

MVS/ESA SP V5 Authorized Assembler Services Guide, GC28-1467

MVS/ESA SP V5 Authorized Assembler Services Reference, Volume 2,

GC28-1476

MVS/ESA SP V5 Conversion Notebook, GC28-1436

MVS/ESA SP V5 Initialization and Tuning Guide, SC28-1451

MVS/ESA SP V5 Initialization and Tuning Reference, SC28-1452

MVS/ESA SP V5 Installation Exits, SC28-1459

MVS/ESA SP V5 JCL Reference, GC28-1479

MVS/ESA SP V5 JES2 Initialization and Tuning Reference, SC28-1454

MVS/ESA SP V5 JES2 Commands, GC28-1443

MVS/ESA SP V5 JES3 Commands, GC28-1444

MVS/ESA SP V5 Planning: Global Resource Serialization, GC28-1450

MVS/ESA SP V5 Planning: Security, GC28-1439

MVS/ESA SP V5 Planning: Operations, GC28-1441

MVS/ESA SP V5 Planning: Workload Management, GC28-1493 MVS/ESA SP V5 Programming: Assembler Services References, GC28-1474

MVS/ESA SP V5 Programming: Sysplex Services Guide, GC28-1495

MVS/ESA SP V5 Programming: Sysplex Services Reference, GC28-1496

MVS/ESA SP V5 Setting Up a Sysplex, GC28-1449

MVS/ESA SP V5 System Commands, GC28-1442

MVS/ESA SP V5 Sysplex Migration Guide, SG24-4581

MVS/ESA SP V5 System Management Facilities (SMF) , GC28-1457

S/390 MVS Sysplex Application Migration, GC28-1211

S/390 MVS Sysplex Hardware and Software Migration, GC28-1210.

S/390 MVS Sysplex Overview: An Introduction to Data Sharing and

Parallelism, GC28-1208

S/390 MVS Sysplex Systems Management, GC28-1209

S/390 9672/9674 Managing Your Processors, GC38-0452

S/390 9672/9674 System Overview, GA22-7148

SMP/E R8 Reference, SC28-1107

Sysplex Timer Planning, GA23-0365

TSO/E V2 Users Guide, SC28-1880

TSO/E V2 CLISTs, SC28-1876

TSO/E V2 Customization, SC28-1872

VTAM for MVS/ESA Version 4 Release 3 Migration Guide, GC31-6547

International Technical Support Organization Publications

Automating CICS/ESA Operations with CICSPlex SM and NetView, GG24-4424 Batch Performance, SG24-2557

CICS Workload Management Using CICSPlex SM And the MVS/ESA Workload

Manager, GG24-4286

CICS/ESA and IMS/ESA: DBCTL Migration For CICS Users, GG24-3484

DFSMS/MVS Version 1 Release 3.0 Presentation Guide, GG24-4391

DFSORT Release 13 Benchmark Guide, GG24-4476

Disaster Recovery Library: Planning Guide, GG24-4210

MVS/ESA Software Management Cookbook, GG24-3481

MVS/ESA SP-JES2 Version 5 Implementation Guide, SG24-4583

MVS/ESA SP-JES3 Version 5 Implementation Guide, SG24-4582

Preface xxi

7/27/2019 Tape drive Sg 244503

24/325

MVS/ESA Version 5 Sysplex Migration Guide, SG24-4581

MVS/ESA Sysplex Migration Guide, GG24-3925

Planning for CICS Continuous Availability in an MVS/ESA Environment,

SG24-4593

RACF Version 2 Release 1 Installation and Implementation Guide, GG2

RACF Version 2 Release 2 Technical Presentation Guide, GG24-2539

Sysplex Automation and Consoles, GG24-3854

S/390 Microprocessor Models R2 and R3 Overview, SG24-4575

S/390 MVS Parallel Sysplex Continuous Availability Presentation Guide,

SG24-4502

S/390 MVS Parallel Sysplex Performance, GG24-4356

S/390 MVS/ESA Version 5 WLM Performance Studies, SG24-4352

Storage Performance Tools and Techniques for MVS/ESA, GG24-4045

A complete list of International Technical Support Organization publications,

known as redbooks, with a brief description of each, may be found in:

International Technical Support Organization Bibliography of Redbooks,

GG24-3070.

To get a catalog of ITSO redbooks, VNET users may type:

TOOLS SENDTO WTSCPOK TOOLS REDBOOKS GET REDBOOKS CATALOG

A listing of all redbooks, sorted by category, may also be found on MKTTOOLS

as ITSOCAT TXT. This package is updated monthly.

How to Order ITSO Redbooks

IBM employees in the USA may order ITSO books and CD-ROMs using

PUBORDER. Customers in the USA may order by calling 1-800-879-2755 or by

faxing 1-800-445-9269. Most major credit cards are accepted. Outside the

USA, customers should contact their local IBM office. For guidance on

ordering, send a PROFS note to BOOKSHOP at DKIBMVM1 or E-mail [email protected].

Customers may order hardcopy ITSO books individually or in customized

sets, called BOFs, which relate to specif ic functions of interest. IBM

employees and customers may also order ITSO books in online format on

CD-ROM collections, which contain redbooks on a variety of products.

ITSO Redbooks on the World Wide Web (WWW)

Internet users may find information about redbooks on the ITSO World Wide Web

home page. To access the ITSO Web pages, point your Web browser to thefollowing URL:

http://www.redbooks.ibm.com/redbooks

IBM employees may access LIST3820s of redbooks as well. The internal

Redbooks home page may be found at the following URL:

http://w3.itsc.pok.ibm.com/redbooks/redbooks.html

xxii Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

25/325

Acknowledgments

This publication is the result of a residency conducted at the International

Technical Support Organization, Poughkeepsie Center.

The advisor for this project was:

The authors of this document are:

G. Tom Russell

International Technical Support Organization,

Poughkeepsie

Paola Bari

I BM I ta ly

Margaret Beal

IBM Aust ral ia

Horace Dyke

IBM Canada

Patr ick Kappeler

IBM France

Paul ONeill

IBM Nord ic

Ian Wai te

IBM U K

Preface xxiii

7/27/2019 Tape drive Sg 244503

26/325

xxiv Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

27/325

Part 1. Configuring for Continuous Availability

This part describes how to configure both the hardware and software in order to:

Eliminate planned outages

Minimize the impact of unplanned outages

Copyright IBM Corp. 1995 1

7/27/2019 Tape drive Sg 244503

28/325

2 Continuous Availability with PTS

7/27/2019 Tape drive Sg 244503

29/325

Chapter 1. Hardware Configuration

This chapter discusses how to design a hardware configuration for continuous

availabil ity. This means eliminating all single points of failure, and giving the

possibility to make changes to hardware and software without disrupting the

running of the applications.

1.1 What Is Continuous Availability?

When we speak about continuous availability we are really dealing with two

different but interrelated topics, high availability and continuous operations.

High availability has to do with keeping the applications running without any

breakdown during the planned opening hours. The way we achieve this is by a

combination of high reliability for the individual components of the system and of

redundancy of components so that even if a component fails there is another one

there that can replace it.

Continuous operations on the other hand is about keeping the applications and

systems running without any planned stops. This in itself would not be too big a

problem if it were not for the opposing but equally urgent need for

responsiveness to changing business requirements. So the simplistic solution to

freeze all changes just will not do.

What the end users increasingly require is that the applications are kept running

without any planned or unplanned stops, and this is what we mean by continuous

availabil ity.

Up to now the only real solution to these requirements has been redundancy at

the system level. This is a costly solution, but organizations such as airl ines that

have these requirements often have two complete systems, where one runs theproduction and the other is a hot standby, and they can switch the production

from one system to the other quickly. Then if they have an unplanned

breakdown on the production system the standby one takes over with a

minimum delay. Having a second system also allows them to make planned

changes to the standby system, and then switch the production over to it when

they are ready to bring the change into operation.

1.1.1 Parallel Sysplex and Continuous AvailabilityThe parallel sysplex was designed to:

Provide a single system image to the end-user of the application

Support multiple copies of the applications, and provide services for dynamicbalancing of the workload over the multiple copies

Provide locking facilities to allow data to be shared among the multiple

copies of the applications with integrity

Provide services to facilitate communication between the multiple copies

From the perspective of continuous availability, the two most important functions

provided by a parallel sysplex are:

Data Sharing

Copyright IBM Corp. 1995 3

7/27/2019 Tape drive Sg 244503

30/325

Which allows multiple instances of an application running on multiple

systems to work on the same databases simultaneously.

Workload Balancing

Which means that the workload can be distributed evenly across these

multiple application instances. This is made possible by the fact that they

can share data.

These radically new possibilities provided by parallel sysplex change the way we

approach continuous availabil ity.

Today, a specific system provides the infrastructure for a major customer

application. The loss or degradation of that system can severely impact the

customer s business.

In the parallel sysplex environment, where multiple cooperating systems provide

the infrastructure, the loss or degradation of one of the many identical systems

has little impact.

This means that we can now design a system that is fault-tolerant from both a

hardware and software perspective, giving us the possibility of the following:

Very High Availability

With redundancy in both hardware and software we can eliminate

points-of-failure, and workload balancing can ensure that the work being

done on a lost component will be distributed across the remaining ones.

Nondisruptive Change

Hardware changes can be made by removing the system that needs to be

changed from the sysplex while the applications continue to run on the

remaining systems, making the change, and then returning the system to the

sysplex.

Software changes can be achieved in a similar way, provided that thechanged version of the software in question can co-exist with the current

ones in the sysplex. This coexistence (at level N and N+1) is a design

objective of the IBM systems and subsystems that support parallel sysplex.

This shift in philosophy changes the way we think about designing the

configuration in a parallel sysplex. In order to take advantage (or exploit) the

parallel sysplex there must be more than one of each hardware component, and

the software must be designed for cloning.

If the application requires N images in order to provide the processing capacity,

then the system designer should provide N+1 images in the sysplex.

1.1.2 Why N+1 ?When designing systems for high availability we must always consider the

possibil ity that a component can fail. If we build the system with redundant

components such that, even if any component does fail, the system will continue

to function, then we have a fault-tolerant system. We can also say that we have

no single point of failure.

Obviously this component redundancy has a cost. The simplest, but most

expensive solution, is to duplicate everything. This is often not an economically

viable alternative. Fortunately there are others.


7/27/2019 Tape drive Sg 244503

31/325

If we assume that the individual components of the system are inherently

reliable, that is that the probability of failure is very low for each component,

then the probability of more than one failing at any one time is extremely low,

and can be ignored. So, if we need a number of components (N) to do a

particular job, all we need to do is allocate one extra to allow for the possibility

of failure, and these N+1 components give us the redundancy we need. The

larger the number of components (N) sharing the work, the less the relative cost

of this redundancy.

In other words, if we are flying in a two-engined plane and want to be safe in the

case of an engine failure, then one engine must be able to f ly the plane. This

means one of the two engines (50%) is redundant. If it is a four-engined plane

then we want to be able to continue with three engines, so the fourth one (25%)

is redundant.

In the same way we have been building hardware redundancy into computer

systems for some time, the number of channels to I/O units, power supplies in

the processor, and so on.

Now with parallel sysplex we can take this concept one step further, and

introduce N+1 redundancy in the number of machines or system images in the

system. This allows us to configure for the failure of entire machines or system

images and sti l l keep the system on the air.

Figure 1. Sample Paralle l Sysplex Continuous Availabi l ity Configurat ion. The coupling faci l it ies, sysplex t imers

and all the links are duplicated to eliminate single points of failure.

Ch ap ter 1 . Ha rd wa re Co nf igu rat ion 5

7/27/2019 Tape drive Sg 244503

32/325

1.2 Processors

The first prerequisite is that we have multiple processors following the N+1

philosophy outlined above.

1.2.1.1 CMOS-Only Sysplexes

If we are designing a configuration from scratch, using CMOS processors, thenthis is just a matter of deciding what the optimal processor size is and then

configuring N+1 identical machines, where N of these are sufficient to run the

workload.

In theory, the larger N is (that is, the smaller the individual machines) the less is

the cost of the redundant N+1 machine.

In practice there are counterbalancing reasons, such as the following:

The performance overhead on the sysplex (between 0.5% and 1% for each

extra machine).

The extra human effort in managing more machines (which will depend on

how well the systems management procedures and tools can handle multiplemachines).

The extra work involved in maintaining more system images (which will

depend on how well the clones are replicated and on how well the naming

and other installation standards support this).

How useful small machines are in handling the workload. If there are

components in the workload that require larger machines to perform

satisfactorily then this will tend to reduce the number of ways we can split

the sysplex.

1.2.1.2 Mixed SysplexesVery often a sysplex will be a mixture of large bipolar and smaller CMOS

machines. This is for many installations a natural evolution from their current

bipolar configurations and allows these machines to continue their useful life into

the parallel sysplex world. It may also be necessary to keep these larger

machines because parts of the workload need either the larger system image or

the more powerful engines that these provide.

In many cases it is not realistic to adopt a simplistic N+1 approach to these

configurations with large machines due to the high cost of having a redundant

large processor. In any event we are often dealing here with a transit ion state,

where not all of the work can be partit ioned on a sysplex. What we need to

consider from an availability viewpoint is the effect of the failure of each machine

in the configuration, and particularly the larger ones. We must ensure that there

is reserve capacity available to take over the essential work from that machine.This may involve removing or reducing the priority of some other nonessential

work.


7/27/2019 Tape drive Sg 244503

33/325

1.3 Coupling Facilities

The recommended configuration of coupling facilities for availability is to have at

least two of them, and as separate 9674s, not partitions in processors doing

other work.

1.3.1 Separate MachinesThe reason for having them as separate machines is that if a coupling facility

fails then the structures it contains will have to be rebuilt in another coupling

facility, and this rebuild will be done using data from the coupled MVS-systems.

If you run the coupling facility in a partition in a machine which is also running

one of the systems in the sysplex, then a hardware failure on this machine will

not only bring down the coupling facility but also one of the sources needed to

rebuild it. The only way to recover from this situation is to restart the whole

sysplex.

1.3.2 How Many?In deciding how many coupling facil it ies you need, the same N+1 considerations

apply as we have seen for processors. If one fails, we need to have sufficientprocessor capacity and memory available in the remaining ones to rebuild the

structures and handle the load.

The simplest design is where we have two coupling facilities, each of which has

enough processor power and memory to handle the entire sysplex. In normal

production we can then distribute the structures over these, and for each

structure specify the other CF as the alternate for rebuild in case of a failure.

1.3.3 CF LinksThe recommended number of CF links to each machine in the sysplex is at least

two, for avai labi li ty reasons. You may need more for performance. SeeParallel

Sysplex Performance, GG24-4356. Note that each of these receiver links (at theCF end) is separate. Sender l inks (at the MVS end) can be shared between

partitions in a fashion similar to EMIF, so even if you have several partitions you

will only need two links per machine for each CF you need to connect to. If you

have an MP-machine which you plan to partition for any reason, then this means

two links per CF on each side of the machine.

In the coupling facility, one Intersystem Channel Adapter (fc #0014) is required

for every two coupling links (#0007 or #0008). The Intersystem Channel Adapter

is not hot pluggable, but the coupling links are. If you do not have a redundant

9674 to switch the coupling load to, you may want to consider installing

additional Intersystem Channel Adapters to allow for additional coupling links to

be installed without an outage in the future. For details on hot plugging, refer tothe 9672/9674 System Overview, GA22-7148.

1.3.4 Coupling Facility StructuresThere could be some planned activities that require a coupling facility shutdown.

A coupling facil ity cannot be treated as a normal device. It requires a particular

procedure to be unallocated by the subsystems and the shutdown can be

disruptive or not depending on the initial coupling facility setting and the usage

made by each different user. Here we will go through some considerations that

can be useful in designing the coupling facility environment and making it

possible to remove structures.


7/27/2019 Tape drive Sg 244503

34/325

While designing the coupling facility environment, you should consider which

structures must be relocated to an alternate coupling facil ity. Some subsystems

can continue to operate without their coupling facility structure, although there

may be a loss of performance. For example, the JES2 checkpoint can be

relocated to DASD and the RACF structure can simply be deallocated while

coupling facil ity maintenance is being performed. For the remaining structures,

you must ensure that enough capacity (storage, CPU cycles, link connections

structure IDs, etc.) exists on an alternate coupling facility to allow structures to

be rebuilt there.

When you set up your coupling facility configuration you should provide

definitions that enable the structures to be moved or rebuilt; structures being

moved to the alternate coupling facility must have the alternate coupling facility

name in the PREFLIST statement. The following is an example on how to define

a structure that can be rebuilt:

STRUCTURE NAME(IEFAUTOS) SIZE(640)REBUILDPERCENT(20)PREFLIST(CF01, CF02)

For structures that will be moved (REBUILT) from the outgoing coupling facility toan alternate coupling facility, ensure that all systems using the structures have

connectivity to the alternate coupling facility.

1.3.5 Coupling Facility Volatility/NonvolatilityPlanning a coupling facility configuration for continuous availability requires

particular attention to the storage volatility of the coupling facility where shared

data resides. The advantages of a nonvolati le coupling facil ity are that if you

lose power to a coupling facility that is configured to be nonvolatile, the coupling

facility enters power save mode, saving the data contained in the structures.

Continuous availability of structures can be provided by making the coupling

facility storage contents nonvolatile.

This can be done in different ways depending on how long a power loss we want

to allow for:

With a UPS

With an optional battery backup feature

With a UPS plus a battery backup feature

For more details on this see 1.15.2, 9672/9674 Protection against Power

Disturbances on page 27.

The volatility or nonvolatility of the coupling facility is reflected by the volatility

attribute, and can be monitored by the system and subsystems to decide onrecovery actions in the case of power failure.

There are some subsystems that are very sensitive to the status of this coupling

facility attribute, like the system logger, and they can behave in different ways

depending on the volati l i ty status. To set the volatil i ty attribute you should use

the coupling facility control code command:

Mode Powersave

This is the default setup and automatically determines the volatility status of

the coupling facility based on the presence of the battery backup feature. If


7/27/2019 Tape drive Sg 244503

35/325

the battery backup is installed and working, the CFCC sets its status to

nonvolati le. The battery backup feature will preserve coupling facil ity

storage contents across a certain time interval (default is 10 seconds).

Mode Non-Volatile

This command should be used to inform the CFCC to set non-volatile status

for its storage because a UPS is installed.

Mode Volatile

This command informs the CFCC to put its storage in volatile status

irrespective of whether there is a battery or not.

There are considerations in coupling facility planning depending on the

sensitivity of subsystem users to coupling facility volatile/nonvolatile status:

JES2

JES2 can use a coupling facility structure for primary checkpoint data set,

and its alternate checkpoint data set can either be in a coupling facility or on

DASD. Depending on the volati l i ty of the coupling facil ity, JES2 will or wil l

not allow you to have both primary and secondary checkpoint data sets on

the coupling facility.

Logger

The system logger can be sensitive to the volatile/nonvolatile status of the

coupling facility where the LOGSTREAM structures are allocated.

Particularly, depending on the coupling facility status, the system logger is

able to protect its data against a double failure (MVS failure together with

the coupling facil ity). When you define a LOGSTREAM you can specify the

following parameters:

STG_DUPLEX(NO/YES)

Specifies whether the coupling facility logstream data should be

duplexed on DASD staging data sets. You can use this specif icationtogether with the DUPLEXMODE parameter to be configuration

independent.

DUPLEXMODE(COND/UNCOND)

Specifies the conditions under which the coupling facility log data will be

duplexed in DASD staging data sets. COND means that duplexing will be

done only if the logstream contains a single point of failure and is

therefore vulnerable to permanent log data loss:

- Logstream is allocated to a volati le coupling facil ity residing on the

same machine as the MVS system.

- Duplexing will not be done if the coupling facil ity for the logstream is

nonvolatile and resides on a different machine than the MVS system.

DB2

DB2 requests of MVS that structures be allocated in a nonvolatile coupling

facility; however, it does not prevent allocation in a volatile coupling facility.

DB2 does issue a warning message if allocation occurs into a volatile

coupling facil ity. A change in volati l ity after allocation does not have an

effect on your existing structures.

The advantages of a nonvolatile coupling facility are that if you lose power to

a coupling facility that is configured to be nonvolatile, the coupling facility


7/27/2019 Tape drive Sg 244503

36/325

enters power save mode, saving the data contained in the structures. When

power is returned, there is no need to do a group restart, and there is no

need to recover the data from the group buffer pools. For DB2 systems

requiring high availability, nonvolatile coupling facilities are recommended.

SMSVSAM Lock

The coupling facility IGWLOCK00 lock structure is recommended to be

allocated in a nonvolati le coupling facil ity. This lock structure is used to

enforce the protocol restrictions for VSAM RLS data sets and maintain the

record level locks. The support requires a single CF lock structure.

IRLM Lock

The lock structures for IMS or DB2 locks are recommended to be allocated in

a nonvolati le coupling facil ity. Recovery after a power failure is faster if the

locks are still available.

IMS Cache Directory

The cache directory structure for VSAM or OSAM databases can be

allocated in a nonvolatile or volatile coupling facility.

VTAM

The VTAM Generic Resources structure ISTGENERIC can be allocated in

either a nonvolati le or a volati le coupling facil ity. VTAM has no special

processing for handling a coupling facility volatility change.

1.4 Sysplex Timers

In a multi-system sysplex it is necessary to synchronize the Time-of-Day (TOD)

clocks in all the systems very accurately in order to maintain data integrity. If all

the systems are in the same CPC, under PR/SM, then this is no problem as they

are all using the same TOD clock. If the systems are spread across more than

one CPC then the TOD clocks in all these CPCs must be synchronized using a

single external time source, the sysplex timer.

The IBM Sysplex Timer (9037) is a table-top unit that can synchronize the TOD

clocks in up to 16 processors or processor sides, which are connected to it by

fiber-optic l inks. For full details see IBM 9037 Sysplex Timer and System/390

Time Management, GG66-3264-00.

The sysplex cannot continue to function without the sysplex t imer. If any system

loses the timer signal, it will be fenced from the sysplex and put in an

unrestartable wait state.

1.4.1 DuplicatingWhen the Expanded Availability Feature is installed, two 9037 devices linked to

one another, provide a synchronized, redundant configuration. This ensures that

the failure of one 9037, or a fiber optic cable, will not cause loss of time

synchronization. It is recommended that each 9037 have its own AC power

source, so that if one source fails, both devices are not affected.

Note that these two timers must be within 2.2 meters of one another.

The sysplex timer attaches to the processor via the processors Sysplex Timer

Attachment Feature. Dual ports on the attachment feature permit redundant

connections, so that there is no single point of failure.


7/27/2019 Tape drive Sg 244503

37/325

1.4.2 DistanceThe processors are connected to the timer by a multi-mode fiber, and can be up

to three kms from the timer, depending on the fiber. Distances between the

sysplex timer and CECs beyond 3,000 meters are supported by RPQ 8K1919.

RPQ 8K1919 allows the use of single mode fiber optic (laser) links between the

processor and the 9037. To support single mode fiber on the 9037, a specialLED/laser converter has been designed called the 9036 Model 003. The 9036-003

is designed for use only with a 9037, and is available only as RPQ 8K1919. Two

9036-003 extenders (two RPQs) are required between the 9037 and each sysplex

timer attachment port on the processor.

The si

Tape drive Sg 244503

Documents

Transcript of Tape drive Sg 244503