New Measurement Data Sources for MVS XA and its Subsystems

Abstract

New Measurement Data Sources For MVS X/A and Its Subsystems

H. W. Barry Merrill, PhD President-Programmer Merrill Consultants Dallas, Texas 75229

Since MVS/XA 2.1.0 became available in 1984, IBM has made significant changes in MVS/XA software and hardware architecture. New data exist to measure and manage these enhancements. SMF/RMF measurement of the vector processor, extended storage and the 3090 I/O processing enhancements will be discussed Additional enhancements and the incompatible changes that have been announced in MVS/XA 2.2.1 will be described.

This paper is extracted from Merrill's Expanded Guide Supplement, published by SAS Institute in 1987, and examples use the names of SAS® System variables and data sets as built by the MXG® Software.

OVERVIEW

TYPE 70 CPU DATA

VECTOR FACTILITY ARCHITECTURE

TYPE 71 EXTENDED STORAGE ARCHITECTURE

TYPE 72 WORKLOAD MEASUREMENT

TYPE 74 DEVICE MEASUREMENT

TYPE 78 I/O FOR 3090

TYPE 78 VIRTUAL STORAGE

MVS 2.2 ANNOUNCEMENT

1 MIP 2 MEG

~~.

, ~. 'j '. r-}, .-t ~-

f ~ ~ (, !: r f

~ ~ [ ~ ~

~

~

I I 403

~

TYPE70 RMF CPU Activity

There have been several significant changes to TYPE70 since 1984.

What is a CPU? Originally, a CPU was the Central Processing Unit; for some time, we have had multiple CPUs executing under the control of one operating system, and they have been called MPs, Dual Processors, APs, Dyadics, Triadics, Quads, IPl and non-IPl, and so forth. In some papers, this author has used the term multi-engine system to describe these complexes in which more than one CPU is under the control of a single copy of the operating system. Recently, IBM has introduced another term, the Central Electronic Complex (CEC), to describe the physical hardware that contains one or more CPUs (processors, engines) under the control of a single copy of the operating system. Take your pick, and take your chances!

The 3084 CPUs were numbered zero through three, but the 3090-400 are numbered one through four, and the 3090-800 probably will be numbered one through eight. MXG now creates nine variables, suffixed 0 through 8, for the CPU-specific values.

MVS 2.1.7 reversed the order in which TPI I/O is serviced. Formerly, TPI interrupts were serviced from low to high CPU number; now, the selection for handling TPI is from high to low CPU number. The TYPE70 I/O interrupt rate on 3090s is somewhat higher (5%?) than the TYPE7810 lOP activity rate. The difference may measure I/O redrive requests.

The recursive scan of the dispatcher queue before entry into the wait state, the primary cause of the low utilization effect discussed on page 98 in [1], has been significantly reduced. Instead of scanning all address spaces (including logically swapped ASIDs), the MVS 2.1.3 dispatcher scans only address spaces in memory. MVS 2.1.7 eliminates the recursive scan entirely, which has produced a measurable reduction in uncaptured CPU time. See CPUOVHTM in the RMFINTRV section of Chapter 40 in [2].

SERVICE UNITS PER SECOND

The Service Units per second value (SU SEC, also called the Machine Dependent Constant, MDC) is now contained in the type 72 record. The table look-up in VMAC7072, which set SU SEC from CPUTYPE and CPUVERSN, is not used after RMF 3.3, and updates- to VMAC7072 with each new processor are no longer necessary. The importance of SU SEC is unchanged, and its value can still be determined; see [1], pp 686-687.

404

Because the SU SEC is derived by IBM from benchmarks (analyzed with you know who's software), the value has proven to be an excellent estimate of the processor capacity prior to the availability of actual hardware benchmarks for your own workload. A major source of confusion, however, is the apparent inconsistency in SU SEC values between processors. For example, the SU SEC for a 3081KX is 399, but a 3084QX value is 373.8. Since a 3084QX is simply two 3081KX's, why are they not equal? The reason is that the purpose of the MDC is to make CPU service units consistent.

When benchmarks show that a four-engine CEC records more seconds of task CPU time (type 30) than the same task recorded for a two-engine CEC, the MDC of the four-engine CEC must be smaller than the value for the two-engine CEC. The CPU time (TCB, SRB, and even uncaptured) is larger for a task in a dyadic than for one in a uni-processor. CPU time is accumulated while a task is executing, and if the task needs data that are not in cache storage, the task's CPU time continues to accumulate while the CPU competes with other CPUs for storage control. The competition (and therefore the measurable increase in CPU) grows with the number of engines. In actuality, the SU SEC value is a single constant for an engine, and RMF de-rates-the constant based on the number of engines currently on-line:

Processors on-line: 1 2 3 4 5 6

RMF 2.1.3 derate 100% 93% 88% 85% 80% 76%

Prior derate 100% 95% 92% 89%

An additional complication is the confusion of CPU speed with CPU capacity. In the preceding example, the 3081KX with a speed of 399 SU_SEC has two processors for a capacity of 798 SU_SEC, while the 3084QX speed of 373.8 provides a service unit capacity of 1495.2 per second. The following table provides both the speed and the capacity in raw service units for several past and present IBM processors.

CEC MDC SU.SEC CEC MDC SU SEC CEC MDC SU SEC Model Speed Capacity Model Speed Capacity Model Speed Capacity

155 42 42 3090-150 451.6 451.6 3090-150E 471.9 471.9 168 151 151 3090-180 750 750 3090-180E 823.5 823.5

3033U 261.3 261.3 3090-200 712.5 1425 3090-200E 765.9 1531.8 4381-R14 200.6 401.2 3090-300E 724.7 2174.1

3083JX 420 420 3090-400 667.5 2670 3090-400E 700 2800 3081KX 399 798 3090-600E 625.9 3755.4 3084QX 373.8 1495.2

As with any set of numbers, use these with caution. Recognize that they are aimed at maintaining consistency of service units for typical work and will not always apply to all processors or all workloads. Only an actual benchmark of your workload on your processor will provide exact measures.

405

ASID Count by State

These measures [1], pp 687-688, are good indicators of the overall system health, and they can specifically indicate constraints on the multiprogramming level (MPL) that can. result from incorrect specification of MIN/MAX MPL values for each domain. Thanks to Bill Mullen's 1986 CMG presentation for the following guidelines:

READYAVG (the average number of IN and READY address spaces) should be less than 10% of INAVG. A too-high percentage of IN memory ASIDs is one result of specifying an MPL that is too high, which can cause significant queueing delays waiting for CPU dispatch. The suggested 10% value will maintain a queue (so the processor is not idle), but it will keep the delay small.

OUTAVG (the average number of OUT and READY address spaces) should not be greater than one for each active CPU engine. A higher value indicates a constraint on domain MIN/MAX values that could be examined by tracing (TYPE76) domain utilizations or by the display domain operator command.

LRDYAVG (the average number of LOGICALLY OUT and READY address spaces) should be zero. Like OUTAVG, it indicates possible domain constraints.

Vector Facility Measurement

The optional vector facility is described by its on-line status (VFONn) and the affinity duration (VFAFFTMn) for each CPU engine.

Vector facility affinity time is the duration of CPUTCBTM accumulated for all tasks having affinity to the vector processor. It is captured separately for each CPU. engine. When a task first uses a vector instruction, it is given affinity to only the processors that have the vector facility. It keeps this affinity until the dispatcher detects that vector instructions have not been used for a long time. (A long time is 50 milliseconds of TCB with no vector instruction.) A task that has lost its vector affinity because of non-use can regain affinity when it again issues a vector instruction. Vector affinity time, VFAFFTMn, is accumulated for each engine that has a vector processor only when a task with vector affinity is dispatched and is executing under its TCB in that engine. Only one task can accumulate VFAFFTMn at one time in one engine, although several vector tasks can be waiting to be dispatched on that same engine.

Affinity time should be used for capacity planning and cost recovery of the vector processor. There is no more vector facility capacity when those engines with a vector processor record 100% affinity time. Because affinity time is a part of CPU TCB time, 100% affinity time occurs when VPAFFTMn equals CPUTCBn and the processor is 100% active. CPUTCBn is not directly measurable (except on a UP), but MVS uncaptured CPU time and SRB CPU time cause the measured VPAFFTM at 100% capacity to be significantly less than 100% of DURATM.

406

Vector affinity time is also accumulated by step in the TYPE30 4 and TYPE30 5 data set variables VPIAFFTM and VPUAFFTM. Two additiona~ step level variables, VPIUSETM and VPUUSETM, describe the duration of actual vector processor execution within affinity time. See the TYPE30 section in Chapter Forty in this supplement.

For additional reference material on the vector facility, see Paul Dorn's excellent Washington Systems Center technical bulletin, "MVS/SP 2.1.3 Vector Facility Enhancement Overview," lG66-0227-00, and the "3090 Vector Facility Technical Reference," GG24-3058-0.

TYPE7l

RMF Paging and Swapping Activity

The existence of extended memory (originally named expanded memory) has made significant changes in the TYPE71 data set, especially with swap counting and swap possibilities. With MVS 2.1.3 or later and on a 3090 processor with" extended memory, the following figure describes the possible swap events and their transitions:

. (0) . "ext slor dlrecf'

PHYEXT ....

(E) "log swap"

logical swap In real memory

(I) "mlgralen

EJ(TAUX ••

407

(e) . "lUX slor direct"

PHYAUX ••

r·'

I ~:

I

The four boxes in this figure are the four states in which a swappable task can be found. The six vectors indicate the possible transitions that a task can follow among these four states. The letters in parentheses and the phrases in quotes describe in which report column (see below) RMF prints these swap rates. The names in uppercase are the prefixes of the MXG variables that measure the transitions.

SWAP.. -- swap candidates count (rate per second)

Swap candidates will initially change state by one of the following three possible transitions:

PHYEXT •• -- physically swapped directly into extended storage

PHYAUX •• -- physically swapped directly to auxil i ary (DASD) storage

logical -- logically swapped in real memory.

Tasks that are logically swapped in real memory (that do not become ready before the think time limit) could then undergo one of the following two possible transitions:

LOGEXT ••

LOGAUX ••

physically swapped from logical to extended storage

physically swapped from logical to auxiliary storage.

Tasks that have been moved to extended storage (that do not become ready before their extended storage space is needed) will go to only one state:

EXTAUX •• -- migrated from extended storage to auxiliary storage. (Migrated pages go to real storage and then to DASD.)

There are eleven reasons why a task may be swapped. These reasons are described on pages 715-717 in [1], by the SWAP •• variable names that end with the following swap reason code suffixes:

bl ank - total 'for all el even swap reasons AS - auxiliary storage shortage (very unlikely) DW - detected wait EX - exchange on recommendation Value NQ - enqueue exchange NS - transition to non-swappable (not many) RS - real pageable storage shortage (very unlikely) TI - terminal input wait TO - terminal output wait US - unilateral - SRM overcommitted or domain MPL exceeded VR - request swap (not many) WT - long wait.

408

!

These eleven reason codes (plus the total for all reasons) for each of the six possible transitions are stored in seventy-two MXG swap rate (per second) variables. Each of the following swap rate variables is set to a missing value if that transition and reason code did not occur:

SWAP SWAPAS SWAPDW SWAP EX SWAPNQ SWAPNS SWAPRS SWAPTI SWAPTO SWAPUS SWAPVR SWAPWT

PHYEXT PHYEXTAS PHYEXTDW PHYEXTEX PHYEXTNQ PHYEXTNS PHYEXTRS PHYEXTTI PHYEXTTO PHYEXTUS PHYEXTVR PHYEXTWT

PHYAUX PHYAUXAS PHYAUXDW PHYAUXEX PHYAUXNQ PHYAUXNS PHYAUXRS PHYAUXTI PHYAUXTO PHYAUXUS PHYAUXVR PHYAUXWT

LOG EXT LOGEXTAS LOGEXTDW LOGEXTEX LOGEXTNQ LOGEXTNS LOGEXTRS LOGEXTTI LOGEXTTO LOGEXTUS LOGEXTVR LOGEXTWT

LOGAUX LOGAUXAS LOGAUXDW LOGAUXEX LOGAUXNQ LOGAUXNS LOGAUXRS LOGAUXTI LOGAUXTO LOGAUXUS LOGAUXVR LOGAUXWT

EXTAUX EXTAUXAS EXTAUXDW EXTAUXEX EXTAUXNQ EXTAUXNS EXTAUXRS EXTAUXTI EXTAUXTO EXTAUXUS EXTAUXVR EXTAUXWT

The RMF swap placement activity report describes how SRM handles these swap requests and how effective SRM is in the way it satisfies these requests. Each row of the RMF report describes one swap reason. The following columns on the RMF report can be calculated directly from MXG variables:

Col RMF Report Column Heading

A total B aux stor total C aux stor direct D aux stor via transition E log swap F log swap effective G ext stor direct H ext stor total I migrated J ext stor effective K log swap/ext stor effective

Calculate from MXG Variables

SWAP PHYAUX + LOGAUX + EXTAUX PHYAUX LOGAUX + EXTAUX SWAP - (PHYEXT+PHYAUX) SWAP - (PHYEXT+PHYAUX+LOGEXT+LOGAUX) PHYEXT LOGEXT + PHYEXT EXTAUX LOG EXT + PHYEXT - EXTAUX SWAP - (PHYAUX+LOGAUX+EXTAUX)

Although all of these swap rate fields measure some facet of extended storage, the most important swap rate is EXTAUX, the rate of swapping migration. A low migration swap rate indicates that extended storage pages are not being moved to DASD, and the full benefits of extended storage are being achieved. (Although the EXTAUX transition is from extended storage to auxiliary storage, the actual physical pages move first from extended storage to real storage, and then they are written out to auxiliary DASD.)

409

The following flow chart was originally prepared by Dr. Jeff Buzen for the BGS User's Group meeting preceding the 1986 eMG meeting; it is presented here with appreciation and minor revisions. The chart follows the decision logic within MVS that determines which transits are actually taken. The letters identify the RMF report column •

.~

.: ;: c ><

Aiixiiii.y Direct

Ie)

. Not a Logicai Swap ·Candldata

Extended Direct ." :;

(G) ~ ~ -. :::I

jujAux Siora Tolal - C + D jiq Log SWaplExTStor Eii= F + J _

410

The RMF paging activity report has changed somewhat as a result of the extended storage. The following figures identify which MXG variables are printed in which location on the RMF report.

main storage paging rates per second

category page reclaims

non-swap

pageable system areas (non viol

lpa csa

address spaces vio non vio

LPARECLM PVTCAREC

PVTVAMR PVTNPREC

----page in----swap non-swap

LPAGINS PVTCAIN

PVTVAMI PVTSPIN PVTNPIN

----page out----swap non-swap

PVTCAOUT

PVTVAMO PVTSPOUT PVTNPOUT

Other columns are column or row sums or percents of these paging rates.

main storage frame counts

min total frames

lpa LPAPGMN csa CSAPGMN privage area PRVPGMN unused PVTAFCMN

--------

max

LPAPGMX CSAPGMX PRVPGMX PVTAFCMX --------

*total FlXEDMN+PAGBLMN FIXEDMX+PAGBLMX

fixed frames sqa SQAFXMN SQAFXMX lpa LPAFXMN lPAFXMX

**csa CSlPFXMN CSlPFXMX lsqa LSQAFXMN lSQAFXMX private area PRVFXMN PRVFXMX

***below 16 meg FlXlOMN FIXlOMX -------- --------

total fixed FIXEDMN FIXEDMX

nucleus frames at end PVTFPFN

total online sY$tem frames PVTPOOL+PVTFPFN

number of samples NRSAMPLE

avg

LPAPGAV CSAPGAV PRVPGAV PVTAFCAV --------

FIXEDAV+PAGBLAV

SQAFXAV lPAFXAV CSlPFXAV lSQAFXAV PRVFXAV FlXlOAV --------FIXEDAV

Notes: * - This total is also equal to PVTPOOl. ** - CSlPFX •• (not CSAFX •• as given on page 706) is correct.

*** - These frames below 16 meg are included in total fixed (which itself is the sum of SQA, lPA, CSA, lSQA and the private area.)

411

RMF Paging Activity Report - continued

local page data set slot counts

avail abl e slots vio slots non-vio slots bad slots

total slots

min max SLOTUNMN SLOTUNMX SLOTVIMN SLOTVIMX SLOTNVMN SLOTNVMX SLOTNGMN SLOTNGMX

SLOTLOMN SLOTLOMX

real storage movement total rate PVTMVTOT

rate of page movement to es PGMVTOEX

migration rate PGMIEXAU

available esf min AVLEXTMN max AVLEXTMX avg AVLEXTAV

high uic migr age min HIUICMN MIGAGEMN max HIUICMX MIGAGEMX avg HIUICAV MIGAGEAV

esf configuration installed online

EXTFRMIN EXTFRMON

avg SLOTUNAV SLOTVIAV SLOTNVAV SLOTNGAV

ASMSLOTS

Extended Storage Paging Versus Swapping

The preceding discussions might lead you to believe that extended memory is only used by swapping, but that is not true. RSM, Real Storage Management, treats extended storage as an extension of real memory, and both swap and nonswap pages are moved from real storage to extended storage. By moving pages to extended storage instead of to auxiliary storage, a page-in can usually be avoided.

412

.1;

Control Mechanisms for Extended Storage

The Real Storage Manager (RSM) controls the utilization of extended storage by using a value called the criteria age to determine when a page should be sent to extended storage. The criteria age is specified by keywords in OPT for various types of pages and for three types of users. RSM then compares the selected criteria age value to the following two measures of real storage contention:

HIUIC -- System-high unreferenced interval count (UIC). This is the age in seconds that a page has remained unreferenced in real storage. A low value indicates contention for storage.

MIGAGE -- Migration age. This is the average age in seconds that pages have remained unreferenced in extended storage.

RSM wants to send to extended storage only those pages that are likely to be referenced again before being migrated to auxiliary storage. The decision logic depends on the type of page involved, and on the measures of HIUIC, MIGAGE, and think time:

Type of Page

Swap

Sent to Extended Storage if

HIUIC + MIGAGE > Criteria Age

Terminal Wait HIUIC + MIGAGE - Think Time> Criteria Age

Stolen, Virtual Fetch, Page-Out Request MIGAGE > Criteria Age

The type of user is specified by the following keywords:

O=nonswappable, common, or privileged l=all others not in type 0 or type 2 2=terminal wait swap, or TSO stolen or paged-out.

Based on both page type and user type, the criteria age is selected from the following table of defaults, which also summarizes the criteria test:

OPT Keyword Page Type

ESCTPOC Changed page-out pages ESCTPOU Unchanged page-out pages ESCTSTC Changed stolen pages ESCTSTU Unchanged stolen pages ESCTSWTC Changed swap-out pages ESCTS'tlTU Unchanged swap-out pages ESCTSWWS WKSET pages ready for swap-out

• Terminal swap pages ESCTVF Virtual fetch pages

413

Type of User Owning Page 012

100 100 100 100 100 100

o 20 15 o 20 15

100 100 60 100· 100 60 100 100 50

100 15

Criteria

MA > CA MA > CA MA > CA MA > CA

UIC+MA > CA UIC+MA > CA UIC+MA > CA

UIC+MA-TT > CA MA > CA

Sequence of Swap Actions

When an address space that was swapped directly from real storage to auxiliary storage (PHYAUX and LOGAUX swaps) is to be swapped back in, ASM reads and writes these working set pages from DASD to real memory. This is called a single-stage swap-tn.

If an address space is swapped from real storage to extended storage, RSM and SRM split the address space into primary and secondary working sets. The primary working set is the LSQA pages, fixed pages, and the first page from each virtual storage segment. (The first pages are necessary for reconstruction of the virtual segment and page tables at swap-in.) RSM manages the primary working set· as one entity. The secondary working set is all other working set pages.

If the address space is later selected for migration to auxiliary storage, RSM migrates the secondary working set in groups, based on the need to replenish the extended-storage-available frame queue. ASM then writes these secondary working set pages as group requests to the page data sets. After the entire secondary working set has been selected for migration, the RSM swaps the primary working set from extended storage into real storage and then writes it out to DASD auxiliary storage.

Once an address space has been split, the RSM must return the entire primary working set to real memory before the address can become dispatchable. If the address space has been migrated, the primary working set is swapped first, and then the secondary working set is swapped. This is called a two-stage swap-in.

Extended Storage Measurements

Given the preceding discussion, the actual measurement of extended storage is simple. The most important measure of utilization is PGMIEXAU, the actual rate of page migration from extended storage to auxiliary storage. If there is no movement of pages from extended to auxiliary, then extended storage is clearly not overcommitted. To place page movement in perspective, however, PGMIEXAU must be compared (perhaps as a percentage) to PGMVTOEX, the total page movement rate into extended storage. Note, however, that it is not possible to separate page movement or migration due to swap pages from page movement or migration due to nonswap pages. PGMVTOEX and PGMIEXAU disregard the type of page being moved when they count page movement.

In current 3090 architecture, pages to be migrated are actually moved first from extended storage to real storage, and then they are physically moved from real to auxiliary storage; but that design is likely to change.

If appreciable migration from extended to auxiliary storage occurs, only then is it worthwhile to determine which swap reasons are involved by examining the eleven EXTAUX .• variables. Note that these variables count the rate that address spaces are moved or migrated, rather than counting the rate that pages are being moved.

~4

., ':

TYPE72

RMF Workload Activity

The SU SEC value. page 686 in [1). is no longer acquired by table look-up from CPUTYPE and CPUVERSN in TYPE70. RMF 3.3 added the value of SU_SEC in the type 72 record. SU_SEC is of critical importance because it is used to convert CPU service units into CPU seconds in TYPE72. By then subtracting the TYPE72 control performance group CPU times from the TYPE70 CPU active time. the MVS uncaptured CPU time can be measured. This is done in the RMFINTRV data set. which is described in Chapter Forty in [2). SU SEC also identifies the speed of the processor engine. which is important for capacity planning. and permits normalization of processor measurement for machines of different speed. See the new section on service unit measurement in Chapter Twenty-Six in [2].

Domains, Performance Groups, and Periods

TYPE72 observations are created for each period of each performance group. A performance group is simply a number that is assigned to a task by the installation for resource reporting and labeling. A task keeps the same performance group for its entire execution duration (except that batch steps can change performance groups by specifying a different value of PERFORM~ on an EXEC statement). The SYSl.PARMlIB IEAICSnn member (which ensures that work is controlled) defines the performance group for each task in a hierarchy that you create.

Performance groups are assigned to tasks in ICS first by subsystem:

STC JES2/JES3 TSO

PGN=nn

unless you specify performance group by job name or job class:

TRXNAME=MYJOB TRXClASS-A

PGN=mm PGN=oo

or unless you specify performance group by userid:

USERID=BERNIE PGN=pp

Performance groups can be subdivided into periods so that the resources consumed by transactions or steps can be controlled. You may wish to reward the interactive users by giving them high levels of service. but you may also wish to punish the batch user who executes his stage-one SYSGEN under TSO. The periods are defined by the OUR parameter. which sets the number of service units to be permitted in each period. When a task exceeds the OUR value. the SRM detects that event and transfers the task to the next period:

415

I I ~

Period One

Period Two

Period Three

1--------------1-----------------------1----------------I DUR=300 DUR=lOOO PRTY=HI PRTY=MED PRTY=LOW

DMN=5 DMN=4 DMN=6

The performance group period defines the domain in which the task is to execute, and the domain definitions determine the priority and service level that the task receives.

Domains are numbered collections of tasks that are to be treated equally with regard to service-unit consumption. Note that tasks commonly change their domain during a transaction's execution. '

Because the SRMchooses domains for service delivery, a large number of domains is unnecessary, unwise, and can increase the MVS overhead. Performance groups have no overhead; they are simply buckets in which the service units and transaction counts are stored when a transaction ends. Therefore, it is common, and in fact wise, to have a large number of performance groups. It is strongly suggested that you place each long-running address space (that is, each CICS, each IMS control region, each IMS message region, the RMF, and so forth) in a separate control performance group. This permits TYPE72 data to be invaluable in determining workload growth over time as well as measuring the percentage of the peak-hour resources that each workload consumes. Although separate performance groups for each long-running task are desired, all tasks that are alike (for example, IMS message regions) can be mapped to the same domain so that the SRM will treat them equally.

The ratio of RESIDTM to DURATM gives the average number of address spaces in this performance-group period (that is, in this domain) that were resident in real memory. This is commonly referred to as the multiprogramming level (MPL) for the performance-group period. For example, the ratio can be calculated for all periods of TSO performance groups to determine the average number of TSO users in real memory.

The difference between ACTIVETM and RESIDTM is the swap-in delay time. If this difference is divided by TRANS, the average swap delay (in seconds) for each transaction is known.

416

TYPE74

RMF Device Activity

In addition to the following discussion of device measurement at the device level, see the TYPE7810, TYPE78CF, and TYPE78CU sections in [2].

TYPE74 is the primary source of information for managing and measuring the DASD I/O environment in MVS/XA. Although TYPE74 data can also be used for some measurements of tape I/O activity, the techniques described in Chapter Fourteen are generally more appropriate for tape analysis.

MVS/XA captures data that are recorded in hardware monitors that are built into the channels. The data measured at the device level provide complete and"" detailed data on the I/O activity. The actual data that are recorded in type 74 records consist of a number of duration measurements. For example, the duration during which data were actually being transferred between the CPU and the device is measured by DEVCONTM (the number of seconds of device connect time during the RMF interval).

In addition to this absolute duration measure, you can examine each measure from two other useful perspectives. If you divide the measured duration by the RMF interval duration, you get the percentage of time when data transfer was occurring (PCTDVCON). If you divide the measured duration by the number of I/O operations, you get the average milliseconds of connect time (AVGCONMS) for each SIO (actually, for each start subchannel (SSCH) instruction). Because all three perspectives are useful, MXG creates a variable for each perspective in TYPE74.

The power of TYPE74 analysis lies in the completeness of the measures of both the utilization and the delays at the device level. The next figure identifies the major durations (and their alternate perspective variables) that are created in TYPE74.

~7

Primary MVS/XA Device Measurements

AVGACTMS DEVACTTM PCTDVACT

�------------------------------------------------------------------1 Active

AVGPNDMS DEVPNDTM PCTDVPND

AVGDISMS DEVDISTM PCTDVDIS

AVGCONMS DEVCONTM PCTDVCON

I-----------------------------------I----~----------I--------------1 Pending Disconnected Connected

AVGPNCHA AVGPNCUB AVGPNDEV PCTPNOTH PCTPNCUB PCTPNDEV

1-----------1-----------1-----------1---------------1--------------1 Pend Pend Pend for for for

Channel Control Device Path Unit Busy Busy Busy

Pending

Waiting for available channel path, control unit, or device busy

The three components of pending time exist only for 309x processors.

Typically 1-5 ms

418

Disconnected

Devi ce in use but i

not transferring data

SEEK (arm motion) SET SECTOR (RPS) RECONNECT (RPS)

Connected

Actual Data

Transfer

Typically 9-27 ms Minimum value 1.3 ms at 4K

14.9 ms at 47K

In addition to the actual hardware measurements of disconnected and connected durations, RMF samples UCB bits to calculate the additional percentage of time when the device was reserved but was not involved in an I/O operation (PCTDELAY). The sum of PCTDVDIS, PCTDVCON, and PCTDELAY is PCTDVUSE (the percentage of time that the device was in use). PCTDVUSE describes how long the device was Utied-upu and could not be used to service a request from any other system. The MVS/370 measure of device utilization, DEVBUSY, was replaced by PCTDVUSE:

PCTDVUSE �----------~------------------------------------I Devi ce Uti 1i zed


AVGCONMS DEVCONTM PCTDVCON PCTDELAY

1---------------1--------------1----------------1 Disconnected Connected Reserved with

no I/O

Yet another measure is sampled by RMF: the average number of I/Os that were enqueued in lOS (that is, delayed in the input/output supervisor because an I/O was already outstanding to this device from this system). Knowing the average number of I/Os that were queued permits the calculation of the average milliseconds of lOS queue time for each SIO (AVGIOQMS). With this additional value, the average response time in milliseconds for each SIO can be determined:

AVGRSPMS I----------------------------~-----------------------------1 Average Response Milliseconds per SIO

AVGIOSMS AVGPNDMS DEVPNDTM PCTDVPND


AVGCONMS DEVCONTM PCTDVCON

1-----------1---------------1---------------1--------------1 lOS Queue Pending Disconnected Connected

419

RMF Device Report

The RMF direct access device activity report column headings are shown below in lowercase; the MXG variable directly corresponding to each column is indicated in uppercase. Using

PROC PRINT SPLIT='*';

with a VARIABLES statement naming these variables would create the report.

dey num

device avg volume lcu activity resp serial rate time

avg iosq time

avg cub delay

avg db delay

avg pend time

DEVNR VOLSER LCU I ORATE AVGRSPMS AVGIOMS AVGPNCUB AVGPNDEV AVGPNDMS

avg disc time

avg conn time

% dey conn

% dey util

% dey resv

avg ds open

% % mt alloc pend

AVGDISMS AVGCONMS PCTDVCON PCTDVUSE PCTRESVD AVDSOPEN PCTALOC MOUNT

420

!

TYPE78 ••

Seven different TYPE78.. data sets are created from the three subtypes of the type 78 RMF records. Subtype 2 contains virtual storage measures that create TYPE78PA, TYPE78SP, and TYPE78VS data sets. The subtypes 1 and 3 contain I/O measures that are structurally different depending on the processor on which the data are created:

TYPE73

308x and 4381

MXG Data Set Subtype

TYPE78 TYPE78CF

1 1

309x

MXG Data Set Subtype

TYPE78CF TYPE78CU TYPE7810

3 3 3

Primary I/O Measurement Sources -- 308x Processors

CHAN PCHANBY

TYPE78CF lCUID CHPID CUI-CU4 (TYPE73 - PCHANBY)

TYPE78

TYPE74

lCUID 10RATE AVGENQUE PCTAllBY PCTDEFER PCTDEFDV PCTDEFCU

lCU DEVNR VOlSER 10RATE PCTDVPND PCTDVDIS PCTDVCON AVGRSPMS

Primary I/O Measurement Sources -- 309x Processors

TYPE7810 10PIQID IOPACTRT AVGIOPQl

TYPE78CF lCUID CHPID CUI-CU4 CHPIDTKN PCTPTHBY PCTCUBSY

TYPE78CU lCUID CUHACTRT AVGCUHQl

TYPE74 lCU DEVNR VOlSER 10RATE PCTDVPND PCTDVDIS PCTDVCON AVGRSPMS

421

RMF I/O Queueing Activity

The RMF device activity report has already been described in the preceding TYPE74 section in this supplement. The RMF I/O queueing activity report described below combines information from the TYPE78IO, TYPE78CU, and TYPE78CF data sets, which are described in more detail in their own sections in [2]. The following report is described as it exists for RMF 3.4.1 on a 309x processor:

rmf i/o queueing activity

-----------from TYPE78IO------------

'iop activity rate

10PIQID IOPACTRT

avg q lngth

AVGIOPQL

-----from TYPE78CU----- -----------------from TYPE78CF----------------

delay lcu contention q

rate lngth

% all ch path

busy control units chan

I 2 3 4 path chpid taken

% cu busy

LCUID CUHACTRT AVGCUHQL note I CUI CU2 CU3 CU4 CHPID CHPIDTKN PCTCUBSY

note I: For each TYPE78CF observation, the PCTPTHBY variable .xists, and it is identical with the TYPE73 PCHANBY variable for that CHPID. The RMF queueing activity report takes the product of PCTPTHBY for each CHPID within each LCUID as the estimated probability that all channel paths are busy within this LCUID.

There is one observation for each LCUID in TYPE78CU. In TYPE78CF, there is one observation for each CHPID within each LCUID, with up to four CUs connected to that CHPID. Thus, there may be several lines from TYPE78CF for each LCUID from TYPE78CU.

422

TYPE78CF

MVS/XA Device Configuration

There is one observation in TYPE78CF for each combination of a CHPID (Channel Path ID) within each lCUID (logical Control Unit ID).

TYPE78CF is buil t from type 78 subtype 1 records if the processor is a 308x or 4381, or from the type 78 subtype 3 record for 309x processors.

Prior to 309x processors, the TYPE78CF data set was only a static description of the physical connectivity of physical control units, their channel paths, and their logical control units. The 309x machines have added four important utilization statistics: the percentage that all channel paths were busy, the rate at which I/Os were taken on this CHPID for this lCUID, the percentage that all paths were busy to this lCUID, and the percentage that physical control units were busy on this CHPID lCUID combination. While TYPE74 still remains the primary source of I/O delay measurement, once a candidate problem device has been identified, TYPE78CF can provide amplification of potential path problems.

TYPE78CU

MVS/XA 309x Control Unit Header Queue Statistics

There is one observation in TYPE78CU for each lCUID (logical Control Unit 10).

TYPE78CU is built from the type 78 subtype 3 record for 309x processors only. It describes the activity and queueing for the control unit header queue.

TYPE78IO

MVS/XA 309x Input-Output Processor Initiative Queue

There is one observation in TYPE7810 for each lOP.

TYPE7810 is built from the type 78 subtype 3 record for 309x processors only. It describes the activity and queueing for the lOP initiative queue.

'423

TYPE78VS

MVS/XA Virtual Storage Statistics

Some virtual storage measures in TYPE78VS are contained in sets of ten variables whose suffix (0-9) identifies the content of the variable. Some virtual storage measures have a suffix of 0-4 because that area exists only below the 16MB line. It is the area below 16MB that is most critical because many system functions can exist only below the 16MB line.

----------- Below 16MB ----------- ----------- Above 16MB -----------

min time max time avg min time max time avg value of value of value value of value of value

min max min max

0 1 2 3 4 5 6 7 8 9

The time stamps (MXG suffixes I, 3, 6, and 8) are correct in the type 78 record (and, thus, in MXG), but a close examination of the RMF report will show time stamps that are earlier than the beginning of the RMF interval (even with RMF 3.4.1). The error results from truncation in an internal calculation in the IBM RMF report program. The RMF printed value is typically 1 to 4 seconds earlier than the correct value. This error has been reported to IBM.

424

The RMF virtual storage activity report from TYPE7SVS data is shown below. Report columns are in lowercase, and MXG names are in uppercase.

static storage map area address size

epvt PRVTADHI PRVTSZHI ecsa CSAADHI CSASZHI emlpa MLPAADHI MLPASZHI eflpa FLPAADHI FLPASZHI eplpa PLPAADHI PLPASZHI esqa SQAADHI SQASZHI enuc NUCLADHI NUCLSZHI -----16 meg boundary----nucleus sqa plpa flpa mlpa csa private psa

SQAADLO PLPAADLO FLPAADLO MLPAADLO

CSAADLO PRVTADLO

zero

SQASZLO PLPASZLO FLPASZLO MLPASZLO

CSASZLO PRVTSZLO PRVTADLO

allocated csa/sqa

---below 16M--- ---above 16M--min max avg min max avg

sqa SQAUSED,O-4 SQAUSED,5-9 csa CSAUSED,O-4 CSAUSED,5-9

allocated csa by key o CSAUSEO,O-4 1 CSAUSE1,O-4 2 CSAUSE2,O-4 3 CSAUSE3,O-4 4 CSAUSE4,O-4 5 CSAUSE5,O-4 6 CSAUSE6,O-4 7 CSAUSE7,O-4 S-F CSAUSES,O-4

sqa expansion into csa SQAEXPN,O-4

CSAUSEO,5-9 CSAUSE1,5-9 CSAUSE2,5-9 CSAUSE3,5-9 CSAUSE4,5-9 CSAUSE5,5-9 CSAUSE6,5-9 CSAUSE7,5-9 CSAUSES,5-9

SQAEXPN,5-9

plpa intermodule space IMPLSPLO in plpa and IMPLSPHI in eplpa plpa space redundant with mlpa/flpa PSRMSPLO in plpa PSRMSPHI in eplpa

csa free pages (bytes) largest free block allocated area size

sqa free pages (bytes) largest free block allocated area size

---below 16M--- ---above 16M--min max avg min max avg

CSAFREE,O-4 CSALARG,O-4 CSAALOC, 5-9

SQAFREE,O-4 SQALARG, 0-4 SQAALOC,O-4

CSAFREE,5-9 CSALARG, 5-9 CSAALOC,5-9

SQAFREE,5-9 SQALARG,5-9 SQAALOC,5-9

maximum possible user region -- USERSPLO below and USERSPHI above

425

MVS 2.2 ANNOUNCEMENT

MVS/XA 2.2.0 (JES2) and 2.2.1 (JES3) and associated RMF 3.5 have been announced for 3rd quarter 1987 availability. This discussion is based only on the IBM Announcement of those products.

NEW SMF RECORDS

36 - ICF (Integrated Catalog Facility) Export or Import 41 - DIV (Data In Virtual) Access 90.17 - Subtype for Set PFK

NEW SMF DATA

The type 30 record will contain the number of Data In Virtual (DIV) rereads in the step and job termination records and in the interval records.

The type 72 performance group period data will contain (finally!) paging statistics for each period of each performance group.

If the type 4 and 34 SMF records are not already dead, MVS 2.2 is their final poison. The Task I/O Table (TIOT) is expanded from 32KB to 64KB in MVS 2.2, which allows for 3273 DDs (compared with only 1635 DDs). The type 4 and 34 SMF records do not support more than 1635 DDs, but instead contain a "not here" bit, indicating that the step had more than 1635 DDs and that I/O counts for the excess will only be found in the type 30 records.

The Selectable Unit Installed bit string in the type 70 record does not exist in MVS 2.2; it had not been updated since MVS/370.

NEW SMF RECORD SUBTYPE SELECTIVITY

A bit in record types 30, 32, and 70-79 allow selectivity of record subtypes. The selectitity can be specified in SYS1.PARMLIB (TYPE,SUBTYPE) to selectively create records, or it can be specified in parameters to the IFASMFDP program for dumping selectivity by subtype.

INCOMPATIBLE CHANGES

IBM has stated that programs that do not use the relocatable format information in the type 30, 72, and 79 will be affected by incompatible changes to those records.

The documentation of these new changes in SMF data were released in April, 1987, in GC28-1411 MVS 2.2 Conversion Notebook Volume 2. IBM should be congratulated for the early release of the documentation of these changes which affect the format of existing data records.

426

MVS 2.2 PERFORMANCE ITEMS

The following items in the announcement should be noted.

Many control blocks were moved above 16MB to ease virtual storage constraints. This will likely affect all system monitors; check with your monitor vendor for a new version/release of their product.

Global Resource Sharing (GRS) will have a larger RSA (up to 32K, or 8 pages), the QWB will be variable length, and the SMPL is compressed from 96 to 28 bytes. This leads to more QWBs per page and to more QWBs per RSA, which should reduce delays in the GRS processing time.

The DUAL option for the JES2 checkpoint data set should reduce I/O on the SPOOL.

Multiple JES2 output PCEs can be defined, leading to improved print throughput.

All SVCs are now defined in IEASVCxx and the SVC table is now page protected •. This requires all SVC users to use SVCUPDTE which will also affect system monitor vendors.

Expiration dates through year 2155 are now supported, with 99365 and 99366 protected from ever expiring. Why through 2155 you ask? Because there are 255 years from 1900 to 2155 and a one byte counter for the years since 1900 are usedl SAS-provided formats and functions like SMFSTAMP8. and DATEJUL() will undoubtedly be revised to handle the new format. At present, however, only the EXPOT in the type 14 and 15 records, a packed decimal date without associated time, can be in the next millenium.

Now that extended storage has virtually eliminated the need for page or swap data sets, MVS 2.2 will support up to 256 separate swap or page data sets.

Up to 9999· report performance groups can be. defined, with the limit of 999 control performance groups unchanged.

The RMF Monitor II value of CPU=101, which indicates that the CPU is actually 100% busy and in addition there was at least one ready task which was not dispatched during the last five elapsed seconds, will now count the actual number of undispatched ready tasks, up to a maximum of 28. Thus you could see CPU=128 in the future - impress your manager with that onel

The Logrec buffer has been increased from 4KB to 16KB.

Partioned Data Sets and QSAM disk files can be concatenated in any order. It is no longer necessary to specify the largest blocksize first. Apparently, IBM must have finally found its own documentation of the SILl bit fondly remembered by OS/360 system programmers.

4V

ONE MIPS AND TWO MEGS SOME FOOD FOR THOUGHT

In the late 1970's, Walt Doherty, of IBM's T.J. Watson Research Center, concluded from studies of highly interactive development CMS users that each such user truly required only 1-2 MIPS and 2-4 MEGs, ·on demand".

Studies of TSO user data sU9gests that no more than 10% of the logged on TSO users are actually active, that is, in and ready, or "on demand."

With 500 logged on TSO users on a 3090-400 with 128MB memory, only 50 users would be expected to be in and ready on this 50 MIPS machine, and each Ron demand" user would have 2 MB of memory even if we allowed the operating system to own 28MB. Thus this machine would provide one MIPS and two MEGS to each ·on demand" user.

Conversations with many sites with this hardware and workload seem to subjectively confirm that 1 MIPS and 2 MEGS provides unconstrained TSO response.

Another 3090-400 with 50MIPS and 128MB supported eight CICS applications which showed a total peak maximum active tasks (equivalent to the TSO in and ready, or Aon demand") of 48. No response delays were noted, and the eight CICS applications recorded not one single page fault for the entire week. One MIPS and two MEGS seems also to meet CICS needs.

The advent of extended storage has finally provided our programs with enough real memory to execute, unconstrained by storage limitations. This in turn has reduced some of the operating system overhead which was consumed in serving the paging, but more importantly, the massive reduction in page service time (70 microseconds for extended memory, 30,000 micro seconds for a page fault) has eliminated significant delay to the user response. We see reduced resources for better response.

I believe that when the user notices the response time, it is too long.

What will happen when you have the hardware to provide one MIPS and two MEGS for each "on demand" user? Your capacity planning headaches will go away, and computer growth will be tied directly to user population. The user population is constrained by new employment, which itself is constrained by profit. Rare is the company whose profit increases more than 10% per year.

Once you have provided unconstrained response to your users, only adding more users will require more hardware. Since you have already computerized almost all of the repetitive business functions, new development may add function and ease of use, but will not likely increase the number of concurrent business users of those repetitive business functions.

428

Ah, but what about those fourth generation languages that eat resources? Won't they fill my CPU and memory? There are certainly some industries which can take advantage of computer power in relational data bases to further the business purpose. I suspect, however, these companies are few and far between. Furthermore, how many employees will need to execute these long running complex transactions, and how many times a month will these Hanalyses" be required? If they produce complex results, is it not likely that many hours will be required by humans to analyze the results, to establish marketing plans based on the results, etc.?

I am not suggesting that all computer growth is coming to an end. There are many well known problems which are just being feasibly solved which need much more power to be accurately solved. (Predicting weather has been described as needing 9 dimensions, while current computers can only support 3 dimensions, and then only for a 5-day prediction). But for the run-of-the-mill business data processing (invoicing, accounts receivable, returns, credits, payroll, etc., etc.), which has been the main driving factor of computer growth since the first 650 was shipped, I believe the compound growth rates we have seen are a thing of the past. If you install the right amount of hardware and software to do the job, then you won't need a job position to figure out how much hardware and software you need.

References

[1] Merrill, H. W., Merrill's Expanded Guide To Computer Performance Evaluation Using the SAS System, SAS Institute, Cary, N.C., 1984, 866 pages.

[2] Merrill, H. W., Merrill's Expanded Guide Supplement, SAS Institute, Cary, N.C., 1987, 624 pages.

MXG is a Registered Trademark of Merrill Consultants, Dallas, Texas.

SAS is a Registered Trademark of SAS Institute, Cary, N.C.

429

New Measurement Data Sources for MVS XA and its Subsystems

Documents

Transcript of New Measurement Data Sources for MVS XA and its Subsystems