White Paper, System Z Dataset Naming Standards

System-Z Dataset Naming Standards

Business data comes in many forms. It is found in program libraries, databases, extracts from databases, files created from manipulating database data, unloads, backups, access logs, stored procedures, and so on. Controlling access to business data seems daunting, if not impossible. If we remember all business data (except printed reports) exists as datasets on some form of electronic media, the task becomes manageable, even relatively easy. The key is a good dataset naming standard.

A proper dataset naming standard has these features.

The HLQ is unique, not only to the application but also to the data’s purpose within that application.

It describes the owning application and indicates the usage of the data. It indicates whether the data is production or test. The second qualifier describes the dataset uniquely. The third qualifier describes the type of data on the dataset. In other words, is it a

database, an unload, a log, or some other dataset?

There are two exceptions to these rules. The first exception is DB2, which has its own naming standard. However, DB2’s standard is easily amenable to the general rules. The second exception is the existence of “working copies” of databases, i.e. a copy of the database for problem investigation, utility testing, and other purposes.

Database versus Non-database Business Data

In almost all cases, database datasets already have a form of naming standard. Whether this is a DD-name based standard (IMS) or a table space standard (DB2), there are certain rules inherent in the DBMS. Non-database datasets, however, have no such limits and may carry any name the IT person can think of. Non-database datasets come in two forms: “control” datasets (procedures, parameters, and programs) and “data” datasets (everything else). Because batch jobs create “data” datasets specific to the job, it makes sense to use the creating batch job name as the dataset’s unique descriptor (i.e. the second qualifier). For “control” datasets, the second qualifier may be anything describing the library’s purpose as long as it is unique within the high-level qualifier.

A Note on Other Mainframe Naming Standards

There are many IBM mainframes, running a variety of operating systems. One example is the VM LINUX virtual machines. Of course, naming standards and even file storage are different for each of these operating systems. Because Linux and other variations of UNIX work on a file system with both directories and files, it is not critical to name the owner in the file name. The owner can be determined from the directory and/or disk where the file is located, thus the file name can be more descriptive of its contents.

An “Ideal” Dataset Naming Standard for zOS

HLQ:o Code a “P” for production, anything else for test.o Code the chargeback code for the owning application. This typically ranges

from two to four characters.o Code IMS for IMS databases and related datasets, VSM for VSAM databases

and related datasets, and DB2 for DB2 databases. Related datasets include backups, unloads, and copies. Applications may use any other character set for non-database datasets.

The second qualifier is the DD name of the database dataset. The third qualifier is a data type indicator, UNLOAD, or COPYx (where “x”

describes the use of the copy, e.g. library or vault). Working copies of databases must have a “fourth” qualifier to show the date of

the copy. This qualifier is a “J” followed by the Julian date of the backup.

Examples

The examples here assume a two-character chargeback code.

o Production Amalgamated Assurance claims database (HDAM): PAAIMS.DDCLM01D.OSAM OSAM dataset, first partition PAAIMS.DDCLM01D.J2007153.OSAM working copy of the database PAAIMS.DDCLM01D.UNLOAD unload dataset PAAIMS.DDCLM01D.COPYL backup of database, library copy PAAIMS.DDCLM01D.COPYV backup of database, vaulted tape

o Production Amalgamated Assurance billing code database (HIDAM): PAAIMS.DDBILCDD.OSAM database DSN, data dataset PAAIMS.DDBILCDD.KSDS database DSN, index dataset PAAIMS.DDBILCDD.UNLOAD unload dataset for CLAIM first partition PAAIMS.DDBILCDD.COPYL backup of database, library copy PAAIMS.DDBILCDD.COPYV backup of database, vaulted tape

o Test Amalgamated Assurance claims database (HDAM): TAAIMS.DDCLM01D.OSAM database, OSAM dataset, first partition TAAIMS.DDCLM01D.UNLOAD unload dataset for CLAIM first partition TAAIMS.DDCLM01D.COPYL backup of database, library copy TAAIMS.DDCLM01D.COPYV backup of database, vaulted tape

o Test Amalgamated Assurance futures database (DB2): TAADB2.DSNDBC.TS00001A.I0001.A001

o Other Amalgamated Assurance production datasets: PAAPARM.BATCH.PARMLIB PAAPGM.GENERAL.LOADLIB PAACLM.<jobname>.CLMRPT.G0012V00 PAACLM.<jobname>.CLMBAD.G0001V00

You may note the “other” production dataset names are self-explanatory, or nearly so.

Advantages and Disadvantages

There are enormous advantages to the naming standard spelled out above.

A. Clarity. The dataset name instantly identifies who owns it, what is in the dataset, and what type of data it is.

B. Chargeback is easy because the chargeback control information is always in the same place.

C. Security. Whether you have RACF, Top Secret, ACF2, or another security package, the HLQ of the dataset is the owner of the protection rules. Placing the application’s unique chargeback information in the HLQ leaves no doubt who owns the security responsibility for the data. It also drastically reduces the overhead and number of RACF objects needed to protect business data.

D. Automation. Automation tools can construct dataset names using a few simple rules. The tools do not need to keep or search for extra data. That reduces CPU cycles and reduces the storage required to maintain a list of backups.

E. Storage. This standard makes it very easy to code ACS routines, both for sending datasets to various pools and for excluding datasets from pools. It reduces CPU and simplifies ACS routines. Further, dataset allocation is faster with fewer dataset screening criteria.

There is really only one disadvantage to this naming standard: it is an ideal. Starting out with this ideal, or a similar naming standard, is something to try for. However, while it is possible to change a legacy system’s dataset names, going from a very simple naming standard to this standard can be a complicated process.

A. Many job changes. DSNs must change wherever they occur. Using a global find/change utility simplifies the task, but it is still no small undertaking.

B. In IMS regions, we must change dynamic allocation members. There are automated tools to do this.

C. Many new GDG bases. However, we can easily automate creating the new GDG bases. A relatively simple REXX exec would generate the GDGs, RACF profiles, and new vaulting lists very rapidly given a list of chargeback codes plus the member list of the DBDLIB and a list of the DD names of VSAM databases.

D. Coordination. It is important to verify, before changing names, what applications (OS as well as business applications) are affected. Converting one application at a time lessens this disadvantage.

While there are technical difficulties, none of them is impossible to surmount. By far the hardest task is convincing the application to make the change. It is easier to do so if the application already has a good standard; in fact, if their standard follows the general guidelines set out on the first page, there may be no need to change at all.

White Paper, System Z Dataset Naming Standards

Technology

Transcript of White Paper, System Z Dataset Naming Standards