IBM Spectrum Scale Container Native Storage Access Guide 5.1.0

53
IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 IBM SC28-3168-03

Transcript of IBM Spectrum Scale Container Native Storage Access Guide 5.1.0

IBM Spectrum Scale Container Native Storage AccessGuide 5.1.0.1

IBM

SC28-3168-03

Tables of ContentsOverview

IntroductionLimitationsSupported features

PlanningBefore startingHardware and software requirementsDeployment considerationsContainer image listing for IBM Spectrum Scale Container Native Storage Access

InstallationPreparationRed Hat® OpenShift® Container Platform configurationContainer Images

Configuring IBM Spectrum® Scale OperatorSelectors and LabelsOperator deployment preparationsIBM Spectrum® Scale configuration parametersCreate GUI user credentials for storage clusterCustom ResourceCreate the IBM Spectrum® Scale Container Native Storage Access (CNSA) Cluster

Deploying IBM Spectrum® Scale Container Storage Interface (CSI)Performing pre-installation tasks for CSI Operator deploymentConfiguring Custom Resource (CR) for IBM Spectrum® Scale Container Storage Interface (CSI) driverInstall IBM Spectrum® Scale Container Storage Interface (CSI) driver

Maintenance for a deployed clusterShutting down a clusterRed Hat® OpenShift® Container Platform UpgradeAdding a new node to an existing cluster

CleanupDelete a clusterRemove applicationsClean up IBM Spectrum® Scale Container Storage InterfaceClean up IBM Spectrum® Scale OperatorClean up the worker nodesClean up on the storage cluster

TroubleshootingDebugging the IBM Spectrum® Scale operatorDebugging IBM Spectrum® Scale DeploymentCommon IssuesKnown IssuesCollecting data for support

MonitoringSystem monitor and Kubernetes readiness probeViewing and analyzing the performance data with the IBM Spectrum Scale bridge for Grafana

1

1

2

2

3

3

4

5

5

7

7

8

10

14

14

16

17

18

19

22

25

25

27

28

29

29

32

32

33

33

33

34

34

35

36

36

36

39

41

43

44

48

48

48

ReferencesIBM Spectrum® ScaleRed Hat OpenShift / Kubernetes

49

49

49

Overview

This topic covers the following:

IntroductionLimitationsSupported features

Introduction

IBM Spectrum® Scale is a clustered file system that provides concurrent access to a single file system or set offile systems from multiple nodes. The nodes can be SAN attached, network attached, a mixture of SANattached, and network attached, or in a shared-nothing cluster configuration. This enables high performanceaccess to this common set of data to support a scale-out solution or to provide a high availability platform. Formore information on IBM Spectrum® Scale features, see the Product Overview.

IBM Spectrum® Scale in containers allows the deployment of the cluster file system in a Red Hat® OpenShift®cluster. Using a remote mount attached file system, the IBM Spectrum® Scale solution provides a persistentdata store to be accessed by the applications via the IBM Spectrum® Scale Container Storage Interface (CSI)driver using Persistent Volumes (PVs).

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 1

LimitationsEncrypted file systems are not supported with IBM Spectrum Scale Container Native Storage.IBM Spectrum Scale Container Native Storage currently only supports a remote mount of the filesystem. It does not support local disks and NSD nodes.

Description Max supportedNumber of worker nodes 128Number of remote clusters 1Number of remote file systems 1

Features coveredIBM Spectrum® Scale version 5.1.0.1, Container Native Storage Access with Red Hat® OpenShift®Container Platform.IBM Spectrum® Scale node labels to establish node affinity.Automated client-only cluster creation.Automated remote file system mount for IBM Spectrum® Scale Storage cluster.

2 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Integrated IBM Spectrum® Scale Container Storage Interface (CSI) driver for application persistentstorage.IBM Spectrum® Scale Container Native Storage Access client cluster node expansion on Red Hat®OpenShift® Container Platform.Cluster monitoring using Red Hat® OpenShift® Container Platform Liveness and Readiness probe.Call home.Performance data collection.

Functional support

File audit logging (FAL)CompressionQuotas on the storage clusterACLs on the storage clusterILM support on the storage clusterFile clones on the storage clusterSnapshots on the storage clusterTCP/IP network connectivity among cluster nodes

Planning

The planning for IBM Spectrum® Scale Container Native Storage Access (CNSA) includes the following topics:

Before startingHardware and software requirementsDeployment considerationsContainer image listing for IBM Spectrum Scale Container Native Storage Access

Before starting

The installation process of IBM Spectrum® Scale on Red Hat® OpenShift® consists of many steps.

Note: These steps build on top of each other and therefore it is critical to follow this sequence.

PlanningWhat version of Red Hat® OpenShift® Container Platform do I need?What are the hardware requirements?How should the network look like?What is the minimum level of IBM Spectrum® Scale that needs to be there on the storage cluster?

Preparation1. Obtain the IBM Spectrum® Scale Container Native Storage Access tar archive file from Fix Central or

Passport Advantage®.2. Unpackage the tar archive.3. Validate and apply the configuration to the Red Hat OpenShift installation settings.4. Load the IBM Spectrum® Scale container images into a site-managed private container registry.

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 3

Deploy IBM Spectrum® Scale Container Native Storage Access (CNSA)cluster

1. Prepare the yaml files necessary for Operator.2. Create a GUI user on the storage cluster with the ContainerOperator role.3. Create a secret by using the storage cluster GUI user credentials in the ibm-spectrum-scale-ns

namespace.4. Configure the ScaleCluster Custom Resource that is used for deployment of the Operator.5. Deploy the Operator.

Preparation of Container Storage Interface (CSI)1. Create a GUI user for CSI with CsiAdmin role on the storage cluster and on the CNSA cluster.2. Configure CSI prerequisites on storage cluster.3. Create a secret for the storage cluster in the CSI namespace.4. Create the secret for the CNSA cluster in the CSI namespace.5. Deploy the CSI Operator.6. Download the CSI Custom Resource file.7. Input the Container Native Storage Access information into the CSI Custom Resource file for

configuring the storage/remote cluster.

Deploy IBM Spectrum® Scale Container Storage Interface (CSI) driverApply the CSI custom resource. This will trigger the deployment of the CSI pods and ultimately provide theCSI interface that application pods need to claim their persistent volumes (PVs).

Hardware and software requirements

The requirements are as follows:

Red Hat OpenShift Container Platform4.54.6.x, where 'x' is greater than or equal to 6

PodmanSpectrum_Scale_Container_Native_Storage_Access-<version>-<arch>.tgz (provided)External container images required for IBM Spectrum® Scale. For more information, see ContainerImage Listing for Spectrum Scale.Network:

All nodes in the compute cluster should be able to communicate with all nodes in the storagecluster.A minimum of 10 Gb network is needed but 40 to 100 Gb is recommended.RDMA for Infiniband or RoCE for Ethernet is not supported.

Worker node requirements:

For Power® and x86:

Minimum:CPU – 8 Cores per worker nodeMemory – 16 GB RAM per worker node

Recommended:CPU – 16 Cores per worker node

4 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Memory – 64 GB per worker nodeFor Linux® on IBM Z®:

Minimum:CPU – 4 Cores per worker nodeMemory – 8 GB RAM per worker node

Recommended:CPU – 8 Cores per worker nodeMemory – 16 GB per worker node

Deployment considerations

Red Hat® OpenShift® cluster considerations:

The Spectrum Scale pods will use the host network and not the Container Network Interface(CNI) overlay network.Ensure that your DNS is configured properly so that the worker nodes can resolve the storagecluster nodes. If this is not possible, see Host Aliases section in the custom resource.All worker nodes must be able to communicate with each other via the host network.Red Hat Enterprise Linux® CoreOS (RHCOS) restricts new file system mounts to the /mntsubtree. IBM Spectrum® Scale can mount any file system below /mnt on the Red Hat OpenShiftcluster regardless of the default mount point defined on the storage cluster.Requires a mininum configuration of 3 master nodes and 3 worker nodes, with a maximum of128 worker nodes.

Storage cluster considerations:

The storage cluster must be at IBM Spectrum® Scale version 5.1.0.1 or higher to support remotemount via the Operator.

For more information on remote cluster, see Accessing a remote GPFS file system.

Encrypted file systems are not supported. Do not use an encrypted file system with the IBMSpectrum® Scale Container Native Storage Access.

Red Hat OpenShift Built-in Container Registry:

The images are delivered in a tar archive that needs to be loaded into a container registry.The instructions provided use podman to push these artifacts into the Red Hat OpenShiftintegrated container registry. If another registry is used in your environment follow theinstructions of that registry on how to push the artifacts into that registry.Any registry used for IBM Spectrum® Scale Container Native Storage Access container imagesmust not be accessible to external users and must be restricted to the service account used forIBM Spectrum® Scale Container Native Storage Access management. All users and machinesaccessing these container images must be authorized per IBM Spectrum® Scale licenseagreement.A user configured in the Red Hat OpenShift cluster that is not kubeadmin.

Container image listing for IBM Spectrum Scale ContainerNative Storage Access

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 5

The following images are required for successful deployment of IBM Spectrum® Scale Container NativeStorage Access. Ensure the external images can be pulled from its respective registries.

Provided container images from Docker archive tar fileThe images listed in the following table are the container images that are provided by the Docker archive tarfile.

Pod Container Repository Image or SHADigest

ibm-spectrum-scale-operator-XXXXX

ibm-spectrum-scale-operator

Provided in Docker archivetar

scale-core-operator

ibm-spectrum-scale-XXXXX mmbuildgpl Provided in Docker archivetar

scale-core

ibm-spectrum-scale-XXXXX config Provided in Docker archivetar

scale-core

ibm-spectrum-scale-XXXXX gpfs Provided in Docker archivetar

scale-core

ibm-spectrum-scale-gui liberty Provided in Docker archivetar

scale-gui

ibm-spectrum-scale-gui sysmon Provided in Docker archivetar

scale-monitor

ibm-spectrum-scale-pmcollector pmcollector Provided in Docker archivetar

scale-pmcollector

ibm-spectrum-scale-pmcollector sysmon Provided in Docker archivetar

scale-monitor

External container imagesThe images listed in the following table are external container images. They will need to be available viaregistry (external or imported into local registry) for deployment of IBM Spectrum® Scale.

Note: Check the notes column to match the target architecture. If there is no architecture listed,the image is assumed to have multi-architecture support.

Pod Container Repository Image or SHA Digest Notes

ibm-spectrum-scale-gui

postgres docker.io/library/postgres

sha256:9f325740426d14a92f71013796d98a50fe385da64a7c5b6b753d0705add05a21

Postgresv12.4

ibm-spectrum-scale-gui

log-gpfsgui-trc, log-mgtsrv-system,log-mgtsrv-trace, log-wlp-messages,log-wlp-trace, log-audit

registry.access.redhat.com/ubi8

ubi-minimal:8.2

ibm-spectrum-scale-csi-operator

operator quay.io/ibm-spectrum-scale

ibm-spectrum-scale-csi-operator:v2.1.0

6 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Pod Container Repository Image or SHA Digest Notes

ibm-spectrum-scale-csi-attacher

ibm-spectrum-scale-csi-attacher us.gcr.io/k8s-artifacts-prod/sig-storage

csi-attacher:v3.0.0

ibm-spectrum-scale-csi-provisioner

csi-provisioner us.gcr.io/k8s-artifacts-prod/sig-storage

csi-provisioner:v2.0.2

ibm-spectrum-scale-csi-driver-XXXXX

driver-registrar us.gcr.io/k8s-artifacts-prod/sig-storage

csi-node-driver-registrar:v2.0.1

ibm-spectrum-scale-csi-driver-XXXXX

ibm-spectrum-scale-csi quay.io/ibm-spectrum-scale

ibm-spectrum-scale-csi-driver:v2.1.0

Installation

The installation of IBM Spectrum® Scale Container Native Storage Access (CNSA) includes the followingtopics:

PreparationRed Hat OpenShift Container Platform configurationContainer Images

Preparation

Creating a new projectCreate a new project using the oc new-project command.

oc new-project ibm-spectrum-scale-ns

Note: Throughout this documentation, ibm-spectrum-scale-ns is used as theproject/namespace name.

Extracting the tar archiveObtain the IBM Spectrum® Scale Container Native Storage Access tar archive file from Fix Central or PassportAdvantage®.

This archive file contains:

yaml files to assist with configuration of the Red Hat® OpenShift® Container Platform clustercontainer imagesyaml files required to deploy the operator

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 7

The .tgz filename is in the format Spectrum_Scale_Container_Native_Storage_Access-<version>-<arch>.tgz.

Obtain the correct one based on your architecture and extract it using the following command:

tar xvfz Spectrum_Scale_Container_Native_Storage_Access-*.tgz

Red Hat® OpenShift® Container Platform configuration

Some changes are required to be made on the Red Hat® OpenShift® Container Platform installation for IBMSpectrum® Scale Container Native Storage Access to operate correctly.

For detailed instructions on Red Hat® OpenShift® Container Platform installation, see Red Hat® OpenShift®Container Platform Install Documentation.

The following tasks can also be handled during the Red Hat® OpenShift® Container Platform installationprocess by adding day-1 kernel arguments. However, the instructions that follow assume Red Hat® OpenShift®Container Platform has already been installed.

Increase PID_LIMITSIncrease vmalloc kernel parameterApply Machine Config Operator - Red Hat OpenShift Container Platform 4.6.x, where 'x' is greater thanor equal to 6

Increase PIDS_LIMITIncrease the pids_limit to 4096 on each of the worker nodes planned to run IBM Spectrum® Scale. Withoutthis change, the GPFS daemon will crash during I/O by running out of PID resources. These changes aremanaged by the Machine Config Operator (MCO).

Do the following steps:

1. Create the configuration yaml file to be applied. A sample is provided at spectrumscale/preinstall/increase_pid_mco.yaml from the .tgz file.

--- apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: increase-pid-limit spec: machineConfigPoolSelector: matchLabels: pid-crio: config-pid containerRuntimeConfig: pidsLimit: 4096

2. Apply the YAML using the the following command: oc create -f spectrumscale/preinstall/increase_pid_mco.yaml

3. Verify the increase-pid-limit configuration is present using this command: oc get ContainerRuntimeConfig.

Note: By issuing this command, it is confirming that the ContainerRuntimeConfig objectwas created successfully.

8 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

4. Modify the worker MachineConfigPool and apply the machineConfigPoolSelector created in theprevious steps by setting the label: pid-crio: config-pid.

oc label machineconfigpool worker pid-crio=config-pid

Note: Executing this command will drive a rolling update across your Red Hat® OpenShift®Container Platform worker nodes.

5. Check the status of the update using:

oc get MachineConfigPool

Note: This process could take over 30 minutes depending on the size of the worker nodepool, as the worker will be rebooted.

6. Verify the PID_LIMITS are increased on the worker nodes by issuing the following command:

# Replace <worker node> with the hostname of the worker. oc debug node/<worker node> -- chroot /host crio-status config | grep pids_limit

Note: This command needs to be run as many times as the number of worker nodes.

Increase vmalloc kernel parameterKernel parameter setting change applies only to IBM Spectrum® Scale running on Linux® on System Z tooperate properly with RHCOS.

1. Create the configuration to be applied. See the spectrumscale/preinstall/99-openshift-machineconfig-worker-kargs.yaml file from the .tgz file.

--- apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-openshift-machineconfig-worker-kargs spec: kernelArguments: - 'vmalloc=4096G'

2. Apply the YAML using this command:

oc apply -f spectrumscale/preinstall/99-openshift-machineconfig-worker-kargs.yaml

3. Check the status of the update using:

oc get MachineConfigPool

Note: This process could take over 30 minutes depending on the size of the worker nodepool, as the worker will be rebooted.

4. To verify the vmalloc kernel parameter has been set, validate that the machineconfig has the vmallocsetting:

oc describe machineconfig | grep vmalloc

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 9

5. Validate that the vmalloc kernel parameter has been applied onto the Red Hat® OpenShift® ContainerPlatform worker nodes with the following command:

oc debug node/<worker node> -- cat /proc/cmdline

You will see vmalloc=4096G in the following output:

# oc debug node/worker1.example.com -- cat /proc/cmdline Starting pod/worker1examplecom-debug ... To use host binaries, run `chroot /host` rhcos.root=crypt_rootfs random.trust_cpu=on ignition.platform.id=metal rd.luks.options=discard $ignition_firstboot ostree=/ostree/boot.1/rhcos/51e4c768b7c3dcec3bb63b01b9de9e8741486bf00dd4ae4df2d1ff1f872efe2e/0 vmalloc=4096G

Apply Machine Config OperatorApplication of this Machine Config Operator is only applicable for Red Hat OpenShift Container Platform4.6.x, where 'x' is greater than or equal to 6 to allow for kernel module support necessary for successfulIBM Spectrum Scale Container Native Storage access deployment.

1. Create a configuration file named machineconfigoperator.yaml with following contents.

--- apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 02-worker-kernel-develspec: config: ignition: version: 3.1.0 extensions: - kernel-devel

2. Apply the YAML using the the following command:

oc create -f machineconfigoperator.yaml

3. Check the status of the update using:

oc get MachineConfigPool

Note: This process could take over 30 minutes depending on the size of the worker nodepool, as the worker will be rebooted.

4. Validate that the kernel-devel package is successfully applied onto the Red Hat® OpenShift® ContainerPlatform worker nodes by using the following command:

oc get nodes -lnode-role.kubernetes.io/worker= \ -ojsonpath="{range .items[*]}{.metadata.name}{'\n'}" |\ xargs -I{} oc debug node/{} -T -- chroot /host sh -c "rpm -q kernel-devel"

Container Images

During the deployment process, Red Hat® OpenShift® Container Platform needs to access the IBM Spectrum®Scale container images. Since the images are not found in a public registry such as quay.io or

10 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

registry.redhat.com, you will need to load the container images into a site-managed private containerregistry where the Red Hat® OpenShift® Container Platform cluster can access.

If you already have a private registry set up, you can load the images there, and skip to the Loading imagestopic and follow the procedure for your registry.

Integrated Red Hat® OpenShift® Container Platform registryIf access to a site-managed container registry is not available, the integrated Red Hat® OpenShift® ContainerPlatform registry can be used. For more information, see Image Registry.

Pre-requisites

1. Ensure podman is installed.

Note: docker could also be used here, but the commands may need to be adjustedaccordingly.

2. Ensure the Red Hat® OpenShift® Container Platform Image Registry is available.

oc get deployment image-registry -n openshift-image-registry

If you get an error: Error from server (NotFound): deployments.apps "image-registry" not found, validate that your registry has been configured to manage images. For more information,see Image Registry Storage Configuration.

Creating a user via HTPasswdThe following procedure creates a user and grants them permission to write to the integrated container imagerepository. For more information, see Configuring an HTPasswd identity provider.

Note: Install httpd-tools on the host providing the authentication. The Red Hat® OpenShift®Container Platform cluster must be able to access this host.

1. Create the user/password entry into a file using the htpasswd command.

USERNAME="user1" PASSWORD="P@ssword" htpasswd -c -B -b /etc/users.htpasswd $USERNAME $PASSWORD

2. Generate the HTPasswd Secret.

Create a secret called htpass-secret in the openshift-config namespace using the followingcommand:

oc create secret generic htpass-secret \ --from-file=htpasswd=/etc/users.htpasswd -n openshift-config

3. Create a Custom Resource (CR) for the HTPasswd identity provider. A sample CR file is provided in the .tgz file at: spectrumscale/preinstall/optional_oauth_config.yaml.

Use the following command to set the name of the secret: htpass-secret into the CR fileprovided in the tar. The secret is created on the previous step.

# modify the secret name to: htpass-secret sed -i 's/<secret-name>/htpass-secret/g' spectrumscale/preinstall/optional_oauth_config.yaml

Create the Custom Resource file.

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 11

# Apply the custom resource oc apply -f spectrumscale/preinstall/optional_oauth_config.yaml

Set permissions for the userThe user created above must be granted permission to modify the image registry and to store the images.

USERNAME="user1" oc policy add-role-to-user registry-editor $USERNAME oc policy add-role-to-user edit $USERNAME

Expose the image registryTo push the artifact into the internal registry, the registry must be exposed. For more information, seeExposing the Registry.

oc patch configs.imageregistry.operator.openshift.io/cluster \ --patch '{"spec":{"defaultRoute":true}}' --type=merge

Loading Images

Obtain the route to the container registryThe following command is used to obtain the route to the integrated Red Hat® OpenShift® Container Platformimage registry. If you are using a site-managed container registry, obtain and note the route to your containerregistry.

The HOST variable that is referenced holds the route and is referenced in the later steps to push the containerimages to the destination container registry.

1. Switch to an Red Hat® OpenShift® Container Platform ADMIN account which has the permission toexecute oc get routes:

oc login -u kubeadmin

2. Set the HOST variable to the image registry route:

HOST=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')

3. Validate that HOST is not empty:

echo $HOST

Note: This HOST variable will be referenced in the following steps, run the commands inthe same shell.

Load container images and push to container registry

Note: Ensure that you are in the ibm-spectrum-scale-ns namespace.

1. Log in to the user account created above that has access to the Red Hat® OpenShift® ContainerPlatform Image Registry:

USERNAME="user1" oc login -u $USERNAME

For example:

[[email protected]]# oc login -u $USERNAME

You get an output similar to this:

12 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Logged into "https://api.example.com:6443" as "user3" using existing credentials. You have access to the following projects and can switch between them with 'oc project <projectname>': default * ibm-spectrum-scale-ns

2. Import the IBM Spectrum® Scale container images provided in the .tgz file using podman.

For tag 5.1.0.1, run:

TAG="5.1.0.1" for file in `ls spectrumscale/*.tar`; do tarname=${file##*/} tagname=`echo $tarname | sed 's/.tar//g' | sed "s/-$TAG/:$TAG/g"` echo "-- Loading $file as $tagname" # If using docker, the load and tagging cannot be done in a single step podman load -i $file localhost/$tagname done

3. Log in to the Red Hat® OpenShift® Container Platform integrated container registry via podman:

oc whoami -t | podman login -u $(oc whoami) --password-stdin --tls-verify=false $HOST

4. Tag and push the images into the Red Hat® OpenShift® Container Platform Image Registry:

TAG="5.1.0.1" NAMESPACE="ibm-spectrum-scale-ns" for file in `ls spectrumscale/*.tar`; do tarname=${file##*/} tagname=`echo $tarname | sed 's/.tar//g' | sed "s/-$TAG/:$TAG/g"`

podman tag localhost/$tagname $HOST/$NAMESPACE/$tagname podman push $HOST/$NAMESPACE/$tagname --tls-verify=false done

Validate the container imagesThe following steps are only applicable when using the Red Hat® OpenShift® Container Platformintegrated container registry.

1. Log in to an administrator account:

oc login -u kubeadmin

2. Ensure that the ImageStream contains the images:

for image in `oc get is -o custom-columns=NAME:.metadata.name --no-headers`; do echo "---"oc get is $image -o yaml | egrep "name:|dockerImageRepository" done

Example output:

Note: The dockerImageRepository values are the values that should populate the imagefields in the operator.yaml and CR if using a local image registry

--- name: scale-core dockerImageRepository: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 13

--- name: scale-core-operator dockerImageRepository: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core-operator --- name: scale-gui dockerImageRepository: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-gui --- name: scale-monitor dockerImageRepository: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-monitor --- name: scale-pmcollector dockerImageRepository: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-pmcollector

Configuring IBM Spectrum Scale Operator

The configuration of the IBM Spectrum® Scale Container Native Storage Access (CNSA) Operator includes thefollowing procedures:

Selectors and LabelsOperator deployment preparationsIBM Spectrum Scale configuration parametersCreate GUI user credentials for storage clusterCustom ResourceCreate the IBM Spectrum Scale Container Native Storage Access (CNSA) Cluster

Selectors and Labels

Node selectorsBy default, IBM Spectrum Scale Container Native Storage Access is deployed to all worker nodes. Optionally,you could select a subset of worker nodes to deploy IBM Spectrum Scale Container Native Storage Accessonto. You cannot select any master nodes. Node selectors can be added to the ScaleCluster custom resourcefile (spectrumscale/deploy/crds/ibm_v1_scalecluster_cr.yaml) to specify which nodes to deployIBM Spectrum Scale Container Native Storage Access. In the ScaleCluster CR, there is a node selector definedfor worker nodes. In this case, the Operator deploys an IBM Spectrum® Scale pod to nodes labeled as workernodes.

Note: If you want to deploy IBM Spectrum Scale Container Native Storage Access to all worker nodes, you canskip this section.

spec: nodeSelector: node-role.kubernetes.io/worker: ""

A user may configure multiple node selector values by adding labels to the nodeSelector list. The Operatorwill check that a node has all defined labels present in order to deem a node eligible to deploy IBM Spectrum®Scale pods. In the following example, the Operator will deploy IBM Spectrum Scale pods on nodes with boththe worker label and "scale" component label.

spec: nodeSelector:

14 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

node-role.kubernetes.io/worker: "" app.kubernetes.io/component: "scale"

Node labelsIBM Spectrum Scale Container Native Storage Access automatically assigns certain roles to some workernodes. Normally you do not need to explicitly designate the worker nodes. However, it is possible to do sousing node labels.

The following mechanisms are supported to designate IBM Spectrum Scale Container Native Storage Accessnodes:

Automatic (Recommended) - allows the Operator to designate the nodes automaticallyManual (Optional) - allows the administrators to have more control on the placement of Spectrum Scalenode roles (like the quorum role) to the pods on specific worker nodes.

AutomaticIf the user does not annotate any nodes as quorum nodes, the Operator will automatically apply quorumannotations to a subset of the nodes in the cluster. The number of nodes to be annotated depends on thenumber of cluster nodes:

If the number of nodes in the cluster definition is less than 4, all nodes are designated as quorumnodes.If the number of nodes in the cluster definition is between 4 and 9 inclusive, 3 nodes are designated asquorum nodes.If the number of nodes in the cluster definition is between 10 and 18 inclusive, 5 nodes are designatedas quorum nodes.If the number of nodes in the cluster definition is greater than 18, 7 nodes are designated as quorumnodes.

Manual node designation with annotationsThe following annotation can be applied to nodes in the Red Hat® OpenShift® cluster to dictate how podsdeployed on those nodes will be designated:

scale.ibm.com/nodedesc=quorum::scale.ibm.com/nodedesc=manager::scale.ibm.com/nodedesc=collector::

Supported Node Designations are manager, quorum, and collector. To designate a node with more than onevalue, add a dash in between the designations.

For example, the following annotation will apply node designations of quorum, manager, and collector:

scale.ibm.com/nodedesc=quorum-manager-collector::

To see the list of nodes in your cluster, use the oc get nodes command:

[root@example-inf ~]# oc get nodes NAME STATUS ROLES AGE VERSION master0.example.com Ready master 50d v1.16.2 worker0.example.com Ready worker 50d v1.16.2 worker1.example.com Ready worker 50d v1.16.2 worker2.example.com Ready worker 50d v1.16.2

To apply an annotation to a node, use the oc annotate node <node name> <annotation> command asfollows:

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 15

oc annotate node worker0.example.com scale.ibm.com/nodedesc=quorum-manager::

To verify the annotation was applied to the node, use the oc describe <node name> command as follows:

[root@example-inf ~]# oc describe node worker0.example.com Name: worker0.example.com Roles: worker Labels: beta.kubernetes.io/arch=ppc64le beta.kubernetes.io/os=linux kubernetes.io/arch=ppc64le kubernetes.io/hostname=worker0.example.com kubernetes.io/os=linux node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos Annotations: csi.volume.kubernetes.io/nodeid: {"spectrumscale.csi.ibm.com":"worker0.example.com"} machineconfiguration.openshift.io/currentConfig: rendered-worker-3c99f4a93a0711da48c7f568c97567fc machineconfiguration.openshift.io/desiredConfig: rendered-worker-3c99f4a93a0711da48c7f568c97567fc machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done scale.ibm.com/nodedesc: quorum-manager:: volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 29 Oct 2020 17:20:42 -0700 Taints: <none> . . .

To remove an annotation from a node, issue the following command:

oc annotate node <node name> scale.ibm.com/nodedesc-

Quorum nodesFor more information on how to manually node label quorum nodes, see the Quorum topic in the IBMSpectrum® Scale Knowledge Center. It is recommended to configure an odd number of nodes, with 3, 5, or 7nodes being the typical numbers used.

For automatic labeling, see the "Automatic" sub-topic on this page.

Operator deployment preparations

With the tar archive, we provide a set of .yaml files that are required for deployment of IBM Spectrum® Scale.

Since both the namespace and the container image registry are defined by the users, the .yaml files areshipped with placeholders needing to be replaced with the actual values matching your environment beforeapplying the .yaml files.

Note: Before applying the yaml files, ensure that you are in the correct namespace for IBMSpectrum® Scale. It is preferred to explicitly specify the namespace -n ibm-spectrum-scale-ns in the commands.

Change to the directory where the tar archive was extracted and run the following steps:

1. Edit the spectrumscale/deploy/cluster_role_binding.yaml file to set the namespace: field toyour namespace.

vi spectrumscale/deploy/cluster_role_binding.yaml

16 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Replace <namespace> with namespace ibm-spectrum-scale-ns:

name: ibm-spectrum-scale-operator namespace: ibm-spectrum-scale-ns

Save and exit the file.

2. Edit the spectrumscale/deploy/operator.yaml and modify the image: field to point to thecontainer registry serving the scale-core-operator image.

For example, if your images are tagged 5.1.0.1 and the container registry route is image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns the image valueswould look like:

image: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core-operator:5.1.0.1

Note: To validate the container images, see the Validate the container images topic.

3. Apply the following .yaml files to prepare deployment of the Operator:

oc create -f spectrumscale/deploy/service_account.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/service_account_core.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/role.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/role_binding.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/role_scale_core.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/role_binding_scale_core.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/scc.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/cluster_role.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/cluster_role_binding.yaml -n ibm-spectrum-scale-ns oc create -f spectrumscale/deploy/crds/ibm_v1_scalecluster_crd.yaml -n ibm-spectrum-scale-ns

IBM Spectrum® Scale configuration parameters

Default configuration parametersThe following parameters are set by default via the Operator.

Parameter Settingpagepool 1 GBnumaMemoryInterleave --ignorePrefetchLUNCount YesworkerThreads 1024maxFilesToCache 128 KmaxStatCache 128 Kmaxblocksize 16 MmaxMBpS --initPrefetchBuffers 128prefetchTimeout 30

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 17

Parameter SettingprefetchPct 25prefetchMaxBuffersPerInstance 1024prefetchthreads --controlSetxattrImmutableSELinux YesenforceFilesetQuotaOnRoot Yes

Note: The listed parameters are the only supported tunables, and the tunables are only appliedupon cluster creation.

For more information, see the mmchconfig.

Override default configuration parameters (optional step)To allow changes to the default parameters specified above, a configMap file is provided in spectrumscale/deploy/scale-profile.yaml. Use this file to modify the values before creating thecluster for the first time. If this step is skipped, the Operator will create the configmap ibm-spectrum-scale-core-profile using the defaults indicated above.

1. Edit the configMap and modify the properties: vi spectrumscale/deploy/scale-profile.yaml

2. Create the configMap:

oc create -f spectrumscale/deploy/scale-profile.yaml -n ibm-spectrum-scale-ns oc get configmap -n ibm-spectrum-scale-ns | grep profile

Create GUI user on storage cluster with ContainerOperatorrole

The steps described in this section must be performed on the storage cluster.

Create CNSA Operator User and GroupVerify and create the GUI user group ContainerOperator and the user on the IBM Spectrum® Scale storagecluster. This user will be used by the operator to configure the remote mount of the storage cluster filesystems.

Issue the following commands in the shell of the GUI node of the storage cluster:

1. Check if the IBM Spectrum® Scale GUI user group ContainerOperator exists by issuing the followingcommand:

/usr/lpp/mmfs/gui/cli/lsusergrp ContainerOperator

2. If the GUI user group ContainerOperator does not exist, create it using the following command:

/usr/lpp/mmfs/gui/cli/mkusergrp ContainerOperator --role containeroperator

3. Check if the IBM Spectrum® Scale GUI user exists within group ContainerOperator by issuing thefollowing command:

/usr/lpp/mmfs/gui/cli/lsuser | grep ContainerOperator

18 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

4. If no user exists in group ContainerOperator, create an IBM Spectrum® Scale GUI user in the ContainerOperator group by issuing the following command:

/usr/lpp/mmfs/gui/cli/mkuser cnss_storage_gui_user -p cnss_storage_gui_password -g ContainerOperator

Create secretsCreate a secret on the Red Hat® OpenShift® cluster to hold the username and password for the IBMSpectrum® Scale Storage cluster GUI user and password.

This secret will be used by the Operator to communicate to the storage cluster when performing configurationfor remote mount.

A new secret should be added for each storage cluster being configured for remote mount with IBMSpectrum® Scale Container Native Storage Access (CNSA).

Storage cluster GUI user secretThe user name and password specified in this topic must match the GUI user that was created on the storagecluster in the steps above.

To create the storage cluster GUI user secret named cnsa-remote-mount-storage-cluster-1, issue thefollowing command:

oc create secret generic cnsa-remote-mount-storage-cluster-1 --from-literal=username='cnss_storage_gui_user' \ --from-literal=password='cnss_storage_gui_password' -n ibm-spectrum-scale-ns

Custom Resource

Edit ScaleCluster Custom Resource (CR)Before deploying the ScaleCluster Custom Resource (CR), it is required to make changes to the customresource .yaml file.

The values provided in the CR are used for IBM Spectrum® Scale cluster creation, and must be valid inputs.Subsequent changes made to the CR after the creation of the IBM Spectrum® Scale cluster are not supported.

Edit the spectrumscale/deploy/crds/ibm_v1_scalecluster_cr.yaml file and modify the followingsections to match your environment.

Note: The update strategy for the DaemonSet is changed to "onDelete", therefore, if a value isgiven in error, the cleanup steps must be performed and the CR must be corrected beforeattempting to re-deploy.

Parameter listing

Parameter

Usage

Description

nodeSelector

Required

In conjunction with the nodeSelector configuration, the operator also applies node affinityaccording to supported architectures and OS. The Custom Resource provides a nodeSelector thatworks for most use-cases by default.

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 19

Parameter

Usage

Description

images

Required

The list of Docker container images required to deploy and run IBM Spectrum® Scale

callhome

Optional

A block that establishes and activates call home

filesystems

Required

The file system that will be remotely mounted from the storage cluster

remoteClusters

Required

A remoteCluster definition provides the name, storage cluster GUI's hostname, its GUI secret, andcontact nodes used for remote mount

hostAliases

Optional

hostAliases is used in an environment where DNS cannot resolve the storage cluster nodes

Required Parameters

Node SelectorsThe nodeSelectors: section allows the user to configure a nodeSelector to determine where IBMSpectrum® Scale nodes can be created. In the following example, a node label that selects Kubernetes workernodes is defined.

nodeSelector: node-role.kubernetes.io/worker: ""

Container ImagesThe images section allows you to specify the container image to pull when the pods start.

1. Replace the tag REPLACE_CONTAINER_REGISTRY with the route to the container registry hosting theimages.

For example, if your registry route is image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns (refer to the Loading Images page) and you want topull image tag 5.1.0.1, the section will look like:

images: core: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core:5.1.0.1 coreInit: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core:5.1.0.1 gui: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-gui:5.1.0.1 ... pmcollector: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-pmcollector:5.1.0.1 sysmon: image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-monitor:5.1.0.1 ...

2. For the logs: image, modify the image and point it to the 8.2 tag of ubi-minimal.

20 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

images: ... logs: "registry.access.redhat.com/ubi8/ubi-minimal:8.2"

Remote MountTo configure a remote storage cluster for remote mount, fill out the filesystem and remoteCluster sections inthe custom resource.

filesystem

filesystems: - name: "fs1" remoteMount: storageCluster: "storageCluster1" storageFs: "fs1" mountPoint: "/mnt/fs1"

name - Provide a name to the file system that will be created upon CNSA deployment. Thisname must follow guidelines defined by DNS Label NamesremoteMount:

storageCluster - Name to reference the remoteCluster definition in the CR file and shouldmatch the remoteCluster.name field below.storageFs - Name of the file system on the storage cluster. You can get this information byissuing the mmlsfs all command on the storage cluster.

mountPoint - This is the mount point where the remote file system is mounted upon CNSAcluster creation. This mountpoint must be under /mnt

remoteClusters

remoteClusters: - name: storageCluster1 gui: host: "" secretName: "cnsa-remote-mount-storage-cluster-1" insecureSkipVerify: true # contactNodes: # - storagecluster1node1 # - storagecluster1node2

name - This is the name used to identify the remote Storage Cluster, and is also referenced in filesystemsguihost - hostname for the GUI endpoint on the storage cluster.secretName - This is the name of the kubernetes secret created during the storage clusterconfiguration.insecureSkipVerify - true is the only option that is supported at this time.contactNodes (optional) - provide a list of storage nodes to be used as the contact nodes list. Ifnot specified, the operator will use 3 nodes from the storage cluster.

Optional Parameters

Call Home

To configure the IBM Spectrum® Scale call home functionality, uncomment and fill out the callhome section inthe custom resource. For more information about call home, see Understanding call home.

1. License agreement

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 21

If you agree to the license agreement, enter true for acceptLicense:

acceptLicense: true

Note: If you do not accept the license agreement, call home will not be enabled.

2. Personal Information

Enter your companyName, the customerID that IBM® provided to you, the companyEmail and the countryCode.

The countryCode is a two-letter upper-case country codes as defined in ISO 3166-1 alpha-2. Forexample US for the United States or DE for Germany.

3. Proxy (optional)

If you are using a proxy for communication, enter the proxy host and port in their repective fields.

If your proxy requires authentication, you have to create a secret containing the credentials. You can doso with the following command:

oc create secret generic <proxy_secret_name> --from-literal=username='<proxy_username>' \ --from-literal=password='<proxy_password>' -n ibm-spectrum-scale-ns

Enter the name of the created secret to secretName: <proxy_secret_name>. If you do not useproxy authentication, comment out secretName.

Note: If you are not using a proxy, ensure that the proxy section under callhome iscommented out with the # character.

4. Call home groups

The cluster nodes are organized in call home groups. Call home groups help to distribute the data-gather and data-upload workload to prevent bottlenecks.

The call home groups are automatically created and maintained by the IBM Spectrum® Scale Operator.There is no need to perform any manual action on the call home groups at any time.

Note: For call home, you can enable, modify, or disable call home at any time. After running the oc apply command on the ScaleCluster custom resource, the call home configuration changewill be applied immediately.

Host AliasesThe hostnames of the contact nodes of the storage cluster must be resolvable via DNS by the client clusternodes. If the IP addresses of the storage cluster nodes cannot be resolved via DNS, the hostname and their IPaddresses can be specified in the hostAliases section of the custom resource file. This allows the operatorto automatically add entries into the /etc/hosts file on the Pods.

hostAliases: - hostname: node1.example.com ip: 10.0.0.1 - hostname: node2.example.com ip: 10.0.0.2

Create the IBM Spectrum® Scale Container Native StorageAccess (CNSA) Cluster22 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Once the custom resource is applied, the IBM Spectrum® Scale Operator takes over and creates all the podsthat make up an IBM Spectrum® Scale Container Native Storage Access cluster. The Operator also sets up aremote cluster relationship with the storage cluster and mounts the file system on all worker nodes that runIBM Spectrum® Scale. This operation can take some time.

1. Create the Operator

oc create -f spectrumscale/deploy/operator.yaml -n ibm-spectrum-scale-ns

2. Verify the operator pod is in the "Running" state

# oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-operator-d7ffcbc7-7znj7 1/1 Running 0 15s

3. Create the Custom Resource

oc create -f spectrumscale/deploy/crds/ibm_v1_scalecluster_cr.yaml -n ibm-spectrum-scale-ns

Verify the IBM Spectrum® Scale Container Native Storage Access(CNSA) Cluster

In the following steps, you must provide the actual core pod names that you obtain by submitting the oc get pods -n ibm-spectrum-scale-ns command.

1. Verify that the Operator creates the ScaleCluster Custom Resource by checking pods and the Operatorlogs:

oc get pods -n ibm-spectrum-scale-ns oc logs <scale-operator-pod> -n ibm-spectrum-scale-ns -f

A sample output of the oc get pods -n ibm-spectrum-scale-ns command is as follows:

# oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-core-c9r57 1/1 Running 0 2m52s ibm-spectrum-scale-core-jm76n 1/1 Running 0 2m52s ibm-spectrum-scale-core-kfqmg 1/1 Running 0 2m52s ibm-spectrum-scale-gui-0 9/9 Running 0 2m52s ibm-spectrum-scale-operator-d7ffcbc7-7znj7 1/1 Running 0 2m52s ibm-spectrum-scale-pmcollector-0 2/2 Running 0 2m52s ibm-spectrum-scale-pmcollector-1 2/2 Running 0 2m52s

Note: The resulting cluster will contain one operator pod, one GUI pod, two pmcollectorpods, and one core pod per node that has the specified nodeSelector label.

2. Verify that the IBM Spectrum Scale cluster has been correctly created:

# oc exec ibm-spectrum-scale-core-c9r57 -n ibm-spectrum-scale-ns -- mmlscluster

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 23

GPFS cluster information ======================== GPFS cluster name: ibm-spectrum-scale.ibm-spectrum-scale-ns.example.com GPFS cluster id: 8474207021938860514 GPFS UID domain: ibm-spectrum-scale.ibm-spectrum-scale-ns.example.com Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR

Node Daemon node name IP address Admin node name Designation --------------------------------------------------------------------------------- 1 worker0.example.com 10.17.57.33 worker0.example.com quorum-manager-perfmon 2 worker1.example.com 10.17.58.33 worker1.example.com quorum-manager-perfmon 3 worker2.example.com 10.17.59.33 worker2.example.com quorum-manager-perfmon

# oc exec ibm-spectrum-scale-core-c9r57 -n ibm-spectrum-scale-ns -- mmgetstate -a Node number Node name GPFS state ------------------------------------------- 1 worker0 active 2 worker1 active 3 worker2 active

3. Verify that the storage cluster has been configured:

# oc exec ibm-spectrum-scale-core-c9r57 -n ibm-spectrum-scale-ns -- mmremotecluster show all Cluster name: storage-node1.example.com Contact nodes: storage-node1.example.com,storage-node2.example.com,storage-node3.example.com ... File systems: fs1 (fs1)

4. Verify the storage cluster file system has been configured:

# oc exec ibm-spectrum-scale-core-c9r57 -n ibm-spectrum-scale-ns -- mmremotefs show Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority fs1 fs1 storage-node1.example.com /mnt/fs1 rw yes - 0

5. Verify the storage cluster file system has been remotely mounted:

# oc exec ibm-spectrum-scale-core-c9r57 -n ibm-spectrum-scale-ns -- mmlsmount fs1 -L File system fs1 (storage-node1.example.com:fs1) is mounted on 6 nodes: 10.11.29.64 storage-node1 storage-node1.example.com 10.11.29.65 storage-node2 storage-node1.example.com 10.11.7.28 storage-node3 storage-node1.example.com 10.16.101.172 worker2.example.com ibm-spectrum-scale.ibm-spectrum-scale-ns.example.com 10.16.101.155 worker0.example.com ibm-spectrum-scale.ibm-spectrum-scale-ns.example.com 10.16.101.156 worker1.example.com ibm-spectrum-scale.ibm-spectrum-scale-ns.example.com

24 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Deploying IBM Spectrum Scale Container Storage Interface(CSI)

Use the following sections to help with deploying IBM Spectrum® Scale CSI with IBM Spectrum® ScaleContainer Native Storage Access (CNSA):

Performing pre-installation tasks for CSI Operator deploymentConfiguring Custom Resource (CR) for CSI driverInstall CSI driver

Performing pre-installation tasks for CSI Operator deployment

Complete the following steps for storage cluster, CNSA cluster, and custom resource file for CSI:

1. Ensure the following pre-installation tasks on the storage cluster are completed.

Create an IBM Spectrum® Scale user group "CsiAdmin" using the following command:

/usr/lpp/mmfs/gui/cli/mkusergrp CsiAdmin --role csiadmin

Create an IBM Spectrum® Scale user in the "CsiAdmin" group. This user must be used on IBMSpectrum® Scale Container Storage Interface driver configuration. Issue this command on theGUI node to create the user.

/usr/lpp/mmfs/gui/cli/mkuser csi-storage-gui-user -p csi-storage-gui-password -g CsiAdmin

Perfileset quota on the file systems to be used by IBM Spectrum® Scale Container StorageInterface driver is set to No

mmlsfs fs1 --perfileset-quota flag value description ------------------- ------------------------ ------------------------------- --perfileset-quota No Per-fileset quota enforcement

Quota is enabled in the file systems to be used by IBM Spectrum® Scale Container StorageInterface driver

mmchfs fs1 -Q yes

Verify that quota is enabled.

mmlsfs fs1 -Q

flag value description ------------------- ------------------------ ----------------------------------- -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled

Enable quota for root user by issuing the following command:

mmchconfig enforceFilesetQuotaOnRoot=yes -i

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 25

Ensure that the controlSetxattrImmutableSELinux parameter is set to "yes" by issuing thefollowing command:

mmchconfig controlSetxattrImmutableSELinux=yes -i

Enable filesetdf of the file system by using the following command:

mmchfs fs1 --filesetdf

2. Ensure the following pre-installation tasks on the CNSA cluster are completed.

Apply the IBM Spectrum® Scale Container Storage Interface driver node label to the workernodes where CNSA pods are running. By default, these will be the Red Hat® OpenShift®Container Platform worker nodes.

For example, to label all Red Hat® OpenShift® Container Platform worker nodes to use node label"scale=true", issue the following command:

oc label nodes -l node-role.kubernetes.io/worker= scale=true

Create a CNSA GUI user for CSI:

oc exec -c liberty ibm-spectrum-scale-gui-0 -- /usr/lpp/mmfs/gui/cli/mkuser csi-cnsa-gui-user -p csi-cnsa-gui-password -g CsiAdmin

3. Gather information necessary for Configuring Custom Resource (CR) for IBM Spectrum® ScaleContainer Storage Interface (CSI) driver from the CNSA cluster.

The GUI host for CNSA follows the format of ibm-spectrum-scale-gui.<namespace>. In thiscase, our GUI host would be named ibm-spectrum-scale-gui.ibm-spectrum-scale-ns.

Identify node mapping:

You can use the following REST API call to validate the IBM Spectrum® Scale node namesunder the mount:nodesMountedReadWrite section.

From the infrastructure node, issue the following command against an existing IBMSpectrum® Scale pod:

oc exec <ibm-spectrum-scale-core-pod> -- curl -k https://ibm-spectrum-scale-gui.ibm-spectrum-scale-ns/scalemgmt/v2/filesystems/fs1?fields=mount -u "csi-cnsa-gui-user:csi-cnsa-gui-password"

Retrieve OpenShift worker nodes by issuing the oc get nodes command

Note: For more information on node mapping, see Kubernetes to IBM Spectrum®Scale node mapping.

Retrieve and note down the CNSA cluster ID:

From the infrastructure node, issue the following command against an existing IBM Spectrum®Scale pod:

oc exec <ibm-spectrum-scale-core-pod> -- curl -s -k https://ibm-spectrum-scale-gui.ibm-spectrum-scale-ns/scalemgmt/v2/cluster -u "csi-cnsa-gui-user:csi-cnsa-gui-password" | grep clusterId

Retrieve and note down the storage cluster ID:

26 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Note: The storage GUI host is the same host used in the Remote Mount section ofCNSA Custom Resource

remoteClusters: - name: storageCluster1 gui: host: "example-gui.com" <----

Issue the following curl command against the storage cluster to return the storage cluster ID:

curl -s -k https://example-gui.com/scalemgmt/v2/cluster -u "csi-storage-gui-user:csi-storage-gui-password" | grep clusterId

Configuring Custom Resource (CR) for CSI driver

Follow the steps starting with Step 2 under the section Phase 2: Deploying IBM Spectrum® Scale ContainerStorage Interface driver from Installing IBM Spectrum® Scale Container Storage Interface driver using CLIsto download the sample custom resource file, modify the CR file parameters, and deploy the CSI operator.

The following table lists the keys and example values necessary to update in the IBM Spectrum® ScaleContainer Storage Interface (CSI) CR file.

Note: For more information, see Remote cluster support.

The first clusters element (clusters[0]) will represent the CNSA cluster, the second clusters element(clusters[1]) will represent the storage cluster.

Key Value Description NotesscaleHostpath "/mnt/fs1" Mount point on CNSA

clusterclusters[0].id "7061811038141825260" IBM Spectrum® Scale

cluster ID of the CNSAcluster

oc exec <ibm-spectrum-scale-core-pod> -- mmlscluster | grep 'GPFScluster id:'

clusters[0].secrets

"secret-cnsa" GUI secret with CNSAGUI credentials

clusters[0].primary.primaryFs

"fs1" File system on CNSAcluster

clusters[0].primary.primaryFset

"scale-fset1" Optional

clusters[0].primary.remoteCluster

"4251191354533383468" IBM Spectrum® Scalecluster ID of the storagecluster

clusters[0].restApi.guiHost

ibm-spectrum-scale-gui.ibm-spectrum-scale-ns

CNSA GUI Host ibm-spectrum-scale-gui.ibm-spectrum-scale-ns

clusters[1].id "4251191354533383468" IBM Spectrum® Scalecluster ID of the storagecluster

clusters[1].secrets

"secret-storage" GUI secret with storagecluster GUI credentials

clusters[1].secureSslMode

false

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 27

Key Value Description Notesclusters[1].restApi.guiHost

"storagegui.example.com" Storage cluster GUI Host

Install CSI driver

Configuring CSI to work with CNSA includes the following two phases. For more information on CSIinstallation, see Installing CSI.

Note: If you have access to Operator Lifecycle Manager (OLM), you may opt to Installing IBMSpectrum Scale Container Storage Interface driver using Operator Lifecycle Manager.

Phase 1: Deploying the Operator1. Create the namespace for the IBM Spectrum® Scale CSI driver.

oc new-project ibm-spectrum-scale-csi-driver

2. Deploy the Operator.

oc create -f https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-csi/v2.1.0/generated/installer/ibm-spectrum-scale-csi-operator.yaml -n ibm-spectrum-scale-csi-driver

3. Verify that the Operator is deployed, and the Operator pod is in running state.

oc get pod,deployment -n ibm-spectrum-scale-csi-driver

NAME READY STATUS RESTARTS AGE pod/ibm-spectrum-scale-csi-operator-56955949c4-tvv6q 1/1 Running 0 28s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/ibm-spectrum-scale-csi-operator 1/1 1 1 29s

Phase 2: Deploying IBM Spectrum® Scale Container Storage Interfacedriver

1. Create a secret for the CNSA GUI user created in step 2 of Performing pre-installation tasks for CSIOperator deployment by issuing the following command:

oc create secret generic secret-cnsa --from-literal=username=csi-cnsa-gui-user --from-literal=password=csi-cnsa-gui-password -n ibm-spectrum-scale-csi-driver

oc label secret secret-cnsa product=ibm-spectrum-scale-csi

2. Create a secret for the GUI user on the storage cluster created in Storage cluster configuration.

oc create secret generic secret-storage --from-literal=username=csi-storage-gui-user --from-literal=password=csi-storage-gui-password -n ibm-spectrum-scale-csi-driver

oc label secret secret-storage product=ibm-spectrum-scale-csi

28 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

3. Download the sample custom resource file on your cluster.

curl -O https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-csi/v2.1.0/operator/deploy/crds/csiscaleoperators.csi.ibm.com_cr.yaml

4. Modify the parameters in the file to suit your environment. For more information, see ConfiguringCustom Resource (CR) for CSI driver. Once finished modifying the CR, return to this page and continueto step 5.

5. Create the CR file.

oc create -f csiscaleoperators.csi.ibm.com_cr.yaml -n ibm-spectrum-scale-csi-driver

6. Verify that IBM Spectrum® Scale Container Storage Interface driver is installed, Operator and driverresources are ready, and pods are in running state.

oc get pod,daemonset,statefulset -n ibm-spectrum-scale-csi-driver NAME READY STATUS RESTARTS AGE pod/ibm-spectrum-scale-csi-attacher-0 1/1 Running 0 4d2h pod/ibm-spectrum-scale-csi-dxslh 2/2 Running 0 4d2h pod/ibm-spectrum-scale-csi-provisioner-0 1/1 Running 0 4d2h pod/ibm-spectrum-scale-csi-operator-6ff9cf6979-v5g4c 1/1 Running 0 1h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/ibm-spectrum-scale-csi 1 1 1 1 1 <none> 4d2h

NAME READY AGE statefulset.apps/ibm-spectrum-scale-csi-attacher 1/1 4d2h statefulset.apps/ibm-spectrum-scale-csi-provisioner 1/1 4d2h

Note: It can take some time to see the CSI driver pods scheduled and running

Maintenance for a deployed cluster

The maintenance of a deployed IBM Spectrum® Scale Container Native Storage Access (CNSA) cluster caninclude the following:

Shutting down a clusterRed Hat OpenShift Container Platform UpgradeStarting the cluster after shutdownAdding a new node to an existing cluster

Shut down a Cluster

Shutting down CSINote: The CSI documentation for shutting down CSI can be found here

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 29

1. Move or stop all pods that use volumes that are managed by the IBM Spectrum® Scale ContainerStorage Interface driver.

2. Drain the nodes so that StatefulSets move to other nodes.

oc adm drain <node1> <node2> --ignore-daemonsets=true --delete-local-data=true

3. Verify that CSI is stopped and that CSI volumes are neither attached nor in use.

oc get nodes -l=node-role.kubernetes.io/worker= \ -ojsonpath="{range .items[*]}{'Node: '}{.metadata.name}{'\n'}\ {'Volumes Attached\n'}{.status.volumesAttached}{'\n'}\ {'Volumes In Use\n'}{.status.volumesInUse}{'\n\n'}"

The following sample output shows when nodes with volumes that are attached or in use:

Node: worker0.example.com Volumes Attached Volumes In Use

Node: worker1.example.com Volumes Attached [map[devicePath: name:kubernetes.io/csi/spectrumscale.csi.ibm.com^11503701103687025168;0A0B1D40:5F88C327;filesetName=pvc-ad01f21c-0159-4468-9e27-dfd79ec21617;path=/mnt/cluster-fs1/cluster-fset-nov23/.volumes/pvc-ad01f21c-0159-4468-9e27-dfd79ec21617]] Volumes In Use [kubernetes.io/csi/spectrumscale.csi.ibm.com^11503701103687025168;0A0B1D40:5F88C327;filesetName=pvc-ad01f21c-0159-4468-9e27-dfd79ec21617;path=/mnt/cluster-fs1/cluster-fset-nov23/.volumes/pvc-ad01f21c-0159-4468-9e27-dfd79ec21617]

Node: worker2.example.com Volumes Attached [map[devicePath: name:kubernetes.io/csi/spectrumscale.csi.ibm.com^11503701103687025168;0A0B1D40:5F88C327;filesetName=pvc-ad01f21c-0159-4468-9e27-dfd79ec21617;path=/mnt/cluster-fs1/cluster-fset-nov23/.volumes/pvc-ad01f21c-0159-4468-9e27-dfd79ec21617]] Volumes In Use [kubernetes.io/csi/spectrumscale.csi.ibm.com^11503701103687025168;0A0B1D40:5F88C327;filesetName=pvc-ad01f21c-0159-4468-9e27-dfd79ec21617;path=/mnt/cluster-fs1/cluster-fset-nov23/.volumes/pvc-ad01f21c-0159-4468-9e27-dfd79ec21617]

To display more information on a node that has volumes attached or in use, issue the followingcommand and look under the volumesAttached and volumesInUse fields:

oc get node <node-name> -o yaml

4. Verify that there are no bind mounts by opening a remote shell into an IBM Spectrum® Scale pod andrunning the following commands:

oc rsh <scale-pod-name> sh-4.4# python3 gpfs-namespace-mounts.py sh-4.4#

Note: No output from the gpfs-namespace-mounts.py means that there are no bindmounts.

5. Remove the node label that is assigned to the worker nodes.

oc label nodes -l node-role.kubernetes.io/worker= scale-

30 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Shutting down a deployed cluster1. Edit the existing ScaleCluster Custom Resource yaml and add a node selector that does not exist. This

will ensure no pods are scheduled.

Get the name of the ScaleCluster custom resource, which will be the NAME field returned fromthe following command:

oc get gpfs -n ibm-spectrum-scale-ns NAME CLUSTER NAME ibm-spectrum-scale ibm-spectrum-scale.ibm-spectrum-scale-ns

Using the name of the ScaleCluster custom resource, issue the following command to edit thecustom resource:

oc edit gpfs ibm-spectrum-scale

Note: An editor will open with the contents of the ScaleCluster custom resource.

Locate the nodeSelector field and add a unique label not being used by the cluster. In thisexample, we are using spectrumscale.ibm.com/none as the label. Add this line under the nodeSelector field, save, and exit the file.

... nodeSelector: node-role.kubernetes.io/worker: "" spectrumscale.ibm.com/none: "" ...

After applying the new label, the ibm-spectrum-scale-core pods will begin to terminate, andshut down in the process.

Example:

# oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-core-7mc5l 1/1 Terminating 0 13d ibm-spectrum-scale-core-g4jgg 0/1 Terminating 0 13d ibm-spectrum-scale-core-xxj22 0/1 Terminating 0 13d ibm-spectrum-scale-gui-0 9/9 Running 0 13d ibm-spectrum-scale-operator-79cd7dfb68-srm6r 1/1 Running 1 13d ibm-spectrum-scale-pmcollector-0 2/2 Running 0 13d ibm-spectrum-scale-pmcollector-1 2/2 Running 0 13d

2. To verify the IBM Spectrum® Scale cluster has been shut down, run oc get pods -n ibm-spectrum-scale-ns, and there should be no remaining ibm-spectrum-scale-core pods. It isexpected that the operator, gui, and pmcollector pods will remain running.

oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-gui-0 9/9 Running 0 13d

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 31

ibm-spectrum-scale-operator-79cd7dfb68-srm6r 1/1 Running 0 13d ibm-spectrum-scale-pmcollector-0 2/2 Running 0 13d ibm-spectrum-scale-pmcollector-1 2/2 Running 0 13d

Red Hat® OpenShift® Container Platform Upgrade

In preparation of an upgrade, you will need to stop all applications that use the IBM Spectrum® Scale filesystem to ensure that there is no workload running. Once the workload has stopped, proceed with Red Hat®OpenShift® Container Platform upgrade.

As described in Red Hat® OpenShift® Container Platform Updating Cluster documentation:

During the upgrade process, the Machine Config Operator (MCO) applies the new configurationto your cluster machines. It cordons the number of nodes that is specified by themaxUnavailable field on the machine configuration pool and marks them as unavailable. Bydefault, this value is set to 1. It then applies the new configuration and reboots the machine.

As the worker nodes are updated by Red Hat® OpenShift® Container Platform, the IBM Spectrum® Scale podswill be restarted and returned to a scheduleable state.

Note: If any problems are encountered during OpenShift Upgrade, gather cluster data to open asupport case, as it is helpful to provide debugging information about your cluster to Red HatSupport.

Upgrade from Red Hat OpenShift 4.5 to 4.6.xWhen upgrading Red Hat OpenShift Container Platform 4.6.x, where 'x' is greater than or equal to 6, anadditional step must be performed to properly configure the worker nodes via Machine Config Operator.

Note: IBM Spectrum Scale Container Native Storage access requires a level of Red HatOpenShift Container Platform 4.6.x, where 'x' is greater than or equal to 6.

1. Ensure that you have properly shut down IBM Spectrum Scale cluster by following the proceduresoutlined here: Shutting down a Cluster

2. Run the Red Hat OpenShift Container Platform 4.6 upgrade procedures.

3. Apply Machine Config Operator instructions to configure the worker nodes.

4. Start up the IBM Spectrum Scale cluster by following the procedures outlined here: Starting the clusterafter shutdown

Adding a new node

Adding a new node to an existing clusterWhen a new node with labels that match the existing cluster's node selectors is added, a pod will be createdon the new node. The new pod will be up and running withing a few minutes.

To check on the progress of the creation of the new pod, issue the following command:

32 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

oc get pods -n ibm-spectrum-scale-ns

To ensure the new pod is ready, issue the following command:

oc exec <scale-pod> -n ibm-spectrum-scale-ns -- mmgetstate -a

The output will look something like this.

Node number Node name GPFS state ------------------------------------------- 1 worker1 active 2 new node arbitrating 3 worker0 active

Once the pod has finished arbitrating and enters the active state, CSI is ready to be enabled on this node.

Configuring CSI on new nodesNote: CSI should only be configured on new nodes after they are finished arbitrating and inActive state. Applying the CSI node label before nodes are in an active state can causeunexpected behavior.

For CSI to recognize the newly added node, apply the label to the node:

oc label node <node-name> scale=true

The newly added node can now be used for running applications.

Cleanup

To safely remove the pods or perform other maintenance actions, IBM Spectrum® Scale Container NativeStorage Access (CNSA) cluster needs to be manually shut down prior to performing these operations. Thefollowing will outline how to validate if it is safe to shut down and how to perform the actions.

Delete a clusterRemove applicationsClean up IBM Spectrum Scale Container Storage InterfaceClean up IBM Spectrum Scale OperatorClean up the worker nodesClean up on the storage cluster

Delete a Cluster

When deleting the entire cluster, all applications and IBM Spectrum® Scale Container Storage Interface drivermust be unloaded prior to the umount and shutdown steps.

Remove applications

Note: Ensure you are in the project for IBM Spectrum® Scale Container Storage Interface (CSI)driver.

1. Query the PVC to identify the applications that are active:

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 33

oc describe <csi pvc>

2. Remove all the applications (May require a drain of the node):

oc delete <application deployment or daemonSet from csi pvc describe output>

3. Remove the PV / PVC:

oc delete pvc <pvc name> oc delete pv <pv name>

Clean up IBM Spectrum® Scale Container Storage Interface(CSI)

For steps to remove CSI from your cluster, see Cleaning up IBM Spectrum® Scale Container Storage Interfacedriver and Operator using CLI.

Clean up IBM Spectrum® Scale Operator

Note: Ensure you are in the project that you have created for your IBM Spectrum® Scale cluster.

1. Stop and uninstall the IBM Spectrum® Scale pods:

oc delete -f spectrumscale/deploy/crds/ibm_v1_scalecluster_cr.yaml -n ibm-spectrum-scale-ns

2. Uninstall the Operator:

oc delete -f spectrumscale/deploy/operator.yaml -n ibm-spectrum-scale-ns

3. Delete the Spectrum Scale Project to remove the secrets, config maps, role, role binding, and serviceaccount objects:

oc delete project ibm-spectrum-scale-ns

4. Delete the remaining Operator objects:

oc delete -f spectrumscale/deploy/crds/ibm_v1_scalecluster_crd.yaml -n ibm-spectrum-scale-ns oc delete -f spectrumscale/deploy/cluster_role_binding.yaml -n ibm-spectrum-scale-ns oc delete -f spectrumscale/deploy/cluster_role.yaml -n ibm-spectrum-scale-ns oc delete -f spectrumscale/deploy/role_binding.yaml -n ibm-spectrum-scale-ns oc delete -f spectrumscale/deploy/role.yaml -n ibm-spectrum-scale-ns oc delete -f spectrumscale/deploy/role_binding_scale_core.yaml -n ibm-spectrum-scale-ns oc delete -f spectrumscale/deploy/role_scale_core.yaml -n ibm-spectrum-scale-ns

5. Delete IBM Spectrum® Scale Operator Security Context Constraints (SCCs):

oc delete scc ibm-spectrum-scale-restricted oc delete scc ibm-spectrum-scale-privileged

6. Clean up the performance monitoring artifacts:

34 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Note: The local PVs used by performance monitoring may already have been deleted fromdeleting the project in Step 3.

List the PVs with claim of datadir-ibm-spectrum-scale-scale-pmcollector. There will be twothat are returned.

oc get pv -l app=scale-pmcollector oc delete pv -l app=scale-pmcollector

Delete the PVC created by performance monitoring:

oc get pvc | grep pmcollector oc delete pvc <pvc-name>

Delete the Storage Class created by performance monitoring:

oc delete sc -l app=scale-pmcollector

Clean up the worker nodes

IBM Spectrum® Scale requires host path volume mounts and creates directories on each worker node.

Note: At this point, the project has been deleted. Ensure you are in the default namespace usingcommand: oc project default

1. List the nodes that have the node-role.kubernetes.io/worker= label:

oc get nodes -l 'node-role.kubernetes.io/worker=' -o jsonpath="{range .items[*]}{.metadata.name}{'\n'}"

2. For each of the listed worker nodes, run the following command to create a debug pod that will removethe host path volume mounted directories used by IBM Spectrum® Scale.

oc debug node/<openshift_worker_node> -T -- chroot /host sh -c "rm -rf /var/mmfs; rm -rf /var/adm/ras"

Example:

oc debug node/worker0.example.com -T -- chroot /host sh -c "rm -rf /var/mmfs; rm -rf /var/adm/ras" Starting pod/worker0examplecom-debug ... To use host binaries, run `chroot /host` Removing debug pod ...

3. Ensure that none of the files are left by issuing the following command:

oc debug node/<openshift_worker_node> -T -- chroot /host sh -c "ls /var/mmfs; ls /var/adm/ras"

Example:

oc debug node/worker0.example.com -T -- chroot /host sh -c "ls /var/mmfs; ls /var/adm/ras" Starting pod/worker0examplecom-debug ... To use host binaries, run `chroot /host` ls: cannot access '/var/mmfs': No such file or directory ls: cannot access '/var/adm/ras': No such file or directory Removing debug pod ... error: non-zero exit code from debug container

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 35

4. Remove node annotations created by Container Native Storage Access Operator

oc get nodes -ojsonpath="{range .items[*]}{.metadata.name}{'\n'}" | xargs -I{} oc annotate node {} scale.ibm.com/nodedesc-

Clean up on the storage cluster

To delete the access that has been granted to the IBM Spectrum® Scale client cluster for mounting the remotefile system, issue the following commands on the IBM Spectrum® Scale storage cluster:

1. Query the name of the containerized client cluster, for example:

$ mmauth show all | grep ibm-spectrum-scale Cluster name: ibm-spectrum-scale.clustername.example.com

2. Remove the client cluster authorization, for example:

$ mmauth delete ibm-spectrum-scale.clustername.example.com mmauth: Propagating the cluster configuration data to all affected nodes. mmauth: Command successfully completed

Troubleshooting

Use the following sections to help troubleshoot and debug specific issues with IBM Spectrum® ScaleContainer Native Storage Access (CNSA):

Debugging the IBM Spectrum Scale CNSA operatorDebugging IBM Spectrum Scale DeploymentDebugging IBM Spectrum Scale Container Storage Interface (CSI) DeploymentCommon IssuesKnown IssuesCollecting Data for Support

Debugging the IBM Spectrum® Scale CNSA operator

Problem: The operator pod is not successfully deployedNo operator pod appears when running oc get pods -n ibm-spectrum-scale-ns.

Verify that all worker nodes in the Red Hat® OpenShift® Container Platform cluster are in a "Ready"state. If all worker nodes are in "NotReady" state, the operator pod will not have an eligible node to bedeployed to.

[user@example ~]#oc get nodes -n ibm-spectrum-scale-ns NAME STATUS ROLES AGE VERSION master0.example.com Ready master 65d v1.18.3+6c42de8 master1.example.com Ready master 65d v1.18.3+6c42de8 master2.example.com Ready master 65d v1.18.3+6c42de8 worker0.example.com NotReady worker 65d v1.18.3+6c42de8 worker1.example.com NotReady worker 65d v1.18.3+6c42de8 worker2.example.com NotReady worker 65d v1.18.3+6c42de8

Verify that all yaml files in the spectrumscale/deploy directory have been created.

36 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

oc get serviceaccount -n ibm-spectrum-scale-ns | grep ibm-spectrum-scale-operator oc get role -n ibm-spectrum-scale-ns | grep ibm-spectrum-scale-operator oc get rolebinding -n ibm-spectrum-scale-ns | grep ibm-spectrum-scale-operator oc get clusterrole -n ibm-spectrum-scale-ns | grep ibm-spectrum-scale-operator oc get clusterrolebinding -n ibm-spectrum-scale-ns | grep ibm-spectrum-scale-operator

If any of the above commands do not find a matching line, refer to Step 3 under Operator deploymentpreparations to create missing resources.

Look for more details via the following commands

oc get deployment -n ibm-spectrum-scale-ns oc describe deployment ibm-spectrum-scale-operator -n ibm-spectrum-scale-ns

oc get replicasets -n ibm-spectrum-scale-ns oc describe replicaset <replicaset name> -n ibm-spectrum-scale-ns

Problem: The operator pod is in ErrImgPull or ImagePullBackOff stateWhen viewing oc get pods -n ibm-spectrum-scale-ns, the operator pod is in ErrImgPull or ImagePullBackOff state.

Verify the namespace you are currently in has images in the image stream. Run oc get is -n ibm-spectrum-scale-ns to see the contents of the local registry. If empty, images may have been loadedinto a different namespace.

Verify the image name supplied in the operator.yaml file contains the correct registry. Describe theoperator pod with oc describe pod <operator pod name> -n ibm-spectrum-scale-ns andlook at "Events" at the bottom of the output.

[root@example ~]# oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-operator-6cb86cc4f-jsmp9 0/1 ErrImagePull 0 2s

[root@example ~]# oc describe pod ibm-spectrum-scale-operator-6cb86cc4f-jsmp9 -n ibm-spectrum-scale-ns Name: ibm-spectrum-scale-operator-6cb86cc4f-jsmp9 Namespace: ibm-spectrum-scale-ns Priority: 0 ... Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned ibm-spectrum-scale-ns/ibm-spectrum-scale-operator-6cb86cc4f-jsmp9 to worker1.example.com Normal AddedInterface 24s multus Add eth0 [10.254.20.3/22] Normal Pulling 10s (x2 over 24s) kubelet, worker1.example.com Pulling image "default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-core-operator-beta3" Warning Failed 10s (x2 over 24s) kubelet, worker1.example.com Failed to pull image "default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-core-operator-beta3":

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 37

rpc error: code = Unknown desc = error pinging docker registry default-route-openshift-image-registry.apps.example.com: Get https://default-route-openshift-image-registry.apps.example.com/v2/: x509: certificate signed by unknown authority Warning Failed 10s (x2 over 24s) kubelet, worker1.example.com Error: ErrImagePull Normal BackOff <invalid> (x3 over 24s) kubelet, worker1.example.com Back-off pulling image "default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-core-operator-beta3" Warning Failed <invalid> (x3 over 24s) kubelet, worker1.example.com Error: ImagePullBackOff

The message Failed to pull image "default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-core-operator-beta3": rpc error: code = Unknown desc = error pinging docker registry default-route-openshift-image-registry.apps.example.com: Get https://default-route-openshift-image-registry.apps.example.com/v2/: x509: certificate is indicative ofusing the incorrect value for the image path. The correct path is image-registry.openshift-image-registry.svc:5000

Verify the correct image tag was supplied in the operator.yaml file. Describe the operator pod with oc describe pod <operator pod name> -n ibm-spectrum-scale-ns and look at "Events" atthe bottom of the output.

[root@example ~]# oc describe pod ibm-spectrum-scale-operator-bd54ccf5c-rsmbg -n ibm-spectrum-scale-ns Name: ibm-spectrum-scale-operator-bd54ccf5c-rsmbg Namespace: ibm-spectrum-scale-ns ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned ibm-spectrum-scale-ns/ibm-spectrum-scale-operator-bd54ccf5c-rsmbg to worker1.example.com Normal AddedInterface 7s multus Add eth0 [10.254.20.8/22] Normal Pulling 7s kubelet, worker1.example.com Pulling image "image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core-operator:beta3" Warning Failed 6s kubelet, worker1.example.com Failed to pull image "image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core-operator:beta3": rpc error: code = Unknown desc = Error reading manifest beta3 in image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core-operator: name unknown Warning Failed 6s kubelet, worker1.example.com Error: ErrImagePull Normal BackOff 5s (x2 over 6s) kubelet, worker1.example.com Back-off pulling image "image-registry.openshift-image-registry.svc:5000/ibm-spectrum-scale-ns/scale-core-operator:beta3" Warning Failed 5s (x2 over 6s) kubelet, worker1.example.com Error: ImagePullBackOff

Reference the images that were uploaded to the local registry and verify they were correctly supplied inthe operator.yaml file. In this scenario, the pod was trying to pull scale-core-operator:beta3but the local registry contains scale-core-operator-beta3:latest

[root@example ~]# oc get is -n ibm-spectrum-scale-ns NAME IMAGE REPOSITORY

38 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

TAGS UPDATED scale-core-beta3 default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-core-beta3 latest About an hour ago scale-core-operator-beta3 default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-core-operator-beta3 latest About an hour ago scale-gui-beta3 default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-gui-beta3 latest About an hour ago scale-monitor-beta3 default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-monitor-beta3 latest About an hour ago scale-pmcollector-beta3 default-route-openshift-image-registry.apps.example.com/ibm-spectrum-scale-ns/scale-pmcollector-beta3 latest About an hour ago

Problem: Operator pod shows container restartsKubernetes will keep the logs of the current container and the previous container. Take a look at theprevious container's logs for any clues using the following command:

oc logs -p <operator pod> -n ibm-spectrum-scale-ns

Debugging IBM Spectrum® Scale deployment

Problem: ScaleCluster CustomResource was created but no pods arecreated.

[root@example deploy]# oc create -f crds/ibm_v1_scalecluster_cr.yaml -n ibm-spectrum-scale-ns scalecluster.scale.ibm.com/ibm-spectrum-scale created

[root@example deploy]# oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-operator-7dbdb67f9f-wkrtg 1/1 Running 0 2m21s

Check that the correct namespace was supplied in spectrumscale/deploy/cluster_role_binding.yaml. In most cases, this value was an incorrectnamespace or it was not changed at all from the default "\ " value that must be updated.

[root@example ~]# cat spectrumscale/deploy/cluster_role_binding.yaml --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: app.kubernetes.io/instance: ibm-spectrum-scale-core-operator app.kubernetes.io/managed-by: ibm-spectrum-scale-core-operator app.kubernetes.io/name: ibm-spectrum-scale-core-operator product: ibm-spectrum-scale-core release: ibm-spectrum-scale-core-operator name: ibm-spectrum-scale-operator roleRef: kind: ClusterRole name: ibm-spectrum-scale-operator apiGroup: rbac.authorization.k8s.io subjects:

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 39

- kind: ServiceAccount name: ibm-spectrum-scale-operator namespace: <namespace>

In this case, the cluster_role_binding.yaml was not updated to contain the proper namespace value. Toremedy the issue, edit the spectrumscale/deploy/cluster_role_binding.yaml file and replace <namespace> with the correct value. Save the changes and run oc apply -f spectrumscale/deploy/cluster_role_binding.yaml -n ibm-spectrum-scale-ns. If thisdoes not work, try deleting the operator resource, deleting the cluster_role_binding, recreating thecluster_role_binding, and finally recreating the operator.

Problem: Core, GUI, or collector pod(s) are in ErrImgPull orImagePullBackOff state

When viewing oc get pods -n ibm-spectrum-scale-ns, the core, pmcollector, or GUI pods are in ErrImgPull or ImagePullBackOff state.

Refer to the Debugging the operator page for steps to debug issues with images. The commands arethe same but in this scenario use the name of the pod that is having issues rather than the operatorpod. High level commands for checking the status of the deployment and to determine if there is anissue loading the images:

oc get ds -n ibm-spectrum-scale-ns oc describe ds -n ibm-spectrum-scale-ns oc get po -n ibm-spectrum-scale-ns oc describe po -n ibm-spectrum-scale-ns <scale pod>

Problem: Core, GUI, or collector pod(s) are not upIf the pods are not deployed or the cluster is not created, examine operator pod logs:

oc logs <scale operator pod> -n ibm-spectrum-scale-ns

Problem: Core, GUI, or collector pod(s) show container restartsKubernetes will keep the logs of the current container and the previous container. Take a look at theprevious container's logs for any clues using the following command:

oc logs -p <scale pod> -n ibm-spectrum-scale-ns

Problem: All pods have been deployed but GPFS cluster is stuck in the"arbitrating" state.

If the cluster is stuck in arbitrating state:

oc exec <scale pod> -n ibm-spectrum-scale-ns -- mmlscluster oc exec <scale pod> -n ibm-spectrum-scale-ns -- vi /var/adm/ras/mmfs.log.latest

Note: The GPFS logs are mapped to /var/adm/ras/gpfslogs/ on the worker nodes

Problem: Remote Mount file system not getting configured or mountedVerify the network between the clusters using mmnetverify.

40 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Common issues

Error: daemon and kernel extension do not match.This error occurs when there is an unintentional upgrade of GPFS code.

The issue will present itself as the GPFS state being down and the above error being found in the GPFS logs.

To resolve the issue follow proper upgrade proceedures. The issue occurs because the kernel module cannotbe unloaded when a file system is in use. Rebooting the node will resolve the problem, or follow procedures toremove application workloads and then on the node issue:

rmmod tracedev mmfs26 mmfslinux

controllers.RemoteMount rest error: context deadline exceeded orUnauthorized 401

ERROR controllers.RemoteMount rest error: Get "https://ibm-spectrum-scale-gui.ibm-spectrum-scale-ns/scalemgmt/v2/remotemount/remotefilesystems/fs1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The IBM Spectrum® Scale GUI REST credentials are stored as kubernetes secrets. In some scenarios, such assecret deletion, the credentials may become out of sync with what is known to IBM Spectrum® Scale GUI.

When this occurs it requires manual intervention.

1. Read the credentials from the kubernetes secret created by the operator for accessing the in-clusterIBM Spectrum® Scale GUI REST API.

oc get secret ibm-spectrum-scale-gui-containeroperator -n ibm-spectrum-scale-ns -ojsonpath='{.data.password}' | base64 -d -

Note: In some shells the end of the line will have a highlighted %. This denotes there is nonew line and should not be included when updating the password.

Note: It is also possible to see this issue with the storage cluster GUI REST credentials.This can happen if the remote storage GUI password expires. In this scenario the errormessage URL will reference the hostname of the storage cluster GUI.

2. Connect to the in-cluster GUI pod

oc rsh -n ibm-spectrum-scale-ns ibm-spectrum-scale-gui-0

3. Update the password to match that expected by the operator per the kubernetes secret.

/usr/lpp/mmfs/gui/cli/chuser ContainerOperator -p <password from step 1.>

Note: For in-cluster GUI ContainerOperator is the username, but in cases involving theremote storage GUI the username will be as configured during installation.

MountVolume.SetUp failed for volume "ssh-keys"Warning FailedMount 83m (x5 over 83m) kubelet, worker-0.example.ibm.com MountVolume.SetUp failed for volume "ssh-keys" : secret "ibm-spectrum-scale-ssh-key-secret" not found

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 41

The pod create times show that the ssh key secret was created after the deployment was created. This meansthat the deployment rightfully could not find the secret to mount, as it did not yet exist.

This message can be misleading as the pods should resolve themselves once the secret is created. If corepods are not in a Running state, and the secret is already created, deleting the ibm-spectrum-scale-corepods should resolve the issue. This will restart the pods and allow the mount to complete successfully for thealready created SSH key.

pmcollector pod is in pending state during OpenShift ContainerPlatform upgrade or reboot

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 65s (x202 over 4h43m) default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

This issue is caused by a problem during OpenShift Container Platform Upgrade or when a worker node hasnot been reset to schedulable after reboot. The pmcollector will remain in a Pending state until the pod itselfand its respective Persistent Volume can be bound to a worker node.

[root@example ~]# oc get nodes NAME STATUS ROLES AGE VERSION master0.example.com Ready master 5d18h v1.18.3+2fbd7c7 master1.example.com Ready master 5d18h v1.18.3+2fbd7c7 master2.example.com Ready master 5d18h v1.18.3+2fbd7c7 worker0.example.com Ready worker 5d18h v1.17.1+45f8ddb worker1.example.com Ready,SchedulingDisabled worker 5d18h v1.17.1+45f8ddb worker2.example.com Ready worker 5d18h v1.17.1+45f8ddb

If the Persistent Volume has Node Affinity to the host that has SchedulingDisabled, the pmcollectorpod will remain in Pending state until the node associated with the PV becomes schedulable.

[root@example-inf ~]# oc describe pv worker1.example.com-pv Name: worker1.example.com-pv Labels: app=scale-pmcollector Annotations: pv.kubernetes.io/bound-by-controller: yes Finalizers: [kubernetes.io/pv-protection] StorageClass: ibm-spectrum-scale-internal Status: Bound Claim: example/datadir-ibm-spectrum-scale-pmcollector-1 Reclaim Policy: Delete Access Modes: RWO VolumeMode: Filesystem Capacity: 25Gi Node Affinity: Required Terms: Term 0: kubernetes.io/hostname in [worker1.example.com] Message: Source: Type: LocalVolume (a persistent volume backed by local storage on a node) Path: /var/mmfs/pmcollector

If the issue was with OpenShift Container Platform upgrade, fixing the upgrade issue should resolve thepending pod.

42 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

If the issue is due to worker node in SchedulingDisabled state and not due to a failed OpenShift ContainerPlatform upgrade, re-enable scheduling for the worker with oc adm uncordon.

Known Issues

Custom Resource (CR) was modified and applied but Pods did notrestart

The ibm-spectrum-scale DaemonSet update strategy was changed to "onDelete". This was done to preventunexpected restart of pods.

If applying changes to a cluster with pods already running, the pods will need to be deleted in order for thechanges to be applied.

pmsensors showing null after failure of pmcollector nodeIf a node that is running the pmcollector pod is drained, when the node is uncordoned the pmcollector podsget new IPs assigned. This leads to the pmsensors process issue. It reports:

Connection to scale-pmcollector-0.scale-pmcollector successfully established.

but getting error

Error on socket to scale-pmcollector-0.scale-pmcollector: No route to host (113)

See /var/log/zimon/ZIMonSensors.log. This issue can also be seen on the pmcollector pod:

sh-4.4$ echo "get metrics cpu_user bucket_size 5 last 10" | /opt/IBM&reg;/zimon/zc 0 1: worker1 2: worker2 Row Timestamp cpu_user 1 2020-11-16 05:27:25 null 2 2020-11-16 05:27:30 null 3 2020-11-16 05:27:35 null 4 2020-11-16 05:27:40 null 5 2020-11-16 05:27:45 null 6 2020-11-16 05:27:50 null 7 2020-11-16 05:27:55 null 8 2020-11-16 05:28:00 null 9 2020-11-16 05:28:05 null 10 2020-11-16 05:28:10 null

If the scale-pmcollector pods get their IP addresses changed, the pmsensors process will need to be killedand restarted manually on all scale-core pods, to get the performance metrics collection resumed.

To kill the pmsensor process, run these commands on all the ibm-spectrum-scale-core pods. The PMSENSORPID variable holds the results of the oc exec comamnd. If this variable is empty, there is noprocess running, and you do not need to issue the following command to kill the process.

PMSENSORPID=`oc exec <ibm-spectrum-scale-core> -n ibm-spectrum-scale-ns -- pgrep -fx '/opt/IBM/zimon/sbin/pmsensors -C /etc/scale-pmsensors-configuration/ZIMonSensors.cfg -R /var/run/perfmon'` echo $PMSENSORPID oc exec <scale-pod> -n ibm-spectrum-scale-ns -- kill $PMSENSORPID

To start the service again, run this command on all scale pods.

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 43

oc exec <scale-pod> -n ibm-spectrum-scale-ns -- /opt/IBM/zimon/sbin/pmsensors -C /etc/scale-pmsensors-configuration/ZIMonSensors.cfg -R /var/run/perfmon

Collecting data for support

Generating GPFS trace reportsConfiguring GPFS trace reports from cluster creationKernel crash dumpsLog Collection for IBM Spectrum® Scale Container Native Storage Access Deployment

Kubernetes ResourcesGPFS Snap

Gather logs for local GUIGather data about the Red Hat® OpenShift® Container Platform cluster

Generating GPFS trace reportsSome issues might require low-level system detail accessible only through the IBM Spectrum® Scale daemonand the IBM Spectrum® Scale Linux® kernel trace facilities.

In such instances the IBM® Support Center might request such GPFS trace reports to facilitate rapid problemdetermination of failures.

The level of detail that is gathered by the trace facility is controlled by setting the trace levels using themmtracectl command.

Note: The following should be performed under the direction of the IBM Support Center.

1. Access a running ibm-spectrum-scale-core pod by issuing the following command:

oc rsh -n ibm-spectrum-scale-ns <ibm-spectrum-scale-core-pod>

Note: The pod must be in Running status to connect. It is best to pick a pod running on anode that is not exhibiting issues.

The remaining steps should be completed while connected to this shell running inside the gpfscontainer of this running core pod.

2. Issue the mmchconfig dataStructureDump command to change the default location trace data willbe stored. The directory /var/adm/ras/ will allow the data generated within the container to persiston the host machine.

mmchconfig dataStructureDump=/var/adm/ras/

3. Set desired trace classes and levels. This part of the process is identical to classic IBM Spectrum®Scale installs. See more details about trace classes and levels.

mmtracectl --set --trace={io | all | def | "Class Level [Class Level ...]"}

4. To start the trace facility on all nodes, issue the following command:

mmtracectl --start

5. Re-create the problem.

6. Stop the trace generation as soon as the problem to be captured occurs.

mmtracectl --stop

44 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

7. Turn off trace generation.

mmtracectl --off

Configuring GPFS trace reports from cluster creationIn some rare scenarios, it may be required to configure GPFS tracing from cluster creation. This can beaccomplished using the cluster core profile and settings directed by IBM Support Center.

Kernel crash dumpsFor Red Hat® OpenShift® Container Platform 4.6 and below the Red Hat Enterprise Linux® CoreOS (RHCOS)based machines do not support configuring kdump or generating kernel crash dumps.

In some virtual machine installations it may be possible to generate a vmcore crash dump from thehypervisor.

In lieu of kernel dumps, CoreOS currently recommends using pstore, even if only small snippets of diagnosticdata can be collected.

Log Collection for IBM Spectrum® Scale Container Native StorageAccess Deployment

Kubernetes ResourcesTo gather logs and diagnostic data to assist IBM Support in debugging an issue, run the provided log collectionscript located at spectrumscale/scripts/collect_logs.sh.

Note: The collect_logs.sh requires the user to be logged in to an account on the Red Hat®OpenShift® Container Platform cluster that has sufficient privileges to query OpenShift andKubernetes resources. Collaboration with the customer may be needed to get necessarycredentials for oc login -u <username> to successfully query OpenShift and Kubernetesresources.

[root@example ~]# ./spectrumscale/scripts/collect_logs.sh Gathering pods information... done Gathering secrets information... done Gathering configmaps information... done Gathering services information... done Gathering serviceaccounts information... done Gathering deployments information... done Gathering statefulsets information... done Gathering replicasets information... done Gathering nodes information... done Gathering scc information... done Gathering daemonsets information... done Gathering description of scalecluster...done Gathering high level node view with labels...done Gathering OpenShift events...done Gathering GUI logs...done

This produces a zipped tar file in the directory that the user was in when collect_logs.sh wasexecuted.

[root@example ~]# ls -ltr total 3978292 ...

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 45

-rw-r--r-- 1 root root 282197 Dec 4 07:28 cnsalogcapture_2020-12-04-07.27.57.1607095677.tar.gz

Note: This script only gathers diagnostic data for Red Hat® OpenShift® Container Platform andKubernetes resources. To gather GPFS specific diagnostic data, proceed to the next bullet point.

GPFS SnapTo gather GPFS specific data, execute gpfs.snap in one of the core pods and extract the resulting tar filefrom the core pod. The GPFS snap will be stored in /tmp of the pod that the gpfs.snap was executed on.

Note: Executing gpfs.snap can take a while and oc rsh can time out after 1m of inactivity. Toincrease this timeout value, execute the following commands and set the timeout value to 30m.Once diagnostic data has been extracted, revert to the original config and restart HAProxy sothat the node is running with the original configuration.

cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak # Backup current HAProxy configuration sed -i -e "/timeout client/s/ [0-9].*/ 30m/" /etc/haproxy/haproxy.cfg # Change value for timeout client sed -i -e "/timeout server/s/ [0-9].*/ 30m/" /etc/haproxy/haproxy.cfg # Change value for timeout server diff /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak # Verify differences systemctl restart haproxy # Restart haproxy so that changes take effect

Get a core pod to execute GPFS.snap:

[root@example ~]# oc get pods -n ibm-spectrum-scale-ns NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-core-98bzj 1/1 Running 0 45m ibm-spectrum-scale-core-9x29g 1/1 Running 0 45m ibm-spectrum-scale-core-n8fg6 1/1 Running 0 45m ibm-spectrum-scale-gui-0 9/9 Running 0 45m ibm-spectrum-scale-operator-7dbdb67f9f-wkrtg 1/1 Running 0 52m ibm-spectrum-scale-pmcollector-0 2/2 Running 0 45m ibm-spectrum-scale-pmcollector-1 2/2 Running 0 44m

Remote shell into one of the core pods:

[root@example ~]# oc rsh -n ibm-spectrum-scale-ns ibm-spectrum-scale-core-98bzj sh-4.4# gpfs.snap gpfs.snap: started at Fri Dec 4 15:57:19 UTC 2020. Gathering common data... Gathering Linux specific data... Gathering extended network data... Gathering local callhome data... Gathering local perfmon data... Gathering local sysmon data... Gathering local cnss data... Gathering local gui data... Gathering trace reports and internal dumps... Gathering Transparent Cloud Tiering data at level BASIC... gpfs.snap: The Transparent Cloud Tiering snap software was not detected on node worker0.example.com Gathering QoS data at level FULL... gpfs.snap: No QoS configuration was found for this cluster. gpfs.snap: QoS configuration collection complete. Gathering cluster wide gui data...

Gathering cluster wide sysmon data... Gathering cluster wide cnss data...

46 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

Gathering cluster wide callhome data... Gathering cluster wide perfmon data... gpfs.snap: Spawning remote gpfs.snap calls. Master is worker0.example.com. This may take a while.

Copying file /var/mmfs/tmp/cleanupAllow/gpfs.snap.worker1_20201204155940.12093.out.tar.gz from worker1.example.com ... gpfs.snap.worker1_20201204155940.12093.out.tar.gz 100% 1041KB 134.1MB/s 00:00 Successfully copied file /var/mmfs/tmp/cleanupAllow/gpfs.snap.worker1_20201204155940.12093.out.tar.gz from worker1.example.com.

Copying file /var/mmfs/tmp/cleanupAllow/gpfs.snap.worker2_20201204155941.29630.out.tar.gz from worker2.example.com ... gpfs.snap.worker2_20201204155941.29630.out.tar.gz 100% 1479KB 124.3MB/s 00:00 Successfully copied file /var/mmfs/tmp/cleanupAllow/gpfs.snap.worker2_20201204155941.29630.out.tar.gz from worker2.example.com.

Gathering cluster wide protocol data... Gathering waiters from all nodes... Gathering mmfs.logs from all nodes. This may take a while... All data has been collected. Processing collected data. This may take a while... Packaging master node data... Writing * to file /tmp/gpfs.snapOut/32092/collect/gpfs.snap.worker0_master_20201204155719.32092.out.tar.gz Packaging all data. Writing . to file /tmp/gpfs.snapOut/32092/all.20201204155719.32092.tar gpfs.snap completed at Fri Dec 4 16:00:12 UTC 2020 ############################################################################### Send file /tmp/gpfs.snapOut/32092/all.20201204155719.32092.tar to IBM Service Examine previous messages to determine additional required data. ###############################################################################

Once completed, copy the output of the GPFS snap to the infrastructure node:

[root@example ~]# oc cp -n ibm-spectrum-scale-ns ibm-spectrum-scale-core-98bzj:/tmp/gpfs.snapOut/32092/all.20201204155719.32092.tar ./all.20201204155719.32092.tar tar: Removing leading `/' from member names

[root@example ~]# ls -ltr ... -rw-r--r-- 1 root root 3819520 Dec 4 08:05 all.20201204155719.32092.tar

Gather logs for local GUITo aide with debugging, the local GUI logs may be requested. Issue the following commands from the RedHat® OpenShift® Container Platform infrastructure node (or wherever oc command is installed).

Note: Running the spectrumscale/scripts/collect_logs.sh will include collection of theGUI logs.

mkdir /tmp/gui_diag oc cp -n ibm-spectrum-scale-ns -c liberty ibm-spectrum-scale-gui-

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 47

0:/var/log/cnlog/mgtsrv /tmp/gui_diag oc exec -n ibm-spectrum-scale-ns -c postgres ibm-spectrum-scale-gui-0 -- pg_dump -U postgres postgres > /tmp/gui_diag/pg_dump tar -czvf /tmp/gui_diag.tgz /tmp/gui_diag rm -rf /tmp/gui_diag

Note: After the second command, you may see tar: Removing leading '/' from member names returned, however, the command has completed successfully.

Gather data about the Red Hat® OpenShift® Container Platform clusterFor issues with the Red Hat® OpenShift® Container Platform cluster itself where a ticket will need to beopened with Red Hat Support, debugging information about the cluster will need to be provided for problemdetermination. Refer to Gathering data about your cluster in Red Hat documentation on how to gatherdebugging information from the Red Hat® OpenShift® Container Platform cluster.

Monitoring

The IBM Spectrum® Scale Container Native Storage Access (CNSA) cluster is monitored by sending the healthstatus and events between its pods.

System monitor and Kubernetes readiness probeViewing and analyzing the performance data with the IBM Spectrum Scale bridge for Grafana

System Monitor and Kubernetes Readiness Probe

General OverviewThe scale-monitor sidecar container has the following objectives:

Runs the containermon service which is monitoring the service (GUI, pmcollector) in the same podProvides a readiness probe API (HTTPS)Sends the health status and events back to the core pod on the same worker nodeCore pod is forwarding the events to GUI / mmhealthProvides a API for call home data collectionHas several debug tools installed and can be used for problem determination

Note: To learn more about Kubernetes Container Probes, click here

Readiness ProbeIf the monitoring status is HEALTHY, the probe returns success 200. When "unreadyOnFailed" option isenabled in containermon.conf (default=true), any FAILED state will cause the probe to return 500. When acritical event occurred which has the "container_unready=True" flag, the probe will return 501. Whensomething went wrong with the service, (e.g. no service found), it will return 502.

Viewing and analyzing the performance data with the IBMSpectrum Scale bridge for Grafana

48 IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1

The IBM Spectrum Scale Container Storage system has built-in performance monitoring tool that collectsmetrics from various GPFS components. These metrics can provide you a status overview and trends of thekey performance indicators. You can view and analyze the collected performance data with Grafana, a third-party visualization software. For more information, see Grafana.

For using Grafana, you need a running Grafana instance and the IBM Spectrum Scale performance monitoringbridge for Grafana deployed on your IBM Spectrum Scale Storage Access cluster. The IBM Spectrum Scalebridge for Grafana is an open source tool, available for free usage on IBM Spectrum Scale devices. Ittranslates the IBM Spectrum Scale metadata and performance data collected by the IBM Spectrum Scaleperformance monitoring tool (ZiMon) to query requests acceptable by the Grafana-integrated openTSDBplugin. For more information on how to set up the IBM Spectrum Scale performance monitoring bridge forGrafana in a Kubernetes or Red Hat OpenShift environment, see the Installation guides section athttps://github.com/IBM/ibm-spectrum-scale-bridge-for-grafana.

References

IBM Spectrum ScaleRed Hat OpenShift / Kubernetes

IBM Spectrum® Scale

Administration GuideFor Linux® on Z: Changing the kernel settingsmmchconfigmmnetverifyAccessing a remote GPFS file systemDefining the cluster topology for the installation toolkitNode quorumInstall CSI

Red Hat® OpenShift® / Kubernetes

How to find PV Usage in KubernetesUpdating CRIO.confKDUMP Support StatementRed Hat OpenShift Install DocumentationAdding day-1 kernel arguments during Red Hat® OpenShift® Container Platform install phaseConfiguring htpasswd Identity Provider

IBM Spectrum Scale Container Native Storage Access Guide 5.1.0.1 49

IBM®

SC28-3168-03