Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the...

14
Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February 2006 Executive Summary ...................................................................... 2 Introduction ................................................................................ 2 Audience ................................................................................. 3 HP Serviceguard for Linux and Red Hat GFS Co-existence ................ 3 Background.............................................................................. 3 Dual Cluster Co-existence ........................................................... 4 Keeping the Red Hat GFS clusters highly available ..................... 4 Lock Manager Access to Configuration Files .............................. 5 Avoiding Cluster Network Partition ........................................... 7 Storage paths......................................................................... 7 Heartbeat settings ................................................................... 8 Recommended Configuration ........................................................ 8 Package Control Script Modifications .......................................... 10 Differences in specifying and mounting Red Hat GFS partitions..... 10 Package script use of Volume groups......................................... 12 Conclusion ............................................................................... 13 Related Materials ...................................................................... 14

Transcript of Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the...

Page 1: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System

February 2006

Executive Summary...................................................................... 2

Introduction ................................................................................ 2 Audience ................................................................................. 3

HP Serviceguard for Linux and Red Hat GFS Co-existence ................ 3 Background.............................................................................. 3 Dual Cluster Co-existence........................................................... 4

Keeping the Red Hat GFS clusters highly available ..................... 4 Lock Manager Access to Configuration Files .............................. 5 Avoiding Cluster Network Partition ........................................... 7 Storage paths......................................................................... 7 Heartbeat settings................................................................... 8

Recommended Configuration........................................................ 8

Package Control Script Modifications .......................................... 10 Differences in specifying and mounting Red Hat GFS partitions..... 10 Package script use of Volume groups......................................... 12

Conclusion ............................................................................... 13

Related Materials ...................................................................... 14

Page 2: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Executive Summary

Organizations today are deploying critical applications on Linux clusters that require high availability, ease of cluster management, and access to large pools of stored information. HP Serviceguard for Linux provides high availability for applications running on a cluster of servers. Red Hat’s Global File System (GFS) enables a cluster of Linux servers to simultaneously read and write to a single shared file system on the SAN. Together, HP Serviceguard for Linux and Red Hat GFS provide a best-of-breed, complete availability solution, from applications to data, for the most mission-critical datacenter workloads.

Organizations that deploy both HP Serviceguard for Linux and Red Hat GFS on the same set of servers need to be confident that these two clusters will operate concurrently in both normal operation and during failures. HP and Red Hat Engineering have tested and validated the ability of these two clusters to co-exist on the same set of servers. This paper recommends configurations, based on the results of those tests, which will achieve a stable cluster of HP Serviceguard for Linux and Red Hat GFS.

February 2006 – Updated to incorporate support for coexistence with RedHat GFS 6.1.

Introduction

Organizations are gaining confidence in the robust and maturing Linux environment such that they are targeting inexpensive Linux servers to deploy complex and critical tasks on Linux. These tasks include data management, large scale file-serving, and business processing applications. These functions require clustering multiple Linux servers together, all of which need access to the same storage pool. They also require that the applications deployed on the cluster be highly available and not vulnerable to any single point of failure.

HP Serviceguard for Linux clusters a number of Linux servers and makes application services available despite hardware or software failures or the planned downtime required for maintenance or system upgrades. HP Serviceguard for Linux can respond to single or, sometimes, multiple failures within a cluster. When it detects the failure of a node, it fails over applications running on that node to other nodes in the cluster. The applications are restarted on the new nodes by HP Serviceguard for Linux.

Red Hat’s GFS is a cluster file system that enables multiple Red Hat Enterprise Linux servers to simultaneously read and write to a single shared file system on the SAN. This increases performance and reduces management complexity by eliminating the need for multiple copies of data.

When HP Serviceguard for Linux and Red Hat GFS are both deployed on the same set of servers, they act together when a failure occurs to quickly bring the application up on the new node. Without GFS, HP Serviceguard for Linux moves the application package over to the new node and mounts the file system that the application needs. When deployed together, Red Hat GFS had already made this file system visible to the new server and the file system can be used immediately after GFS has gone through its recovery with no additional time lost for file system checks. So downtime is only the time required by:

2

Page 3: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

• Serviceguard and GFS to detect the failure • GFS to recover the file system • Serviceguard to shutdown/startup the application on the other node

The concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters on the same group of servers must ensure that the two clusters don’t interfere with each other, especially with regard to ensuring data integrity during failures. Both clusters can co-exist with no interference on the same set of servers by following some simple configuration rules. HP and Red Hat engineering have tested these two clusters in a variety of failure scenarios. This paper is a guide for using these two products together, including Serviceguard script changes that are necessary to support gfs volumes as a file system type.

Audience

This document is for users of HP Serviceguard on Linux who want to take advantage of the features in Red Hat GFS and for users of Red Hat GFS cluster who need to make the applications on this cluster highly available. It documents the guidelines to implement a stable dual-cluster (two clusters sharing the same hardware), with HP Serviceguard for Linux and Red Hat GFS.

It is assumed that the reader has a general understanding of HP Serviceguard for Linux and Red Hat GFS features. Please see www.hp.com/go/sglx and www.redhat.com/software/rha/gfs for detailed information on each solution.

HP Serviceguard for Linux and Red Hat GFS Co-existence

Background

Red Hat GFS can be deployed in many configurations, with single or redundant (multiple) Red Hat GFS lock managers as well as distributed lock managers. A lock manager controls access across nodes to shared files.

Red Hat GFS version 6.0 implements the GULM (Grand Unified Lock Manager) locking mechanism, where single or redundant lock manager nodes can be configured in a cluster. In addition to GULM, Red Hat GFS version 6.11 implements a DLM (Distributed Lock Manager) based locking mechanism where the lock management responsibility is distributed across all nodes of the cluster. Redundant lock manager nodes of GULM can be configured as external to the Red Hat GFS nodes or they may remain embedded (within the cluster).

Both HP Serviceguard for Linux and Red Hat GFS have cluster management functionality, including cluster membership. Both monitor the status of all nodes in their clusters by listening to their heartbeat signals through the network. Independently running two cluster software products on the same set of nodes and hardware resources means there must be no interference between the two clusters in all possible failure scenarios and configurations.

1 The user should be aware of two issues that may affect GFS 6.1 operation:

An issue with Red Hat GFS 6.1 fencing on RHEL4U1 and RHEL4U2 can be set right by replacing the default line “action=/sbin/shutdown -h now” in the file /etc/acpi/events/sample.conf with “action=/sbin/poweroff -f”

Another defect, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175805 can cause a node in the Red Hat GFS 6.1 cluster to hang while it is booting up. This is expected to be fixed in RHEL4 U3.

3

Page 4: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

One significant difference in the clusters is how they deal with taking a node out of the cluster on failures. GFS “fences” nodes. A node that is “fenced” is prevented from writing to the file system. GFS supports various mechanisms, but the only one supported in conjunction with Serviceguard is Integrated Lights Out (ILO) fencing. Using that mechanism, a message is sent to the ILO of a server to restart that server. Use of ILO is less costly and easier to manage than most other methods. Serviceguard “reforms” the cluster without that node and the node typically resets itself.

The two clusters have negligible performance impact on each other. HP Serviceguard for Linux has no impact on application performance.

Dual Cluster Co-existence

The function of a high availability cluster is to keep application services available, even though there are one or more failures in the cluster. In order to provide that capability the cluster software needs to be able to monitor the state of applications and the state of the various nodes in a cluster. In the case of HP Serviceguard for Linux and Red Hat GFS co-existence, only Serviceguard monitors the health of the applications. Both HP Serviceguard for Linux and Red Hat GFS monitor the same nodes using their respective heartbeat mechanisms.

Both HP Serviceguard for Linux and Red Hat GFS recognize node failures by the loss of heartbeats and act to resolve it by rebooting the affected node. Most failures, where there is a loss of heartbeat, are due to a node failure. This could be due to a hardware problem or an OS crash. In these instances, both clusters readily recognize (and implicitly agree on) which node has failed, and adjust their membership accordingly.

There are other types of failures that can affect either one cluster or both. These can be avoided by following a set of simple configuration rules described later in the “Recommended Configuration” section.

Keeping the Red Hat GFS clusters highly available

Red Hat supports a variety of Red Hat GFS configurations to cover a broad range of customer needs. Some of these configurations could have a single point of failure, where the failure of a single hardware component or piece of software could cause the entire cluster to fail. The simplest example of this is the Single Lock Manager (SLM) in the GULM configuration. A configuration with an SLM would be unusable if the node with the SLM were to fail. Because clusters using Serviceguard need to be configured with no single point of failure, clusters combining HP Serviceguard for Linux and Red Hat GFS must use a Redundant Lock Manager (RLM) configuration.

An example RLM embedded configuration is depicted in Figure 1 (below). The three nodes are embedded lock managers, one of which is the master at any given time. All nodes are members of both Serviceguard and GFS clusters. For simplicity, Figure 1 shows single connections for the network and storage. In an actual deployment these would be redundant connections.

4

Page 5: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Figure 1: A 5 node HP Serviceguard for Linux and Red Hat GFS cluster

A combined HP Serviceguard for Linux and Red Hat GFS cluster must have at least two nodes in the DLM configuration. The DLM configuration supports a two node cluster, unlike GULM. Figure 2, below, depicts an example of a two node DLM cluster.

Lock Manager Access to Configuration Files

The Cluster Configuration System (CCS) archives represent the Red Hat GFS cluster configuration. In Red Hat GFS 6.0, these contain the list of all lock manager nodes, Red Hat GFS nodes (ones that only mount GFS partitions), and their fencing mechanisms. The Red Hat GFS 6.0 lock manager nodes need access to these archives in order to fence a node. The archives can be located on the local disks or on the SAN. Red Hat GFS does not immediately detect the failure of the Fibre Channel (FC) links on any node. The link failure is detected when an attempt to access the storage on the SAN fails. When the link failure is detected, the lock manager fences that node. This may have a different impact depending on how Red Hat GFS is configured when the CCS archives are stored on the SAN, since the CCS archives will not be accessible after an FC link failure.

If the FC link failure occurred on the master lock manager node in a cluster configured with embedded Redundant Lock Managers, that node loses its ability to fence any Red Hat GFS node since it does not have access to the archives. As a result, some nodes that should be fenced will still have access to the SAN, possibly creating a single point of failure in the cluster. To avoid this, Serviceguard, in this configuration, only supports redundant paths to high availability storage. This is the same requirement for most Serviceguard clusters. With two paths to storage on the SAN, at least two failures would have to occur before a node

5

Page 6: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

loses its ability to fence any Red Hat GFS node. Keeping the archives local to the lock manager nodes avoids this problem entirely and is good practice for a high availability cluster.

Figure 2: A 2 node HP Serviceguard for Linux and Red Hat GFS 6.1 DLM cluster

If the FC link failure occurred in a cluster configured with external Redundant Lock Managers (RLM) and the CCS archives are on the SAN, then access to the archives is lost and the lock managers can no longer fence other nodes. To avoid this, the CCS archives should be on the local disks. This avoids the loss of access to CCS when FC connectivity is lost. When the Red Hat GFS cluster is configured with the external RLM and the CCS archives are on local disks, there is no need to have a connection to the SAN for the RLM nodes and the overall (external RLM) configuration is less expensive. If there is some need to keep the archives on the SAN in this configuration, then the RLM nodes can be part of another Serviceguard cluster. The nodes in this Serviceguard cluster should be configured with multiple paths to storage and also have a Serviceguard package that uses the Serviceguard disk monitor to check that the disks used by the CCS archives are accessible. Then, configure the package to reboot the node if access to storage is lost.

Red Hat GFS 6.1, stores the cluster configuration information on an ASCII file, /etc/cluster/cluster.conf, locally on all nodes in the cluster instead of on CCS archive. It handles FC link failure of a DLM cluster node by recovering the file system locks held by that node from another cluster node, without actually rebooting the failed node. This will result in

6

Page 7: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

access to GFS mount points from the failed cluster node reporting an IO error. In order to have access to the GFS file system when the FC link is set right, the node must be rebooted. Unmounting the GFS mount point and then mounting it again does not restore access to the file system. The same failure scenario, in the case of a GULM configuration, will result in the node being fenced.

GFS 6.1 does not reset a node when there is loss of access to storage, so another mechanism must be used to force a package failover when Serviceguard is running as well. In a recommended Serviceguard configuration2, there is a dual path to storage so two failures are necessary to lose access to storage. For customers that want their systems to be able to survive that dual failure, the disk monitor can be used in all the packages on a dual-cluster. This ensures that when there is a node with failed FC link(s), the packages on this node will be moved to other adoptive nodes. If Serviceguard attempts to move a package back to such a failed node, it will fail again and move on to other adoptive nodes. The failed node must be rebooted after the FC link(s) are restored, in order to restore GFS mount points.

Avoiding Cluster Network Partition

A cluster network partition occurs when there are network failures that prevent heartbeat messages from going from one subset of nodes in a cluster to the other nodes in a cluster. If the subset is just one node, most clusters, including Serviceguard and GFS, just reboot that node. Partitions where there are two or more nodes in each subset are problematic when multiple clusters are sharing the same servers, because there could be differences in the resulting memberships. In many clusters, these partitions would have to be of equal size to cause a problem. A GFS cluster using RLM is even susceptible to problems with unequal sized partitions because of the operation of the RLM. Such differences in membership could result in the entire cluster going down. Any configuration where this type of partition can happen with a single failure cannot be supported. So the required configuration has multiple heartbeat paths for Serviceguard where at least one path is independent of the GFS heartbeat path(s). Also, this partition problem cannot occur in a DLM configuration where all nodes can handle locking. To avoid these problems and due to other differences between DLM and RLM, it is recommended that DLM be used when possible. Multiple heartbeat paths for Serviceguard are also very essential in a two node DLM dual-cluster to prevent instability due to differences in the default heartbeat rates and timeout values of the two clusters.

Storage paths

Loss of the storage system makes data unavailable and causes most services to failover to an alternate node. Therefore, all storage systems must have redundant controllers and power supplies. Multiple paths to shared storage are required so that the loss of storage path connectivity does not require a failover between nodes but instead causes a failover to the redundant path from the same node. If a node has only a single path to shared storage, then any failure in that path may cause all packages relying on that shared storage to failover to another node in the cluster. Also, the redundant storage paths also help ensure access to the CCS archives if they are on the SAN, in case of GFS 6.0.

2 See the Serviceguard for Linux “Configuration Guide” available at http://www.hp.com/info/sglx

7

Page 8: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Heartbeat settings

It is recommended that the default values for heartbeat and the node timeout intervals for both Serviceguard and Red Hat GFS be used. Since the default values are close to one another, the recovery time will not be affected. Changes to these values, if required, will not affect the overall reliability of the cluster.

Recommended Configuration The general recommended network configuration for Serviceguard calls for a dedicated heartbeat LAN with a bonded pair of NICs (using Linux Channel Bonding) for both Serviceguard and Red Hat GFS heartbeats. Red Hat GFS does not support multiple heartbeat networks although it can be configured to use channel-bonding for additional HA protection.

Figure 3 is an example of the network configuration for a four node dual cluster. It has one bonded pair of network paths for both Serviceguard and Red Hat GFS heartbeats and another network for communication external to the cluster (bonded NICs are shown for the external connections). This is the minimum supported network configuration. Dual port NICs can be used for bonding, but for availability, bonded pairs should not use the connections from a single dual port NIC. More NICs can be added at the users’ option. If possible, these should be made highly available via bonding, and should also be configured as a Serviceguard heartbeat.

The focus of the recommendations has been to eliminate the possibility of both clusters going down by increasing redundancy of FC links and heartbeat networks.

There are certain latent failures like an FC link failure that can cause the entire cluster to go down later when certain types of additional failures occur. This is especially true in the case of RLM external configuration of Red Hat GFS 6.0 cluster with CCS archives located on the SAN. To avoid this situation, it is recommended to have the CCS files local or to have a separate Serviceguard cluster on all external lock manager nodes of Red Hat GFS with Serviceguard packages configured to monitor the disks used for the Red Hat GFS configuration files.

Latent failures like FC link failures in Red Hat GFS 6.1 can cause the GFS mount points on shared storage to be unavailable even after the FC link is restored. It requires that the failed node be rebooted after the FC link(s) are restored. Hence it is recommended to configure Serviceguard for Linux disk monitoring service on all packages of a dual-cluster. More details on how to configure a disk monitoring service is available in the section “Creating a Disk Monitor Configuration” of the document “Managing HP Serviceguard for Linux, Fifth Edition”.

When co-existing on a two node Red Hat GFS 6.1 cluster in DLM configuration, a second heartbeat LAN exclusively for HP Serviceguard for Linux must be configured to avoid the entire cluster going down during network partition on one LAN.

8

Page 9: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Some extra care needs to be taken when administering clusters. For example, on a two node cluster, if an administrator halts only Serviceguard on a node and does not exclude that node from the Red Hat GFS cluster, then a subsequent network partition may result in HP Serviceguard for Linux and the applications it protects become unavailable. This would happen in case where the only remaining Serviceguard node is chosen to be fenced by Red Hat GFS as a result of the network partition. The converse is true as well.

Figure 3: Four node example of a recommended configuration for HP Serviceguard for Linux and Red Hat GFS Co-existence

The solution is simple. Whenever an administrator takes an action that can affect the membership of one cluster (for example, “cmhaltnode” for Serviceguard) then the similar command for the other cluster should be performed on that same node (“service gfs stop” for Red Hat GFS). In this case, halt Serviceguard on the node first and then halt GFS. When manually starting the clusters, start GFS before Serviceguard. In the case of an embedded RLM with 3 RLM nodes, if one RLM node fails during the period that another has been taken out for maintenance then the entire GFS cluster will fail.

9

Page 10: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Package Control Script Modifications

This section describes why changes are necessary to the package control scripts when Serviceguard is combined with GFS. The detailed changes are shown as well. Users do not have to make these changes. A complete package control script, with all of these changes, is available at http://www.hp.com/go/softwaredepot/ha.

Differences in specifying and mounting Red Hat GFS partitions In a Serviceguard cluster without a clustered file system, the package control script has the commands to mount and unmount logical volumes. Following are some sample lines from the package control scripts:

Example of the default package control script using ext2 file system: # For example, if this package uses the following: # logical volume: /dev/vg01/lvol1 # mount point: /pkg1a # file system type: ext2 # mount options: read/write # # Then the following would be entered: # LV[0]=/dev/vg01/lvol1; FS[0]=/pkg1a; FS_TYPE[0]="ext2"; # FS_MOUNT_OPT[0]="-o rw"; FS_UMOUNT_OPT[0]=""; FS_FSCK_OPT[0]="";

A modified package control script for use with Red Hat GFS is available for download in the “Serviceguard for Linux” section of http://docs.hp.com/en/linux.html. Customers should always use the HP supplied version of the control script rather than make the changes themselves. Change the filename to one appropriate for your environment. This example shows the modifications (in blue) that are needed for a package to use Red Hat GFS:

# For example, if this package uses the following: # GFS6.0 uses pool for logical volume management whereas # GFS6.1 uses LVM2. Their device name formats differ and # an example for each is shown below. Please use the # appropriate one. # Pool : /dev/pool/pool1 (GFS 6.0) OR # LVM2 : /dev/mapper/vgX-lvY (GFS6.1) # mount point : /pkg1a # file system type : gfs # mount options : read/write # Then the following would be entered: # LV[0]=/dev/pool/pool1; (GFS6.0) OR # LV[0]=/dev/mapper/vgX-lvY; (GFS6.1) # FS[0]=/pkg1a; FS_TYPE[0]="gfs"; # FS_MOUNT_OPT[0]="-o rw"; LV[0]=""; FS[0]=""; FS_TYPE[0]=""; FS_MOUNT_OPT[0]=""

LV designators are used to specify the GFS pools in the package control script in case of Red Hat GFS 6.0 and LVM2 volumes in case of Red Hat GFS 6.1. LVM2 device names are of the format /dev/mapper/vgX-lvY where vgX is the volume group and lvY is the logical volume. Please fill in the values for LV[0], FS[0], etc and any additional volumes as is appropriate to your configuration. Note that the FS_UMOUNT_OPT and FS_FSCK_OPT are not used for clustered file systems and so they must not be set.

10

Page 11: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

For those familiar with HP Serviceguard for Linux, it may be useful to understand what changes were made to the script. The function check_and_mount was changed so that it does not perform fsck operations if the FS_TYPE is “gfs”. The modified function, with added lines highlighted in red (lines 4, 5, and 16), is shown below. Note that lines in blue have not changed. 1 function check_and_mount 2 { 3 typeset -i j=0 4 if [ ${FS_TYPE[0]} != "gfs" ] 5 Then # Verify that there is at least one file system to check and what

# type. 6 if (( ${#LV[*]} > 0 )) 7 Then 8 ... 9 ... 10 Fi # Check exit value (set if any proceeding fsck calls failed) 11 if (( $exit_value == 1 )) 12 Then 13 echo "###### Node \"$(hostname)\": Package start FAILED at $(date)

######" 14 exit 1 15 Fi 16 Fi 17 typeset –i F=0 18 typeset –i j 19 typeset –I L=${#LV[*]} 20 while (( F < L )) 21 Do ... 22 rm –f mount_pids$$ 23 Done 24 }

A major advantage of running the Serviceguard and Red Hat GFS clusters together is having a package restart as quickly as possible by avoiding the fsck (file system check) and mounting of the file system. So an administrator of a dual Serviceguard and Red Hat GFS cluster needs to take care that a GFS mount point needed by a package is already mounted. HP recommends that all Red Hat GFS partitions are mounted on a Red Hat GFS node during system initialization as suggested by the Red Hat GFS Administration manual. This is done by adding appropriate entries in the /etc/fstab file. The device name entries needed in the /etc/fstab file differ between Red Hat GFS 6.0 and 6.1 due to the difference in device name formats between pool and LVM2 volume managers.

To handle the case of a file system not already being mounted on a node during package startup, if a Serviceguard package uses a Red Hat GFS partition, the script checks for the Red Hat GFS mount points and mounts them if required. This is achieved by specifying the Red Hat GFS partitions required by the application in its package control script.

11

Page 12: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Package script use of Volume groups

Volume groups (LVM) and MD devices are not supported in Red Hat GFS. When using Red Hat GFS the LV designators are used for the Red Hat GFS pools in the package control script instead of being used for logical volumes. This simplifies the package control script modifications. Note that MD[x] and VG[x] must not be set and also that a Red Hat GFS file system should not be unmounted.

In the package control script (the file with the .sh extension) the function umount_fs was modified to do nothing when the file system is a Red Hat GFS (the FS_TYPE is “gfs”). The following shows the beginning of the unmodified function and then the modified version with the added lines highlighted in red. These are “snippets” of the code, and not the entire function.

From:

function umount_fs { typeset -i j=0 typeset -i L typeset -i UM_CNT typeset -i ret UM_CNT=${FS_UMOUNT_COUNT:-1} if (( ${UM_CNT} < 1 )) then UM_CNT=1 fi L=${#FS[*] . . . }

To:

(lines 7, 8, 9, and 10 added)

1 function umount_fs 2 { 3 typeset -i j=0 4 typeset -i L 5 typeset -i UM_CNT 6 typeset -i ret 7 if[ ${FS_TYPE[0]} == ”gfs”] 8 then 9 return 0 10 fi 11 UM_CNT=${FS_UMOUNT_COUNT:-1} 12 if (( ${UM_CNT} < 1 )) 13 then 14 UM_CNT=1 15 fi 16 L=${#FS[*]

12

Page 13: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

. . . 17 }

Note: The following parameters are incompatible and should not be used with a Red Hat GFS file system:

RAIDTAB, RAIDSTART, RAIDSTOP, VGCHANGE, MD[x], VG[x].

Even though Red Hat GFS 6.1 uses LVM2 as the volume manager, HP recommends that customers not set the VGCHANGE variable. For the GFS mount points to remain mounted always on all nodes in the cluster, the LVM2 volume groups must not be deactivated on any of the nodes in the cluster. It is not recommended to have a volume group for any use other than by GFS, in the same package control script.

The variable FS_MOUNT_RETRY_COUNT should be set to “0” for a Red Hat GFS file system.

Data replication is not yet supported in a Serviceguard for Linux environment that is concurrently deployed with Red Hat GFS, so the variable DATA_REP should be set to “none”.

Conclusion

HP Serviceguard for Linux and Red Hat GFS clusters can co-exist on the same set of servers adding value to each other. Serviceguard can provide HA encapsulation to multiple instances of the same application managing single instance of data from multiple nodes in the cluster simultaneously. Stable co-existence of the two clusters can be achieved with a proper choice of redundant hardware and software components and configurations as mentioned below.

Use two FC links on all cluster nodes for external shared storage with two FC HBA’s, and multipathing configured.

Use redundant heartbeat networks for Serviceguard and Red Hat GFS (via channel-bonding). Use other networks as Serviceguard heartbeat paths whenever

possible. Place the CCS archives on the local storage on all lock manager nodes of

Red Hat GFS 6.0. If an embedded RLM must be configured with the CCS archives on the

SAN, create a separate SG cluster with a package monitoring the disks hosting the CCS archives.

Configure and run a disk monitoring service on all the packages in the dual-cluster. This ensures that the packages are moved to other adoptive nodes when access to shared storage is lost as a result of FC link failure.

Use default values for the heartbeat settings of Serviceguard and Red Hat GFS clusters. On a two node DLM dual-cluster configure a second heartbeat LAN exclusively for Serviceguard. This ensures dual-cluster stability even during network partition on one LAN.

13

Page 14: Clustering Linux Servers with the Concurrent Deployment of ...Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System February

Create Serviceguard package control scripts to allow the file systems to be mounted at all times.

Related Materials HP Serviceguard for Linux product documentation at http://docs.hp.com

“Red Hat GFS 6.0 Administrator’s Guide” at http://www.redhat.com/docs/manuals/csgfs

“Red Hat GFS 6.1 Administrator’s Guide” at http://www.redhat.com/docs/manuals/csgfs

“Red Hat Cluster Suite Configuration Guide” at http://www.redhat.com/docs/manuals/csgfs

HP Serviceguard for Linux certification matrix showing servers, storage, and software versions supported: http://www.hp.com/info/sglx

14