Aix Clusters

Simple Overview AIX HACMP Cluster implementations.

Version: 0.9Date: 05/01/2011By: Albert van der SelFor who:

This note describes HACMP HA clusters, and will not directly focus on the newer PowerHA implementation.

Section 1: HACMP - High Availability Cluster Multi-Processing

1.1 Conceptual view of a simple 2 Node HACMP configuration:

Fig. 1 public network

RESOURCE:node A 172.18.17.6 service ip address

172.18.17.10for an application,owned by node A,and could betaken over by Node B

private network

10.10.10.6KA messages=heartbeats on:(1): private IP networkand

(2): non IP heartbeat- Might be through rs232 like.connection (old method).- Or this type ofheartbeat is "disk"based, using FC adapters toa disk in a concurrent VG.

for example fiber

hdisk1 resource group RG1hdisk2 application_01hdisk3 volume groupsetc.. filesystems

Anyone who likes a very simplified high-level overview on the main HACMP concepts and components.

There are several cluster implementations possible on AIX, of which the most well known

- GPFS cluster systems, mainly used for High Availability and parallel access of multiple Nodes on shared filesystem objects- HACMP systems, mainly used for High Availability through Failover of services (applications) to other Node(s)

/etc/hostslists all ip adresses

rsct hacmp daemons:rscdclstrmgrclcomdgrpglsmdhatsd hagsd

- Resource group descriptions- start/stop scripts for the application- Resource Group takeover- Service IP address takeover

logfiles

SAN A replication<------------>

HACMP Main Function:

Resources (like volumes, filesystems, application service IP etc..) are grouped together in Resource Groups (RGs),which HACMP keeps highly available as a single entity.When a Node who owns a Resource Group fails, the Resource Group "fails over" to another node.Since an "application" is determined by it's filesystems, and it's IP parameters, the RG is a sort of "wrapper" or "container", that can fail over from one Node to another node. This is "high availability" of the application.

If a Node providing the appliction, goes "down", the RG will "move" to the other Node. Stop and Start scripts of the application will ensure that the application stops at the first Node, and start (a bit later) on the other Node.So, "wrapping" stop and start shell scripts, on both nodes, are an element too in the HA environment.

Some HACMP network keypoints:

The application associated IP (service label IP) can "fail over" either by:- IP address takeover (IPAT) via IP aliases- IPAT via IP Replacement.

IP address takeover is a mechanism for recovering a service IP label by moving it to another physical network adapter on another node.

An IP alias is an IP address that is configured on a interface in addition to the base IP address. An IP alias is an AIX function that is supported by HACMP. Multiple aliases are possible, on different subnets.A "Boot" IP address is the (regular) boot or base address configured for an interface in AIX.The "service lable IP" is the address on which clients can connect to the service (application).The service IP address will be added on top of the base IP address (IPAT via aliasing), or will replacethe base (boot) IP address of the communication interface, depending on the IP address take over (IPAT) mechanism.

Some HACMP heartbeat keypoints:

Always a private IP network based heartbeat will be implemented.Secondly, a Non IP based heartbeat must be present. This could be realized using rs232 on both nodes (old method),or using a "disk based" heartbeat (using FC's) to a specially configured disk in a conncurent VG.In this way, HACMP maintains information about the status of the cluster nodes and their respective network interfaces.So, if only the IP network temporarily fails, but the rs232 or disk heartbeats still work, fail over will not take place.Indeed, in that situation the nodes are healthy, but the IP network only had a temporarily "hickup".So, it's a sort of "saveguard" to avoid "split brain" or a partitioned (or ill-functioning), cluster.

1.2 A view on the latest versions and AIX matrix:

AIX 4.3.3 AIX 5.1 AIX 5.1 64b AIX 5.2 AIX 5.3 AIX 6.1

HACMP 4.5 No Yes No Yes No No

HACMP/ES 4.5 No Yes Yes Yes No No

HACMP/ES 5.1 No Yes Yes Yes Yes No

HACMP/ES 5.2 No Yes Yes Yes Yes No

HACMP/ES 5.3 No No No Yes Yes

HACMP/ES 5.4.0 No No No TL8+ TL4+ No

No No No TL8+ TL4+

No No No No TL9+ TL2,SP1+

No No No No TL9+ TL2,SP1+

Ofcourse, a Resource Group will not magically fly from one Node to the other Node, so it's probably better to say thatthe other node "aquires" the Resource Group (open the Volume Group, mount filesystems, aquire the Service IP address etc..).

Yes

HACMP/ES 5.4.1 YesPowerHA 5.5

PowerHA 6.1

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10619

http://www-01.ibm.com/common/ssi/index.wss?DocURL=http://www-01.ibm.com/common/ssi/rep_sm/2/897/ENUS5765-F62/index.html&InfoType=DD&InfoSubType=SM&InfoDesc=HW+%26+SW+Desc+%28Sales+Manual%2C+RPQ%29&panelurl=index.wss%3Fbuttonpressed%3DDET001PT116%26page%3D1000%26paneltext1%3DDET001PEF012%26user%2Btype%3DEXT&paneltext=HW%20&%20SW%20desc%20%28Sales%20manual,%20RPQ%29


http://www-01.ibm.com/common/ssi/index.wss?DocURL=http://www-01.ibm.com/common/ssi/rep_ca/3/897/ENUS208-313/index.html&InfoType=AN&InfoSubType=CA&InfoDesc=Announcement+Letters&panelurl=index.wss%3F&paneltext=Announcement%20letter%20search#h2-availx

http://www-01.ibm.com/common/ssi/ShowDoc.jsp?docURL=/common/ssi/rep_ca/8/897/ENUS209-288/index.html&breadCrum=DET001PT022&url=buttonpressed=DET002PT005&specific_index=DET001PEF502&DET015PGL002=DET001PEF011&submit.x=7&submit.y=8&lang=en_US

No No No No No TL6+

The ES stands for "Enhanced Scalability". As of 5.1, HACMP/ES is solely based on RSCT, short for Reliable Scalable Cluster Technology. As from version 5.5, HACMP is renamed to PowerHA.

1.3 Main HACMP daemons:

Notice that if you list the daemons in the AIX System Resource Controller (SRC), using the "lssrc" command, you will see ES appended to their names. The actual executables do not have the ES appended.

clstrmgr This daemon monitors the status of the nodes and their interfaces, and invokes the appropriate scripts Cluster Manager daemon in response to node or network events. It also centralizes the storage of and publishes updated information

about HACMP-defined resource groups. The Cluster Manager on each node coordinates information gathered from the HACMP global ODM, and other Cluster Managers in the cluster to maintain updated information about the content, location, and status of all HACMP resource groups. This information is updated and synchronized among all nodes whenever an event occurs that affects resource group configuration, status, or location.All cluster nodes must run the clstrmgr daemon.From HACMP V5.3 the clstrmgr daemon is started via init process and should always be running.

clcomd As of 5.2, the dependency on rhosts and r commands has been removed, and are done by clcomd.Cluster communication daemon Starting with Version 5.2, clcomdES must be running before any cluster services can be started.

The clcomd daemon is started automatically at boot time by the init process. It provides secure remote command execution and HACMP ODM configuration file updates.

clsmuxpd This daemon maintains status information about cluster objects. This daemon works in conjunction with Cluster SMUX Peer daemon the Simple Network Management Protocol (snmpd) daemon. All cluster nodes must run the clsmuxpd daemon.(only for versions lower than Note: The clsmuxpd daemon cannot be started unless the snmpd daemon is running.HACMP 5.3) It no longer exists as of HACMP 5.3

RSCT Reliable Scalable Cluster Technology. Since HACMP 5.1, HACMP relies on RSCT. So, in modern HACMP, RSCT isReliable Scalable Cluster a neccessary component or subsystem. For example, HACMP uses the heartbeat facility of RSCT.Technology RSCT is a standard component in AIX5L.The "glue" in AIX clustering.(as of AIX 5.1). Reliable Scalable Cluster Technology, or RSCT, is a set of software components that together provide a

comprehensive clustering environment for AIX and Linux. RSCT is the infrastructure used by a variety of IBMr products to provide clusters with improved system availability, scalability, and ease of use. RSCT includes these components:

-Resource Monitoring and Control (RMC)-Resource managers (RM)-Cluster Security Services (CtSec)-Group Services-Topology Services

The following daemons are related to the RSCT framework.topsvcsd The RSCT Toplogy Services subsystem monitors the status of network interfaces.Cluster Topology Services All cluster nodes must run the topsvcsd daemon.Subsystem

hagsd This RSCT subsystem provides reliable communication and protocols required for cluster operation.RSCT group services subsystem

hatsdRSCT Topology Services subsystem

grpglsmd This RSCT daemon operates as a Group Services client; its function is to make switch adapter membership global across all cluster nodes. All cluster nodes must run the grpglsmd daemon

PowerHA 7.1

It uses the daemons hatsd and hats_nim.

http://www-01.ibm.com/common/ssi/ShowDoc.jsp?docURL=/common/ssi/rep_ca/8/897/ENUS209-288/index.html&breadCrum=DET001PT022&url=buttonpressed=DET002PT005&specific_index=DET001PEF502&DET015PGL002=DET001PEF011&submit.x=7&submit.y=8&lang=en_US

This RSCT subsystem acts as a resource monitor for the event management subsystem and provides information about the operating system characteristics and utilization.The RMC subsystem must be running on each node in the cluster. By default the rmcd daemon is setup to start from inittab when it is installed. The rc.cluster script ensures the RMC subsystem is running.

1.4 Main HACMP logs:

Main HACMP to v. 5.5 logs, and PowerHA logs/tmp/hacmp.out Contains time-stamped, formatted messages generated by the HACMP for AIX scripts. or In verbose mode, this log file contains a line-by-line record of each command executed /var/hacmp/log/hacmp.out in the scripts, including the values of the arguments passed to the commands. By default,

the HACMP for AIX software writes verbose information to this log file; however, you can This is your main logfile. change this default. Verbose mode is recommended. /usr/es/adm/cluster.log Contains time-stamped, formatted messages generated by HACMP for AIX scripts and daemons. or In this log file, there is one line written for the start of each event, and one line written /var/hacmp/adm/cluster.log for the completion.

The regular AIX system error log Contains time-stamped, formatted messages from all AIX subsystems, including the HACMPuse the "errpt" command to view this log. for AIX scripts and daemons.

/usr/sbin/cluster/history/cluster.mmdd Contains time-stamped, formatted messages generated by the HACMP for AIX scripts. or The system creates a new cluster history log file every day that has a cluster event /var/hacmp/adm/history/cluster.mmddyyyy occurring. It identifies each day's file by the file name extension, where mm indicates

the month and dd indicates the day. /tmp/clstrmgr.debug Contains time-stamped, formatted messages generated by HACMP for AIX clstrmgr activity.

Information in this file is used by IBM Support personnel when the clstrmgr is in debug mode. Note that this file is overwritten every time cluster services are started; so, you should be careful to make a copy of it before restarting cluster services on a failed node.

/tmp/cspoc.log Contains time-stamped, formatted messages generated by HACMP for AIX C-SPOC commands. Because the C-SPOC utility lets you start or stop the cluster from a single cluster node, the /tmp/cspoc.log is stored on the node that initiates a C-SPOC command.

/var/hacmp/clverify/clverify.log Contains messages when the cluster verification has run.

/var/hacmp/log/clutils.log: Every day a health check is performed at 00:00h, and logged on clutils.log

/usr/es/sbin/cluster/utilities/clsnapshots Not a logfile. Only for later versions.The HACM cluster snapshot facility (/usr/es/sbin/cluster/utilities/clsnapshots) allows you to savein a file, a record all the data that defines a particular cluster configuration. It also allows you to create your own custom snapshot methods,to save additional information important to your configuration.You can use this snapshot for troubleshooting cluster problems.

1.5 Checking HACMP processes:

(1): View the logs:

First and foremost, you can check your processes by viewing the logs as described in section 1.4.

(2): Viewing running processes:

Checking the cluster processes:

(1): using smittysmitty hacmp > System Management (C-SPOC) > Manage HACMP Services > Show Cluster Services# smitty hacmp

HACMP for AIX

Move cursor to desired item and press Enter.

Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools

(2): using SRC list commands:# lssrc -a | grep active # shows all active daemons under the control of the SRC# lssrc -g cluster # shows all processes in this group# lssrc -ls topsvcs # shows the heartbeats

(3): using ps -ef:# ps -ef | grep clstrmgr# ps -ef | grep clcomdetc..

1.6 Some remarks on the shared storage:

In figure 1, a situation is depicted, where each node is connected to two separate storage systems.But, for each node, one such storage system (e.g. on a San), can be considered to be "owned", or be active for that node.Then, in case of failover, the associated Resource Groups including the volumes and filesystems, will then be aquired by the other node.

Ofcourse, both nodes can be attached to one local storage system (one SAN) as well.

Volume Groups not too large distance, otherwise additional components

Physical Volumes are needed.

Logical Volumes - Filesystems

Disk subsystems that support access from multiple hosts include SCSI, SSA, ESS and others.

On a logical level, the Volume Groups are defined as follows.

It's "shared" because both nodes can access the Volume Group, but only one Node at the time, until fail-over occurs.If the resource group containing the shared disk space moves to another node, the new node will activate the disks, and check the current state of the volume groups, logical volumes, and file systems.

In a non-concurrent access configuration, only one cluster node can access the shared data at a time.

node 1 node 2 node 1

as the name already shows, allows the use of regular filesystems.

A Concurrent Capable or Enhanced Concurrent Capable volume group is a volume group that you can vary online on more than one AIX instance at a time. So, if you have a Concurrent VG, it's online on all nodes.Concurrent Capable option can only be used in configurations with HACMP and HACMP/ESWith HACMP you can create enhanced concurrent volume groups. Enhanced concurrent volume groups can be used for both concurrent and non-concurrent access.

With concurrent access, the application needs to be able to deal with the "concurrent" access mode.

the "looks and feel" of any other filesystem. Or you should use RAW volumes for true concurrent access.

The enhanced concurrent volume groups are varied on all nodes in the resource group, and the data accessis coordinated by HACMP. Only the node that has the resource group active, will vary on the

In “passive” mode, no high level operations are permitted on that volume group.

An example of physical paths to Storage:

When for example SDD MPIO "multipath" is used, the connections of the nodes to a storage system, typically looks like this:

FC connections

swiches

1 and 3 2 and 4

1.7 Some general remarks on Starting and Stopping of HACMP:

1.7.1 HACMP shutdown modes:

HACMP non-concurrent access environments, use normal journaled file systemsas JFS2 to manage data, while concurrent access environments often use RAW logical volumes. These are thus not the regular filesystems.Also, true concurrent access is possible too using a true cluster filesystem like the GPFS filesystem, which,

But again, only with non-concurrent access you will have the regular filesystems which most applications use.

The exception ofcourse is using a true cluster filesystem as GPFS, which allows for concurrent access, and gives

volume group in "concurrent active mode"; the other nodes will vary on the volume group in “passive mode".

NodeA

FC cards

1 2

NodeB

FC cards

3 4

storage system

- Graceful: Local machine shuts itself gracefully. The remote machine interprets this as a graceful down and does not takeover resources .- Takeover (Gracefull with takeover): Local machine shuts itself down gracefully. The remote machine interpret this as a non-graceful down and takes over resources.- Forced: Local machine shuts down cluster services without releasing any resources. Remote machine does not take over any resources. This mode is use ful for system maintenence.

If you do a "shutdown", or reboot "shutdown -r", the rc.shutdown script will stop the cluster services with a graceful shutdown. So, the other node won't takeover the resourses.Also, if you reboot your system then the other node will not take over.

1.7.2 Starting and stopping HACMP: Starting the HACMP services can be done in the following ways:

1. Using smitty clstop and smitty clstart:

The easiest way to stop and start HACMP, is using "smitty clstop" and "smitty clstart".Suppose you just need to reboot a node without that resources need to failover. Then you would choosethe "graceful" shutdown of HACMP. When that's done, and no applications are active, you can use"shutdown -Fr" command to reboot the node.But first shutdown HACMP using:

# smitty clstop

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Stop now, on system restart or both now + Stop Cluster Services on these nodes [starboss] + BROADCAST Cluster shutdown? true+* Shutdown mode graceful +

You can view "/tmp/hacmp.out" for any messages and see if it has shutdown in good order.Then you can use "shutdown -Fr" to reboot the machine.Note: Make sure no apps are running, and users are out of the system, so that all filesystems are able to unmount.

Depending on how IP and HACMP is configured, you may see that at the system boot, you can only ping the machine onit's "boot address".After the node is up, and there is no "autostart" of HACMP, you need to start HACMP manually, for example using"smitty clstart" or "smitty hacmp".

# smitty clstart

Type or select values in entry fields.Press Enter AFTER making all desired changes. [Entry Fields]* Start now, on system restart or both now + Start Cluster Services on these nodes [starboss] + BROADCAST message at startup? false+ Startup Cluster Information Daemon? true + Reacquire resources after forced down ? false

+

has stopped working, and that the machine is accesible again, with a remote terminal, using the service IP address.

Note: if HACMP starts from the inittab, then you do not need to start HACMP manually.See section 5 for information on how to check if HACMP is running.

Note: be sure you have documented all IP parameters of the machine.

2. Using "smitty hacmp":

You can also use the main smitty HACMP menu system, like so:

# smitty hacmp


Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools

Note that "smitty hacmp" can lead you to the socalled "C-SPOC" utility, shown by the "System Management" menu option.


Manage HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP Log Viewing and Management HACMP File Collection Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management Configure GPFS


Start Cluster Services Stop Cluster Services Show Cluster Services

3. Using scripts:

Starting:The "/usr/es/sbin/cluster/etc/rc.cluster" script initializes the environment required for HACMP/ES, and then callsthe "/usr/es/sbin/cluster/utilities/clstart" script to start the HACMP daemons.The clstart script calls the SRC startsrc command to start the specified subsystem or group. Thus, clstart invokes the AIX System Resource Controller (SRC) facility, to start the cluster daemons.

During and after HACMP startup, you may experience that your remote session to the machine using the boot ip adress

The following figure illustrates the major commands and scripts called at cluster startup:

rc.cluster -> clstart -> startsrc

Using the C-SPOC utility, you can start cluster services on any node (or on all nodes) in a cluster by executing the C-SPOC /usr/es/sbin/cluster/sbin/cl_rc.cluster command on a single cluster node. The C-SPOC cl_rc.cluster command calls the rc.cluster command to start cluster services on the nodes specified from the one node. The nodes are started in sequential order, not in parallel. The output of the command run on the remote node is returned to the originating node. Because the command is executed remotely, there can be a delay before the command output is returned.

Note that the clcomd daemon (called clcomdES) is started from "/etc/inittab". You will probably find the following record in inittab:

clcomdES:2:once:startsrc -s clcomdES >/dev/console 2>&1

Depending on how HACMP is configured, the rc.cluster script might be called from inittab as well.In that case, HACMP is started from inittab at boottime. The following record might be present in inittab:

hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1

In fact, if the "rc.cluster" script is called using the parameter "-R", that inittab entry will be created.Below, is a small fragment from the comments that can be found in rc.cluster:

# Arguments: -boot : configures service adapter to use boot address# -i : start client information daemon# -b : broadcast these start events# -N : start now# -R : start on system restart# -B : both# -r : re-acquire resources after forced down## Usage: rc.cluster [-boot] [-l] [-c] [-b] [-N | -R | -B] [-r]

Shutdown:Newer AIX /usr/sbin/shutdown commands, automatically calls the PowerHA, or HACMP, /"usr/es/sbin/cluster/etc/rc.shutdown" command, which will stop the HACMP services. Since many shops have a custom "/etc/rc.shutdown" script (which contain statementsto stop all sorts of other processes), the HACMP rc.shutdown version, will also call that /etc/rc.shutdown script.

1.7.3 Entries in /etc/inittab:

What you might find in the startup file "/etc/inittab", are the following records:

- During installation, the following entry is made to the /etc/inittab file to start the Cluster Communication Daemon at boot:clcomdES:2:once:startsrc -s clcomdES >/dev/console 2>&1

- Usually, during install, the following entry is added to inittab for autostart the HACMP daemons at boottime.Also, if you use the "rc.cluster" script with the "-R" parameter, the entry will be added if it's not present already.hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1

For PowerHA (the new renamed HACMP), a similar entry is present:hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init

- Because of the specific actions needed to implement IP Address Take over (IPAT), a dedicated script is implemented.That's why you will find the following entry in inittab:

harc:2:wait:/usr/es/sbin/cluster/etc/harc.net # HACMP network startup

1.8 Some Management issues Shared Volume groups in HACMP:

1.8.1. Addition of volumes and issues:

In HACMP, a shared VG must be installed. Several management issues may present themselves at later times.

For illustrational purpose, here are some "real life" questions and answers as found in various AIX sites.

You can access C-SPOC by using "smitty hacmp".

The C-SPOC commands only operate on both shared and concurrent LVM components that aredefined as part of an HACMP resource group. When you use SMIT HACMP C-SPOC, it executes the command on the node that "owns" the LVM component. This is the node that has it varied on.

Below examples are for illustrational purposes only.

Question: I've got a HACMP (4.4) cluster with SAN- attached ESS storage. SDD is installed. How can I add volumes to one of the shared VG's?

Answer:1) acquire the new disks on primary node (where the VG is in service) with: # cfgmgr -Svl fcs0 # repeat this for all fcs adapters in system2) convert hdisks to vpaths. Note: use the smit screens for this because the commandshave changed from version to version.3) add vpaths to VG with: # extendvg4vp vgname vpath#4) create LVs/filesystems on the vpaths.5) break VG/scsi locks so that other systems can see the disks with: # varyonvg -b -u vgname6) perform steps 1 & 2 for all failover nodes in the cluster.7) refresh the VG definitions on all the failover nodes with: # importvg -L vgname vpath#8) reestablish disk locks on service node with: # varyonvg vgname9) add new filesystems to HA configuration.10) synchronise HA resources to the cluster.

(note: it can be done as above, but normally I would advise C-SPOC through "smitty hacmp".

Question: How to add a vpath to running hacmp cluster with HACMP:

Answer:On the VG active node : #extendvg4vp vg00 vpath10 vpath11 #smitty chfs ( Increase the f/s as required ) #varyonvg -bu vg00 ( this is to un-lock the vg)

On Secondary node where vg is not active : # cfgmgr -vl fscsi0 ( fscsi1 and fcs0 and fcs1 ) Found new vpaths

Although many Sysadmins use the commandline for changing LVM objects, in general, you should use cluster methods,like C-SPOC, for any change. This will ensure ODM updates on all involved nodes.

# chdev -l vpath10 -a pv=yes ( for vpath11 also ) (Note: I don't think you need to set the pvid)# lsvg vg00|grep path ( just note down any one vpath which is from this o/p-for e.g vpath0 ) # importvg vg00 vpath0

Once that's done, go to Primary Node

# varyonvg vg00 ( Locking the VG )

Question:I have an HACMP cluster with enhanced concurrent resource group. What is the best way to add a (LUN) disk to it?

Answer:Once the lun is visible on your cluster nodes, you should use HACMP C-SPOC in order to add the new lun to an enhanced concurrent volume group.

smitty hacmp > System Management (C-SPOC) > HACMP Logical Volume Management > Shared Volume Groups > Set Characteristics of a Shared Volume Group > Add a Volume to a Shared Volume Group > <select your VG> > Select Disk

If you cannot select any new volume in the last C-SPOC screen, then follow this:1. Allocate the lun to both cluster nodes (this is already done by the SAN administrator)2. Run cfgmgr on 1st node to pick up the new lun.3. Set a pvid on the new lunchdev -l hdiskx -a pv=yes4. Set the no reserve attributechdev -l hdiskx -a reserve_policy=no_reserve5. On the 2nd cluster node, run cfgmgr to pick up the new lun. Ensure it picks up the correct pvid as created on node 1 step 36. Set the no reserve attribute on your hdisk on node 2chdev -l hdiskx -a reserve_policy=no_reserve

Now when you go into C-SPOC, you should see the new lun when you goto add a new volume to a shared volume group.

7. smitty hacmp > System Management (C-SPOC) > HACMP Logical Volume Management > Shared Volume Groups > Set Characteristics of a Shared Volume Group > Add a Volume to a Shared Volume Group > <select your VG>8. Once the lun has been added, check on your cluster node 2 that the VG has the new lun associated.lsvg -p <vgname>

Note: the reserve_policy=no_reserve allows the 2 cluster nodes to see the lun, without one node locking the lun from the other.

Question:How can I increase the filesystem on a shared VG in HACMP. What is different from just running "chfs"?

Answer:You have 2 (or more) machines in a cluster but only one of them has access to the filesystems at a time. If you change the filesystem you have to make sure that not only this one machine but all machines in the cluster get this information and update their bookkeeping data.Again, use C-SPOC in "smitty hacmp".

1.8.2 More on varyonvg in HACMP:

In HACMP, on the shared disksubsystem, one or more "concurrent capable" VG's will be defined.In normal operation, at startup of a node, or even if a fail-over will occur, you should not be botheredby using varyonvg commands: it should all work "automatic" in HACMP.

However, in certain circumstances, you need to be able to perform some actions manually.

-> At a stand-alone AIX machine, if you would varyon a volumegroup, you would simply use:

# varyonvg volumegroup_name

The normal varyonvg command as used above, will "lock" or reserve the disks for the current machine.The base design of LVM assumes that only one initiator can access a volume group.

-> In a HACMP environment, you have a shared Volume Group, which is called "concurrent capable".

The varyonvg command knows many switches, but for HACMP, the following are the most relevant ones.

-b This flag unlocks all disks in a given volume group.-c Varies the volume group on Enhanced Concurrent mode. This can only be done on a concurrent capable VG,

which is for the shared VG's, used in a HACMP environment.-u Varies on a volume group, but leaves the disks that make up the volume group in an unlocked state.

Regular Failover configuration:

In a regular Failover HACMP configuration, one node actively accesses the VG, while the other node has the potentialto take over the Resource Group containing the VG.So, the active node will have implemented "varyonvg vgname" to set the reservations.Although one node is active at the time, the VG still is configured as "concurrent capable".This is sometimes also called an "HACMP nonconcurrent access configuration".

Concurrent access to the VG:

simultaniously.This is sometimes also called an 'HACMP concurrent access configuration".An example of such a configuration might be an Oracle RAC environment, used in conjuction with HACMP.

Active and Passive Varyon in Enhanced Concurrent Mode:

An enhanced concurrent volume group can be made active on the node, or varied on, in twostates: active or passive. Note that active or passive state varyons are done automatically by HACMP.

- Active state varyon behaves as ordinary varyon, and makes the logical volumes normally available.

The node that opens the VG in "active state varyon", can mount all filesystems, and all other usualoperations are possible like running applications from them.

- Passive state Varyon: limited access.

When an enhanced concurrent volume group is varied on in passive state, the LVM provides anequivalent of fencing for the volume group at the LVM level.Only limited operations are possible. The other nodes in a failover configuration will open the VGin "passive state varyon".

As said before, in certain circumstances "true" concurrent access is possible using RAW volumes orusing a true cluster file system (like GPFS, and not using JFS/JFS2).This type of cluster then would not be a "failover cluster". It has certain requirements of the applicationsaccessing the VG. They should be able to handle "parallel" access, like Oracle RAC Clusterware can.

1.9 Some HACMP utilities:

Some HACMP utiliies can provide you the cluster status, and other information.

If RAW volumes, or the true clusterfilesystem as GPFS is used, multiple nodes can truly access the VG

- The "clfindres" and "clRGinfo" commands:

This command shows you the status of Resource Groups and where they are active.

# /usr/es/sbin/cluster/utilities/clfindres

Example output:

-----------------------------------------------------------------------------Group Name Type State Location-----------------------------------------------------------------------------vgprod_resource non-concurrent ONLINE P550LP1 OFFLINE P550LP2 vgtest_resource non-concurrent ONLINE P550LP2

Example output:

GroupName Type State Location Sticky Loc---------- ---------- ------ -------- ----------C37_CAS_01 cascading UP P520-1C38_CAS_01 cascading UP P520-2

The same information can be obtained using the clRGinfo command. If fact, clfindres is a link to clRGinfo.

- The "cllsserv" command:

Use this command to list all the applications configured in HACMP, including the start and stop script

# cllsserv

Example output:

OraDB_Appl /usr/local/bin/dbstart /usr/local/bin/dbstopSapCI_Appl /usr/local/bin/sapstart /usr/local/bin/sapstop

- The "clstat" command:

This commands shows you the overall cluster status.

-a ascii mode-n name shows information of the cluster with the specified name-r tenths-of-seconds determines the refresh rate to update the information

# /usr/es/sbin/cluster/clstat -a

Shows a list a nodes with their interfaces and status.

clstat - HACMP Cluster Status Monitor -------------------------------------

Cluster: Unix_cluster01 (1110212176)Wed Jan 5 09:04:53 NFT 2011 State: UP Nodes: 2 SubState: STABLE

Node: starboss State: UP

Interface: star-oraprod_boot (2) Address: 3.223.224.137 State: DOWN Interface: star-oraprod_stb (2) Address: 10.80.16.1 State: UP Interface: prodlun498 (0) Address: 0.0.0.0 State: UP Interface: star-oraprod (2) Address: 3.223.224.135 State: UP Resource Group: staroraprod_resource State: On line

Node: stargate State: UP Interface: star-oratest_boot (2) Address: 3.223.224.141 State: DOWN Interface: star-oratest_stb (2) Address: 10.80.16.2 State: UP Interface: testlun498 (0) Address: 0.0.0.0 State: UP Interface: star-oratest (2) Address: 3.223.224.139 State: UP Resource Group: staroratest_resource State: On line

- Commands to document the HACMP cluster:

Alongside the above commands (clfindres, cllsserv, clstat), the output of the below commandscan be used to document your HACMP environment.

/usr/es/sbin/cluster/utilities/cllscf/usr/es/sbin/cluster/utilites/cllsnw/usr/es/sbin/cluster/utilities/cltopinfo/usr/es/sbin/cluster/utilities/clshowres/usr/es/sbin/cluster/utilites/cllsserv

Note: Also, don't forget to document the outputs of "lsvg", "lsvg -l", "lspv", "df -g", and the contents of"/etc/filesystems", "/etc/hosts", and all other relevant configuration files.

Ofcourse, this was only a small portion of all utilities.

1.10 Some notes on Disk based Heartbeat.

The whole idea about this, is to have additional Keep Alive, or Heartbeats, across a Non IP network.

An additional path, alongside the private network, makes sure that HACMP can determine that the nodes are still up,even if the private network is malfunctioning for some reason.

To create this additional path, serial links might be used, or, you could use a "heartbeat over disk".

In HACMP 5.x, the RSCT component "/usr/sbin/rsct/bin/hats_diskhb_nim" has the functionality to make it all happen.

There is no SCSI reservation on the disk. This is because both nodes must be able to read and write to that disk.

- Nowadays, a concurrent Volume Group can be used in a nonconcurrent- and concurrent Resource Group. An imporatant feature of a concurrent Resource Group is, is that it's online and open on both nodes.

So, if the private network is just temporarily down, then a "Take Over" does not need to take place.

The latter provides the ability to use existing shared disks, to provide a "serial network like" heartbeat path.

For that, it is sufficient that the disk resides in an enhanced concurrent volume group to meet this requirement.

There is ofcourse a difference between an concurrent Volume Group and an concurrent Resource Group.

- With the older AIX and HACMP versions, a concurrent Volume Group can only be used in an concurrent Resource Group.

As of AIX 5.2, disk heartbeats can exist on an enhanced concurrent VG that resides in a non-concurrent resource group.

How would one install Disk Heartbeat on a two node HACMP cluster:

Say you have the nodes "starboss" and "stargate". Suppose we use ESS storage with vpath devices.

Starboss 'sees' the vpath4 device.Stargate 'sees' the vpath5 device.

If a PVID does not exist on each system, you should run "chdev -l <devicename> -a pv=yes" on both systems.This will ensure that smitty - CSPOC will reckognize it as a disk in shared storage.Both vpaths (vpath4 and vpath5) are pointing to the same virtual disk.

Let's now use C-SPOC to create an "Enhanced Concurrent volume group".

# smitty cl_admin

System Management (C-SPOC)


Manage HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP Log Viewing and Management HACMP File Collection Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management Configure GPFS

HACMP Concurrent Logical Volume Management


Concurrent Volume Groups Concurrent Logical Volumes Synchronize Concurrent LVM Mirrors

Concurrent Volume Groups


List All Concurrent Volume Groups Create a Concurrent Volume Group Create a Concurrent Volume Group with Data Path Devices Set Characteristics of a Concurrent Volume Group Import a Concurrent Volume Group Mirror a Concurrent Volume Group Unmirror a Concurrent Volume Group

Then, choose the nodes and after that add the appropriate shared storage devices based on pvids which here are vpath4 and vpath5.Then, choose that you want to create an Enhanced concurrent VG, with for example the name "examplevg".

A check on the disk devices, after the volume group was created, could be this:

starboss #/ lspvvpath4 000a7f5pe78e9ed5 examplevg

stargate #/lspvvpath5 000a7f5pe78e9ed5 examplevg

Now that the enhanced concurrent Volume Group is available, we now need to create the "heartbeat" network.Since the "physical" path is just along the Fibercards (or whatever physical connection to shared storage you may use),you may wonder why it’s called a "network". Well, actually it resembles a heartbeat network so much, that peoplecall it a network too (it functions quite similar as the private IP network).A network of this type is called a "diskhb" network.

To create it, use "smitty hacmp" from your primary node (for example starboss).Instead of showing all smitty menu's, here we will just only show the menu choices:

smitty hacmp > Extended Configuration > Extended Topology Configuration > Configure HACMP Networks >Add a Network to the HACMP cluster > select diskhb > enter an appropriate network name

Suppose we gave our new diskhb network the name "hbdisknet".When the above actions are done, we have added an diskhb network definition.Next we need to associate our new diskhb network, to our vpath devices.

smitty hacmp > Extended Configuration > Extended Topology Configuration > Configure HACMP Communication Interfaces/Devices >Add Communication Interfaces >

Then, you need to fill in a screen similar to:

Add a Communication Device

Type or select values in entry fields.

* Device Name [starboss_hb] * Network Type diskhb * Network Name hbdisknet * Device Path [/dev/vpath4] * Node Name [starboss]

When done, perform similar actions from your second node.To get a full functioning heartbeat newwork, you need to do some more work.This section was only provided to get a taste of a real installation of some HACMP component.

Well, that's it.Hopefully this document was of some use.

Simple Overview AIX HACMP Cluster implementations.

This note describes HACMP HA clusters, and will not directly focus on the

Section 1: HACMP - High Availability Cluster Multi-Processing

public network As of 5.1, 32 nodes are possible.

service ip address 172.18.17.5 node B

for an application,owned by node A,

taken over by Node B

private network

10.10.10.5KA messages=heartbeats on:(1): private IP network

(2): non IP heartbeat- Might be through rs232 like.connection (old method).- Or this type ofheartbeat is "disk"based, using FC adapters toa disk in a concurrent VG.

hdisk1 resource group RG2 Here, two SAN's are present.hdisk2 application_02 Other setups might just usehdisk3 volume groups one shared disksubsystem.etc.. filesystems In this specific situation,

high-level overview on the main HACMP concepts and components.

There are several cluster implementations possible on AIX, of which the most well known are:

parallel access of multiple Nodes on shared filesystem objects.Failover of services (applications) to other Node(s).

/etc/hosts lists all ip adresses

rsct hacmp daemons: rscd clstrmgr clcomd grpglsmd hatsd hagsd

- Resource group descriptions- start/stop scripts for the application- Resource Group takeover- Service IP address takeover

logfiles

replication SAN B each node has it's "own"<------------> online application and RG.

SCSI, SSA, or Fibre Channel. Such an application and RG,can fail over to the other node.

Resources (like volumes, filesystems, application service IP etc..) are grouped together in Resource Groups (RGs),

When a Node who owns a Resource Group fails, the Resource Group "fails over" to another node.Since an "application" is determined by it's filesystems, and it's IP parameters, the RG is a sort of "wrapper" or "container", that can fail over from one Node to another node. This is "high availability" of the application.

If a Node providing the appliction, goes "down", the RG will "move" to the other Node. Stop and Start scripts of the application will ensure that the application stops at the first Node, and start (a bit later) on the other Node.So, "wrapping" stop and start shell scripts, on both nodes, are an element too in the HA environment.

The application associated IP (service label IP) can "fail over" either by:

IP address takeover is a mechanism for recovering a service IP label by moving it to another

An IP alias is an IP address that is configured on a interface in addition to the base IP address. An IP alias is an AIX function that is supported by HACMP. Multiple aliases are possible, on different subnets.A "Boot" IP address is the (regular) boot or base address configured for an interface in AIX.The "service lable IP" is the address on which clients can connect to the service (application).The service IP address will be added on top of the base IP address (IPAT via aliasing), or will replacethe base (boot) IP address of the communication interface, depending on the IP address take over (IPAT) mechanism.

Secondly, a Non IP based heartbeat must be present. This could be realized using rs232 on both nodes (old method),or using a "disk based" heartbeat (using FC's) to a specially configured disk in a conncurent VG.In this way, HACMP maintains information about the status of the cluster nodes and their respective network interfaces.So, if only the IP network temporarily fails, but the rs232 or disk heartbeats still work, fail over will not take place.Indeed, in that situation the nodes are healthy, but the IP network only had a temporarily "hickup".So, it's a sort of "saveguard" to avoid "split brain" or a partitioned (or ill-functioning), cluster.

AIX 7.1 Comment

No

No

No new heartbeating over disk

No

No

No

from one Node to the other Node, so it's probably better to say that" the Resource Group (open the Volume Group, mount filesystems, aquire the Service IP address etc..).

YesYesYes




The ES stands for "Enhanced Scalability". As of 5.1, HACMP/ES is solely based on RSCT, short for Reliable Scalable Cluster Technology. As from version 5.5, HACMP is renamed to PowerHA.

Notice that if you list the daemons in the AIX System Resource Controller (SRC), using the "lssrc" command, you will see ES appended to their names. The actual executables do not have the ES appended.

This daemon monitors the status of the nodes and their interfaces, and invokes the appropriate scripts in response to node or network events. It also centralizes the storage of and publishes updated information about HACMP-defined resource groups. The Cluster Manager on each node coordinates information gathered from the HACMP global ODM, and other Cluster Managers in the cluster to maintain updated information about the content, location, and status of all HACMP resource groups. This information is updated and synchronized among all nodes whenever an event occurs that affects resource group configuration, status, or location.All cluster nodes must run the clstrmgr daemon.From HACMP V5.3 the clstrmgr daemon is started via init process and should always be running.As of 5.2, the dependency on rhosts and r commands has been removed, and are done by clcomd.Starting with Version 5.2, clcomdES must be running before any cluster services can be started. The clcomd daemon is started automatically at boot time by the init process. It provides secure remote command execution and HACMP ODM configuration file updates.

This daemon maintains status information about cluster objects. This daemon works in conjunction with the Simple Network Management Protocol (snmpd) daemon. All cluster nodes must run the clsmuxpd daemon.Note: The clsmuxpd daemon cannot be started unless the snmpd daemon is running.

Reliable Scalable Cluster Technology. Since HACMP 5.1, HACMP relies on RSCT. So, in modern HACMP, RSCT isa neccessary component or subsystem. For example, HACMP uses the heartbeat facility of RSCT.

RSCT is a standard component in AIX5L.

Reliable Scalable Cluster Technology, or RSCT, is a set of software components that together provide a comprehensive clustering environment for AIX and Linux. RSCT is the infrastructure used by a variety of IBMr products to provide clusters with improved system availability, scalability, and ease of use.

The following daemons are related to the RSCT framework.The RSCT Toplogy Services subsystem monitors the status of network interfaces.All cluster nodes must run the topsvcsd daemon.

This RSCT subsystem provides reliable communication and protocols required for cluster operation.

This RSCT daemon operates as a Group Services client; its function is to make switch adapter membership global across all cluster nodes. All cluster nodes must run the grpglsmd daemon

Yes

and hats_nim.


This RSCT subsystem acts as a resource monitor for the event management subsystem and provides information about the operating system characteristics and utilization.The RMC subsystem must be running on each node in the cluster. By default the rmcd daemon is setup to start from inittab when it is installed. The rc.cluster script ensures the RMC subsystem is running.

Contains time-stamped, formatted messages generated by the HACMP for AIX scripts. In verbose mode, this log file contains a line-by-line record of each command executed in the scripts, including the values of the arguments passed to the commands. By default, the HACMP for AIX software writes verbose information to this log file; however, you can change this default. Verbose mode is recommended. Contains time-stamped, formatted messages generated by HACMP for AIX scripts and daemons. In this log file, there is one line written for the start of each event, and one line written

Contains time-stamped, formatted messages from all AIX subsystems, including the HACMPfor AIX scripts and daemons.

Contains time-stamped, formatted messages generated by the HACMP for AIX scripts. The system creates a new cluster history log file every day that has a cluster event occurring. It identifies each day's file by the file name extension, where mm indicates the month and dd indicates the day. Contains time-stamped, formatted messages generated by HACMP for AIX clstrmgr activity. Information in this file is used by IBM Support personnel when the clstrmgr is in debug mode. Note that this file is overwritten every time cluster services are started; so, you should be careful to make a copy of it before restarting cluster services on a failed node. Contains time-stamped, formatted messages generated by HACMP for AIX C-SPOC commands. Because the C-SPOC utility lets you start or stop the cluster from a single cluster node, the /tmp/cspoc.log is stored on the node that initiates a C-SPOC command.

Contains messages when the cluster verification has run.

Every day a health check is performed at 00:00h, and logged on clutils.log

Not a logfile. Only for later versions.The HACM cluster snapshot facility (/usr/es/sbin/cluster/utilities/clsnapshots) allows you to savein a file, a record all the data that defines a particular cluster configuration. It also allows you to create your own custom snapshot methods,to save additional information important to your configuration.You can use this snapshot for troubleshooting cluster problems.

First and foremost, you can check your processes by viewing the logs as described in section 1.4.

smitty hacmp > System Management (C-SPOC) > Manage HACMP Services > Show Cluster Services

# shows all active daemons under the control of the SRC

In figure 1, a situation is depicted, where each node is connected to two separate storage systems.But, for each node, one such storage system (e.g. on a San), can be considered to be "owned", or be active for that node.Then, in case of failover, the associated Resource Groups including the volumes and filesystems,

Ofcourse, both nodes can be attached to one local storage system (one SAN) as well.

in these simplistic figures,elements like SAN switches are left out

not too large distance, otherwise additional components

Disk subsystems that support access from multiple hosts include SCSI, SSA, ESS and others.

It's "shared" because both nodes can access the Volume Group, but only one Node at the time, until fail-over occurs.If the resource group containing the shared disk space moves to another node, the new node will activate the disks, and check the current state of the volume groups, logical volumes, and file systems.

access configuration, only one cluster node can access the shared data at a time.

node 2

A Concurrent Capable or Enhanced Concurrent Capable volume group is a volume group that you can vary online

Concurrent Capable option can only be used in configurations with HACMP and HACMP/ESWith HACMP you can create enhanced concurrent volume groups. Enhanced concurrent volume groups can be used

With concurrent access, the application needs to be able to deal with the "concurrent" access mode.

the "looks and feel" of any other filesystem. Or you should use RAW volumes for true concurrent access.

The enhanced concurrent volume groups are varied on all nodes in the resource group, and the data accessis coordinated by HACMP. Only the node that has the resource group active, will vary on the

In “passive” mode, no high level operations are permitted on that volume group.

When for example SDD MPIO "multipath" is used, the connections of the nodes to a storage system,

access environments, use normal journaled file systemsas JFS2 to manage data, access environments often use RAW logical volumes. These are thus not the regular filesystems.

true cluster filesystem like the GPFS filesystem, which,

you will have the regular filesystems which most applications use.

as GPFS, which allows for concurrent access, and gives

"; the other nodes will vary on the volume group in “passive mode".

NodeB

FC cards

- Graceful: Local machine shuts itself gracefully. The remote machine interprets this as a graceful down and

- Takeover (Gracefull with takeover): Local machine shuts itself down gracefully. The remote machine interpret this as a non-graceful down and takes over resources.- Forced: Local machine shuts down cluster services without releasing any resources. Remote machine does not

If you do a "shutdown", or reboot "shutdown -r", the rc.shutdown script will stop the cluster services with a graceful shutdown. So, the other node won't takeover the resourses.

The easiest way to stop and start HACMP, is using "smitty clstop" and "smitty clstart".Suppose you just need to reboot a node without that resources need to failover. Then you would choosethe "graceful" shutdown of HACMP. When that's done, and no applications are active, you can use

* Stop now, on system restart or both now + Stop Cluster Services on these nodes [starboss]

+* Shutdown mode graceful

You can view "/tmp/hacmp.out" for any messages and see if it has shutdown in good order.

Note: Make sure no apps are running, and users are out of the system, so that all filesystems are able to unmount.

Depending on how IP and HACMP is configured, you may see that at the system boot, you can only ping the machine on

After the node is up, and there is no "autostart" of HACMP, you need to start HACMP manually, for example using

d

* Start now, on system restart or both now + Start Cluster Services on these nodes [starboss]

+ Startup Cluster Information Daemon? true + Reacquire resources after forced down ? false

has stopped working, and that the machine is accesible again, with a remote terminal, using the service IP address.

Note: if HACMP starts from the inittab, then you do not need to start HACMP manually.

Note that "smitty hacmp" can lead you to the socalled "C-SPOC" utility, shown by the "System Management" menu option.

The "/usr/es/sbin/cluster/etc/rc.cluster" script initializes the environment required for HACMP/ES, and then callsthe "/usr/es/sbin/cluster/utilities/clstart" script to start the HACMP daemons.The clstart script calls the SRC startsrc command to start the specified subsystem or group. Thus, clstart invokes the AIX System Resource Controller (SRC) facility, to start the cluster daemons.

During and after HACMP startup, you may experience that your remote session to the machine using the boot ip adress

The following figure illustrates the major commands and scripts called at cluster startup:

Using the C-SPOC utility, you can start cluster services on any node (or on all nodes) in a cluster by executing the C-SPOC /usr/es/sbin/cluster/sbin/cl_rc.cluster command on a single cluster node. The C-SPOC cl_rc.cluster command calls the rc.cluster command to start cluster services on the nodes specified from the one node. The nodes are started in sequential order, not in parallel. The output of the command run on the remote node is returned to the originating node. Because the command is executed remotely,

Note that the clcomd daemon (called clcomdES) is started from "/etc/inittab".

Depending on how HACMP is configured, the rc.cluster script might be called from inittab as well.In that case, HACMP is started from inittab at boottime. The following record might be present in inittab:

hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1

In fact, if the "rc.cluster" script is called using the parameter "-R", that inittab entry will be created.Below, is a small fragment from the comments that can be found in rc.cluster:

Newer AIX /usr/sbin/shutdown commands, automatically calls the PowerHA, or HACMP, /"usr/es/sbin/cluster/etc/rc.shutdown" command, which will stop the HACMP services. Since many shops have a custom "/etc/rc.shutdown" script (which contain statementsto stop all sorts of other processes), the HACMP rc.shutdown version, will also call that

What you might find in the startup file "/etc/inittab", are the following records:

- During installation, the following entry is made to the /etc/inittab file to start the

- Usually, during install, the following entry is added to inittab for autostart the HACMP daemons at boottime.Also, if you use the "rc.cluster" script with the "-R" parameter, the entry will be added if it's not present already.hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1

- Because of the specific actions needed to implement IP Address Take over (IPAT), a dedicated script is implemented.

In HACMP, a shared VG must be installed. Several management issues may present themselves at later times.

For illustrational purpose, here are some "real life" questions and answers as found in various AIX sites.

The C-SPOC commands only operate on both shared and concurrent LVM components that aredefined as part of an HACMP resource group. When you use SMIT HACMP C-SPOC, it executes the command on the node that "owns" the LVM component. This is the node that has it varied on.

I've got a HACMP (4.4) cluster with SAN- attached ESS storage. SDD is installed.

1) acquire the new disks on primary node (where the VG is in service) with:

2) convert hdisks to vpaths. Note: use the smit screens for this because the commands

(note: it can be done as above, but normally I would advise C-SPOC through "smitty hacmp".

Although many Sysadmins use the commandline for changing LVM objects, in general, you should use cluster methods,. This will ensure ODM updates on all involved nodes.

(Note: I don't think you need to set the pvid)# lsvg vg00|grep path ( just note down any one vpath which is from this o/p-for e.g vpath0 )

I have an HACMP cluster with enhanced concurrent resource group. What is the best way to add a (LUN) disk to it?

Once the lun is visible on your cluster nodes, you should use HACMP C-SPOC in order to

smitty hacmp > System Management (C-SPOC) > HACMP Logical Volume Management > Shared Volume Groups > Set Characteristics of a Shared Volume Group > Add a Volume to a Shared Volume Group > <select your VG> > Select Disk

If you cannot select any new volume in the last C-SPOC screen, then follow this:1. Allocate the lun to both cluster nodes (this is already done by the SAN administrator)

(you don't need to set the pvid)

Now when you go into C-SPOC, you should see the new lun when you goto add a new volume to a shared volume group.

7. smitty hacmp > System Management (C-SPOC) > HACMP Logical Volume Management > Shared Volume Groups > Set Characteristics of a Shared Volume Group > Add a Volume to a Shared Volume Group > <select your VG>8. Once the lun has been added, check on your cluster node 2 that the VG has the new lun associated.

Note: the reserve_policy=no_reserve allows the 2 cluster nodes to see the lun, without one node

How can I increase the filesystem on a shared VG in HACMP. What is different from just running "chfs"?

You have 2 (or more) machines in a cluster but only one of them has access to the filesystems at a time. If you change the filesystem you have to make sure that not only this one machine but all machines in the cluster

In HACMP, on the shared disksubsystem, one or more "concurrent capable" VG's will be defined.In normal operation, at startup of a node, or even if a fail-over will occur, you should not be bothered

However, in certain circumstances, you need to be able to perform some actions manually.

-> At a stand-alone AIX machine, if you would varyon a volumegroup, you would simply use:

The normal varyonvg command as used above, will "lock" or reserve the disks for the current machine.The base design of LVM assumes that only one initiator can access a volume group.

-> In a HACMP environment, you have a shared Volume Group, which is called "concurrent capable".

The varyonvg command knows many switches, but for HACMP, the following are the most relevant ones.

Varies the volume group on Enhanced Concurrent mode. This can only be done on a concurrent capable VG, which is for the shared VG's, used in a HACMP environment.Varies on a volume group, but leaves the disks that make up the volume group in an unlocked state.

In a regular Failover HACMP configuration, one node actively accesses the VG, while the other node has the potential

So, the active node will have implemented "varyonvg vgname" to set the reservations.Although one node is active at the time, the VG still is configured as "concurrent capable".This is sometimes also called an "HACMP nonconcurrent access configuration".

An example of such a configuration might be an Oracle RAC environment, used in conjuction with HACMP.

An enhanced concurrent volume group can be made active on the node, or varied on, in twostates: active or passive. Note that active or passive state varyons are done automatically by HACMP.

- Active state varyon behaves as ordinary varyon, and makes the logical volumes normally available.

The node that opens the VG in "active state varyon", can mount all filesystems, and all other usual

When an enhanced concurrent volume group is varied on in passive state, the LVM provides an

Only limited operations are possible. The other nodes in a failover configuration will open the VG

As said before, in certain circumstances "true" concurrent access is possible using RAW volumes or

This type of cluster then would not be a "failover cluster". It has certain requirements of the applicationsaccessing the VG. They should be able to handle "parallel" access, like Oracle RAC Clusterware can.

Some HACMP utiliies can provide you the cluster status, and other information.

as GPFS is used, multiple nodes can truly access the VG

This command shows you the status of Resource Groups and where they are active.

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

The same information can be obtained using the clRGinfo command. If fact, clfindres is a link to clRGinfo.

Use this command to list all the applications configured in HACMP, including the start and stop script

shows information of the cluster with the specified namedetermines the refresh rate to update the information

Resource Group: staroraprod_resource State: On line

Resource Group: staroratest_resource State: On line

Alongside the above commands (clfindres, cllsserv, clstat), the output of the below commands

Note: Also, don't forget to document the outputs of "lsvg", "lsvg -l", "lspv", "df -g", and the contents of"/etc/filesystems", "/etc/hosts", and all other relevant configuration files.

The whole idea about this, is to have additional Keep Alive, or Heartbeats, across a Non IP network.

An additional path, alongside the private network, makes sure that HACMP can determine that the nodes are still up,

To create this additional path, serial links might be used, or, you could use a "heartbeat over disk".

In HACMP 5.x, the RSCT component "/usr/sbin/rsct/bin/hats_diskhb_nim" has the functionality to make it all happen.

There is no SCSI reservation on the disk. This is because both nodes must be able to read and write to that disk.

- Nowadays, a concurrent Volume Group can be used in a nonconcurrent- and concurrent Resource Group. An imporatant feature of a concurrent Resource Group is, is that it's online and open on both nodes.

, then a "Take Over" does not need to take place.

The latter provides the ability to use existing shared disks, to provide a "serial network like" heartbeat path.

enhanced concurrent volume group to meet this requirement.

and an concurrent Resource Group.

- With the older AIX and HACMP versions, a concurrent Volume Group can only be used in an concurrent Resource Group.

As of AIX 5.2, disk heartbeats can exist on an enhanced concurrent VG that resides in a non-concurrent resource group.

Say you have the nodes "starboss" and "stargate". Suppose we use ESS storage with vpath devices.

If a PVID does not exist on each system, you should run "chdev -l <devicename> -a pv=yes" on both systems.This will ensure that smitty - CSPOC will reckognize it as a disk in shared storage.

Then, choose the nodes and after that add the appropriate shared storage devices based on pvids

Then, choose that you want to create an Enhanced concurrent VG, with for example the name "examplevg".

A check on the disk devices, after the volume group was created, could be this:

Now that the enhanced concurrent Volume Group is available, we now need to create the "heartbeat" network.Since the "physical" path is just along the Fibercards (or whatever physical connection to shared storage you may use),you may wonder why it’s called a "network". Well, actually it resembles a heartbeat network so much, that peoplecall it a network too (it functions quite similar as the private IP network).

To create it, use "smitty hacmp" from your primary node (for example starboss).Instead of showing all smitty menu's, here we will just only show the menu choices:

smitty hacmp > Extended Configuration > Extended Topology Configuration > Configure HACMP Networks >Add a Network to the HACMP cluster > select diskhb > enter an appropriate network name

When the above actions are done, we have added an diskhb network definition.

smitty hacmp > Extended Configuration > Extended Topology Configuration > Configure HACMP Communication Interfaces/Devices >

Type or select values in entry fields.

* Device Name [starboss_hb]

* Device Path [/dev/vpath4]

To get a full functioning heartbeat newwork, you need to do some more work.This section was only provided to get a taste of a real installation of some HACMP component.

As of 5.1, 32 nodes are possible.

Here, two SAN's are present.Other setups might just useone shared disksubsystem.In this specific situation,

each node has it's "own"online application and RG.Such an application and RG,can fail over to the other node.

* Stop now, on system restart or both now + Stop Cluster Services on these nodes [starboss]

+* Shutdown mode graceful

* Start now, on system restart or both now + Start Cluster Services on these nodes [starboss]

+ Startup Cluster Information Daemon? true + Reacquire resources after forced down ? false

Aix Clusters

Documents

Transcript of Aix Clusters