Sun Cluster 3.2 Cheat Sheet

Sun Cluster Cheat Sheet

This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS devices, etc

Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one

scsetup (3.1)•clsetup (3.2) •

Also all the commands in version 3.1 are available to version 3.2

Daemons and Processes

At the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain informtion on most of them but still looking for others.

Versions 3.1 and 3.2

clexecd

This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.

cl_ccradThis daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped.

cl_eventdThe cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed.

cl_eventlogdcluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.

failfastdThis daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed

rgmdThe resource group management daemon which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.fedThis is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.pmfd

This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.

of 12Sun Cluster 3.2 - Cheat Sheet

9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

pnmdPublic managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.

scdpmd

Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.

Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.

Version 3.2 only

qd_userdThis daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device

cl_execd

ifconfig_proxy_serverd

rtreg_proxy_serverd

cl_pnmdis a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups.

scprivipd This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones.

sc_zonesdThis daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure

cznetdIt is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file.

rpc.fedThis is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds

scqdmd The quorum server daemon, this possibly use to be called "scqsd"

pnm mod serverd

File locations

Both Versions (3.1 and 3.2)

man pages /usr/cluster/man

log files/var/cluster/logs /var/adm/messages

Configuration files (CCR, eventlog, etc) /etc/cluster/

Cluster and other commands /usr/cluser/lib/sc

Version 3.1 Only

sccheck logs /var/cluster/sccheck/report.<date>

Cluster infrastructure file /etc/cluster/ccr/infrastructure



Version 3.2 Only

sccheck logs /var/cluster/logs/cluster_check/remote.<date>

Cluster infrastructure file /etc/cluster/ccr/global/infrastructure

Command Log /var/cluster/logs/commandlog

SCSI Reservations

Display reservation keys

scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3: /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

determine the device owner

scsi2: /usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3: /usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Command shortcuts

In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin

� shortcut

cldevice cldev

cldevicegroup cldg

clinterconnect clintr

clnasdevice clnas

clquorum clq

clresource clrs

clresourcegroup clrg

clreslogicalhostname clrslh

clresourcetype clrt

clressharedaddress clrssa

Shutting down and Booting a Cluster



� 3.1 3.2

shutdown entire cluster

##other nodes in cluster scswitch -S -h <host> shutdown -i5 -g0 -y ## Last remaining node scshutdown -g0 -y

cluster shutdown -g0 -y

shutdown single node scswitch -S -h <host> shutdown -i5 -g0 -y

clnode evacuate <node> shutdown -i5 -g0 -y

reboot a node into non-cluster mode ok> boot -x ok> boot -x

Cluster information

� 3.1 3.2

Cluster scstat -pv cluster list -v cluster show cluster status

Nodes scstat –n clnode list -v clnode show clnode status

Devices scstat –D cldevice list cldevice show cldevice status

Quorum scstat –q clquorum list -v clquorum show clqorum status

Transport info scstat –W clinterconnect show clinterconnect status

Resources scstat –g clresource list -v clresource show clresource status

Resource Groups scsat -g scrgadm -pv

clresourcegroup list -v clresourcegroup show clresourcegroup status

Resource Types clresourcetype list -v clresourcetype list-props -v clresourcetype show

IP Networking Multipathing scstat –i clnode status -m

Installation info (prints packages and version) scinstall –pv clnode show-rev -v



Cluster Configuration

� 3.1 3.2

Release cat /etc/cluster/release

Integrity check sccheck cluster check

Configure the cluster (add nodes, add data services, etc)

scinstall

scinstall

Cluster configuration utility (quorum, data sevices, resource groups, etc)

scsetup clsetup

Rename cluster rename -c <cluster_name>

Set a property cluster set -p <name>=<value>

List

## List cluster commands cluster list-cmds ## Display the name of the cluster cluster list ## List the checks cluster list-checks ## Detailed configuration cluster show -t global

Status cluster status

Reset the cluster private network settings cluster restore-netprops <cluster_name>

Place the cluster into install mode cluster set -p installmode=enabled

Add a node scconf –a –T node=<host><host> clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2

Remove a node scconf –r –T node=<host><host> clnode remove

Prevent new nodes from entering scconf –a –T node=.

Put a node into maintenance state

scconf -c -q node=<node>,maintstate

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.

Get a node out of maintenance state

scconf -c -q node=<node>,reset

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.



Node Configuration

� 3.1 3.2

Add a node to the cluster

clnode add [-c <cluster>] [-n <sponsornode>] \ -e <endpoint> \ -e <endpoint> <node>

Remove a node from the cluster ## Make sure you are on the node you wish to remove clnode remove

Evacuate a node from the cluster scswitch -S -h <node> clnode evacuate <node>

Cleanup the cluster configuration (used after removing nodes)

clnode clear <node>

List nodes

## Standard list clnode list [+|<node>]

## Destailed list clnode show [+|<node>]

Change a nodes property clnode set -p <name>=<value> [+|<node>]

Status of nodes clnode status [+|<node>]

Admin Quorum Device Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.

� 3.1

Adding a SCSI device to the quorum

scconf –a –q globaldev=d11

Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace.

clquorum add [-t <type>] [-p <name>=<value>] [+|<devicename>]

Adding a NAS device to the quorum n/a clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevice>

Adding a Quorum Server n/a clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber>

Removing a device to the quorum scconf –r –q globaldev=d11 clquorum remove [-t <type>] [+|<devicename>]

Remove the last quorum device

## Evacuate all nodes ## Put cluster into maint mode scconf –c –q installmode ## Remove the quorum device

## Place the cluster in install modecluster set -p installmode=enabled ## Remove the quorum device clquorum remove <device>



scconf –r –q globaldev=d11 ## Check the quorum devices scstat –q

## Verify the device has been removedclquorum list -v

List

## Standard list clquorum list -v [-t <type>] [-n <node>] [+|<devicename>]

## Detailed list clquorum show [-t <type>] [-n <node>]

## Status clquorum status [-t <type>] [-n <node>] [+|<devicename>]

Resetting quorum infoscconf –c –q reset

Note: this will bring all offline quorum devices online clquorum reset

Bring a quorum device into maintenance mode (3.2 known as enabled)

## Obtain the device number scdidadm –L scconf –c –q globaldev=<device>,maintstate

clquorum enable [-t <type>] [+|<devicename>]

Bring a quorum device out of maintenance mode (3.2 known as disabled)

scconf –c –q globaldev=<device><device>,reset clquorum disable [-t <type>] [+|<devicename>]

Device Configuration

� 3.1

Check device cldevice check [-n <node>] [+]

Remove all devices from node cldevice clear [-n <node>]

Monitoring

## Turn on monitoring cldevice monitor [-n <node>] [+|<device>]

## Turn off monitoring cldevice unmonitor [-n <node>] [+|<device>]

Rename cldevice rename -d <destination_device_name>

Replicate cldevice replicate [-S <source-node>]

Set properties of a device cldevice set -p default_fencing={global|pathcount|scsi3}

Status

## Standard display cldevice status [-s <state>] [-n <node>]

## Display failed disk paths cldevice status -s fail



Lists all the configured devices including paths across all nodes.

scdidadm –L

## Standard List cldevice list [-n <node>] [+|<device>] ## Detailed list cldevice show [-n <node>] [+|<device>]

List all the configured devices including paths on node only.

scdidadm –l see above

Reconfigure the device database, creating new instances numbers if required.

scdidadm –r cldevice populate cldevice refresh [-n <node>] [+]

Perform the repair procedure for a particular path (use then when a disk gets replaced)

scdidadm –R <c0t0d0s0> - device scdidadm –R 2 - device id

cldevice repair [-n <node>] [+|<device>]

Disks group

� 3.1

Create a device group n/a cldevicegroup create

Remove a device group n/a cldevicegroup delete <devgrp>

Adding scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true cldevicegroup add

Removing scconf –r –D name=<disk group> cldevicegroup remove

Set a property cldevicegroup set [

List scstat

## Standard list cldevicegroup list [

## Detailed configuration reportcldevicegroup show [

status scstat cldevicegroup status [

adding single node scconf -a -D type=vxvm,name=appdg,nodelist=<host> cldevicegroup add

Removing single node scconf –r –D name=<disk group>,nodelist=<host> cldevicegroup remove

Switch scswitch –z –D <disk group> -h <host> cldevicegroup switch

Put into maintenance mode scswitch –m –D <disk group> n/a

take out of maintenance mode scswitch -z -D <disk group> -h <host> n/a

onlining a disk group scswitch -z -D <disk group> -h <host> cldevicegroup online <devgrp>

offlining a disk group scswitch -F -D <disk group> cldevicegroup offline <devgrp>

Resync a disk group scconf -c -D name=appdg,sync cldevicegroup syn [

Transport Cable



� 3.1

Add clinterconnect add <endpoint>,<endpoint>

Remove clinterconnect remove <endpoint>,<endpoint>

Enable scconf –c –m endpoint=<host>:qfe1,state=enabled clinterconnect enable [-n <node>] [+|<endpoint>,<endpoint>]

Disable scconf –c –m endpoint=<host>:qfe1,state=disabled Note: it gets deleted

clinterconnect disable [-n <node>] [+|<endpoint>,<endpoint>]

List scstat## Standard and detailed list clinterconnect show [-n <node>][+|<endpoint>,<endpoint>]

Status scstat clinterconnect status [-n <node>][+|<endpoint>,<endpoint>]

Resource Groups

� 3.1

Adding (failover) scrgadm -a -g <res_group> -h <host>,<host> clresourcegroup create <res_group>

Adding (scalable) clresourcegroup create -S <res_group>

Adding a node to a resource group clresourcegroup add-node -n <node> <res_group>

Removing scrgadm –r –g <group>

## Remove a resource group clresourcegroup delete <res_group>

## Remove a resource group and all its resourcesclresourcegroup delete -F <res_group>

Removing a node from a resource group clresourcegroup remove-node -n <node> <res_group>

changing properties scrgadm -c -g <resource group> -y <propety=value> clresourcegroup set -p Failback=true + <name=value>

Status scstat -g clresourcegroup status [-n <node>][-

Listing scstat –g clresourcegroup list [-n <node>][-r

Detailed List scrgadm –pv –g <res_group> clresourcegroup show [-n <node>][-r <resource][

Display mode type (failover or scalable) scrgadm -pv -g <res_group> | grep 'Res Group mode'

Offlining scswitch –F –g <res_group>

## All resource groups clresourcegroup offline +

## Individual group clresourcegroup offline [-n <node>] <res_group>

clresourcegroup evacuate [+|-n <node>]



Onlining scswitch -Z -g <res_group>

## All resource groups clresourcegroup online +

## Individual groups clresourcegroup online [-n <node>] <res_group>

Evacuate all resource groups from a node (used when shutting down a node)

clresourcegroup evacuate [+|-n <node>]

Unmanagingscswitch –u –g <res_group>

Note: (all resources in group must be disabled) clresourcegroup unmanage <res_group>

Managing scswitch –o –g <res_group> clresourcegroup manage <res_group>

Switching scswitch –z –g <res_group> –h <host> clresourcegroup switch -n <node> <res_group>

Suspend n/a clresourcegroup suspend [+|<res_group>]

Resume n/a clresourcegroup resume [+|<res_group>]

Remaster (move the resource group/s to their preferred node)

n/a clresourcegroup remaster [+|<res_group>]

Restart a resource group (bring offline then online)

n/a clresourcegroup restart [-n <node>]

Resources

� 3.1

Adding failover network resource scrgadm –a –L –g <res_group> -l <logicalhost> clreslogicalhostname create

Adding shared network resource scrgadm –a –S –g <res_group> -l <logicalhost> clressharedaddress create

adding a failover apache application and attaching the network resource

scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

adding a shared apache application and attaching the network resource

scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

Create a HAStoragePlus failover resource scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true

clresource create -t HAStorage -p FilesystemMountPoints=<mount-p Affinityon=true <rs-hasp>

Removingscrgadm –r –j res-ip

Note: must disable the resource firstclresource delete [-g <res_group>][



changing or adding properties scrgadm -c -j <resource> -y <property=value>

## Changing clresource set -t <type>

## Adding clresource set -p <name>+=<value>

List scstat -g

clresource list [-g <res_group>][ ## List properties clresource list-props [-g <res_group>][

Detailed Listscrgadm –pv –j res-ip scrgadm –pvv –j res-ip

clresurce show [-n <node>] [

Status scstat -g clresource status [-s <state>][

Disable resoure monitor scrgadm –n –M –j res-ip clresource monitor [-n <node>] [

Enable resource monitor scrgadm –e –M –j res-ip clresource unmonitor [-n <node>] [

Disabling scswitch –n –j res-ip clresource disable <resource>

Enabling scswitch –e –j res-ip clresource enable <resource>

Clearing a failed resource scswitch –c –h<host>,<host> -j <resource> -f STOP_FAILED clresource clear -f STOP_FAILED <resource>

Find the network of a resource scrgadm –pvv –j <resource> | grep –I network

Removing a resource and resource group

## offline the group scswitch –F –g rgroup-1 ## remove the resource scrgadm –r –j res-ip ## remove the resource group scrgadm –r –g rgroup-1

## offline the group clresourcegroup offline <res_group> ## remove the resource clresource [-g <res_group>][ ## remove the resource group clresourcegroup delete <res_group>

Resource Types

� 3.1

Adding (register in 3.2) scrgadm –a –t <resource type> i.e SUNW.HAStoragePlus clresourcetype register <type>

Register a resource type to a node n/a clresourcetype add-node -

Deleting (remove in 3.2) scrgadm –r –t <resource type> clresourcetype unregister

Deregistering a resource type from a node n/a clresourcetype remove-node

Listing scrgadm –pv | grep ‘Res Type name’ clresourcetype list [<type>]

Listing resource type properties clresourcetype list-props

Show resource types clresourcetype show [<type>]



Set properties of a resource type clresourcetype set [-p <name>=<value>] <type>



Sun Cluster 3.2 Cheat Sheet

Documents

Transcript of Sun Cluster 3.2 Cheat Sheet