Sun Cluster 3.2 Cheat Sheet
-
Upload
suheil-bukhzam -
Category
Documents
-
view
420 -
download
8
Transcript of Sun Cluster 3.2 Cheat Sheet
Sun Cluster Cheat Sheet
This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS devices, etc
Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one
scsetup (3.1)•clsetup (3.2) •
Also all the commands in version 3.1 are available to version 3.2
Daemons and Processes
At the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain informtion on most of them but still looking for others.
Versions 3.1 and 3.2
clexecd
This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.
cl_ccradThis daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped.
cl_eventdThe cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed.
cl_eventlogdcluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.
failfastdThis daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed
rgmdThe resource group management daemon which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.fedThis is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.pmfd
This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.
Page 1 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
pnmdPublic managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.
scdpmd
Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.
Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.
Version 3.2 only
qd_userdThis daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device
cl_execd
ifconfig_proxy_serverd
rtreg_proxy_serverd
cl_pnmdis a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups.
scprivipd This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones.
sc_zonesdThis daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure
cznetdIt is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file.
rpc.fedThis is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds
scqdmd The quorum server daemon, this possibly use to be called "scqsd"
pnm mod serverd
File locations
Both Versions (3.1 and 3.2)
man pages /usr/cluster/man
log files/var/cluster/logs /var/adm/messages
Configuration files (CCR, eventlog, etc) /etc/cluster/
Cluster and other commands /usr/cluser/lib/sc
Version 3.1 Only
sccheck logs /var/cluster/sccheck/report.<date>
Cluster infrastructure file /etc/cluster/ccr/infrastructure
Page 2 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Version 3.2 Only
sccheck logs /var/cluster/logs/cluster_check/remote.<date>
Cluster infrastructure file /etc/cluster/ccr/global/infrastructure
Command Log /var/cluster/logs/commandlog
SCSI Reservations
Display reservation keys
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
scsi3: /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2
determine the device owner
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2
scsi3: /usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2
Command shortcuts
In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin
� shortcut
cldevice cldev
cldevicegroup cldg
clinterconnect clintr
clnasdevice clnas
clquorum clq
clresource clrs
clresourcegroup clrg
clreslogicalhostname clrslh
clresourcetype clrt
clressharedaddress clrssa
Shutting down and Booting a Cluster
Page 3 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
� 3.1 3.2
shutdown entire cluster
##other nodes in cluster scswitch -S -h <host> shutdown -i5 -g0 -y ## Last remaining node scshutdown -g0 -y
cluster shutdown -g0 -y
shutdown single node scswitch -S -h <host> shutdown -i5 -g0 -y
clnode evacuate <node> shutdown -i5 -g0 -y
reboot a node into non-cluster mode ok> boot -x ok> boot -x
Cluster information
� 3.1 3.2
Cluster scstat -pv cluster list -v cluster show cluster status
Nodes scstat –n clnode list -v clnode show clnode status
Devices scstat –D cldevice list cldevice show cldevice status
Quorum scstat –q clquorum list -v clquorum show clqorum status
Transport info scstat –W clinterconnect show clinterconnect status
Resources scstat –g clresource list -v clresource show clresource status
Resource Groups scsat -g scrgadm -pv
clresourcegroup list -v clresourcegroup show clresourcegroup status
Resource Types clresourcetype list -v clresourcetype list-props -v clresourcetype show
IP Networking Multipathing scstat –i clnode status -m
Installation info (prints packages and version) scinstall –pv clnode show-rev -v
Page 4 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Cluster Configuration
� 3.1 3.2
Release cat /etc/cluster/release
Integrity check sccheck cluster check
Configure the cluster (add nodes, add data services, etc)
scinstall
scinstall
Cluster configuration utility (quorum, data sevices, resource groups, etc)
scsetup clsetup
Rename cluster rename -c <cluster_name>
Set a property cluster set -p <name>=<value>
List
## List cluster commands cluster list-cmds ## Display the name of the cluster cluster list ## List the checks cluster list-checks ## Detailed configuration cluster show -t global
Status cluster status
Reset the cluster private network settings cluster restore-netprops <cluster_name>
Place the cluster into install mode cluster set -p installmode=enabled
Add a node scconf –a –T node=<host><host> clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2
Remove a node scconf –r –T node=<host><host> clnode remove
Prevent new nodes from entering scconf –a –T node=.
Put a node into maintenance state
scconf -c -q node=<node>,maintstate
Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.
Get a node out of maintenance state
scconf -c -q node=<node>,reset
Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.
Page 5 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Node Configuration
� 3.1 3.2
Add a node to the cluster
clnode add [-c <cluster>] [-n <sponsornode>] \ -e <endpoint> \ -e <endpoint> <node>
Remove a node from the cluster ## Make sure you are on the node you wish to remove clnode remove
Evacuate a node from the cluster scswitch -S -h <node> clnode evacuate <node>
Cleanup the cluster configuration (used after removing nodes)
clnode clear <node>
List nodes
## Standard list clnode list [+|<node>]
## Destailed list clnode show [+|<node>]
Change a nodes property clnode set -p <name>=<value> [+|<node>]
Status of nodes clnode status [+|<node>]
Admin Quorum Device Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.
� 3.1
Adding a SCSI device to the quorum
scconf –a –q globaldev=d11
Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace.
clquorum add [-t <type>] [-p <name>=<value>] [+|<devicename>]
Adding a NAS device to the quorum n/a clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevice>
Adding a Quorum Server n/a clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber>
Removing a device to the quorum scconf –r –q globaldev=d11 clquorum remove [-t <type>] [+|<devicename>]
Remove the last quorum device
## Evacuate all nodes ## Put cluster into maint mode scconf –c –q installmode ## Remove the quorum device
## Place the cluster in install modecluster set -p installmode=enabled ## Remove the quorum device clquorum remove <device>
Page 6 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
scconf –r –q globaldev=d11 ## Check the quorum devices scstat –q
## Verify the device has been removedclquorum list -v
List
## Standard list clquorum list -v [-t <type>] [-n <node>] [+|<devicename>]
## Detailed list clquorum show [-t <type>] [-n <node>]
## Status clquorum status [-t <type>] [-n <node>] [+|<devicename>]
Resetting quorum infoscconf –c –q reset
Note: this will bring all offline quorum devices online clquorum reset
Bring a quorum device into maintenance mode (3.2 known as enabled)
## Obtain the device number scdidadm –L scconf –c –q globaldev=<device>,maintstate
clquorum enable [-t <type>] [+|<devicename>]
Bring a quorum device out of maintenance mode (3.2 known as disabled)
scconf –c –q globaldev=<device><device>,reset clquorum disable [-t <type>] [+|<devicename>]
Device Configuration
� 3.1
Check device cldevice check [-n <node>] [+]
Remove all devices from node cldevice clear [-n <node>]
Monitoring
## Turn on monitoring cldevice monitor [-n <node>] [+|<device>]
## Turn off monitoring cldevice unmonitor [-n <node>] [+|<device>]
Rename cldevice rename -d <destination_device_name>
Replicate cldevice replicate [-S <source-node>]
Set properties of a device cldevice set -p default_fencing={global|pathcount|scsi3}
Status
## Standard display cldevice status [-s <state>] [-n <node>]
## Display failed disk paths cldevice status -s fail
Page 7 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Lists all the configured devices including paths across all nodes.
scdidadm –L
## Standard List cldevice list [-n <node>] [+|<device>] ## Detailed list cldevice show [-n <node>] [+|<device>]
List all the configured devices including paths on node only.
scdidadm –l see above
Reconfigure the device database, creating new instances numbers if required.
scdidadm –r cldevice populate cldevice refresh [-n <node>] [+]
Perform the repair procedure for a particular path (use then when a disk gets replaced)
scdidadm –R <c0t0d0s0> - device scdidadm –R 2 - device id
cldevice repair [-n <node>] [+|<device>]
Disks group
� 3.1
Create a device group n/a cldevicegroup create
Remove a device group n/a cldevicegroup delete <devgrp>
Adding scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true cldevicegroup add
Removing scconf –r –D name=<disk group> cldevicegroup remove
Set a property cldevicegroup set [
List scstat
## Standard list cldevicegroup list [
## Detailed configuration reportcldevicegroup show [
status scstat cldevicegroup status [
adding single node scconf -a -D type=vxvm,name=appdg,nodelist=<host> cldevicegroup add
Removing single node scconf –r –D name=<disk group>,nodelist=<host> cldevicegroup remove
Switch scswitch –z –D <disk group> -h <host> cldevicegroup switch
Put into maintenance mode scswitch –m –D <disk group> n/a
take out of maintenance mode scswitch -z -D <disk group> -h <host> n/a
onlining a disk group scswitch -z -D <disk group> -h <host> cldevicegroup online <devgrp>
offlining a disk group scswitch -F -D <disk group> cldevicegroup offline <devgrp>
Resync a disk group scconf -c -D name=appdg,sync cldevicegroup syn [
Transport Cable
Page 8 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
� 3.1
Add clinterconnect add <endpoint>,<endpoint>
Remove clinterconnect remove <endpoint>,<endpoint>
Enable scconf –c –m endpoint=<host>:qfe1,state=enabled clinterconnect enable [-n <node>] [+|<endpoint>,<endpoint>]
Disable scconf –c –m endpoint=<host>:qfe1,state=disabled Note: it gets deleted
clinterconnect disable [-n <node>] [+|<endpoint>,<endpoint>]
List scstat## Standard and detailed list clinterconnect show [-n <node>][+|<endpoint>,<endpoint>]
Status scstat clinterconnect status [-n <node>][+|<endpoint>,<endpoint>]
Resource Groups
� 3.1
Adding (failover) scrgadm -a -g <res_group> -h <host>,<host> clresourcegroup create <res_group>
Adding (scalable) clresourcegroup create -S <res_group>
Adding a node to a resource group clresourcegroup add-node -n <node> <res_group>
Removing scrgadm –r –g <group>
## Remove a resource group clresourcegroup delete <res_group>
## Remove a resource group and all its resourcesclresourcegroup delete -F <res_group>
Removing a node from a resource group clresourcegroup remove-node -n <node> <res_group>
changing properties scrgadm -c -g <resource group> -y <propety=value> clresourcegroup set -p Failback=true + <name=value>
Status scstat -g clresourcegroup status [-n <node>][-
Listing scstat –g clresourcegroup list [-n <node>][-r
Detailed List scrgadm –pv –g <res_group> clresourcegroup show [-n <node>][-r <resource][
Display mode type (failover or scalable) scrgadm -pv -g <res_group> | grep 'Res Group mode'
Offlining scswitch –F –g <res_group>
## All resource groups clresourcegroup offline +
## Individual group clresourcegroup offline [-n <node>] <res_group>
clresourcegroup evacuate [+|-n <node>]
Page 9 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Onlining scswitch -Z -g <res_group>
## All resource groups clresourcegroup online +
## Individual groups clresourcegroup online [-n <node>] <res_group>
Evacuate all resource groups from a node (used when shutting down a node)
clresourcegroup evacuate [+|-n <node>]
Unmanagingscswitch –u –g <res_group>
Note: (all resources in group must be disabled) clresourcegroup unmanage <res_group>
Managing scswitch –o –g <res_group> clresourcegroup manage <res_group>
Switching scswitch –z –g <res_group> –h <host> clresourcegroup switch -n <node> <res_group>
Suspend n/a clresourcegroup suspend [+|<res_group>]
Resume n/a clresourcegroup resume [+|<res_group>]
Remaster (move the resource group/s to their preferred node)
n/a clresourcegroup remaster [+|<res_group>]
Restart a resource group (bring offline then online)
n/a clresourcegroup restart [-n <node>]
Resources
� 3.1
Adding failover network resource scrgadm –a –L –g <res_group> -l <logicalhost> clreslogicalhostname create
Adding shared network resource scrgadm –a –S –g <res_group> -l <logicalhost> clressharedaddress create
adding a failover apache application and attaching the network resource
scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin
adding a shared apache application and attaching the network resource
scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin
Create a HAStoragePlus failover resource scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true
clresource create -t HAStorage -p FilesystemMountPoints=<mount-p Affinityon=true <rs-hasp>
Removingscrgadm –r –j res-ip
Note: must disable the resource firstclresource delete [-g <res_group>][
Page 10 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
changing or adding properties scrgadm -c -j <resource> -y <property=value>
## Changing clresource set -t <type>
## Adding clresource set -p <name>+=<value>
List scstat -g
clresource list [-g <res_group>][ ## List properties clresource list-props [-g <res_group>][
Detailed Listscrgadm –pv –j res-ip scrgadm –pvv –j res-ip
clresurce show [-n <node>] [
Status scstat -g clresource status [-s <state>][
Disable resoure monitor scrgadm –n –M –j res-ip clresource monitor [-n <node>] [
Enable resource monitor scrgadm –e –M –j res-ip clresource unmonitor [-n <node>] [
Disabling scswitch –n –j res-ip clresource disable <resource>
Enabling scswitch –e –j res-ip clresource enable <resource>
Clearing a failed resource scswitch –c –h<host>,<host> -j <resource> -f STOP_FAILED clresource clear -f STOP_FAILED <resource>
Find the network of a resource scrgadm –pvv –j <resource> | grep –I network
Removing a resource and resource group
## offline the group scswitch –F –g rgroup-1 ## remove the resource scrgadm –r –j res-ip ## remove the resource group scrgadm –r –g rgroup-1
## offline the group clresourcegroup offline <res_group> ## remove the resource clresource [-g <res_group>][ ## remove the resource group clresourcegroup delete <res_group>
Resource Types
� 3.1
Adding (register in 3.2) scrgadm –a –t <resource type> i.e SUNW.HAStoragePlus clresourcetype register <type>
Register a resource type to a node n/a clresourcetype add-node -
Deleting (remove in 3.2) scrgadm –r –t <resource type> clresourcetype unregister
Deregistering a resource type from a node n/a clresourcetype remove-node
Listing scrgadm –pv | grep ‘Res Type name’ clresourcetype list [<type>]
Listing resource type properties clresourcetype list-props
Show resource types clresourcetype show [<type>]
Page 11 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Set properties of a resource type clresourcetype set [-p <name>=<value>] <type>
Page 12 of 12Sun Cluster 3.2 - Cheat Sheet
9/15/2010http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm