Sun Cluster Commands

download Sun Cluster Commands

of 29

description

This Doc will help to manage the sun cluster

Transcript of Sun Cluster Commands

New command line cheat sheet

File locations

cluster 3.0/3.1cluster 3.2

man pages/usr/cluster/man/usr/cluster/man

log files/var/cluster/logs/var/adm/messages/var/cluster/logs/var/adm/messages

sccheck logs/var/cluster/sccheck/report./var/cluster/sccheck/report.

cluster check logsN/a/var/cluster/logs/cluster_check// (U2)

CCR files /etc/cluster/ccr/etc/cluster/ccr/

Cluster infrastructure file /etc/cluster/ccr/infrastructure/etc/cluster/ccr//infrastructure (U2)

SCSI Reservationscluster 3.0/3.1cluster 3.2

Display reservation keys scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2 scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

determine the device owner scsi2:/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2scsi2:/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Cluster information

cluster 3.0/3.1cluster 3.2

Quorum infoscstat -q clquorum show

Cluster componentsscstat -pv cluster show

Resource/Resource group statusscstat -g clrg show clrs show

IP Networking Multipathingscstat -i

Status of all nodesscstat -n clnode show

Disk device groupsscstat -D cldg show

Transport infoscstat -W clintr show

Detailed resource/resource groupscrgadm -pv clrs show -v clrg show -v

Cluster configuration infoscconf -p cluster show -v

Installation info (prints packages and version) scinstall -pv scinstall -pv

Cluster Configuration

cluster 3.0/3.1cluster 3.2

Integrity checksccheck cluster check (U2)

Configure the cluster (add nodes, add data services, etc)scinstall scinstall

Cluster configuration utility (quorum, data sevices, resource groups, etc) scsetup clsetup

Add a nodescconf -a -T node=

Remove a nodescconf -r -T node=

Prevent new nodes from enteringscconf -a -T node=.

Put a node into maintenance state scconf -c -q node=,maintstate

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node. clnode evacuate Note: use the clquorum status command to verify that the node is in maintenance mode, the vote count should be zero for that node.

Get a node out of maintenance state scconf -c -q node=,reset

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.clquorum resetNote: use the clquorum status command to verify that the node is in maintenance mode, the vote count should be one for that node.

Admin Quorum Device

Quorum devices are nodes, disk devices, and quorum servers. so the total quorum will be all nodes and devices added together.

cluster 3.0/3.1cluster 3.2

Adding a device to the quorum scconf -a -q globaldev=d11

Note: if you get the error message "unable to scrub device" use scgdevs to add device to the global device namespace. clquorum add deviceid

Note: if you get the error message "unable to scrub device" use cldevice to add device to the global device namespace.

Removing a device to the quorum scconf -r -q globaldev=d11 clquorum remove deviceid

Remove the last quorum device Evacuate all nodes

put cluster into maint mode #scconf -c -q installmode

remove the quorum device#scconf -r -q globaldev=d11

check the quorum devices#scstat-qcluster set -p installmode=enabled

clquorum remove

check the quorum devices

clquorum show

Resetting quorum infoscconf -c -q reset

Note: this will bring all offline quorum devices online clquorum reset

Note: this will bring all offline quorum devices online

Bring a quorum device into maintenance modeobtain the device number#scdidadm -L #scconf -c -q globaldev=,maintstate clquorum disable deviceid

Bring a quorum device out of maintenance modescconf -c -q globaldev=,reset clquorum enable deviceid

Device Configuration

cluster 3.0/3.1cluster 3.2

Lists all the configured devices including paths across all nodes. scdidadm -L cldevice list -v

List all the configured devices including paths on node only. scdidadm -l cldevice list -v -n

Reconfigure the device database, creating new instances numbers if required. scdidadm -r scdidadm -r

Lists all the configured devices including paths & fencing N/A cldevice show -v

Rename a did instance N/A cldevice rename -d

Clearing no longer used did scdidadm -C cldevice clear

Perform the repair procedure for a particular path (use then when a disk gets replaced) scdidadm -R - device scdidadm R 2 - device id cldevice repair device

Configure the global device namespacescgdevscldevice populate

Status of all disk pathsscdpm -p all:all

Note: (:) cldevice status

Monitor device pathscdpm m cldevice monitor -n

Unmonitor device pathscdpm u cldevice unmonitor -n

Device group

cluster 3.0/3.1cluster 3.2

Adding/Registeringscconf -a -D type=vxvm,name=appdg,nodelist=:,preferenced=truecldg create -t -n -d

Removingscconf -r -D name= cldg remove-node[-t -n

cldg remove-device -d

adding single node scconf -a -D type=vxvm,name=appdg,nodelist=cldg add-node -t -n

Removing single nodescconf -r -D name=,nodelist= cldg remove-node -t -n

Switchscswitch -z -D -h cldg switch -t -n

Put into maintenance modescswitch -m -D cldg disable -t

take out of maintenance mode scswitch -z -D -h cldg enable -t

onlining a device group scswitch -z -D -h cldg online -t -n

offlining a device group scswitch -F -D cldg offline -t

Resync a device group scconf -c -D name=appdg,sync cldg sync -t

Transport cable

cluster 3.0/3.1cluster 3.2

Enablescconf c m endpoint=:qfe1,state=enabled clintr enable :

Disable scconf c m endpoint=:qfe1,state=disabled

Note: it gets deletedclintr disable :

Resource Groups

Addingscrgadm -a -g -h ,clrg create -n ,

Removingscrgadm -r -g clrg delete

changing properties scrgadm -c -g -y clrg set -p

Listingscstat -g clrg show

Detailed Listscrgadm -pv -g Clrg show -v

Display mode type (failover or scalable) scrgadm -pv -g | grep 'Res Group mode' Clrg show -v

Offliningscswitch -F -g clrg offline

Onliningscswitch -Z -g clrg online

Unmanagingscswitch -u -g

Note: (all resources in group must be disabled) clrg unmanage

Note: (all resources in group must be disabled)

Managingscswitch -o -g clrg manage

Switchingscswitch -z -g -h clrg switch -n -h

Resources

cluster 3.0/3.1cluster 3.2

Adding failover network resourcescrgadm a L g -l clreslogicalhostname create -g

Adding shared network resourcescrgadm a S g -l clressharedaddress create -g

adding a failover apache application and attaching the network resourcescrgadm a j apache_res -g \ -t SUNW.apache -y Network_resources_used = -y Scalable=False y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

adding a shared apache application and attaching the network resourcescrgadm a j apache_res -g \ -t SUNW.apache -y Network_resources_used = -y Scalable=True y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

Create a HAStoragePlus failover resource scrgadm -a -g -j -t SUNW.HAStoragePlus \-x FileSystemMountPoints=/oracle/data01 -x Affinityon=trueclresource create -g -t SUNW.HAStoragePlus \ -p FilesystemMountPoints=/test2 -x Affinityon=true

Removingscrgadm r j

Note: must disable the resource firstclresource delete

Note: must disable the resource first

changing properties scrgadm -c -j -y clresource set -p

Listscstat -gclresource list

Detailed Listscrgadm pv j scrgadm pvv j clresource list -v

Disable resoure monitorscrgadm n M j clresource unmonitor

Enable resource monitorscrgadm e M j clresource monitor

Disabling scswitch n j clresource disable

Enablingscswitch e j clresource enable

Clearing a failed resourcescswitch -c -h, -j -f STOP_FAILED clrs clear -f STOP_FAILED

Find the network of a resource scrgadm pvv j | grep I networkscrgadm pvv j | grep I network

Removing a resource and resource groupoffline the group: scswitch F g

remove the resource: scrgadm r j

remove the resource group :scrgadm r g clrg offline

clrs delete

clrg delete

Resource Types

cluster 3.0/3.1cluster 3.2

Addingscrgadm a t i.e SUNW.HAStoragePlusclrt register

Deleting scrgadm r t set the RT_SYSTEM property on the RT to falseclrt unregister set the RT_SYSTEM property on the RT to false

Listing scrgadm pv | grep Res Type name clrt list

Sun Cluster 3.1 Cheat Sheet:

Sun Cluster 3.1 cheat sheet DaemonsclexecdThis is used by cluster kernel threads to executeuserland commands (such as the run_reserve and dofsckcommands). It is also used to run cluster commands remotely (like the cluster shutdown command).This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.

cl_ccradThis daemon provides access from userland management applications to the CCR.It is automatically restarted if it is stopped.

cl_eventdThe cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events.The daemon is automatically respawned if it is killed.

cl_eventlogdcluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.

failfastdThis daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed

rgmdThe resource group management daemon which manages the state of all cluster-unaware applications.A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.fedThis is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.pmfdThis is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.

pnmdPublic managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.

scdpmdDisk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.

File locations

man pages/usr/cluster/man

log files/var/cluster/logs/var/adm/messages

sccheck logs/var/cluster/sccheck/report.

CCR files /etc/cluster/ccr

Cluster infrastructure file /etc/cluster/ccr/infrastructure

SCSI ReservationsDisplay reservation keys scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

determine the device owner scsi2:/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Cluster informationQuorum infoscstat q

Cluster componentsscstat -pv

Resource/Resource group statusscstat g

IP Networking Multipathingscstat i

Status of all nodesscstat n

Disk device groupsscstat D

Transport infoscstat W

Detailed resource/resource groupscrgadm -pv

Cluster configuration infoscconf p

Installation info (prints packages and version) scinstall pv

Cluster ConfigurationIntegrity checksccheck

Configure the cluster (add nodes, add data services, etc)scinstall

Cluster configuration utility (quorum, data sevices, resource groups, etc) scsetup

Add a nodescconf a T node=

Remove a nodescconf r T node=

Prevent new nodes from enteringscconf a T node=.

Put a node into maintenance state scconf -c -q node=,maintstate

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.

Get a node out of maintenance state scconf -c -q node=,reset

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.

Admin Quorum DeviceQuorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together.You can use the scsetup GUI interface to add/remove quorum devices or use the below commands.

Adding a device to the quorum scconf a q globaldev=d11

Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace.

Removing a device to the quorum scconf r q globaldev=d11

Remove the last quorum device Evacuate all nodes

put cluster into maint mode #scconf c q installmode

remove the quorum device#scconf r q globaldev=d11

check the quorum devices#scstat q

Resetting quorum infoscconf c q reset

Note: this will bring all offline quorum devices online

Bring a quorum device into maintenance modeobtain the device number#scdidadm L #scconf c q globaldev=,maintstate

Bring a quorum device out of maintenance modescconf c q globaldev=,reset

Device Configuration

Lists all the configured devices including paths across all nodes. scdidadm L

List all the configured devices including paths on node only. scdidadm l

Reconfigure the device database, creating new instances numbers if required. scdidadm r

Perform the repair procedure for a particular path (use then when a disk gets replaced) scdidadm R - device scdidadm R 2 - device id

Configure the global device namespacescgdevs

Status of all disk pathsscdpm p all:all Note: (:)

Monitor device pathscdpm m

Unmonitor device pathscdpm u

Disks group

Adding/Registeringscconf -a -D type=vxvm,name=appdg,nodelist=:,preferenced=true

Removingscconf r D name=

adding single node scconf -a -D type=vxvm,name=appdg,nodelist=

Removing single nodescconf r D name=,nodelist=

Switchscswitch z D -h

Put into maintenance modescswitch m D

take out of maintenance mode scswitch -z -D -h

onlining a disk group scswitch -z -D -h

offlining a disk group scswitch -F -D

Resync a disk group scconf -c -D name=appdg,sync

Transport cableEnablescconf c m endpoint=:qfe1,state=enabled

Disable scconf c m endpoint=:qfe1,state=disabled

Note: it gets deleted

Resource GroupsAddingscrgadm -a -g -h ,

Removingscrgadm r g

changing properties scrgadm -c -g -y

Listingscstat g

Detailed Listscrgadm pv g

Display mode type (failover or scalable) scrgadm -pv -g | grep 'Res Group mode'

Offliningscswitch F g

Onliningscswitch -Z -g

Unmanagingscswitch u g

Note: (all resources in group must be disabled)

Managingscswitch o g

Switchingscswitch z g h

Resources

Adding failover network resourcescrgadm a L g -l

Adding shared network resourcescrgadm a S g -l

adding a failover apache application and attaching the network resourcescrgadm a j apache_res -g \ -t SUNW.apache -y Network_resources_used = -y Scalable=False y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

adding a shared apache application and attaching the network resourcescrgadm a j apache_res -g \ -t SUNW.apache -y Network_resources_used = -y Scalable=True y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin

Create a HAStoragePlus failover resource scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \> -x FileSystemMountPoints=/oracle/data01 \> -x Affinityon=true

Removingscrgadm r j res-ip

Note: must disable the resource first

changing properties scrgadm -c -j -y

Listscstat -g

Detailed Listscrgadm pv j res-ipscrgadm pvv j res-ip

Disable resoure monitorscrgadm n M j res-ip

Enable resource monitorscrgadm e M j res-ip

Disabling scswitch n j res-ip

Enablingscswitch e j res-ip

Clearing a failed resourcescswitch c h, -j -f STOP_FAILED

Find the network of a resource # scrgadm pvv j | grep I network

Removing a resource and resource groupoffline the group# scswitch F g rgroup-1

remove the resource # scrgadm r j res-ip

remove the resource group # scrgadm r g rgroup-1

Resource Types

Addingscrgadm a t i.e SUNW.HAStoragePlus

Deleting scrgadm r t

Listing scrgadm pv | grep Res Type name

Sun Cluster 3.1 Command Line Cheat Sheet

This page describes basic Sun Cluster 3.1 administration commands, including the following command line interface commands: scrgadm, scswitch, scconf, scstat, and scdpm.

scrgadm----- Manage registration and unregistration of resource types, resource groups, and resources -----ActionObjectWhoWhatHosts

-a(add)-g(group)-h(physicalhost)-x(extensionparam=value)-l(logicalhost)

-c(change)-j(resource)-fregistration-filepath-y(param=value)

-r(remove)-t(resourcetype)

-p[vv](print)v=stateofresourcesvv=parametervalues-L(forLogicalhostnames)-nnetiflist

-S(forSharedAddresses)-nnetiflist-Xauxnodelist

Examples

Register a resource type named MyDatabase:

# scrgadm -a -t MyDatabase

Create a failover resource group named MyDatabaseRG:

# scrgadm -a -g MyDatabaseRG

Create a scalable resource group named MyWebServerRG:

# scrgadm -a -g MyWebServerRG \-y Maximum_primaries=integer \-y Desired_primaries=integerCreate a resource of a given type in a resource group:

# scrgadm -a -j resource-name -t resource-type-name -g RG-nameSee the rg_properties(5) man page for a description of the resource group properties. See the r_properties(5) man page for a description of the resource properties.

scconf----- Update the Sun Cluster software configuration -----ActionObjectWhat

-a(add)-C(clusteroption)cluster=clustername

-c(change)-A(adapter)add: trtype=type,name=name,node=node[,other-options] change name=name,node=node[,state=state][,other-options_remove name=_name,node=node

-r(remove)-B(transport junction)add: type=type,name=name[,other-options] change name=name[,state=state][,other-options] remove name=name

-p[vv](print)-m(cable endpoint)add: endpoint=[node:]name[@port],endpoint=[node:]name[@port][,noenable] change: endpoint=[node:]name[@port],state=state remove: endpoint=[node:] name[@port]

-P(private net name)add: node=node[,privatehostname=node[,privatehostname=hostalias]

-q(quorum)add: globaldev=devicename[,node=node,node=node[,...]] change: node=node,{maintstate | reset]} change globaldev= devicename,{maintstate | reset} change:resetchange: installmoderemove: globaldev=devicename

-D(devicegroup)add: type=type,name=name[,nodelist=node [: node]...][,preferenced={true | false}][,failback={enabled | disabled}][,otheroptions ] change name=name [,nodelist=node[:node]...][,preferenced={true | false}][,failback={enabled | disabled}][,other-options ] removename=name [,nodelist=node[:node]...]

-T(authentication)add: node=nodename[,...][,authtype=authtype] change authtype=authtype remove {node=nodename [,...] | all}

-h(nodes)add: node=nodename remove node=nodename or nodeID

Examples

Register a new disk group:

# scconf -a -D type=vxvm,name=new-disk-group,nodelist=nodex:nodex

Synchronize device group information after adding a volume:

# scconf -c -D name=diskgroup,sync

Add a shared quorum device to the cluster:

# scconf -a -q globaldev=nodename

Clear "installmode":

# scconf -c -q reset

Configure a second set of cluster transport connections:

# scconf -a \-A trtype=transport,name=ifname1,node=nodename1 \-A trtype=transport,name=ifname2,node=nodename2 \-m endpoint=nodename1:ifname1,endpoint=nodename2:ifname2

Secure the cluster against other machines that might attempt to add themselves to the cluster:

# scconf -a -T node=.

scswitch----- Perform ownership/state change of resource groups and disk device groups in Sun Cluster configurations -----ActionObjectWhoSpecial

-z(bring online)-g (resource group)-D(device group)-h(targethost)-h ""(noreceiver)takes resourcegroupoffline

-Z(bring everything online)-g(resource group)no-gbrings all resource groups online

-F(take offline for all nodes)-g(resource group)-D(device group)

-S(switch all DG and RG)-h(losing host)-K #specifies the number of seconds to keep resource groups from switching back onto a node after that node has been successfully evacuated. Default is 60 seconds, and can be set up to 65535 seconds. Starting with Sun Cluster 3.1 Update 3.

-R(restart all RG)-h(targethost)

-m(set maintenance mode)-D(device group)

-u(unmanage RG)-g(resource group)-Mdisables monitoring only

-o(online RG)-g(resource group)-Mdisables monitoring only

-e(enable resource)-j(resource)

-n(disable resource)-j(resource)

-c(clearflagSTOP_FAILED)-j(resource)-f(flag name)-h(targethost)

-Q(quiesce resource group (starting in Sun Cluster 3.1 update 3))-g(resource group)

Examples

Switch over resource-grp-2 to be mastered by node1:# scswitch -z -h node1 -g resource-grp-2

Switch over resource-grp-3, a resource group configured to have multiple primaries, to be mastered by node1, node2, node3:

# scswitch -z -h node1,node2,node3 -g resource-grp-3

Switch all managed resource groups online on their most preferred node or nodes:

# scswitch -z

Quiesce resource-grp-2. Stops RG from continuously bouncing around from one node to another in the event of the failure of a START or STOP method:

# scswitch -Q -g resource-group-2

Switch over all resource groups and disk device groups from node1 to a new set of primaries:

# scswitch -S -h node1

Restart some resource groups on specified nodes:

node1# scswitch -R -h node1,node2 -g resource-grp-1,resource-grp-2

Disable some resources:

# scswitch -n -j resource-1,resource-2

Enable a resource:

# scswitch -e -j resource-1

Take resource groups to the unmanaged state:

# scswitch -u -g resource-grp-1,resource-grp-2

Take resource groups to the online state:

# scswitch -o -g resource-grp-1,resource-grp-2

Switch over device-group-1 to be mastered by node2:# scswitch -z -h node2 -D device-group-1

Put device-group-1 into maintenance mode:

# scswitch -m -D device-group-1

Move all resource groups and disk device groups persistently off of a node:

# scswitch -S -h iloveuamaya -K 120

This situation arises when resource groups attempt to switch back automatically when strong negative affinities have been configured (with RG_affinities).

scstat----- Monitor the status of Sun Cluster -----ActionOptions

-D(shows status for all disk groups)

-g(shows status for all resource groups)-h host(show status of all components related to specified host)

-i(shows status for all IPMP groups and public network adapters)

-n(shows status for all nodes)

-h host(shows status for the specified node)

-p(shows status for all components in the cluster)-v[v]verbose output

-q(shows status for all device quorums and node quorums)

-W(shows status for cluster transport path)

Examples

Show status of all resource groups followed by the status of all components related to node1:

# scstat -g -h node1

or:

# scstat -g

and

# scstat -h node1

scdpmAvailable starting in Sun Cluster 3.1 update 3.

----- Disk-path monitoring administration command -----ActionWhat

-m(monitor the new disk path that is specified by node:disk path. All is default option)[node] [node:all] [node:/dev/dis/rdsk/d-]

-u(unmonitor a disk path. The daemon on each node stops monitoring the specified path. All is default option)[node] [node:all] [node:/dev/dis/rdsk/d-]

-p [-F](print current status of a specified disk path from all the nodes that are attached to the storage. All is default option. The -F option prints only faulty disk paths)[node] [node:all] [node:/dev/dis/rdsk/d-]

-f filename(read the list of disk paths to monitor or unmonitor for a specified file name.)

Examples

Force daemon to monitor all disk paths in the cluster infrastructure:

# scdpm -m all

Monitor a new path on all nodes where path is valid:

# scdpm -m /dev/did/dsk/d3

Monitor new paths on just node1:

# scdpm -m node1:d4 -m node1:d5

Print all disk paths in the cluster and their status:

# scdpm -p all:all

Print all failed disk paths:

# scdpm -p -F all

Print the status of all disk paths from node1:

# scdpm -p node1:all

Sun Cluster 3.2 Cheat Sheet:

Sun Cluster Cheat Sheet

This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS devices, etc

Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one

scsetup (3.1)

clsetup (3.2)

Also all the commands in version 3.1 are available to version 3.2

Daemons and ProcessesAt the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain informtion on most of them but still looking for others.

Versions 3.1 and 3.2

clexecdThis is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsckcommands). It is also used to run cluster commands remotely (like the cluster shutdown command).This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.

cl_ccradThis daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped.

cl_eventdThe cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events.The daemon is automatically respawned if it is killed.

cl_eventlogdcluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.

failfastdThis daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed

rgmdThe resource group management daemon which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.fedThis is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.

rpc.pmfdThis is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.

pnmdPublic managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.

scdpmdDisk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.

Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.

Version 3.2 only

qd_userdThis daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device

cl_execd

ifconfig_proxy_serverd

rtreg_proxy_serverd

cl_pnmdis a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups.

scprivipdThis daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones.

sc_zonesdThis daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure

cznetdIt is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file.

rpc.fedThis is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds

scqdmdThe quorum server daemon, this possibly use to be called "scqsd"

pnm mod serverd

File locations

Both Versions (3.1 and 3.2)

man pages/usr/cluster/man

log files/var/cluster/logs/var/adm/messages

Configuration files (CCR, eventlog, etc) /etc/cluster/

Cluster and other commands /usr/cluser/lib/sc

Version 3.1 Only

sccheck logs/var/cluster/sccheck/report.

Cluster infrastructure file /etc/cluster/ccr/infrastructure

Version 3.2 Only

sccheck logs/var/cluster/logs/cluster_check/remote.

Cluster infrastructure file /etc/cluster/ccr/global/infrastructure

Command Log /var/cluster/logs/commandlog

SCSI ReservationsDisplay reservation keys scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

determine the device owner scsi2:/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3:/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Command shortcutsIn version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin

shortcut

cldevicecldev

cldevicegroupcldg

clinterconnectclintr

clnasdeviceclnas

clquorumclq

clresourceclrs

clresourcegroupclrg

clreslogicalhostnameclrslh

clresourcetypeclrt

clressharedaddressclrssa

Shutting down and Booting a Cluster 3.13.2

shutdown entire cluster ##other nodes in clusterscswitch -S -h shutdown -i5 -g0 -y

## Last remaining nodescshutdown -g0 -y cluster shutdown -g0 -y

shutdown single node scswitch -S -h shutdown -i5 -g0 -y clnode evacuate shutdown -i5 -g0 -y

reboot a node into non-cluster mode ok> boot -x ok> boot -x

Cluster information3.13.2

Clusterscstat -pv cluster list -v cluster showcluster status

Nodesscstat n clnode list -v clnode showclnode status

Devicesscstat D cldevice listcldevice showcldevice status

Quorumscstat q clquorum list -vclquorum showclqorum status

Transport infoscstat W clinterconnect showclinterconnect status

Resourcesscstat g clresource list -v clresource showclresource status

Resource Groups scsat -g scrgadm -pv clresourcegroup list -v clresourcegroup showclresourcegroup status

Resource Types clresourcetype list -vclresourcetype list-props -vclresourcetype show

IP Networking Multipathingscstat i clnode status -m

Installation info (prints packages and version) scinstall pv clnode show-rev -v

Cluster Configuration

3.13.2

Releasecat /etc/cluster/release

Integrity checksccheck cluster check

Configure the cluster (add nodes, add data services, etc)scinstall scinstall

Cluster configuration utility (quorum, data sevices, resource groups, etc) scsetup clsetup

Renamecluster rename -c

Set a property cluster set -p =

List## List cluster commandscluster list-cmds

## Display the name of the clustercluster list

## List the checkscluster list-checks

## Detailed configurationcluster show -t global

Statuscluster status

Reset the cluster private network settingscluster restore-netprops

Place the cluster into install mode cluster set -p installmode=enabled

Add a nodescconf a T node=clnode add -c -n -e endpoint1,endpoint2 -e endpoing3,endpoint4

Remove a nodescconf r T node=clnode remove

Prevent new nodes from enteringscconf a T node=.

Put a node into maintenance state scconf -c -q node=,maintstate

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.

Get a node out of maintenance state scconf -c -q node=,reset

Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.

Node Configuration3.13.2

Add a node to the cluster clnode add [-c ] [-n ] \-e \-e

Remove a node from the cluster ## Make sure you are on the node you wish to remove clnode remove

Evacuate a node from the cluster scswitch -S -h clnode evacuate

Cleanup the cluster configuration (used after removing nodes) clnode clear

List nodes ## Standard list clnode list [+|]

## Destailed listclnode show [+|]

Change a nodes property clnode set -p = [+|]

Status of nodes clnode status [+|]

Admin Quorum DeviceQuorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.

3.13.2

Adding a SCSI device to the quorum scconf a q globaldev=d11

Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace. clquorum add [-t ] [-p =] [+|]

Adding a NAS device to the quorum n/aclquorum add -t netapp_nas -p filer=,lun_id=

Adding a Quorum Server n/aclquorum add -t quorumserver -p qshost,port=

Removing a device to the quorum scconf r q globaldev=d11 clquorum remove [-t ] [+|]

Remove the last quorum device ## Evacuate all nodes

## Put cluster into maint mode scconf c q installmode

## Remove the quorum devicescconf r q globaldev=d11

## Check the quorum devicesscstat q## Place the cluster in install modecluster set -p installmode=enabled

## Remove the quorum deviceclquorum remove

## Verify the device has been removedclquorum list -v

List## Standard list clquorum list -v [-t ] [-n ] [+|]

## Detailed list clquorum show [-t ] [-n ] [+|]

## Statusclquorum status [-t ] [-n ] [+|]

Resetting quorum infoscconf c q reset

Note: this will bring all offline quorum devices online clquorum reset

Bring a quorum device into maintenance mode (3.2 known as enabled) ## Obtain the device numberscdidadm L scconf c q globaldev=,maintstate clquorum enable [-t ] [+|]

Bring a quorum device out of maintenance mode (3.2 known as disabled) scconf c q globaldev=,reset clquorum disable [-t ] [+|]

Device Configuration

3.13.2

Check devicecldevice check [-n ] [+]

Remove all devices from node cldevice clear [-n ]

Monitoring## Turn on monitoringcldevice monitor [-n ] [+|]

## Turn off monitoringcldevice unmonitor [-n ] [+|]

Renamecldevice rename -d

Replicatecldevice replicate [-S ] -D [+]

Set properties of a device cldevice set -p default_fencing={global|pathcount|scsi3} [-n ]

Status## Standard displaycldevice status [-s ] [-n ] [+|]

## Display failed disk pathscldevice status -s fail

Lists all the configured devices including paths across all nodes. scdidadm L ## Standard List cldevice list [-n ] [+|]

## Detailed listcldevice show [-n ] [+|]

List all the configured devices including paths on node only. scdidadm l see above

Reconfigure the device database, creating new instances numbers if required. scdidadm r cldevice populate cldevice refresh [-n ] [+]

Perform the repair procedure for a particular path (use then when a disk gets replaced) scdidadm R - device scdidadm R 2 - device id cldevice repair [-n ] [+|]

Disks group3.13.2

Create a device group n/acldevicegroup create -t vxvm -n -p failback=true

Remove a device group n/acldevicegroup delete

Addingscconf -a -D type=vxvm,name=appdg,nodelist=:,preferenced=truecldevicegroup add-device -d

Removingscconf r D name= cldevicegroup remove-device -d

Set a property cldevicegroup set [-p =] [+|]

Listscstat## Standard listcldevicegroup list [-n ] [-t ] [+|]

## Detailed configuration reportcldevicegroup show [-n ] [-t ] [+|]

statusscstatcldevicegroup status [-n ] [-t ] [+|]

adding single node scconf -a -D type=vxvm,name=appdg,nodelist=cldevicegroup add-node [-n ] [-t ] [+|]

Removing single nodescconf r D name=,nodelist= cldevicegroup remove-node [-n ] [-t ] [+|]

Switchscswitch z D -h cldevicegroup switch -n

Put into maintenance modescswitch m D n/a

take out of maintenance mode scswitch -z -D -h n/a

onlining a disk group scswitch -z -D -h cldevicegroup online

offlining a disk group scswitch -F -D cldevicegroup offline

Resync a disk group scconf -c -D name=appdg,sync cldevicegroup syn [-t ] [+|]

Transport Cable3.13.2

Addclinterconnect add ,

Removeclinterconnect remove ,

Enablescconf c m endpoint=:qfe1,state=enabled clinterconnect enable [-n ] [+|,]

Disable scconf c m endpoint=:qfe1,state=disabled

Note: it gets deletedclinterconnect disable [-n ] [+|,]

Listscstat## Standard and detailed list clinterconnect show [-n ][+|,]

Statusscstatclinterconnect status [-n ][+|,]

Resource Groups3.13.2

Adding (failover) scrgadm -a -g -h ,clresourcegroup create

Adding (scalable)clresourcegroup create -S

Adding a node to a resource group clresourcegroup add-node -n

Removingscrgadm r g ## Remove a resource group clresourcegroup delete

## Remove a resource group and all its resourcesclresourcegroup delete -F

Removing a node from a resource group clresourcegroup remove-node -n

changing properties scrgadm -c -g -y clresourcegroup set -p Failback=true +

Statusscstat -g clresourcegroup status [-n ][-r