Group and Resource Failure Problems

download Group and Resource Failure Problems

of 5

Transcript of Group and Resource Failure Problems

  • 8/8/2019 Group and Resource Failure Problems

    1/5

    Group and resource failure problems

    What problem are you having?

    A resource fails, but is not brought back online.

    You cannot bring a resource online.

    You cannot bring the default physical disk resource online in Cluster Administrator.

    In Disk Management, you do not see the disk for the group that is online on that node.

    You are unable to manually move a group, or it does not fail over to another node when it is supposed to.

    A group failed over but did not fail back.

    The entire group failed and has not restarted.

    All nodes are functioning, but resources fail back repeatedly.

    The Cluster service does not successfully fail over resources.

    You fail over a resource group from one node to another, but it automatically fails back.

    The Network Name resource fails when you change to a system locale that is different than the input

    language used by the Network Name resource.

    The Message Queuing resource fails to handle message activity correctly which may result in resource

    failures.

    A third-party resource fails to come online in a mixed-version cluster or while upgrading a cluster.

    A resource fails, but is not brought back online.

    Cause: A resource may depend on another resource that has failed.

    Solution: In the resource Properties dialog box, make sure that the Do not restart check box is clear. If theresource needs another resource to function, and if the second resource fails, confirm that the dependencies arecorrectly configured.

    You cannot bring a resource online.

    Cause: The resource is not properly installed.

    Solution: Make sure the application or service associated with the resource is properly installed.

    Cause: The resource is not properly configured.

    Solution: Make sure the properties are set correctly for the resource.

    Cause: The resource is not compatible with server clusters.

    Solution: Not all applications can be configured to fail over in a cluster. For more information, see Choosing

    applications to run on a server cluster.

    Cause: The resource is generating a specific error.

    http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_1http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_2http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_2http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_3http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_4http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_5http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_6http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_7http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_7http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_8http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_9http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_10http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_11http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_11http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_12http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_12http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_12http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_13http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_13http://technet.microsoft.com/en-us/library/cc736879(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc736879(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_2http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_3http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_4http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_5http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_6http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_7http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_8http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_9http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_10http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_11http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_11http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_12http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_12http://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_13http://technet.microsoft.com/en-us/library/cc736879(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc736879(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc758952(WS.10).aspx#BKMK_1
  • 8/8/2019 Group and Resource Failure Problems

    2/5

    Solution: Review the system Event Log (look for ClusSvc entries under the Source column) to see if that resourceis generating a specific error message.

    You cannot bring the default physical disk resource online in Cluster Administrator.

    Most cluster configuration problems result from improper configuration of the shared storage bus or the restart ofservers.

    Cause: You may not have restarted the servers after installing the Cluster service.

    Solution: Make sure that you restarted all servers after installing the Cluster service.

    When the servers are restarted, the signature of each disk in the cluster storage is read, and the registries areupdated with the signature information.

    Cause: There may be hardware errors or transport problems.

    Solution: Make sure that there are no hardware errors or transport problems.

    Using Event Viewer (on the Start menu, under Programs and Administrative Tools (Common)), look in theevent log for disk I/O error messages or indications of problems with the communications transport.

    Cause: You may not have waited long enough for the registries to be updated.

    Solution: Make sure that you waited long enough for the registries to be updated.

    Cluster Administrator takes a backup of the registry when it starts up. However, it can take up to a minute afterthe second server restarts for the disk signatures to be written to the registries. Wait a minute, and then clickRefresh.

    Cause: One or more adapters on the shared storage bus are configured incorrectly.

    Solution: Make sure that the adapters are configured correctly.

    Cause: The shared storage bus exceeds the maximum cable length.

    Solution: Make sure that the shared storage bus does not exceed the maximum cable length.

    Cause: The disk is not supported.

    Solution: Make sure that the disk hardware or firmware revision level is not outdated.

    Cause: The bus adapter is not supported, or the adapter hardware or firmware revision level is outdated.

    Solution: Make sure that the bus adapter is supported, and that the adapter hardware or firmware revision levelis current.

    Cause: If you move your storage bus adapter to another I/O slot, add or remove bus adapters, or install a newversion of the bus adapter driver, the cluster software may not be able to access disks on your shared storage bus

    Solution: To accommodate these changes, make sure that your shared storage bus adapter has been properlyreconfigured.

    Cause: The operating system is incorrectly configured to access the shared storage bus.

    Solution: Verify that the operating system can detect the shared storage bus adapter.

    In Disk Management, you do not see the disk for the group that is online on that node.Where?

    Computer Management/Storage/Disk Management

    Cause: You may not be looking at the right disks.

    Solution: Make sure that you are looking at the right disks.

  • 8/8/2019 Group and Resource Failure Problems

    3/5

  • 8/8/2019 Group and Resource Failure Problems

    4/5

    All nodes are functioning, but resources fail back repeatedly.

    Cause: Power may be intermittent or failing.

    Solution: Ensure that your power is not intermittent or failing. You can correct this by using an uninterruptablepower supply (UPS) or, if possible, by changing power companies.

    The Cluster service does not successfully fail over resources.

    Cause: Cluster storage device is not properly configured.

    Solution: Verify that the cluster storage device is properly configured and that all cables are properly connected.

    You fail over a resource group from one node to another, but it automatically fails back.

    Cause: One or more resources fail to come online on the new node.

    Solution: Use a process of elimination to determine which resource is failing to come online. For moreinformation, see article Q303431, "Explanation of Why Server Clusters Do Not Verify that Resources will WorkProperly on All Nodes" in theMicrosoft Knowledge Base.

    The Network Name resource fails when you change to a system locale that is different than the inputlanguage used by the Network Name resource.

    Cause: The system locale must be the same on all nodes of a cluster and on the computer used to connect to thecluster.

    Solution: Change the system locale. For more information, see Connect to a cluster with Cluster Administrator.

    The Message Queuing resource fails to handle message activity correctly which may result in resourcefailures.

    Cause: Each instance of Message Queuing on a server maps 4 MB of the system view space when handlingmessage activity. This results in a default limit of three active, working instances of Message Queuing on a cluster

    node. In a server cluster with three Message Queuing resources, a node could have four concurrent MessageQueuing services running (the service running on the local node plus the three services associated with theMessage Queuing resources.) In this scenario, message activity could be limited, resulting in resource failures.

    Solution: Increase the system view space memory pool on each node of a server cluster with three or moreMessage Queuing resources. (We also recommend that you increase the system view space memory pool even for

    nodes running fewer than three Message Queuing resources.)

    Open Registry Editor.

    Open the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session

    Manager\Memory Management.

    Create a new DWORD value called SystemViewSize.

    Calculate and enter the data for this DWORD value using the following formula: (16 + (the number of

    Message Queuing resources x 4)).

    For example, the calculation result for a cluster with three Message Queuing resources is 28.

    Reboot each node.

    A third-party resource fails to come online in a mixed-version cluster or while upgrading a cluster.

    http://go.microsoft.com/fwlink/?LinkId=4441http://go.microsoft.com/fwlink/?LinkId=4441http://technet.microsoft.com/en-us/library/cc736545(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc736545(WS.10).aspxhttp://go.microsoft.com/fwlink/?LinkId=4441http://technet.microsoft.com/en-us/library/cc736545(WS.10).aspx
  • 8/8/2019 Group and Resource Failure Problems

    5/5

    Cause: If a resource uses a cryptographic provider not supplied by Microsoft to export (encrypt) and import(decrypt) resource data (cluster and cluster application cryptographic checkpoints), the default encryption key

    lengths may be different in the Windows 2000 and the Windows Server 2003 family operating systems. The resultis that the resource might fail to come online and the cluster and event logs might contain cryptographiccheckpoint synchronization errors for that resource.

    Solution: Use the cluster.exe "CSP" private property to set the key length and effective key length for the third-party cryptographic provider that encrypts and decrypts data for the failing resource type.

    Open Command Prompt.

    Type clusterClusterName"CSP"=key_length,effective_key_length :MULTISTR

    ClusterName is the name of the cluster, CSPis the name of the cryptographic provider, and key_length

    and effective_key_length are the key and effective key lengths for the RC2 encryption algorithm, in bits.

    For more information on using cluster.exe, see Cluster.

    Depending on the resource, either bring the resource online or recreate the resource to add the new

    cryptographic checkpoint.

    Note

    Review the documentation for your cryptographic provider to obtain valid values for the following RC2

    encryption algorithm parameters: key_length and effective_key_length. Also review the cryptographic

    provider documentation for the correct procedure for adding the cryptographic checkpoint.

    For information about how to obtain product support, see Technical support options.

    http://technet.microsoft.com/en-us/library/cc781201(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc772641(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc781201(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc772641(WS.10).aspx