&OXVWHU￿1HWZRUN￿&RPPXQLFDWLRQV

For proper functioning of the cluster, the two PowerEdge systems must be able to communicate with one another. For instance, this communication includes the exchange of “heartbeat” messages, whereby the two nodes inquire about each other’s status, or “health,” and acknowledge all such inquiries.

To verify network communications between the nodes, open a command prompt on each node. Type ipconfig /all at the prompt, and press <Enter> to observe all known IP addresses on each local node. From each remote computer, issue the ping command to test the responsiveness of each IP address. Perform the same check with the cluster IP address and the IP address for each disk recovery group. Also check the cluster name and the name of each disk recovery group (if any).

&OXVWHU￿6HUYLFH

The Cluster Service performs most of the cluster functionality, including membership management, communication management, and fail-over management. When the Cluster Server is properly installed, the Cluster Service starts on each node and acti- vates automatically in the event that one of the nodes fails or goes off-line.

To verify that the Cluster Service is running on a node, click the Start button, point to Settings, and then click Control Panel. Double-click the Services icon. The Cluster Service is indicated in the dialog box. Check to make sure that the Cluster Service is running on the second node also.

$YDLODELOLW\￿RI￿&OXVWHU￿5HVRXUFHV

In the context of clustering, a resource is a basic unit of fail-over. Application programs are made up of resources that are grouped together for the purpose of recovery. All recovery groups, and therefore their comprising resources, must be online (or in a ready state) for the cluster to function properly.

To verify that the cluster resources are online, start the Cluster Administrator on the monitoring node. Click the Start button, point to Programs, point to Administrative Tools (Common), and then click Cluster Administrator. Open a connection to the cluster, and observe the running state of each recovery group. If a group has failed, one or more of its resources may be off-line.

Troubleshooting the reasons that resources might be failing is beyond the scope of this guide, but examining the properties of each resource and ensuring that the speci- fied parameters are correct is a first step in the troubleshooting process. In general, if a resource is off-line, it can be brought online by selecting it, right-clicking it, and choosing Bring Online from the pull-down menu. For information about troubleshoot- ing resource failures, refer to the Microsoft Windows NT Server, Enterprise Edition Administrator’s Guide.

3-12 Dell PowerEdge Cluster FE100 Installation and Troubleshooting Guide

Page 50
Image 50
3Com FE100 manual OXVWHU1HWZRUN&RPPXQLFDWLRQV, OXVWHU6HUYLFH, $YDLODELOLW\RI&OXVWHU5HVRXUFHV