8 Troubleshooting Your Cluster

This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems. Topics are as follows:

Testing Cluster Operation

Monitoring Hardware (page 322)

Replacing Disks (page 324)

Replacing I/O Cards (page 326)

Replacing LAN or Fibre Channel Cards (page 327)

Replacing a Failed Quorum Server System (page 328)

Troubleshooting Approaches (page 329)

Solving Problems (page 332)

Testing Cluster Operation

Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.

CAUTION: In testing the cluster in the following procedures, be aware that you are causing various components of the cluster to fail, so that you can determine that the cluster responds correctly to failure situations. As a result, the availability of nodes and applications may be disrupted.

Start the Cluster using Serviceguard Manager

If you have just finished configuring your cluster, it starts automatically. If it is halted later, restart it: from the System Management Homepage (SMH), select the cluster and choose Administration

-> Run Cluster...

Testing the Package Manager

You can test that the package manager is operating correctly. Perform the following procedure for each package on the cluster:

1.Obtain the PID number of a service in the package by entering ps -ef grep <service_cmd>

where service_cmd is the executable specified in the package control script with the parameter SERVICE_CMD. The service selected must not have SERVICE_RESTART specified.

2.To kill the service_cmd PID, enter kill PID

3.To view the package status, enter cmviewcl -v

The package should be running on the specified adoptive node.

4.Move the package back to the primary node (see “Moving a Failover Package ” (page 286)).

You can also test the package manager using generic resources. Perform the following procedure for each package on the cluster:

1.Obtain the generic resource that is configured in a package by entering cmviewcl -v -p <pkg_name>

2.Set the status of generic resource to DOWN using the following command:

Testing Cluster Operation 321

Page 321
Image 321
HP Troubleshooting Your Cluster, Testing Cluster Operation, Start the Cluster using Serviceguard Manager, Run Cluster