IBM SG24-5131-00 manual Node Failure / Reintegration, AIX Crash, CPU Failure

Models: SG24-5131-00

1 240
Download 240 pages 48.89 Kb
Page 155
Image 155

Verify that all sharedvg file systems and paging spaces are accessible ( df -k and lsps -a).

6.2.2Node Failure / Reintegration

The following sections deal with issues of node failure and reintegration.

6.2.2.1 AIX Crash

Perform the following steps in the event of an AIX crash:

Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

Optional: Prune the error log on NodeF (errclear 0).

If NodeF is an SMP, you may want to set the fast reboot switch ( mpcfg -cf 11 1).

Monitor cluster logfiles on NodeT.

Crash NodeF by entering cat /etc/hosts > /dev/kmem. (The LED on NodeF will display 888.)

The OS failure on NodeF will cause a node failover to NodeT.

Verify that failover has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

Power cycle NodeF. If HACMP is not configured to start from /etc/inittab, (on restart) start HACMP on NodeF (smit clstart). NodeF will take back its cascading Resource Groups.

Verify that re-integration has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

6.2.2.2 CPU Failure

Perform the following steps in the event of CPU failure:

Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

Optional: Prune the error log on NodeF (errclear 0).

If NodeF is an SMP, you may want to set the fast reboot switch ( mpcfg -cf 11 1).

Monitor cluster logfiles on NodeT.

Power off NodeF. This will cause a node failover to NodeT.

Cluster Testing 137

Page 155
Image 155
IBM SG24-5131-00 manual Node Failure / Reintegration, AIX Crash, CPU Failure