Verify that failover has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

Power cycle NodeF. If HACMP is not configured to start from /etc/inittab (on restart), start HACMP on NodeF (smit clstart). NodeF will take back its cascading Resource Groups.

Verify that re-integration has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

6.2.2.3 TCP/IP Subsystem Failure

Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

Optional: Prune the error log on NodeF (errclear 0).

Monitor the cluster log files on NodeT.

On NodeF, stop the TCP/IP subsystem (sh /etc/tcp.clean) or crash the subsystem by increasing the size of the sb_max and thewall parameters to large values (no -o sb_max=10000; no -o thewall=10000) and ping NodeT. Note that you should record the values for sb_max and thewall prior to modifying them, and, as an extra check, you may want to add the original values to the end of /etc/rc.net.

The TCP/IP subsystem failure on NodeF will cause a network failure of all the TCP/IP networks on NodeF. Unless there has been some customization done to promote this type of failure to a node failure, only the network failure will occur. The presence of a non-TCP/IP network (RS232, target mode SCSI or target mode SSA) should prevent the cluster from triggering a node down in this situation.

Verify that the network_down event has been run by checking the /tmp/hacmp.out file on either node. By default, the network_down script does nothing, but it can be customized to do whatever is appropriate for that situation in your environment.

On NodeF, issue the command startsrc -g tcpip. This should restart the TCP/IP daemons, and should cause a network_up event to be triggered in the cluster for each of your TCP/IP networks.

6.2.3Network Failure

Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

Optional: Prune the error log on NodeF (errclear 0).

138 IBM Certification Study Guide AIX HACMP

Page 156
Image 156
IBM SG24-5131-00 manual Network Failure, 2.3 TCP/IP Subsystem Failure