IBM SG24-5131-00 manual Eprimary Management, Switch Failures

Models: SG24-5131-00

1 240
Download 240 pages 48.89 Kb
Page 214
Image 214

9.4.2 Eprimary Management

The SP switch has an internal primary backup concept, where the primary node, known as the Eprimary, is backed up automatically by a backup node. So, in case any serious failure happens on the primary, it will resign from work, and the backup node will take over the switch network handling, keeping track of routes, working on events, and so on.

HACMP/ES used to have an Eprimary management function with versions below 4.3; so, if you upgrade to Version 4.3 and also upgrade your switch to the SP switch, and you had configured Eprimary management previously within the HACMP definitions, you have to unmanage it.

To check whether the Eprimary is set to be managed, issue the following command:

odmget -q’name=EPRIMARY’ HACMPsp2

If the switch is set to MANAGE, before changing to the new switch, run the script:

/usr/es/sbin/cluster/events/utils/cl_HPS_Eprimary unmanage

As the SP switch has its availability concept built-in, there is no need to do it outside the PSSP software, so, HACMP doesn’t have to take care of it any more.

9.4.3 Switch Failures

As mentioned before, a node in the SP is still restricted to have a maximum of one switch adapter installed. Therefore, even with the software being able to assign a new primary node within the SP and outside of HACMP, the switch adapter is still a single point of failure.

If the switch adapter in a node resigns from work due to a software or hardware problem, the switch network is down for that node.

If any application running on that node relies on the switch network, this means that the application has virtually died on that node. Therefore, it might be advisable to promote the switch network failure into a node failure, as described in 2.6.2.1, “Single Point-of-Failure Hardware Component Recovery” on page 46. HACMP would be able to recognize the network failure when you configure the switch network as an HACMP network, and thus would react with a network_down event, which in turn would shut down the node from HACMP, causing a takeover.

196 IBM Certification Study Guide AIX HACMP

Page 214
Image 214
IBM SG24-5131-00 manual Eprimary Management, Switch Failures