Eprimary Management, Switch Failures | IBM SG24-5131-00 guide

9.4.2 Eprimary Management

The SP switch has an internal primary backup concept, where the primary node, known as the Eprimary, is backed up automatically by a backup node. So, in case any serious failure happens on the primary, it will resign from work, and the backup node will take over the switch network handling, keeping track of routes, working on events, and so on.

HACMP/ES used to have an Eprimary management function with versions below 4.3; so, if you upgrade to Version 4.3 and also upgrade your switch to the SP switch, and you had configured Eprimary management previously within the HACMP definitions, you have to unmanage it.

To check whether the Eprimary is set to be managed, issue the following command:

odmget -q’name=EPRIMARY’ HACMPsp2

If the switch is set to MANAGE, before changing to the new switch, run the script:

/usr/es/sbin/cluster/events/utils/cl_HPS_Eprimary unmanage

As the SP switch has its availability concept built-in, there is no need to do it outside the PSSP software, so, HACMP doesn’t have to take care of it any more.

9.4.3 Switch Failures

As mentioned before, a node in the SP is still restricted to have a maximum of one switch adapter installed. Therefore, even with the software being able to assign a new primary node within the SP and outside of HACMP, the switch adapter is still a single point of failure.

If the switch adapter in a node resigns from work due to a software or hardware problem, the switch network is down for that node.

If any application running on that node relies on the switch network, this means that the application has virtually died on that node. Therefore, it might be advisable to promote the switch network failure into a node failure, as described in 2.6.2.1, “Single Point-of-Failure Hardware Component Recovery” on page 46. HACMP would be able to recognize the network failure when you configure the switch network as an HACMP network, and thus would react with a network_down event, which in turn would shut down the node from HACMP, causing a takeover.

196 IBM Certification Study Guide AIX HACMP

Image 214

IBM SG24-5131-00 manual Eprimary Management, Switch Failures

Contents

AIX Hacmp Page AIX Hacmp Take Note Contents Iv IBM Certification Study Guide AIX Hacmp Page Vi IBM Certification Study Guide AIX Hacmp Appendix A. Special Notices Vii Viii IBM Certification Study Guide AIX Hacmp Figures IBM Certification Study Guide AIX Hacmp Tables Xii IBM Certification Study Guide AIX Hacmp Preface Xiii Team That Wrote This Redbook Comments Welcome Your comments are important to us Xvi IBM Certification Study Guide AIX Hacmp Certification Requirement two Tests Certification OverviewIBM Certified Specialist AIX Hacmp Recommended Prerequisites Certification Exam Objectives PreinstallationHacmp Implementation System Management Certification Education Courses Following table outlines information about the next course IBM Certification Study Guide AIX Hacmp Cluster Nodes Cluster PlanningCPU Options Cluster Node Considerations Cluster Planning Switch adapter is onboard and does not need an extra slot 1 TCP/IP Networks Cluster NetworksSupported TCP/IP Network Types Slip Socc Special Network Considerations Cluster Planning Non-TCPIP Networks Supported Non-TCP/IP Network Types Special Considerations Serial RS232 Target-mode Scsi Cluster DisksSSA Disks Target-mode SSA Host Specification Disk Capacities 1.2 Supported and Non-Supported Adapters Rules for SSA Loops Cluster Planning RAID vs. Non-RAID RAID Technology RAID Level RAID Levels 2 RAID on the 7133 Disk Subsystem Advantages Disks Scsi DisksSubsystems Advantages Disadvantages Resource Planning Resource Group Options Cluster Planning Hot-Standby Configuration Shared LVM Components Rotating Standby Configuration Hot-Standby Configuration Mutual Takeover Configuration Mutual Takeover Configuration Third-Party Takeover Configuration Third-Party Takeover Configuration Concurrent Disk Access Configurations IP Address Takeover Network Topology Single Network Dual Network Point-to-Point Connection Network Name NetworksNetwork Attribute Adapter Label Network AdaptersAdapter Function Cluster Planning Defining Hardware Addresses NFS Exports and NFS Mounts Application Planning Performance Requirements Application Startup and Shutdown Routines Coexistence with other Applications Licensing MethodsCritical/Non-Critical Prioritizations Special Application Requirements Customization PlanningEvent Customization Event Notification Error Notification Predictive Event Error Correction Sample Screen for Add a Notification Method Notification Application Failure User ID Planning Cluster User and Group IDs User Home Directory Planning Cluster PasswordsHome Directories on Shared Volumes NFS-Mounted Home Directories NFS-Mounted Home Directories on Shared Volumes Cluster Hardware and Software Preparation Cluster Node SetupAdapter Slot Placement Rootvg Mirroring IBM Certification Study Guide AIX Hacmp Procedure This is so that the Quorum OFF functionality takes effect AIX Prerequisite LPPs Necessary Apar Fixes AIX Parameter Settings 4.1 I/O Pacing Checking Network Option Settings Cron and NIS Considerations Editing the /.rhosts File Network Connection and Testing Cabling Considerations Connecting Networks to a Hub IP Addresses and Subnets Testing Non TCP/IP Networks Configuring RS232 Configuring Target Mode Scsi Configuring Target Mode SSA Testing RS232 and Target Mode Networks 1 SSA Cluster Disk SetupCabling AIX Configuration Adapter Router Adapter Definitions Disk Definitions Diagnostics #lsdev -Cc disk grep SSA Upgrade Instructions Microcode Loading Cluster Hardware and Software Preparation Configuring a RAID on SSA Disks Scsi Scsi Adapters Connecting RAID SubsystemsRAID Enclosures 110 RAIDiant Arrays Connected on Two Shared 8-Bit Scsi Buses Cluster Hardware and Software Preparation #2416 Adapter Scsi ID and Termination change Termination F1=Help F2=Refresh # chdev -l scsi1 -a id=6 -P Shared LVM Component Configuration Change/Show Characteristics of a Scsi Adapter Creating Shared VGs Creating VGs for Concurrent AccessCreating Non-Concurrent VGs Physical Volume Names Renaming a jfslog and Logical Volumes on the Source Node Creating Shared LVs and File Systems Adding Copies to Logical Volume on the Source Node Mirroring Strategies Testing a File SystemImporting to Other Nodes Changing a Volume Group’s Startup Status Quorum Quorum at Vary On Quorum after Vary On Quorum EnabledQuorum Disabled Disabling and Enabling Quorum Alternate Method TaskGuide Quorum in Non-Concurrent Access ConfigurationsQuorum in Concurrent Access Configurations Forcing a Varyon Starting the TaskGuide IBM Certification Study Guide AIX Hacmp Installing Hacmp Hacmp Installation and Cluster DefinitionFirst Time Installs Cluster.base.server.utils Cluster.hc Rebooting Servers Install Server NodesUpgrading From a Previous Version Upgrade AIX on One Node Install Hacmp 4.3 for AIX on Node a Check Upgraded Configuration Client-only Migration Defining Cluster Topology Defining the Cluster Defining Nodes Defining Adapters Hacmp Installation and Cluster Definition Node Name Configuring Network Modules Adding or Changing Adapters after the Initial Configuration Synchronizing the Cluster Definition Across Nodes Ignore Cluster Configuring Resource Groups Defining Resources Configuring Resources for Resource Groups Service IP Label Configuring Run-Time Parameters Defining Application Servers Clverify Initial TestingSynchronizing Cluster Resources Initial Startup Takeover and Reintegration Cluster Snapshot Applying a Cluster Snapshot Hacmp Installation and Cluster Definition IBM Certification Study Guide AIX Hacmp Cluster Customization Predefined Cluster Events Getdiskvgfs AcquireserviceaddrAcquiretakeoveraddr Nodeupremote Nodedownlocal ReleaseserviceaddrSequence of nodedown Events Nodedownremote Networkup StartserverNetwork Events Networkdown Networkupcomplete Network Adapter Events ConfigtoolongReconfigtopologystart Swapadapter Pre- and Post-Event Processing Event Recovery and RetryConfiguration Resources Cluster Events Change/Show Cluster Event Notification Event Emulator Network Modules/Topology Services and Group Services Creating Shared Volume Groups NFS considerations Cascading Takeover with Cross Mounted NFS File Systems Exporting NFS File SystemsNFS Mounting Creating NFS Mount Points on Clients Caveats about Node Names and NFS NFS Cross Mounts Cross Mounted NFS File Systems and the Network Lock Manager SLEEP=2 Done Device State Cluster TestingNode Verification 131 Process State System ParametersNetwork State LVM State Cluster State Adapter Failure Simulate ErrorsEthernet or Token Ring Interface Failure Switch Adapter Failure Ethernet or Token Ring Adapter or Cable Failure Failure of a 7133 Adapter Re-attach the cables AIX Crash Node Failure / ReintegrationCPU Failure Network Failure 2.3 TCP/IP Subsystem Failure Disk Failure Mirrored rootvg Disk hdisk0 Failure 4.2 7135 Disk Failure Mirrored 7133 Disk Failure Application Failure IBM Certification Study Guide AIX Hacmp Cluster Log Files Cluster Troubleshooting143 Configtoolong Daemons Deadman Switch Extending the syncd Frequency Tuning the System Using I/O PacingIncrease Amount of Memory for Communications Subsystem Changing the Failure Detection Rate Node Isolation and Partitioned Clusters Dgsp Message User ID Problems Troubleshooting Strategy IBM Certification Study Guide AIX Hacmp Monitoring the Cluster Cluster Management and Administration151 Clstat Command Monitoring Clusters using HAView 3.2 /tmp/hacmp.out System Error Log3.1 /var/adm/cluster.log 3.3 /usr/sbin/cluster/history/cluster.mmdd Starting and Stopping Hacmp on a Node or a Client Cluster Smux Peer daemon clsmuxpd Hacmp DaemonsCluster Manager daemon clstrmgr Cluster Lock Manager daemon cllockd Cluster Group Services daemon grpsvcsd Starting Cluster Services on a NodeCluster Topology Services daemon topsvcsd Cluster Information Program daemon clinfo Stopping Cluster Services on a Node Automatically Restarting Cluster Services Graceful When to Stop Cluster servicesTypes of Cluster Stops Forced Starting and Stopping Cluster Services on Clients Maintaining Cluster Information Services on Clients Nodes Replacing Failed ComponentsAdapters 3.1 SSA/SCSI Disk Replacement RAID Disks Sync the volume group smit clsyncvg Manual Update Changing Shared LVM Components Lazy Update Spoc IBM Certification Study Guide AIX Hacmp Changing Cluster Resources TaskGuideTaskGuide Requirements 1 Add/Change/Remove Cluster Resources Synchronize Cluster Resources Dare Resource Migration Utility Sticky Resource Migration Resource Migration TypesNon-Sticky Resource Migration Locations Default LocationNode Name Using the cldare Command to Migrate Resources Stop Location Using the clfindres Command Stopping Resource Groups Applying Software Maintenance to an Hacmp Cluster Fallover System a Rejoins Cluster Backup Strategies Split-Mirror Backups How to do a split-mirror backup Using Events to Schedule a Backup User Management Listing Users On All Cluster Nodes Adding User Accounts on all Cluster Nodes Changing Attributes of Users in a Cluster Removing Users from a Cluster Managing Group Accounts Spoc Log IBM Certification Study Guide AIX Hacmp Hardware Requirements Special RS/6000 SP TopicsHigh Availability Control Workstation Hacws 183 Configuring the Backup CWS Software Requirements Install High Availability Software Hacws Configuration Setup and Test Hacws Kerberos Security Ambrose Bierce, The Enlarged Devil’s Dictionary Configuring Kerberos Security with Hacmp Version VSDs RVSDs Virtual Shared Disk VSDs Special RS/6000 SP Topics Undefined Recoverable Virtual Shared Disk D e Z Rsvd Daemons SP Switch as an Hacmp Network Switch Basics Within Hacmp Switch Failures Eprimary Management Special RS/6000 SP Topics IBM Certification Study Guide AIX Hacmp Hacmp for AIX / Enhanced Scalability Hacmp Classic vs. HACMP/ES vs. HanfsHacmp for AIX Classic 199 IBM Risc System Cluster Technology Rsct Enhanced Cluster Security High Availability for Network File System for AIX Similarities and Differences Decision Criteria Hacmp Classic vs. HACMP/ES vs. Hanfs IBM Certification Study Guide AIX Hacmp Appendix A. Special Notices 205 SP1 Special Notices IBM Certification Study Guide AIX Hacmp Redbooks on CD-ROMs Appendix B. Related PublicationsInternational Technical Support Organization Publications 209 Other Publications How IBM Employees Can Get Itso Redbooks How to Get Itso Redbooks211 How Customers Can Get Itso Redbooks Ibmmail IBM Redbook Order Form 213 IBM Certification Study Guide AIX Hacmp List of Abbreviations 215 Netbios Index Symbols 217 HACMP/ES 219 NIS Vgda Vgsa Itso Redbook Evaluation 221 IBM Certification Study Guide AIX Hacmp SG24-5131-00