Node Failure / Reintegration, AIX Crash | IBM SG24-5131-00 manual

•Verify that all sharedvg file systems and paging spaces are accessible ( df -k and lsps -a).

6.2.2Node Failure / Reintegration

The following sections deal with issues of node failure and reintegration.

6.2.2.1 AIX Crash

Perform the following steps in the event of an AIX crash:

•Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

•Optional: Prune the error log on NodeF (errclear 0).

•If NodeF is an SMP, you may want to set the fast reboot switch ( mpcfg -cf 11 1).

•Monitor cluster logfiles on NodeT.

•Crash NodeF by entering cat /etc/hosts > /dev/kmem. (The LED on NodeF will display 888.)

•The OS failure on NodeF will cause a node failover to NodeT.

•Verify that failover has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

•Power cycle NodeF. If HACMP is not configured to start from /etc/inittab, (on restart) start HACMP on NodeF (smit clstart). NodeF will take back its cascading Resource Groups.

•Verify that re-integration has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

6.2.2.2 CPU Failure

Perform the following steps in the event of CPU failure:

•Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

•Optional: Prune the error log on NodeF (errclear 0).

•If NodeF is an SMP, you may want to set the fast reboot switch ( mpcfg -cf 11 1).

•Monitor cluster logfiles on NodeT.

•Power off NodeF. This will cause a node failover to NodeT.

Cluster Testing 137

Image 155

IBM SG24-5131-00 manual Node Failure / Reintegration, AIX Crash, CPU Failure

Contents

AIX Hacmp Page AIX Hacmp Take Note Contents Iv IBM Certification Study Guide AIX Hacmp Page Vi IBM Certification Study Guide AIX Hacmp Vii Appendix A. Special Notices Viii IBM Certification Study Guide AIX Hacmp Figures IBM Certification Study Guide AIX Hacmp Tables Xii IBM Certification Study Guide AIX Hacmp Xiii Preface Team That Wrote This Redbook Your comments are important to us Comments Welcome Xvi IBM Certification Study Guide AIX Hacmp Recommended Prerequisites Certification OverviewIBM Certified Specialist AIX Hacmp Certification Requirement two Tests Hacmp Implementation PreinstallationCertification Exam Objectives System Management Certification Education Courses Following table outlines information about the next course IBM Certification Study Guide AIX Hacmp CPU Options Cluster PlanningCluster Nodes Cluster Node Considerations Cluster Planning Switch adapter is onboard and does not need an extra slot Supported TCP/IP Network Types Cluster Networks1 TCP/IP Networks Special Network Considerations Slip Socc Cluster Planning Supported Non-TCP/IP Network Types Non-TCPIP Networks Serial RS232 Special Considerations Target-mode SSA Cluster DisksSSA Disks Target-mode Scsi Host Specification 1.2 Supported and Non-Supported Adapters Disk Capacities Rules for SSA Loops Cluster Planning RAID Level RAID vs. Non-RAID RAID Technology RAID Levels 2 RAID on the 7133 Disk Subsystem Advantages Subsystems Scsi DisksDisks Advantages Disadvantages Resource Group Options Resource Planning Cluster Planning Shared LVM Components Hot-Standby Configuration Hot-Standby Configuration Rotating Standby Configuration Mutual Takeover Configuration Mutual Takeover Configuration Third-Party Takeover Configuration Third-Party Takeover Configuration IP Address Takeover Concurrent Disk Access Configurations Single Network Network Topology Point-to-Point Connection Dual Network Network Attribute NetworksNetwork Name Adapter Function Network AdaptersAdapter Label Cluster Planning Defining Hardware Addresses Application Planning NFS Exports and NFS Mounts Application Startup and Shutdown Routines Performance Requirements Critical/Non-Critical Prioritizations Licensing MethodsCoexistence with other Applications Event Notification Customization PlanningEvent Customization Special Application Requirements Predictive Event Error Correction Error Notification Sample Screen for Add a Notification Method Application Failure Notification Cluster User and Group IDs User ID Planning NFS-Mounted Home Directories Cluster PasswordsHome Directories on Shared Volumes User Home Directory Planning NFS-Mounted Home Directories on Shared Volumes Rootvg Mirroring Cluster Node SetupAdapter Slot Placement Cluster Hardware and Software Preparation IBM Certification Study Guide AIX Hacmp Procedure This is so that the Quorum OFF functionality takes effect Necessary Apar Fixes AIX Prerequisite LPPs 4.1 I/O Pacing AIX Parameter Settings Checking Network Option Settings Cron and NIS Considerations Editing the /.rhosts File Cabling Considerations Network Connection and Testing IP Addresses and Subnets Connecting Networks to a Hub Testing Non TCP/IP Networks Configuring Target Mode Scsi Configuring RS232 Testing RS232 and Target Mode Networks Configuring Target Mode SSA Cabling Cluster Disk Setup1 SSA Adapter Router AIX Configuration Disk Definitions Adapter Definitions #lsdev -Cc disk grep SSA Diagnostics Microcode Loading Upgrade Instructions Cluster Hardware and Software Preparation Scsi Configuring a RAID on SSA Disks RAID Enclosures Connecting RAID SubsystemsScsi Adapters 110 RAIDiant Arrays Connected on Two Shared 8-Bit Scsi Buses Cluster Hardware and Software Preparation #2416 Adapter Scsi ID and Termination change Termination F1=Help F2=Refresh # chdev -l scsi1 -a id=6 -P Change/Show Characteristics of a Scsi Adapter Shared LVM Component Configuration Creating Non-Concurrent VGs Creating VGs for Concurrent AccessCreating Shared VGs Physical Volume Names Creating Shared LVs and File Systems Renaming a jfslog and Logical Volumes on the Source Node Adding Copies to Logical Volume on the Source Node Importing to Other Nodes Testing a File SystemMirroring Strategies Changing a Volume Group’s Startup Status Quorum at Vary On Quorum Disabling and Enabling Quorum Quorum EnabledQuorum Disabled Quorum after Vary On Forcing a Varyon Quorum in Non-Concurrent Access ConfigurationsQuorum in Concurrent Access Configurations Alternate Method TaskGuide Starting the TaskGuide IBM Certification Study Guide AIX Hacmp First Time Installs Hacmp Installation and Cluster DefinitionInstalling Hacmp Cluster.base.server.utils Cluster.hc Upgrading From a Previous Version Install Server NodesRebooting Servers Upgrade AIX on One Node Check Upgraded Configuration Install Hacmp 4.3 for AIX on Node a Client-only Migration Defining Cluster Topology Defining Nodes Defining the Cluster Defining Adapters Hacmp Installation and Cluster Definition Node Name Adding or Changing Adapters after the Initial Configuration Configuring Network Modules Synchronizing the Cluster Definition Across Nodes Ignore Cluster Defining Resources Configuring Resource Groups Service IP Label Configuring Resources for Resource Groups Defining Application Servers Configuring Run-Time Parameters Synchronizing Cluster Resources Initial TestingClverify Takeover and Reintegration Initial Startup Cluster Snapshot Applying a Cluster Snapshot Hacmp Installation and Cluster Definition IBM Certification Study Guide AIX Hacmp Predefined Cluster Events Cluster Customization Nodeupremote AcquireserviceaddrAcquiretakeoveraddr Getdiskvgfs Nodedownremote ReleaseserviceaddrSequence of nodedown Events Nodedownlocal Networkupcomplete StartserverNetwork Events Networkdown Networkup Swapadapter ConfigtoolongReconfigtopologystart Network Adapter Events Event Notification Event Recovery and RetryConfiguration Resources Cluster Events Change/Show Cluster Pre- and Post-Event Processing Event Emulator Network Modules/Topology Services and Group Services NFS considerations Creating Shared Volume Groups Creating NFS Mount Points on Clients Exporting NFS File SystemsNFS Mounting Cascading Takeover with Cross Mounted NFS File Systems NFS Cross Mounts Caveats about Node Names and NFS Cross Mounted NFS File Systems and the Network Lock Manager SLEEP=2 Done 131 Cluster TestingNode Verification Device State Network State System ParametersProcess State Cluster State LVM State Ethernet or Token Ring Interface Failure Simulate ErrorsAdapter Failure Ethernet or Token Ring Adapter or Cable Failure Switch Adapter Failure Re-attach the cables Failure of a 7133 Adapter CPU Failure Node Failure / ReintegrationAIX Crash 2.3 TCP/IP Subsystem Failure Network Failure Mirrored rootvg Disk hdisk0 Failure Disk Failure Mirrored 7133 Disk Failure 4.2 7135 Disk Failure Application Failure IBM Certification Study Guide AIX Hacmp 143 Cluster TroubleshootingCluster Log Files Daemons Configtoolong Deadman Switch Increase Amount of Memory for Communications Subsystem Tuning the System Using I/O PacingExtending the syncd Frequency Node Isolation and Partitioned Clusters Changing the Failure Detection Rate Dgsp Message Troubleshooting Strategy User ID Problems IBM Certification Study Guide AIX Hacmp 151 Cluster Management and AdministrationMonitoring the Cluster Monitoring Clusters using HAView Clstat Command 3.3 /usr/sbin/cluster/history/cluster.mmdd System Error Log3.1 /var/adm/cluster.log 3.2 /tmp/hacmp.out Starting and Stopping Hacmp on a Node or a Client Cluster Lock Manager daemon cllockd Hacmp DaemonsCluster Manager daemon clstrmgr Cluster Smux Peer daemon clsmuxpd Cluster Information Program daemon clinfo Starting Cluster Services on a NodeCluster Topology Services daemon topsvcsd Cluster Group Services daemon grpsvcsd Automatically Restarting Cluster Services Stopping Cluster Services on a Node Forced When to Stop Cluster servicesTypes of Cluster Stops Graceful Maintaining Cluster Information Services on Clients Starting and Stopping Cluster Services on Clients Adapters Replacing Failed ComponentsNodes Disks 3.1 SSA/SCSI Disk Replacement RAID Sync the volume group smit clsyncvg Changing Shared LVM Components Manual Update Lazy Update Spoc IBM Certification Study Guide AIX Hacmp TaskGuide Requirements TaskGuideChanging Cluster Resources Synchronize Cluster Resources 1 Add/Change/Remove Cluster Resources Dare Resource Migration Utility Non-Sticky Resource Migration Resource Migration TypesSticky Resource Migration Node Name Default LocationLocations Stop Location Using the cldare Command to Migrate Resources Stopping Resource Groups Using the clfindres Command Applying Software Maintenance to an Hacmp Cluster Fallover System a Rejoins Cluster Split-Mirror Backups Backup Strategies How to do a split-mirror backup User Management Using Events to Schedule a Backup Adding User Accounts on all Cluster Nodes Listing Users On All Cluster Nodes Removing Users from a Cluster Changing Attributes of Users in a Cluster Spoc Log Managing Group Accounts IBM Certification Study Guide AIX Hacmp 183 Special RS/6000 SP TopicsHigh Availability Control Workstation Hacws Hardware Requirements Software Requirements Configuring the Backup CWS Hacws Configuration Install High Availability Software Setup and Test Hacws Kerberos Security Ambrose Bierce, The Enlarged Devil’s Dictionary Configuring Kerberos Security with Hacmp Version Virtual Shared Disk VSDs VSDs RVSDs Special RS/6000 SP Topics Undefined D e Z Recoverable Virtual Shared Disk Rsvd Daemons Switch Basics Within Hacmp SP Switch as an Hacmp Network Eprimary Management Switch Failures Special RS/6000 SP Topics IBM Certification Study Guide AIX Hacmp 199 Hacmp Classic vs. HACMP/ES vs. HanfsHacmp for AIX Classic Hacmp for AIX / Enhanced Scalability IBM Risc System Cluster Technology Rsct High Availability for Network File System for AIX Enhanced Cluster Security Decision Criteria Similarities and Differences Hacmp Classic vs. HACMP/ES vs. Hanfs IBM Certification Study Guide AIX Hacmp 205 Appendix A. Special Notices SP1 Special Notices IBM Certification Study Guide AIX Hacmp 209 Appendix B. Related PublicationsInternational Technical Support Organization Publications Redbooks on CD-ROMs Other Publications 211 How to Get Itso RedbooksHow IBM Employees Can Get Itso Redbooks Ibmmail How Customers Can Get Itso Redbooks 213 IBM Redbook Order Form IBM Certification Study Guide AIX Hacmp 215 List of Abbreviations Netbios 217 Index Symbols HACMP/ES NIS 219 Vgda Vgsa 221 Itso Redbook Evaluation IBM Certification Study Guide AIX Hacmp SG24-5131-00