Network Failure, 2.3 TCP/IP Subsystem Failure | IBM SG24-5131-00

•Verify that failover has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

•Power cycle NodeF. If HACMP is not configured to start from /etc/inittab (on restart), start HACMP on NodeF (smit clstart). NodeF will take back its cascading Resource Groups.

•Verify that re-integration has occurred ( netstat -iand ping for networks, lsvg -oand vi of a test file for volume groups, and ps -U <appuid> for application processes).

6.2.2.3 TCP/IP Subsystem Failure

•Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

•Optional: Prune the error log on NodeF (errclear 0).

•Monitor the cluster log files on NodeT.

•On NodeF, stop the TCP/IP subsystem (sh /etc/tcp.clean) or crash the subsystem by increasing the size of the sb_max and thewall parameters to large values (no -o sb_max=10000; no -o thewall=10000) and ping NodeT. Note that you should record the values for sb_max and thewall prior to modifying them, and, as an extra check, you may want to add the original values to the end of /etc/rc.net.

•The TCP/IP subsystem failure on NodeF will cause a network failure of all the TCP/IP networks on NodeF. Unless there has been some customization done to promote this type of failure to a node failure, only the network failure will occur. The presence of a non-TCP/IP network (RS232, target mode SCSI or target mode SSA) should prevent the cluster from triggering a node down in this situation.

•Verify that the network_down event has been run by checking the /tmp/hacmp.out file on either node. By default, the network_down script does nothing, but it can be customized to do whatever is appropriate for that situation in your environment.

•On NodeF, issue the command startsrc -g tcpip. This should restart the TCP/IP daemons, and should cause a network_up event to be triggered in the cluster for each of your TCP/IP networks.

6.2.3Network Failure

•Check, by way of the verification commands, that all the Nodes in the cluster are up and running.

•Optional: Prune the error log on NodeF (errclear 0).

138 IBM Certification Study Guide AIX HACMP

Image 156

IBM SG24-5131-00 manual Network Failure, 2.3 TCP/IP Subsystem Failure

Contents

AIX Hacmp Page AIX Hacmp Take Note Contents Iv IBM Certification Study Guide AIX Hacmp Page Vi IBM Certification Study Guide AIX Hacmp Appendix A. Special Notices Vii Viii IBM Certification Study Guide AIX Hacmp Figures IBM Certification Study Guide AIX Hacmp Tables Xii IBM Certification Study Guide AIX Hacmp Preface Xiii Team That Wrote This Redbook Comments Welcome Your comments are important to us Xvi IBM Certification Study Guide AIX Hacmp Certification Overview IBM Certified Specialist AIX HacmpCertification Requirement two Tests Recommended Prerequisites Preinstallation Certification Exam ObjectivesHacmp Implementation System Management Certification Education Courses Following table outlines information about the next course IBM Certification Study Guide AIX Hacmp Cluster Planning Cluster NodesCPU Options Cluster Node Considerations Cluster Planning Switch adapter is onboard and does not need an extra slot Cluster Networks 1 TCP/IP NetworksSupported TCP/IP Network Types Slip Socc Special Network Considerations Cluster Planning Non-TCPIP Networks Supported Non-TCP/IP Network Types Special Considerations Serial RS232 Cluster Disks SSA DisksTarget-mode Scsi Target-mode SSA Host Specification Disk Capacities 1.2 Supported and Non-Supported Adapters Rules for SSA Loops Cluster Planning RAID vs. Non-RAID RAID Technology RAID Level RAID Levels 2 RAID on the 7133 Disk Subsystem Advantages Scsi Disks DisksSubsystems Advantages Disadvantages Resource Planning Resource Group Options Cluster Planning Hot-Standby Configuration Shared LVM Components Rotating Standby Configuration Hot-Standby Configuration Mutual Takeover Configuration Mutual Takeover Configuration Third-Party Takeover Configuration Third-Party Takeover Configuration Concurrent Disk Access Configurations IP Address Takeover Network Topology Single Network Dual Network Point-to-Point Connection Networks Network NameNetwork Attribute Network Adapters Adapter LabelAdapter Function Cluster Planning Defining Hardware Addresses NFS Exports and NFS Mounts Application Planning Performance Requirements Application Startup and Shutdown Routines Licensing Methods Coexistence with other ApplicationsCritical/Non-Critical Prioritizations Customization Planning Event CustomizationSpecial Application Requirements Event Notification Error Notification Predictive Event Error Correction Sample Screen for Add a Notification Method Notification Application Failure User ID Planning Cluster User and Group IDs Cluster Passwords Home Directories on Shared VolumesUser Home Directory Planning NFS-Mounted Home Directories NFS-Mounted Home Directories on Shared Volumes Cluster Node Setup Adapter Slot PlacementCluster Hardware and Software Preparation Rootvg Mirroring IBM Certification Study Guide AIX Hacmp Procedure This is so that the Quorum OFF functionality takes effect AIX Prerequisite LPPs Necessary Apar Fixes AIX Parameter Settings 4.1 I/O Pacing Checking Network Option Settings Cron and NIS Considerations Editing the /.rhosts File Network Connection and Testing Cabling Considerations Connecting Networks to a Hub IP Addresses and Subnets Testing Non TCP/IP Networks Configuring RS232 Configuring Target Mode Scsi Configuring Target Mode SSA Testing RS232 and Target Mode Networks Cluster Disk Setup 1 SSACabling AIX Configuration Adapter Router Adapter Definitions Disk Definitions Diagnostics #lsdev -Cc disk grep SSA Upgrade Instructions Microcode Loading Cluster Hardware and Software Preparation Configuring a RAID on SSA Disks Scsi Connecting RAID Subsystems Scsi AdaptersRAID Enclosures 110 RAIDiant Arrays Connected on Two Shared 8-Bit Scsi Buses Cluster Hardware and Software Preparation #2416 Adapter Scsi ID and Termination change Termination F1=Help F2=Refresh # chdev -l scsi1 -a id=6 -P Shared LVM Component Configuration Change/Show Characteristics of a Scsi Adapter Creating VGs for Concurrent Access Creating Shared VGsCreating Non-Concurrent VGs Physical Volume Names Renaming a jfslog and Logical Volumes on the Source Node Creating Shared LVs and File Systems Adding Copies to Logical Volume on the Source Node Testing a File System Mirroring StrategiesImporting to Other Nodes Changing a Volume Group’s Startup Status Quorum Quorum at Vary On Quorum Enabled Quorum DisabledQuorum after Vary On Disabling and Enabling Quorum Quorum in Non-Concurrent Access Configurations Quorum in Concurrent Access ConfigurationsAlternate Method TaskGuide Forcing a Varyon Starting the TaskGuide IBM Certification Study Guide AIX Hacmp Hacmp Installation and Cluster Definition Installing HacmpFirst Time Installs Cluster.base.server.utils Cluster.hc Install Server Nodes Rebooting ServersUpgrading From a Previous Version Upgrade AIX on One Node Install Hacmp 4.3 for AIX on Node a Check Upgraded Configuration Client-only Migration Defining Cluster Topology Defining the Cluster Defining Nodes Defining Adapters Hacmp Installation and Cluster Definition Node Name Configuring Network Modules Adding or Changing Adapters after the Initial Configuration Synchronizing the Cluster Definition Across Nodes Ignore Cluster Configuring Resource Groups Defining Resources Configuring Resources for Resource Groups Service IP Label Configuring Run-Time Parameters Defining Application Servers Initial Testing ClverifySynchronizing Cluster Resources Initial Startup Takeover and Reintegration Cluster Snapshot Applying a Cluster Snapshot Hacmp Installation and Cluster Definition IBM Certification Study Guide AIX Hacmp Cluster Customization Predefined Cluster Events Acquireserviceaddr AcquiretakeoveraddrGetdiskvgfs Nodeupremote Releaseserviceaddr Sequence of nodedown EventsNodedownlocal Nodedownremote Startserver Network Events NetworkdownNetworkup Networkupcomplete Configtoolong ReconfigtopologystartNetwork Adapter Events Swapadapter Event Recovery and Retry Configuration Resources Cluster Events Change/Show ClusterPre- and Post-Event Processing Event Notification Event Emulator Network Modules/Topology Services and Group Services Creating Shared Volume Groups NFS considerations Exporting NFS File Systems NFS MountingCascading Takeover with Cross Mounted NFS File Systems Creating NFS Mount Points on Clients Caveats about Node Names and NFS NFS Cross Mounts Cross Mounted NFS File Systems and the Network Lock Manager SLEEP=2 Done Cluster Testing Node VerificationDevice State 131 System Parameters Process StateNetwork State LVM State Cluster State Simulate Errors Adapter FailureEthernet or Token Ring Interface Failure Switch Adapter Failure Ethernet or Token Ring Adapter or Cable Failure Failure of a 7133 Adapter Re-attach the cables Node Failure / Reintegration AIX CrashCPU Failure Network Failure 2.3 TCP/IP Subsystem Failure Disk Failure Mirrored rootvg Disk hdisk0 Failure 4.2 7135 Disk Failure Mirrored 7133 Disk Failure Application Failure IBM Certification Study Guide AIX Hacmp Cluster Troubleshooting Cluster Log Files143 Configtoolong Daemons Deadman Switch Tuning the System Using I/O Pacing Extending the syncd FrequencyIncrease Amount of Memory for Communications Subsystem Changing the Failure Detection Rate Node Isolation and Partitioned Clusters Dgsp Message User ID Problems Troubleshooting Strategy IBM Certification Study Guide AIX Hacmp Cluster Management and Administration Monitoring the Cluster151 Clstat Command Monitoring Clusters using HAView System Error Log 3.1 /var/adm/cluster.log3.2 /tmp/hacmp.out 3.3 /usr/sbin/cluster/history/cluster.mmdd Starting and Stopping Hacmp on a Node or a Client Hacmp Daemons Cluster Manager daemon clstrmgrCluster Smux Peer daemon clsmuxpd Cluster Lock Manager daemon cllockd Starting Cluster Services on a Node Cluster Topology Services daemon topsvcsdCluster Group Services daemon grpsvcsd Cluster Information Program daemon clinfo Stopping Cluster Services on a Node Automatically Restarting Cluster Services When to Stop Cluster services Types of Cluster StopsGraceful Forced Starting and Stopping Cluster Services on Clients Maintaining Cluster Information Services on Clients Replacing Failed Components NodesAdapters 3.1 SSA/SCSI Disk Replacement RAID Disks Sync the volume group smit clsyncvg Manual Update Changing Shared LVM Components Lazy Update Spoc IBM Certification Study Guide AIX Hacmp TaskGuide Changing Cluster ResourcesTaskGuide Requirements 1 Add/Change/Remove Cluster Resources Synchronize Cluster Resources Dare Resource Migration Utility Resource Migration Types Sticky Resource MigrationNon-Sticky Resource Migration Default Location LocationsNode Name Using the cldare Command to Migrate Resources Stop Location Using the clfindres Command Stopping Resource Groups Applying Software Maintenance to an Hacmp Cluster Fallover System a Rejoins Cluster Backup Strategies Split-Mirror Backups How to do a split-mirror backup Using Events to Schedule a Backup User Management Listing Users On All Cluster Nodes Adding User Accounts on all Cluster Nodes Changing Attributes of Users in a Cluster Removing Users from a Cluster Managing Group Accounts Spoc Log IBM Certification Study Guide AIX Hacmp Special RS/6000 SP Topics High Availability Control Workstation HacwsHardware Requirements 183 Configuring the Backup CWS Software Requirements Install High Availability Software Hacws Configuration Setup and Test Hacws Kerberos Security Ambrose Bierce, The Enlarged Devil’s Dictionary Configuring Kerberos Security with Hacmp Version VSDs RVSDs Virtual Shared Disk VSDs Special RS/6000 SP Topics Undefined Recoverable Virtual Shared Disk D e Z Rsvd Daemons SP Switch as an Hacmp Network Switch Basics Within Hacmp Switch Failures Eprimary Management Special RS/6000 SP Topics IBM Certification Study Guide AIX Hacmp Hacmp Classic vs. HACMP/ES vs. Hanfs Hacmp for AIX ClassicHacmp for AIX / Enhanced Scalability 199 IBM Risc System Cluster Technology Rsct Enhanced Cluster Security High Availability for Network File System for AIX Similarities and Differences Decision Criteria Hacmp Classic vs. HACMP/ES vs. Hanfs IBM Certification Study Guide AIX Hacmp Appendix A. Special Notices 205 SP1 Special Notices IBM Certification Study Guide AIX Hacmp Appendix B. Related Publications International Technical Support Organization PublicationsRedbooks on CD-ROMs 209 Other Publications How to Get Itso Redbooks How IBM Employees Can Get Itso Redbooks211 How Customers Can Get Itso Redbooks Ibmmail IBM Redbook Order Form 213 IBM Certification Study Guide AIX Hacmp List of Abbreviations 215 Netbios Index Symbols 217 HACMP/ES 219 NIS Vgda Vgsa Itso Redbook Evaluation 221 IBM Certification Study Guide AIX Hacmp SG24-5131-00