Chapter 15 High Availability Troubleshooting

The ATCA environment will usually contain a high-availability failover configuration between two ATCA switches in the chassis. Note that the failover features are configurable and a switch can be directed to fail over all of its processing when a single port or link goes down, or it can perform a port-to-port or VLAN-to-VLAN failover where both partner switches are still processing a portion of the network traffic.

Before replacing a switch that has gone out of service because of a switch-level failover, you need to understand how the high-availability features have been configured. If the switch failover was triggered by a port or link failure, make sure to isolate the cause for the link failure first, to make sure the problem is not external to the switch (for example, a bad or loose cable for a wired port).

Spontaneous Failover Activity

If while rebooting the inactive switch in a chassis causes the active switch to reboot and/or an unexpected failover, you can try setting the zsp.conf file vrrp_msg_rate to 500.

The VRRP_msg_rate is the time in milliseconds between transmissions VRRP messages on the inter-switch link (ISL). The VRRP protocol requires the absence of three VRRP messages before concluding that the remote switch has failed. The msg_rate must match the msg_rate of all siblings. Anything other than multiples of seconds does not conform to the VRRP specification, and will only run with the vrrpd.

Unexpected Fail-back Activity

If unexpected fail-back activity is observed check to make sure that only one switch is setup as the Master switch (vrrpd –M option) or the switches will oscillate. See the Ethernet Switch Blade User’s Guide for more information on setting the failover priority level.

Ethernet Switch Blade User's Guide

release 3.2.2j

page 183

Page 183
Image 183
Znyx Networks bh5700 manual High Availability Troubleshooting, Spontaneous Failover Activity, Unexpected Fail-back Activity