NAD 3020 manual Primary root switch failure and recovery

Models: 3020

1 28
Download 28 pages 30.72 Kb
Page 20
Image 20
●Primary root switch failure and recovery

Design Guide

Primary root switch failure and recovery

Secondary root switch failure and recovery

These tests revealed the intricacies of fast convergence in the data center and the necessity for a holistic approach to high availability. Test cases that did not involve the failure of the active HSRP aggregation switch resulted in an average failover time of about 1 second. Failing the active HSRP device requires convergence at Layer 3 and resulted in a recovery time that reflected the settings of the HSRP timers.

It is possible to tune the HSRP timers for subsecond convergence. However, when multiple HSRP devices are involved the recovery time is typically in the 5-second range.

In this topology, 2 to 4 Gigabit Ethernet links compose the port-channel uplinks between the access and aggregation layers. This configuration allows a single link to fail without triggering Spanning Tree Protocol convergence.

Note: The default gateway for the servers is the HSRP address of the Layer 3 aggregation switches. Failover times may be affected if the default gateway of the server is located on another device, such as a load balancer or firewall.

The recommended topology provides a high level of availability to the blade servers except in one failure scenario. If all the uplinks to each of the aggregation switches from a single Cisco Catalyst Blade Switch 3020 are unavailable, the server NICs homed to that Cisco Catalyst Blade Switch 3020 are not notified by default. The blade servers are unaware of the disconnection between the access layer switches (Cisco Catalyst Blade Switch 3020s) and the aggregation layer switches, so they continue to forward traffic. To address this breakdown in network connectivity, use one of the following methods:

Use the NIC teaming features of the ProLiant blade servers

Deploy the Layer 2 trunk failover feature in the Cisco Catalyst Blade Switch 3020s

In addition, the NIC teaming features of the blade servers provide redundancy at the network- adapter level. Stagger the preferred primary NICs between the two Cisco switches in the enclosure to increase server availability. Assigning the primary NIC is a straightforward process. The NIC teaming software provides a GUI or a small configuration file, depending on the operating system, to construct the team. HP also offers network-aware teaming software to verify and detect network routes. For more information about these features, visit the ProLiant Essential Intelligent Network Pack at http://h18004.www1.hp.com/products/servers/proliantessentials/inp/index.html.

By monitoring the health of a server farm, a load balancer can bypass the network failure by redirecting traffic to available servers, helping ensure fulfillment of end-user requests despite the network failure.

The recommended network topology allows for traffic monitoring either locally or remotely using SPAN. Local SPAN supports monitoring of network traffic within one switch, whereas RSPAN allows the destination of mirrored traffic to be another switch within the data center. The source of mirrored traffic for a SPAN or RSPAN session can be one or more ports or VLANs.

Local SPAN is readily supported by the Cisco Catalyst Blade Switch 3020 over any of the external Gigabit Ethernet ports. This connection is an ideal location to attach an IDS or other network- analysis device.

© 2008 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information.

Page 20 of 28

Page 20
Image 20
NAD 3020 manual Primary root switch failure and recovery