Intel® NetStructureTMZT 7102 Chassis Management Module

CMM Redundancy, Synchronization and Failover

Note: The frequency of the ping to the first trap address can vary from one second to approximately 20 seconds.

2.Unhealthy Ethernet Switch:

A condition is asserted and a health score is computed if the active CMM’s corresponding Ethernet switch is not healthy or not present. The switch health is determined by the state of the HEALTHY# hardware signal coming from the Ethernet switch. Refer to the chassis specification to see which switch corresponds to the CMM. If both CMMs have unhealthy switches or are not present in the chassis, then a failover can still occur based on other failover conditions depending on the CMM health scores.

3.Critical events on the active CMM:

A condition occurs if the active CMM has critical events for any of the CMM sensors (not chassis or blade sensors). Critical events are events associated with crossing an upper or lower nonrecoverable threshold of a sensor. If both CMMs have critical CMM events, then the number of major and minor CMM events is examined to decide if a failover should occur. The number of major events is compared, and if they are equal, the number of minor events is used.

4.4.4Scenarios That Failover to an Equally Healthy CMM

The following conditions will cause a failover only if the health score of the standby CMM is equal to that of the active CMM:

1.The ejector latch on the active CMM is opened.

2.A manual failover is executed on the active CMM.

4.4.5Failover Timing

Times required to detect different possible failover conditions and perform data synchronization vary. For example, detecting network connection loss can take up to approximately 20 seconds. Complete synchronization typically takes 7 to 30 seconds to occur, assuming both CMMs are fully booted and a healthy Ethernet network connection and IPMB connection exist between the two CMMs). Synchronization with a newly inserted CMM can take two minutes, since a newly inserted CMM needs that time to boot and initialize.

Once the CMM data is initially synchronized, failover happens instantaneously at the hardware level. However, the CMM software requires some time to initialize various components following a failover. Software-based remote management applications accessing the CMM will need to reconnect to the newly active CMM. The newly active CMM may respond with unexpected errors while initializing.

4.4.6Manual Failover

The following command can be issued to the active CMM to cause a failover manually to the standby CMM:

cmmset -l cmm -d failover -v 1

A manual failover can only be initiated on the active CMM. A failover will only occur if the standby CMM is at least as healthy as the active CMM. Once the command executes, the former standby CMM immediately becomes the active CMM.

If the failover could not occur, the CLI will indicate the reason why the failover could not occur, and a SEL event will be recorded.

42

Technical Product Specification

Page 42
Image 42
Intel ZT 7102 manual Scenarios That Failover to an Equally Healthy CMM, Failover Timing, Manual Failover