Intel MPCMM0001 manual Successful Failover/Reboot Recovery

Models: MPCMM0001

1 281
Download 281 pages 15.26 Kb
Page 48
Image 48

Process Monitoring and Integrity

6.7.4Successful Failover/Reboot Recovery

In this scenario, PMS detects a process fault. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The recovery actions are successful.

Table 9. Successful Failover/Reboot Recovery

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery or

the fault will determine which of the

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting failover & reboot

#

N/A

Configure

"failover & reboot"

recovery action

 

 

 

 

 

 

 

 

PMS executes a failover.

The existing code generates the

 

 

 

events for failover. They are

 

 

 

Note this step is skipped when

separate from process monitoring

-

N/A

N/A

running on the standby CMM.

events and are not described

 

 

 

 

here.

 

 

 

 

 

 

 

 

PMS is running on the standby CMM

 

 

 

 

(failover was successful or already

 

 

 

 

running on the standby), PMS

 

 

 

 

recovers the CMM by rebooting.

Monitoring initialized

#

De-assert

OK

Upon initialization of PMS after the

 

 

 

 

reboot. The monitor will de-assert the

 

 

 

 

event.

 

 

 

 

 

 

 

 

 

6.7.5Failed Failover/Reboot Recovery, Non-Critical

In this scenario, PMS is running on the active CMM and detects a monitored process fault. The severity of the process is configured to a value that is not critical. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is not of a critical severity and therefore the reboot of the CMM will not be performed.

48MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 48
Image 48
Intel MPCMM0001 manual Successful Failover/Reboot Recovery, Failed Failover/Reboot Recovery, Non-Critical