Process Monitoring and Integrity

Table 10. Failed Failover/Reboot Recovery, Non-Critical

Description

 

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery

or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery

or

 

the fault will determine which of the

 

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting failover & reboot

#

N/A

Configure

"failover & reboot"

recovery action

 

 

 

 

 

 

 

 

 

 

 

The existing code generates the

 

 

 

 

events for failover. They are

 

 

 

PMS executes a failover

separate from process monitoring

-

N/A

N/A

 

events and are not described

 

 

 

 

here.

 

 

 

 

 

 

 

 

 

 

 

 

PMS detects that it is still running on

 

 

 

 

 

 

the active CMM. The process is not

Failover & reboot recovery failure

#

N/A

Configure

critical and therefore the reboot

 

 

 

 

 

 

operation will not be performed.

 

 

 

 

 

 

 

 

 

 

 

 

 

No attempt will be made to recover

Process existence fault;

 

 

 

the process. The PMS will stop

 

 

 

monitoring disabled

or

 

 

 

monitoring the process.

 

 

 

Thread watchdog fault; monitoring

#

Assert

Configure

See Section 6.7.11, “Process

disabled

or

 

Administrative Action” on page 53, for

 

 

 

 

Process integrity fault; monitoring

 

 

 

information about how to re-enable

 

 

 

disabled

 

 

 

 

 

monitoring and de-assert the event.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6.7.6Failed Failover/Reboot Recovery, Critical

In this scenario, PMS is running on the active CMM and detects a monitored process fault. The severity of the process is configured to be critical. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of a critical severity and therefore the reboot of the CMM will be performed.

MPCMM0001 Chassis Management Module Software Technical Product Specification

49

Page 49
Image 49
Intel MPCMM0001 manual Failed Failover/Reboot Recovery, Critical, Failed Failover/Reboot Recovery, Non-Critical