Process Monitoring and Integrity

Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 2 of 2)

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects that the process has

Recovery failure due to excessive

#

N/A

Configure

been restarted excessively.

restarts

 

 

 

 

 

 

 

 

PMS attempts to execute the

 

 

 

 

escalated recovery action. Since the

Take no action specified for

#

N/A

Configure

recovery action is "no action", PMS

escalated recovery

 

 

 

disables monitoring of the process.

 

 

 

 

 

 

 

 

 

No attempt will be made to recover

Process existence fault;

 

 

 

the process. The PMS will stop

 

 

 

monitoring disabled or

 

 

 

monitoring the process.

 

 

 

Thread watchdog fault; monitoring

#

Assert

Configure

See Section 6.7.11, “Process

disabled or

Administrative Action” on page 53, for

 

 

 

Process integrity fault; monitoring

 

 

 

information about how to re-enable

 

 

 

disabled

 

 

 

monitoring and de-assert the event.

 

 

 

 

 

 

 

 

 

 

 

 

6.7.8Excessive Restarts, Successful Escalate Failover/Reboot

In this scenario PMS detects a process fault. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The escalated recovery action is successful.

Table 13. Excessive Restarts, Successful Escalate Failover/Reboot

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery or

the fault will determine which of the

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting process restart

#

N/A

Configure

"restart process"

recovery action

 

 

 

 

 

 

 

 

PMS detects that the process has

Recovery failure due to excessive

#

N/A

Configure

been restarted excessively.

restarts

 

 

 

 

 

 

 

 

The escalated recovery action

Attempting failover & reboot

#

N/A

Configure

specified is "failover and reboot"

escalated recovery action

 

 

 

 

 

 

 

 

PMS executes a failover.

The existing code generates the

 

 

 

events for failover. They are

 

 

 

Note this step is skipped when

separate from process monitoring

-

N/A

N/A

running on the standby CMM.

events and are not described

 

 

 

 

here.

 

 

 

 

 

 

 

 

PMS is running on the standby CMM

 

 

 

 

(failover was successful or already

 

 

 

 

running on the standby), PMS

 

 

 

 

recovers the CMM by rebooting.

Monitoring initialized

#

De-assert

OK

Upon initialization of PMS after the

 

 

 

 

reboot. The monitor will de-assert the

 

 

 

 

event.

 

 

 

 

 

 

 

 

 

MPCMM0001 Chassis Management Module Software Technical Product Specification

51

Page 51
Image 51
Intel MPCMM0001 manual Excessive Restarts, Successful Escalate Failover/Reboot