Process Monitoring and Integrity

Table 11. Failed Failover/Reboot Recovery, Critical

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery or

the fault will determine which of the

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting failover & reboot

#

N/A

Configure

"failover & reboot"

recovery action

 

 

 

 

 

 

 

 

 

The existing code generates the

 

 

 

 

events for failover. They are

 

 

 

PMS executes a failover.

separate from process monitoring

-

N/A

N/A

 

events and are not described

 

 

 

 

here.

 

 

 

 

 

 

 

 

PMS detects that it is still running on

 

 

 

 

the active CMM. The process is

 

 

 

 

critical and therefore the reboot

 

 

 

 

operation is performed.

Monitoring initialized

#

De-assert

OK

Upon initialization of PMS after the

 

 

 

 

reboot. The monitor will de-assert the

 

 

 

 

event.

 

 

 

 

 

 

 

 

 

6.7.7Excessive Restarts, Escalate No Action

In this scenario PMS detects a process fault. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The escalation action is configured for no action.

Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 1 of 2)

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery or

the fault will determine which of the

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting process restart

#

N/A

Configure

"process restart"

recovery action

 

 

 

 

 

 

 

 

50MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 50
Image 50
Intel MPCMM0001 manual Excessive Restarts, Escalate No Action, Failed Failover/Reboot Recovery, Critical