Process Monitoring and Integrity

6.7.3Successful Failover/Restart Recovery

In this scenario PMS detects a process fault. The configured recovery action is: failover to the standby CMM and then restart the failed process. The PMS is able to successfully recover the process by restarting it.

Table 8. Successful Failover/Restart Recovery

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery or

the fault will determine which of the

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting process failover &

#

N/A

Configure

"failover and restart".

restart recovery action

 

 

 

 

 

 

 

 

PMS executes a failover.

The existing code generates the

 

 

 

events for failover. They are

 

 

 

Note this step is skipped when

separate from process monitoring

-

N/A

N/A

running on the standby CMM.

events and are not described

 

 

 

 

here.

 

 

 

 

 

 

 

 

PMS was successfully able to restart

 

 

 

 

the process

Recovery successful

#

De-assert

OK

Note PMS will execute this step even

if the failover is unsuccessful (standby

 

 

 

 

not available, unhealthy, etc.).

 

 

 

 

 

 

 

 

 

MPCMM0001 Chassis Management Module Software Technical Product Specification

47

Page 47
Image 47
Intel MPCMM0001 manual Successful Failover/Restart Recovery