Intel MPCMM0001 manual Process Administrative Action

Models: MPCMM0001

1 281
Download 281 pages 15.26 Kb
Page 53
Image 53

Process Monitoring and Integrity

recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of critical severity and therefore the reboot of the CMM will still be executed even though the CMM is still active.

Table 15. Excessive Restarts, Failed Escalate Failover/Reboot, Critical

Description

Event String

UID

Assert

Severity

 

 

 

 

 

PMS detects a faulty process. The

Process existence fault;

 

 

 

attempting recovery or

 

 

 

mechanism (existence, thread

Thread watchdog fault; attempting

 

 

 

watchdog, or integrity) used to detect

#

Assert

Configure

recovery or

the fault will determine which of the

 

 

 

Process integrity fault; attempting

 

 

 

event type strings will be used.

 

 

 

 

recovery

 

 

 

 

 

 

 

 

The recovery action specified is

Attempting process restart

#

N/A

Configure

"restart process"

recovery action

 

 

 

 

 

 

 

 

PMS detects that the process has

Recovery failure due to excessive

#

N/A

Configure

been restarted excessively.

restarts

 

 

 

 

 

 

 

 

The escalated recovery action

Attempting failover & reboot

#

N/A

Configure

specified is "failover and reboot"

escalated recovery action

 

 

 

 

 

 

 

 

 

The existing code generates the

 

 

 

 

events for failover. They are

 

 

 

PMS executes a failover.

separate from process monitoring

-

N/A

N/A

 

events and are not described

 

 

 

 

here.

 

 

 

 

 

 

 

 

PMS detects that it is still running on

 

 

 

 

the active CMM. The process is

 

 

 

 

critical and therefore the reboot

 

 

 

 

operation is performed.

Monitoring initialized

#

De-assert

OK

Upon initialization of PMS after the

 

 

 

 

reboot. The monitor will de-assert the

 

 

 

 

event.

 

 

 

 

 

 

 

 

 

6.7.11Process Administrative Action

In this scenario, PMS has detected a fault in a process, but has not been able to recover the process (recovery is configured for no action, etc.). This causes PMS to operationally disable monitoring of the process. To re-enable monitoring of the process, an operator must administratively lock the process, take the necessary actions to fix the process, and administratively unlock the process.

Table 16. Administrative Action

Description

Event String

UID

Assert

Severity

 

 

 

 

 

Operator administratively locks

None

-

N/A

N/A

monitoring of the process

 

 

 

 

 

 

 

 

 

Operator takes actions to fix the

N/A

-

N/A

N/A

problem

 

 

 

 

 

 

 

 

 

Operator administratively unlocks

 

 

 

 

monitoring of the process causing

Monitoring initialized

#

De-assert

OK

monitoring to restart

 

 

 

 

 

 

 

 

 

MPCMM0001 Chassis Management Module Software Technical Product Specification

53

Page 53
Image 53
Intel MPCMM0001 manual Process Administrative Action