A hardware event monitor monitors the hardware for unusual behavior (known as an event) and sends a message to EMS, which notifies the system administrator and provides suggestions for correcting the problem.

There is a disk monitor that will monitor all disks attached to the A7173A adapter.

For more information about EMS and other online diagnostic tools, see the documents at: http://docs.hp.com/en/diag.html#2%20Online%20Diagnostics

HP Offline Diagnostics Environment (ODE)

The A7173A adapter supports HP’s Offline Diagnostics Environment (ODE). ODE is an offline support tools platform for troubleshooting systems that are running without an operating system or systems that cannot be tested using online tools. The offline environment is also useful for testing that needs to be done before a system is booted.

ODE provides a user-friendly interface for diagnostics and utilities that have been developed to run in this environment.

The Offline Diagnostics Environment has a distributed architecture consisting of several modules. Each module has a specific function and uses well defined protocols to communicate with the other modules.

You can use ODE with either a command line interface, or a menu-driven interface. The command line interface enables you to select specific tests and utilities to perform on a specific hardware module. The menu-driven interface enables you to specify the hardware module to be tested, then automatically selects and performs the necessary tests.

The Offline Diagnostic Environment consists of:

A Test Controller, which acts as the user interface and launches the execution of the Test Modules.

Test Modules, which consist of diagnostic or utility programs designed to execute within ODE. These modules exercise or diagnose user specified hardware units.

A System Library (SysLib), which consists of a set of common routines for use by both the Test Controller and the Test Modules. These routines perform I/O, string parsing, and system control.

For more information about ODE, see the documents at:http://docs.hp.com/en/diag.html#3%20Offline%20Diagnostics

PCI Error Recovery

The PCI Error Recovery feature provides the ability to detect, isolate, and automatically recover from a PCI error, avoiding a system crash. PCI Error Recovery is included with the HP-UX 11i v3 operating system, and it is enabled by default.

NOTE: PCI Error Recovery is not supported on all platforms. To determine if PCI Error Recovery is supported on your system, see the PCI Error Recovery Support Matrix:

http://www.docs.hp.com/en/ha.html#PCI%20Error%20Recovery

With the PCI Error Recovery feature enabled, if an error occurs on a PCI bus containing an I/O card that supports PCI Error Recovery the following events occur:

The PCI bus is quarantined to isolate the system from further I/O and prevent the error from damaging the system.

The PCI Error Recovery feature will attempt to recover from the error and reinitialize the bus so I/O can resume.

If the PCI Error Recovery feature is disabled and an error occurs on a PCI bus, a Machine Check Abort (MCA) or a High Priority Machine Check (HPMC) will occur, and the system will crash.

62 Troubleshooting