Error Handling and Reporting

Remove/replace procedures

Service labels that contain remove/replace procedures are often found on a cover of the system or in other spots accessible to the servicer. These labels provide systematic procedures, including diagrams, detailing how to remove/replace certain serviceable hardware components.

Arrows

Numbered arrows are used to indicate the order of operation and serviceability direction of components. Some serviceable parts such as latches, levers, and touch points need to be pulled or pushed in a certain direction and certain order for the mechanical mechanisms to engage or disengage. Arrows generally improve the ease of serviceability.

Packaging for service

The following service enhancements are included in the physical packaging of the systems to facilitate service:

•Color coding (touch points): Terracotta colored touch points indicate that a component (FRU/CRU) can be concurrently maintained. Blue colored touch points delineate components that are not concurrently maintained -- those that require the system to be turned off for removal or repair.

•Tool-less design: Selected IBM systems support tool-less or simple tool designs. These designs require no tools or simple tools such as flathead screwdrivers to service the hardware components.

•Positive retention: Positive retention mechanisms help to assure proper connections between hardware components such as cables to connectors, and between two cards that attach to each other. Without positive retention, hardware components run the risk of becoming loose during shipping or installation, preventing a good electrical connection. Positive retention mechanisms like latches, levers, thumb-screws, pop Nylatches (U-clips), and cables are included to help prevent loose connections and aid in installing (seating) parts correctly. These positive retention items do not require tools.

Error Handling and Reporting

In the unlikely event of system hardware or environmentally induced failure, the system runtime error capture capability systematically analyzes the hardware error signature to determine the cause of failure. The analysis result will be stored in system NVRAM. When the system can be successfully restarted either manually or automatically, the error will be reported to the operating system. Error Log Analysis (ELA) can be used to display the failure cause and the physical location of the failing hardware.

With the integrated Service Processor, the system has the ability to automatically send out an alert via phone line to a pager or call for service in the event of a critical system failure. A hardware fault will also turn on the amber system fault LED located on the system unit to alert the user of an internal hardware problem. The indicator may also be set to blink by the operator as a tool to allow system identification.

For identification, the blue locate LED on the enclosure and at the system level will turn on solid. The amber system fault LED will be on solid when an error condition occurs.

On POWER7 processor-based servers, hardware and software failures are recorded in the system log. When an HMC is attached, an ELA routine analyzes the error, forwards the event to the Service Focal Point (SFP) application running on the HMC, and notifies the system administrator that it has isolated a likely cause of the system problem. The Service Processor event log also records unrecoverable checkstop conditions, forwards them to the SFP application, and notifies the system administrator. Once the information is logged in the SFP application, if the system is properly configured, a call home service request will be initiated and the pertinent failure data with service parts information and part locations will be sent to an IBM Service organization. Customer contact information and specific system-related

IBM United States Hardware Announcement 110-009

IBM is a registered trademark of International Business Machines Corporation

IBM 750 manual Error Handling and Reporting

Models: 750

Error Handling and Reporting

IBM United States Hardware Announcement 110-009