12 SPARC Enterprise T5440 Server Service Manual • June 2011
■“Memory Fault Handling Overview” on page24
Server Diagnostics OverviewYou can use a variety of diagnostic tools, commands, and indicators to monitor and
troubleshoot a server:
■LEDs – Provide a quick visual notification of the status of the server and of some
of the FRUs. See “Detecting Faults Using LEDs” on page34.
■ILOM firmware – This system firmware runs on the service processor. In addition
to providing the interface between the hardware and OS, ILOM also tracks and
reports the health of key server components. ILOM works closely with POST and
Oracle Solaris Operating System (Oracle Solaris OS) Predictive Self-Healing
technology to keep the system up and running even when there is a faulty
component. See “ILOM Overview” on page 18.
■Power-on self-test (POST) – POST performs diagnostics on system components
upon system reset to ensure the integrity of those components. POST is
configurable and works with ILOM to take faulty components offline if needed.
See “POST Fault Management Overview” on page 22.
■Oracle Solaris OS Predictive Self-Healing (PSH) – This technology continuously
monitors the health of the processor and memory, and works with ILOM to take a
faulty component offline if needed. The Predictive Self-Healing technology
enables systems to accurately predict component failures and mitigate many
serious problems before they occur. See “Identifying Faults Detected by PSH” on
page 48.
■Log files and console messages – Oracle Solaris OS log files and ILOM system
event log can be accessed and displayed on the device of your choice. For more
information, see “Detecting Faults (Oracle Solaris OS Files and Commands)” on
page 39 and “Detecting Faults (ILOM Event Log)” on page 40.
■Oracle VTS software – The Oracle VTS software exercises the system, provides
hardware validation, and discloses possible faulty components with
recommendations for repair. See “About Oracle VTS Software” on page42.
The LEDs, ILOM, Oracle Solaris OS PSH, and many of the log files and console
messages are integrated. For example, a fault detected by the Oracle Solaris software
displays the fault, logs it, and passes information to ILOM where it is logged.
Depending on the fault, one or more LEDs might be illuminated.
See TABLE: Diagnostic Flowchart Actions on page15 and Parameter on page 30 for
an approach for using the server diagnostics to identify a faulty field-replaceable
unit (FRU). The diagnostics you use, and the order in which you use them, depend
on the nature of the problem you are troubleshooting. So you might perform some
actions and not others.
Before referring t o the flowchart, p erform some basic troubleshooting tasks: