Intel® Server Board SE7520JR2

Error Reporting and Handling

6.Error Reporting and Handling

This section defines how errors are handled. Also discussed is the role of the BIOS in error handling and the interaction between the BIOS, platform hardware, and server management firmware with regard to error handling. In addition, error-logging techniques are described and beep codes and POST messages are defined.

Note: The generic term “BMC” may be used throughout this secton when a feature and/or function being described is common to both the mBMC and the Sahalee BMC. If a described feature or function is unique, the specific management controller will be referenced.

6.1Fault Resilient Booting (FRB)

Fault Resilient Booting (FRB) is a set of BIOS and BMC algorithms and hardware support that allow a multiprocessor system to boot in case of failure of the bootstrap processor (BSP) under certain conditions. FRB functionality will differ depending on whether standard onboard platform instrumentation is used (mBMC) or whether an Intel Management Module is used.

With on-board platform instrumentation, should a processor failure be detected during POST, the mBMC does not have the ability to disable the failed or failing processor. Therefore the system may or may not continue to boot. A FRB-2 error will be generated to the System Event Log (SEL) and an error will be displayed at POST. FRB2 is a BIOS-based algorithm that uses the mBMC IPMI watchdog timer to protect against BIOS hangs during the POST process

On systems that have an Intel Management Module installed, several different levels of FRB are supported: FRB1, FRB2, FRB3, and OS Watchdog Timer. The FRB algorithms detect BSP failures and take steps to disable that processor and reset the system so another processor will run as the BSP.

6.1.1FRB1 – BSP Self-Test Failures

The BIOS provides an FRB1 timer. Early in POST, the BIOS checks the Built-in Self Test (BIST) results of the BSP. If the BSP fails BIST, the BIOS requests the Sahalee BMC to disable the BSP. The Sahalee BMC disables the BSP, selects a new BSP and generates a system reset. If there is no alternate processor available, the Sahalee BMC generates a beep code and halts the system. If the Sahalee BMC is not installed, then BIOS can only notify the user that the BIST failed; no processors will be disabled.

The BIST failure is displayed during POST and an error is logged to the SEL.

6.1.2FRB2 – BSP POST Failures

A second timer (FRB2) is set to several minutes by BIOS and is designed to guarantee that the system completes POST. The FRB2 timer is enabled just before the FRB3 timer is disabled to prevent any “unprotected” window of time. Near the end of POST, the BIOS disables the FRB2 timer. If the system contains more than 1 GB of memory and the user chooses to test every DWORD of memory, the watchdog timer is extended before the extended memory test starts, because the memory test can exceed the timer duration. The BIOS will also disable the watchdog timer before prompting the user for a boot password. If the system hangs during POST, before the BIOS disables the FRB2 timer, the Sahalee BMC generates an asynchronous

Revision 1.0

149

 

C78844-002

Page 149
Image 149
Intel SE7520JR2 manual Fault Resilient Booting FRB, 1 FRB1 BSP Self-Test Failures, 2 FRB2 BSP Post Failures