Intel® Server Board SE7520JR2

Error Reporting and Handling

If the BIOS is going to boot to a known PXE-compliant device, then the BIOS reads a user option for OS Watchdog Timer for PXE Boots and either disables the timer or enables the timer with a value read from the option (5, 10, 15, or 20 minutes). If the OS Watchdog Timer is enabled, the timer is repurposed as an OS Watchdog Timer and is referred to by that title as well.

If the OS Watchdog Timer is enabled and if a boot password is enabled, the BIOS will disable the OS Watchdog Timer before prompting the user for a boot password regardless of the OS Watchdog Timer option setting. Also, if the user has chosen to enter BIOS setup, the timer will be disabled regardless of option settings. Otherwise, if the system hangs during POST, before the BIOS disables the timer, the BMC generates an asynchronous system reset (ASR). The BMC retains status bits that can be read by the BIOS later in the POST for the purpose of disabling the previously failing processor, logging the appropriate event into the SEL, and displaying an appropriate error message to the user. If no IMM is present no processors will be disabled. As the timer may be repurposed, the BIOS and BMC will also keep track of which timer expired (early FRB2, late FRB2, or OS Watchdog) and display the appropriate error message to the user.

All of the user options are intended to allow a system administrator to set up a system such that during a normal boot no gap exists during POST that is not covered by the watchdog timer. Options are provided by the BIOS to control the policy applied to OS Watchdog timer failures. By default, an OS Watchdog Timer failure will not cause any action. Other options provided by the BIOS are for the system to reset or power off watchdog timer failure. However, it should be noted that these failures will NOT result in a processor being disabled (as could happen with an FRB2 failure).

6.1.5AP Failures

In systems configured with an Intel Management Module, the BIOS and Sahalee BMC implement additional safeguards to detect and disable the application processors (AP) in a multiprocessor system. If an AP fails to complete initialization within a certain time, it is assumed to be nonfunctional. If the BIOS detects that an AP has failed BIST or is nonfunctional, it requests the Sahalee BMC to disable that processor. Processors disabled by the Sahalee BMC are not available for use by the BIOS or the operating system. Since the processors are unavailable, they are not listed in any configuration tables including SMBIOS tables.

6.1.6Treatment of Failed Processors

All the failures (FRB3, FRB2, FRB1, and AP failures), including the failing processor, are recorded into the system event log (SEL). The FRB-3 failure is recorded automatically by the BMC while the FRB2, FRB1, and AP failures are logged to the SEL by the BIOS. In the case of an FRB2 failure, some systems will log additional information into the OEM data byte fields of the SEL entry. This additional data indicates the last POST task that was executed before the FRB2 timer expired. This information may be useful for failure analysis.

The Sahalee BMC maintains failure history for each processor in non-volatile storage. This history is used to store a processor’s track record. Once a processor is marked “failed,” it remains “failed” until the user forces the system to retest the processor by entering BIOS Setup and selecting the “Processor Retest” option. The BIOS reminds the user about a previous processor failure during each boot cycle until all processors have been retested and successfully pass the FRB tests or AP initialization. If all the processors are bad, the system

Revision 1.0

151

 

C78844-002

Page 151
Image 151
Intel SE7520JR2 manual AP Failures, Treatment of Failed Processors, Revision 151 C78844-002