Enhanced error detection of the high-speed interconnect
Intricate error handling through
Since higher speed interconnects are implemented to increase system performance, there are higher probabilities that interference noise will cause errors occurring along these interconnects. One method of handling these interconnect errors would be to disable the errored interconnect and operate in a degradated mode.
In addition to above method, the Expres5800/1000 series servers have implemented a methodology prevalent in supercomputers, where by intricate
the Express5800/1000 series servers to handle the intermittent errors which occur along the
| Without Check Features | |
| Logic Circuits | 1 bit Error |
chipset | Data | ECC |
| Error Detection | |
|
| Circuits |
| Failure |
|
Bad Data
Without Check Features
Logic Circuits
Data |
| ECC | ||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Error Detection
Circuits
Failure
Circuit
Check
Bad data, resulting from a simple error such as a single bit error, can not be blocked if a failure exists within the error detection circuits themselves.
Error
Reporting Unable to detect error
Diagnostics of the error detection circuits at every system boot insures data integrity.
Error | Error Detected | |
Reporting | ||
|
Two independent power sources
Avoid system shutdown due to failures of the power distribution units
The previous 32 processor and the 16 processor models supported having two independent power supplies, where the 8 processor model did not. This feature is now available on the new 8 processor system (1080Rf) so that the system can continue operations even in the event of a failure with in the power distribution unit.
Implementation of an Uninterruptible Power Supply (UPS) can further increase availability. The two independent power source feature is a standard feature on the 1320Xf and is available as an optional feature for 1160Xf and 1080Rf.
Autonomic reporting of error logs with pinpoint prognosis of failed components
Realization of a mainframe-class platform serviceability
The Express5800/1000 series servers are equipped with a service processor which process server management and platform error handling. The service processor can be considered the core component which supports the RAS features of the system. One feature of the service processor is its ability to analyze detail logs (BID:
Customer
Environment
| Diagnostics Agent |
|
|
|
| |
Diagnostics | Diagnostics of retry tendency and |
|
| |||
confirmation of whether threshold |
|
| ||||
Agent | , |
| ||||
was exceeded |
|
|
| t | ||
|
|
|
|
| ||
|
| tive | Maintenanceemen | |||
|
| ent | Repla |
| ||
| Preven |
|
|
| ||
Hard | Failed | Compon |
|
|
| |
|
|
|
|
| ||
ware |
|
|
|
|
|
|
Service |
|
|
|
| Internet | |
Processor |
|
|
|
| ||
|
|
|
|
|
|
Log
Manager
The error information summary is analyzed to determine the cause of the failure.
The development team may be contacted for assistance.
Maintenance Group
Log
Encrypted message
If required, the detail log is analyzed further by the development groups
In the event of a failure, the Express5800/1000 series servers also have the capability to automatically send detailed error logs to maintenance personnel, enabling us to further lessen the time required to resolve a system error. Furthermore, to minimize the possibility of a critical error, the diagnostics engine is able to proactively predict errors rather than just react to errors.
Log |
| ||
A detailed hardware error log | The Error information | Development Group | |
including transaction history is | |||
is sent via email | |||
collected. |
|
|
10