NEC 5800/1000 manual Enhanced error detection of the high-speed interconnect, Internet

Page 10

Mainframe-class RAS Features

Enhanced error detection of the high-speed interconnect

Intricate error handling through multi-bit error detection and resending of errored data

Since higher speed interconnects are implemented to increase system performance, there are higher probabilities that interference noise will cause errors occurring along these interconnects. One method of handling these interconnect errors would be to disable the errored interconnect and operate in a degradated mode.

In addition to above method, the Expres5800/1000 series servers have implemented a methodology prevalent in supercomputers, where by intricate multi-bit error detection is carried out, and errored data is resent upon detection of an error. This allows

the Express5800/1000 series servers to handle the intermittent errors which occur along the high-speed interconnects, without impacting the system performance.

 

Without Check Features

 

Logic Circuits

1 bit Error

chipset

Data

ECC

 

Error Detection

 

 

Circuits

 

Failure

 

Bad Data

Without Check Features

Logic Circuits

Data

 

ECC

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Error Detection

Circuits

Failure

Circuit

Check

Bad data, resulting from a simple error such as a single bit error, can not be blocked if a failure exists within the error detection circuits themselves.

Error

Reporting Unable to detect error

Diagnostics of the error detection circuits at every system boot insures data integrity.

Error

Error Detected

Reporting

 

Two independent power sources

Avoid system shutdown due to failures of the power distribution units

The previous 32 processor and the 16 processor models supported having two independent power supplies, where the 8 processor model did not. This feature is now available on the new 8 processor system (1080Rf) so that the system can continue operations even in the event of a failure with in the power distribution unit.

Implementation of an Uninterruptible Power Supply (UPS) can further increase availability. The two independent power source feature is a standard feature on the 1320Xf and is available as an optional feature for 1160Xf and 1080Rf.

Autonomic reporting of error logs with pinpoint prognosis of failed components

Realization of a mainframe-class platform serviceability

The Express5800/1000 series servers are equipped with a service processor which process server management and platform error handling. The service processor can be considered the core component which supports the RAS features of the system. One feature of the service processor is its ability to analyze detail logs (BID: built-in diagnosis) which are collected by the chipset in the event of an error. The BID is able to diagnose the location of the error, and will pinpoint the required FRU (Field Replaceable Unit) so that the time required to replace the component and recover the system, can be minimized.

Customer

Environment

 

Diagnostics Agent

 

 

 

 

Diagnostics

Diagnostics of retry tendency and

 

 

confirmation of whether threshold

 

 

Agent

,

 

was exceeded

 

 

 

t

 

 

 

 

 

 

 

tive

Maintenanceemen

 

 

ent

Repla

 

 

Preven

 

 

 

Hard

Failed

Compon

 

 

 

 

 

 

 

 

ware

 

 

 

 

 

 

Service

 

 

 

 

Internet

Processor

 

 

 

 

 

 

 

 

 

 

Log

Manager

The error information summary is analyzed to determine the cause of the failure.

The development team may be contacted for assistance.

Maintenance Group

Log

Mail

Encrypted message

If required, the detail log is analyzed further by the development groups

In the event of a failure, the Express5800/1000 series servers also have the capability to automatically send detailed error logs to maintenance personnel, enabling us to further lessen the time required to resolve a system error. Furthermore, to minimize the possibility of a critical error, the diagnostics engine is able to proactively predict errors rather than just react to errors.

Log

Mail

 

A detailed hardware error log

The Error information

Development Group

including transaction history is

is sent via email

collected.

 

 

10

Image 10
Contents NEC Express5800/1000 Series Dedicated Cache Coherency Interface CCI Crossbar-less configuration Available only on 1080RfVery Large Cache VLC Architecture Resource virtualization through Floating IOA3 Chipset CellDual-Core Intel Itanium processor Compiler Features for performance improvementIntel Itanium processor supported compiler Dedicated Cache Coherency Interface CCI Crossbar-less configurationVLC Architecture Directory Based Cache CoherencyFramework for hardware, firmware and OS error handling RAS Design PhilosophyMemory Mirroring Partial Chipset degradationHighly Available Center Plane Complete modularization and redundancyExpress5800/1000 Series Diagnostics of the error detection circuitsSubstantial strengthening of data integrity Cell cardRealization of a mainframe-class platform serviceability Enhanced error detection of the high-speed interconnectTwo independent power sources InternetResource virtualization through floating I/O Superior standard chassis configurationInvestment Protection Multi OS support / Rich application lineupL2 Cache/core 1MB I / 256KB D L3 Cache/core Processor Dual Core Intel Itanium processorL1 Cache/core 16KB I / 16KB D 12MB