Modularization, redundancy and domain segmentation of the system clock

Minimizes downtime, and avoids multi-partition shutdown due to clock failure

Through modularization and redundancy, system downtime, due to clock failures, have been minimized. The Express5800/1000 series server has taken it one step further. In many cases, when a system is said to have a redundant clock, in actuality, only the oscillator

is redundant. Integral clock distribution mechanisms such as the clock driver or the amplifier are, many times, not redundant. Such a construct leads to the existence of system single point of failures. The Express5800/1000 series servers have redundancy in not only

the oscillator, but also in the clock distribution mechanisms so that system downtime can be minimized.

The 1320Xf system allows for the division of the system into two 16 processor segments, where one segment utilizes one system clock, and the other 16 processor segment utilizes the remaining system clock. A failure in a system clock therefore, will not result in shutdown of the entire system.

Hot

pluggable

Not hot

pluggable

Express5800/1000 Series

Redundant: Active, Standby

 

16 Processor Domain

 

Segmentation

 

 

 

 

 

 

 

 

16 Processor

16 Processor

 

 

Domain

Domain

chipset

chipset

chipset

chipset

Clock

Clock

Clock

Clock

Distribution

Distribution

Distribution

Distribution

Clock

Clock

Clock

Clock

Module

Module

Module

Module

Redundant Configuration A

 

Redundant Configuration B

 

 

 

Redundant: Active, Standby

 

Redundant: Active, Standby

 

 

 

chipset

chipset

chipset

chipset

 

SPOF

 

Clock

Clock

Clock

Distribution

Distribution

Module

Clock

Clock

Clock

Module

Module

Module

 

Express5800/1000 Series

Redundant Configuration A

Redundant Configuration B

Replacement of failed

Redundant

 

*

1

component without

Available on the 1320Xf/1160Xf

 

 

system halt

 

 

 

 

Minimized spread

16 processor Domain Segmentation

 

 

 

of failure

Available on the 1320Xf

 

 

 

 

 

 

 

 

*1: Hot plugging of the redundant oscillator is possible, however the hot plugging of the single clock driver is not possible

Diagnostics of the error detection circuits

Substantial strengthening of data integrity

Main data paths of the A3 chipset on the Express5800/1000 series servers have been protected by ECC. When a single bit error is

detected, a hardware error correction is carried out. Furthermore, paths between the A3 chipset interfaces support multi-bit error detection, and resending of errored data.

In addition to maintaining data integrity through these RAS features, the Express5800/1000 series server has the ability to

CPU

CPU

CPU

CPU

Memory

Controller

To

Cell

Memory

Controller

other CELL

Controller

Memory

controller

 

Controller

Memory

Controller

Cell card

Built-in high-speed error check for inter-chipset paths

run diagnostics on its own error detection circuits. During every system boot, all error detection circuits are diagnosed for possible failures. Without this feature, a failure in these circuits could result in the inability to detect errors during system operation.

Crossbar

Crossbar

Crossbar

Crossbar

Crossbar Card

Controller

Controller

Controller

Controller

 

 

I/O

I/O

 

PCI BOX

Router

Router

 

9

Page 9
Image 9
NEC 5800/1000 manual Diagnostics of the error detection circuits, Substantial strengthening of data integrity, Cell card