Memory Mirroring
Continuous operation even in the event of a
The Express5800/1000 series server supports
A memory scan is run on all loaded memory modules at each OS
boot. If the system detects a memory failure, the failed component is immediately isolated and detached from the system preventing possible downtime during business operations.
Chip sparing (SDDC*) memory is a memory system loaded with several DRAM chips that can correct errors at the chip level. If a failure were to occur in the memory, the error can be corrected immediately to allow for continuous operation.
CPU | CPU |
| CPU | CPU |
Cell | Memory |
| Memory |
|
Controller | I/F |
| Controller |
|
| Memory |
| Memory | Mirror |
| I/F |
| Controller | |
|
|
|
| |
| Memory | Mirror | Memory |
|
| I/F | Controller |
| |
| Memory |
| Memory |
|
| I/F |
| Controller |
|
Memory
Image
Data 0 |
| Data 2 |
|
|
|
|
|
|
Data 1 |
| Data 3 |
|
|
|
|
|
|
Data 0 |
| Data 2 |
|
|
|
|
|
|
Data 1 |
| Data 3 |
|
|
|
Memory scrubbing checks memory content regularly (every few milliseconds) during operation without affecting performance. When an error is detected, it is corrected and then reported. The scrubbing function is effective in detecting errors in a timely manner which ultimately results in the reduction of
Memory mirroring takes place continuously, where the same data is written onto 2 separate memory blocks instead of 1 (available only on the 1160Xf and 1320Xf). In the event of a
|
| Unit of degradation |
Components covered by | Components covered by | on the Express5800/ |
the memory mirroring | the standard chip sparing | 1000 Series |
|
This construct allows for continuous operation through all non- correctablememory errors, not limited to the memory themselves, but also in the memory interfaces and the in memory controllers.
* Single Device Data Correction
Partial Chipset degradation
Avoid
In certain instances when multiple server partitions share a common crossbar controller, effects of a single partition failure may result in a
Within each of the LSI chips, which make up the chipset, multiple LSI
Furthermore, the downed partition can automatically reboot itself, after isolating the failed subsystem, to resume operations in a degradated mode without the intervention of a system administrator. This is made possible, on the Express5800/1000 series servers, by the redundant paths between the Cells and the IO.
0 |
Cell 0 |
Partial degradation
Failure
Unit | SubUnit | Crossbar |
Sub | Sub | Controller |
A | ||
Unit | Unit |
|
PCIBox |
0 |
01 n specifies the partition number
1 |
Cell 1 |
Sub | Sub | ControllerCrossbar |
Unit | Unit | |
Sub | Sub | B |
Unit | Unit |
|
PCIBox |
1 |
0Failure occurs at the
Partition 0 is shutdown so that the failed component can be isolated. Partition 0 is rebooted
1Not affected
7