Chapter 6
Reliability and Serviceability Features
Reliability, availability, and serviceability are key requirements for platforms running
Reliability Features
Intended to execute core business operations, the BladeSymphony 1000’s modular design increases reliability through the high availability of redundant components. Rather than focus on creating individual highly available components, the BladeSymphony 1000 utilizes multiple
The BladeSymphony 1000 is designed with features to help ensure the system does not crash due to a failure and to minimize the effects from a failure. These features are listed in Table 11.
| Table 11: Reliability features | |
|
|
|
Function |
| Feature |
|
|
|
Quickly detect/diagnose failed part |
| BIOS |
|
| Memory scrubbing function (Intel Itanium Server Blade) |
|
| |
Failure recovery by retry and correc- | ECC function (memory, CPU bus, SMP link (Intel Itanium | |
tion |
| Server Blade), CRC retry function (PCIe, SCSI) |
|
|
|
Dynamic isolation of failed part |
| Advanced ECC, online spare memory |
|
|
|
Redundant configurations |
| HDD Modules, redundant Switch & Management Modules, |
|
| Power Modules, and Cooling Fan Modules |
|
| Memory mirroring (Intel Xeon Server Blades) |
|
|
|
Redundant system configurations |
| Redundant LAN/FC modules |
|
| Cluster system configuration, N+1/N+M configurations |
|
|
|
Obtain failure information |
| Isolation of failed part using System Event Log, BladeSym- |
|
| phony Management Suite, and Storage Manager |
|
| Automatic notification of failure by ASSIST via email |
|
|
|
Block failed part |
| Isolation of failed part upon system boot |
|
|
|
Repair failed part during operation |
| Repair CPI adapter, Switch & Management Module, Power |
|
| Module, Cooling Fan Module while system is operating |
|
|
|
www.hitachi.com | BladeSymphony 1000 Architecture White Paper 39 |