IBM 750 manual Memory error correction extensions, Redundancy for array self-healing

Page 15

integration testing processes. During system manufacturing, systems go through a thorough testing process to help ensure the highest level of product quality.

The system cache and memory offer ECC (error checking and correcting) fault- tolerant features. ECC is designed to correct environmentally induced, single-bit, intermittent memory failures and single-bit hard failures. With ECC, the likelihood of memory failures will be substantially reduced. ECC also provides double-bit memory error detection that helps protect data in the event of a double-bit memory failure.

The AIX and IBM i operating systems provide disk drive mirroring and disk drive controller duplexing. The Linux operating system supports disk drive mirroring (RAID 1) through software, while other RAID protection schemes are provided via hardware RAID adapters.

The Journaled File System, also known as JFS or JFS2, helps maintain file system consistency and reduces the likelihood of data loss when the system is abnormally halted due to a power failure. JFS, the recommended file system for 32-bit kernels, now supports extents on the Linux operating system. This feature is designed

to substantially reduce or eliminate fragmentation. Its successor, JFS2, is the recommended file system for 64-bit kernels.

With 64-bit addressing, a maximum file system size of 32 TB and maximum file size of 16 TB, JFS2 is highly recommended for systems running the AIX operating system.

Memory error correction extensions

The memory has single-bit-error correction and double-bit-error detection ECC circuitry. The ECC code is also designed such that the failure of any one specific memory module within an ECC word by itself can be corrected absent any other fault.

Memory protection features include scrubbing to detect errors, a means to call for the deallocation of memory pages for a pattern of correctable errors detected, and signaling deallocation of a logical memory block when an error occurs that cannot be corrected by the ECC code.

Redundancy for array self-healing

Although the most likely failure event in a processor is a soft single-bit error in one of its caches, other events can occur, and they need to be distinguished from one another. For caches and their directories, hardware and firmware keep track of whether errors are being corrected beyond a threshold. If exceeded, a deferred repair error log is created.

Caches and directories on the POWER7 chip are manufactured with spare bits in their arrays that can be accessed via programmable steering logic to replace faulty bits in the respective arrays. This is analogous to the redundant bit steering employed in main storage as a mechanism that is designed to help avoid physical repair, and is also implemented in POWER7 systems. The steering logic is activated during processor initialization and is initiated by the built-in system-test (BIST) at power-on time.

When correctable error cache exceeds a set threshold, systems using the POWER7 processor invoke a dynamic cache line delete function, which enables them to stop using bad cache and eliminates exposure to greater problems.

Fault monitoring functions

When a POWER7 processor-based system is powered on, BIST and POST (power- on self-test) check processor, cache, memory, and associated hardware required for proper booting of the operating system. If a noncritical error is detected or if the errors occur in resources that can be removed from the system configuration, the restarting process is designed to proceed to completion. The errors are logged in the system nonvolatile RAM (NVRAM).

IBM United States Hardware Announcement 110-009

IBM is a registered trademark of International Business Machines Corporation

15

Image 15
Contents Table of contents Overview Key prerequisites Planned availability dateFor more information, visit Description Sata DVD-RAM Page Page IBM United States Hardware Announcement IBM United States Hardware Announcement IBM United States Hardware Announcement Drawer availability Page Inch racks Reliability, fault tolerance, and data correction Memory error correction extensions Fault monitoring functionsRedundancy for array self-healing Mutual surveillance Environmental monitoring functionsAvailability enhancement functions Serviceability First Failure Data Capture and Error Data Analysis Service InterfaceStand-alone diagnostics Service labelsLocation diagrams Error Handling and Reporting Benefits Service ProcessorIBM Electronics Services Call HomePage Product number Statement of general direction12X DDR 73.4 GB 15K RPM SAS SFF Disk Drive 8233 3M SAS CABLE, Adptr to Adptr AA Rfid Tags for SERVERS, BLADES, Bladecenters RACKS, and Hmcs USB Power Cord Foot, To Wall/OEM PDU Power Distribution Unit 8233 Following conversions are available to customers Publications Business Partner informationIBM Publications Center Portal Specified operating environment ServicesPhysical specifications Technical informationHardware requirements Noise level and sound powerPage RAID Limitations Software requirementsIBM United States Hardware Announcement Memory features Feature Minimum Maximum Number Quantity Scsi DDR 12X Cable ChoiceYes USB PCI DVD-ROM Sata SAS, SFF IBM Electronic Services Planning informationCable orders IBM Global Financing Warranty periodWarranty service Terms and conditionsWarranty service upgrades IBM United States Hardware Announcement Graduated program license charges apply Usage plan machine IBM hourly service rate classificationField-installable features Model conversions Machine installationEducational allowance Prices0296 Both 0855 Both Yes No US TAA Compliance Indicator 1476 Support Primary OS IBM Adptr AA E8B 3688 Both Yes No Blades BLADECENTERS, RACKS, and Hmcs E8B PCI English, #103P Opt Front IOP E8B Quantity 150 of #3658 Initial Month Indicator Minimum monthly maintenance charge SSD Placem Indicator US TAA Compliance Indicato Line Cord RPM SFF SAS Disk Ultra Scsi Port CBL. for RACK/RACK SAS Cable Dasd Backplane Extender CBL. USB Keyboard Cryptographic SAS RAID DVD RAM Driv NON Paired Pcie SAS RAID Keyboard USB, Polish Drawer to OEM PDU WR PW Drawer Mouse USB, with Keyboard Language Group Specify FRE Mmmc IOR24 Feature conversions Order now TrademarksTerms of use