IBM P5 570 manual First Failure Data Capture, Permanent monitoring, Mutual surveillance

Page 68

3.2.2 First Failure Data Capture

If a problem should occur, the ability to diagnose it correctly is a fundamental requirement upon which improved availability is based. The p5-570 incorporates advanced capability in start-up diagnostics and in run-time First Failure Data Capture (FDDC) based on strategic error checkers built into the chips.

Any errors that are detected by the pervasive error checkers are captured into Fault Isolation Registers (FIRs, shown in Figure 3-1), which can be interrogated by the service processor (SP). The SP in the p5-570 has the capability to access system components using special-purpose service processor ports or by access to the error registers.

 

Error Checkers

CPU

Fault Isolation Register (FIR)

(unique fingerprint of each

 

 

error captured)

L1 Cache

 

L2/L3 Cache

Service

Processor

 

 

Log Error

 

Non-volatile

Memory

RAM

Disk

Figure 3-1 Schematic of Fault Isolation Register implementation

The FIRs are important because they enable an error to be uniquely identified, thus enabling the appropriate action to be taken. Appropriate actions might include such things as a bus retry, ECC correction, or system firmware recovery routines. Recovery routines could include dynamic deallocation of potentially failing components.

Errors are logged into the system non-volatile random access memory (NVRAM) and the SP event history log, along with a notification of the event to AIX for capture in the operating system error log. Diagnostic Error Log Analysis (diagela) routines analyze the error log entries and invoke a suitable action such as issuing a warning message. If the error can be recovered, or after suitable maintenance, the service processor resets the FIRs so that they can accurately record any future errors.

The ability to correctly diagnose any pending or firm errors is a key requirement before any dynamic or persistent component deallocation or any other reconfiguration can take place.

3.2.3 Permanent monitoring

The SP that is included in the p5-570 provides a way to monitor the system even when the main processor is inoperable. The next subsection offers a more detailed description of monitoring functions in p5-570.

Mutual surveillance

The SP can monitor the operation of the firmware during the boot process, and it can monitor the operating system for loss of control. This enables the service processor to take appropriate action, including calling for service, when it detects that the firmware or the

56p5-570 Technical Overview and Introduction

Image 68
Contents IBM Eserver p5 Technical Overview Introduction Page IBM Sserver p5 570 Technical Overview Introduction First Edition July Contents Page Page Vi p5-570 Technical Overview and Introduction Vii Trademarks Preface Team that wrote this RedpaperBecome a published author Comments welcomeGeneral description P5-570 Technical Overview and Introduction System specifications Physical packageMinimum and optional features View from the frontProcessor card features Processor card FC DescriptionMemory features Disk and media featuresModel D10 I/O drawer USB diskette drive5 I/O drawers Model D10 I/O drawer physical package Model D11 I/O drawer Model D20 I/O drawerDrawers and usable PCI slots Model D20 I/O drawer physical packageValue Paks Hardware Management Console modelsModel type conversion System racksIBM RS/6000 7014 Model T00 Enterprise Rack Rack-mounting rules for p5-570 and I/O drawers AC Power Distribution Unit and rack contentIBM RS/6000 7014 Model T42 Enterprise Rack Additional options for rack Flat panel display optionsIBM 7212 Model 102 TotalStorage Storage device enclosure OEM rackHardware Management Console 7310 Model CR2 Statement of direction 18 p5-570 Technical Overview and Introduction Architecture and technical overview POWER5 chip POWER4 POWER5Dynamic power management Enhanced SMT featuresSimultaneous multi-threading ST operationPower chip evolution POWER4Processor cards CMOS, copper, and SOI technologyProcessor drawer interconnect cables Processor card with DDR1 memory socket layout viewProcessor clock rate Pmcycles -mMemory restriction Memory placement rulesMemory subsystem RIO-2 buses and GX+ card System busesMemory throughput PCI-X slots and adapters Internal I/O subsystemSP bus LAN adapters Graphic acceleratorsScsi adapters Bit and 32-bit adaptersInternal storage Internal hot swappable Scsi disksHot-swap disks and Linux Internal RAID optionsInternal media devices 2 7311 Model D10 and 7311 Model D11 I/O drawers External I/O subsystems1 I/O drawers 3 7311 Model D20 I/O drawer Model D10 featuresModel D11 features 4 7311 I/O drawer and RIO-2 cabling Model D20 internal Scsi cabling5 7311 I/O drawer and Spcn cabling Cost Optimized Performance OptimizedExternal disk subsystems IBM 2104 Expandable Storage PlusIBM TotalStorage Enterprise Storage Server IBM 7133 Serial Disk Subsystem SSAIBM TotalStorage FAStT Storage servers Advanced Power Virtualization feature Dynamic logical partitioningVirtualization Virtual Ethernet15shows the POWER5 partitioning concept Micro-Partitioning technologyPOWER5 Partitioning Virtual I/O ServerService processor Partition Load ManagerService processor base Service processor extenderBoot process IPL flow without an HMC attached to the systemHardware Management Console IPL flow with an HMC attached to the systemProfiles Definitions of partitionsManaged systems Specific partition definitions used for Micro-Partitioning System Management ServicesHardware requirements for partitioning Boot options 17 System Management Services main menuAdditional boot options DVD-ROM, DVD-RAMAIX 5L Operating system requirementsSecurity Linux Linux supportCapacity on Demand, RAS, and manageability Processor Capacity Upgrade on Demand methods Way 1.65 GHz POWER5 processor cardWay 1.9 GHz POWER5 processor card with DDR1 memory slots Way 1.9 GHz POWER5 processor card with DDR2 memory slotsCapacity Upgrade on Demand for memory How to report temporary activation resources Capacity Upgrade on Demand for memory feature codesTrial Capacity on Demand Reliability, availability, and serviceabilityFault avoidance Mutual surveillance First Failure Data CapturePermanent monitoring Environmental monitoring Memory reliability, fault tolerance, and integritySelf-healing Fault masking 5 N+1 redundancyResource deallocation Dynamic or persistent deallocationServiceability Error indication and LED indicatorsAdvanced System Management Interface Concurrent MaintenanceManageability Service Agent Advanced System Management main menu3 p5 Customer-Managed Microcode Service Update Management AssistantService focal point Cluster CSM value points CSM V1.4 on AIX and Linux planned 4Q04IBM Redbooks Other publicationsOnline resources How to get IBM Redbooks Help from IBM68 p5-570 Technical Overview and Introduction Page IBM Eserver p5 Technical Overview Introduction

P5 570 specifications

The IBM P5 570 is a high-performance server that was designed for enterprise-scale computing, offering a blend of advanced technologies and a flexible architecture. Launched as part of IBM's Power5 server line, the P5 570 stands out for its robust processing capabilities and extensive scalability, making it a preferred choice for businesses requiring reliable and efficient computing solutions.

At the heart of the P5 570 is the IBM Power5 processor, which employs simultaneous multi-threading (SMT) technology. This allows the processor to handle two threads per core, effectively doubling the throughput for workloads ideally suited to multi-threading. The server typically features a configuration of up to 32 Power5 processors, providing an impressive compute power that supports demanding applications, ranging from databases to complex enterprise resource planning (ERP) systems.

The P5 570 architecture supports a wide range of memory configurations, with a maximum memory capacity of up to 512 GB. Utilizing IBM’s proprietary Chip Memory technology, it can deliver high bandwidth and low latency, significantly enhancing performance for memory-intensive applications. Furthermore, the integrated memory controller architecture optimizes memory access, ensuring that critical workloads run smoothly.

Scalability is a key characteristic of the P5 570, with the ability to expand processing power and memory capacity as an organization’s needs grow. The server supports various operating systems, including AIX, Linux, and IBM i, which provides flexibility for diverse IT environments. This versatility ensures that companies can run their preferred applications without the need for substantial system overhauls.

In terms of storage, the P5 570 utilizes advanced RAID technology and supports a variety of disk configurations, ensuring that data integrity and availability are maintained. Coupled with built-in security features, such as the IBM Trusted Foundation, which establishes a secure boot environment, the P5 570 offers a reliable platform for mission-critical workloads.

Finally, the IBM P5 570 is designed for high availability and redundancy. Features like hot-swappable components and advanced error detection and recovery mechanisms minimize downtime, making it a dependable choice for businesses that operate around the clock. Combined with its powerful hardware and versatile software support, the IBM P5 570 remains a formidable player in the high-performance server arena.