3.2.2 First Failure Data Capture

If a problem should occur, the ability to diagnose it correctly is a fundamental requirement upon which improved availability is based. The p5-570 incorporates advanced capability in start-up diagnostics and in run-time First Failure Data Capture (FDDC) based on strategic error checkers built into the chips.

Any errors that are detected by the pervasive error checkers are captured into Fault Isolation Registers (FIRs, shown in Figure 3-1), which can be interrogated by the service processor (SP). The SP in the p5-570 has the capability to access system components using special-purpose service processor ports or by access to the error registers.

 

Error Checkers

CPU

Fault Isolation Register (FIR)

(unique fingerprint of each

 

 

error captured)

L1 Cache

 

L2/L3 Cache

Service

Processor

 

 

Log Error

 

Non-volatile

Memory

RAM

Disk

Figure 3-1 Schematic of Fault Isolation Register implementation

The FIRs are important because they enable an error to be uniquely identified, thus enabling the appropriate action to be taken. Appropriate actions might include such things as a bus retry, ECC correction, or system firmware recovery routines. Recovery routines could include dynamic deallocation of potentially failing components.

Errors are logged into the system non-volatile random access memory (NVRAM) and the SP event history log, along with a notification of the event to AIX for capture in the operating system error log. Diagnostic Error Log Analysis (diagela) routines analyze the error log entries and invoke a suitable action such as issuing a warning message. If the error can be recovered, or after suitable maintenance, the service processor resets the FIRs so that they can accurately record any future errors.

The ability to correctly diagnose any pending or firm errors is a key requirement before any dynamic or persistent component deallocation or any other reconfiguration can take place.

3.2.3 Permanent monitoring

The SP that is included in the p5-570 provides a way to monitor the system even when the main processor is inoperable. The next subsection offers a more detailed description of monitoring functions in p5-570.

Mutual surveillance

The SP can monitor the operation of the firmware during the boot process, and it can monitor the operating system for loss of control. This enables the service processor to take appropriate action, including calling for service, when it detects that the firmware or the

56p5-570 Technical Overview and Introduction

Page 68
Image 68
IBM P5 570 manual First Failure Data Capture, Permanent monitoring, Mutual surveillance

P5 570 specifications

The IBM P5 570 is a high-performance server that was designed for enterprise-scale computing, offering a blend of advanced technologies and a flexible architecture. Launched as part of IBM's Power5 server line, the P5 570 stands out for its robust processing capabilities and extensive scalability, making it a preferred choice for businesses requiring reliable and efficient computing solutions.

At the heart of the P5 570 is the IBM Power5 processor, which employs simultaneous multi-threading (SMT) technology. This allows the processor to handle two threads per core, effectively doubling the throughput for workloads ideally suited to multi-threading. The server typically features a configuration of up to 32 Power5 processors, providing an impressive compute power that supports demanding applications, ranging from databases to complex enterprise resource planning (ERP) systems.

The P5 570 architecture supports a wide range of memory configurations, with a maximum memory capacity of up to 512 GB. Utilizing IBM’s proprietary Chip Memory technology, it can deliver high bandwidth and low latency, significantly enhancing performance for memory-intensive applications. Furthermore, the integrated memory controller architecture optimizes memory access, ensuring that critical workloads run smoothly.

Scalability is a key characteristic of the P5 570, with the ability to expand processing power and memory capacity as an organization’s needs grow. The server supports various operating systems, including AIX, Linux, and IBM i, which provides flexibility for diverse IT environments. This versatility ensures that companies can run their preferred applications without the need for substantial system overhauls.

In terms of storage, the P5 570 utilizes advanced RAID technology and supports a variety of disk configurations, ensuring that data integrity and availability are maintained. Coupled with built-in security features, such as the IBM Trusted Foundation, which establishes a secure boot environment, the P5 570 offers a reliable platform for mission-critical workloads.

Finally, the IBM P5 570 is designed for high availability and redundancy. Features like hot-swappable components and advanced error detection and recovery mechanisms minimize downtime, making it a dependable choice for businesses that operate around the clock. Combined with its powerful hardware and versatile software support, the IBM P5 570 remains a formidable player in the high-performance server arena.