Configuring and Deconfiguring Processors or Memory

All failures that crash the system with a machine check or check stop, even if intermittent, are reported as a diagnostic callout for service repair. To prevent the recurrence of intermittent problems and improve the availability of the system until a scheduled maintenance window, processors and memory books with a failure history are marked ″bad″ to prevent their being configured on subsequent boots.

A processor or memory book is marked ″bad″ under the following circumstances:

vA processor or memory book fails built-in self-test (BIST) or power-on self-test (POST) testing during boot (as determined by the service processor).

vA processor or memory book causes a machine check or check stop during runtime, and the failure can be isolated specifically to that processor or memory book (as determined by the processor runtime diagnostics in the service processor).

vA processor or memory book reaches a threshold of recovered failures that results in a predictive callout (as determined by the processor run-time diagnostics in the service processor).

During boot time, the service processor does not configure processors or memory books that are marked ªbad.º

If a processor or memory book is deconfigured, the processor or memory book remains offline for subsequent reboots until it is replaced or repeat gard is disabled. The repeat gard function also provides the user with the option of manually deconfiguring a processor or memory book, or re-enabling a previously deconfigured processor or memory book. For information on configuring or deconfiguring a processor, see the Processor Configuration/Deconfiguration Menu on page 33.

For information on configuring or deconfiguring a memory book, see the Memory Configuration/Deconfiguration Menu on page 35. Both of these menus are submenus under the System Information Menu.

You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the Processor Configuration/Deconfiguration Menu.

Run-Time CPU Deconfiguration (CPU Gard)

L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache correctable errors are monitored by the processor runtime diagnostics (PRD) code running in the service processor. When a predefined error threshold is met, an error log with warning severity and threshold exceeded status is returned to AIX. At the same time, PRD marks the CPU for deconfiguration at the next boot. AIX will attempt to migrate all resources associated with that processor to another processor and then stop the defective processor.

Chapter 3. Using the Service Processor 55

Page 69
Image 69
IBM pSeries690 manual Configuring and Deconfiguring Processors or Memory, Run-Time CPU Deconfiguration CPU Gard

pSeries690 specifications

The IBM pSeries690, also known as the IBM eServer p690, is a high-performance server within IBM's line of Power Systems, designed primarily for enterprise-level workloads. Launched in the early 2000s, it distinguished itself as one of IBM's flagship offerings, aimed at delivering exceptional scalability, reliability, and processing power.

At the heart of the pSeries690 is the IBM POWER4 processor, a revolutionary architecture that significantly enhanced processing capabilities. The system supports multiple processor configurations, allowing businesses to customize performance based on their specific applications. Each pSeries690 can be configured with up to 32 POWER4 processors, providing a maximum processing power that is well-suited for demanding enterprise applications, complex databases, and large-scale transaction processing.

One of the standout features of the pSeries690 is its ability to handle massive amounts of memory — up to 1TB of total RAM. This is particularly advantageous for businesses that run memory-intensive applications or require high availability for critical tasks. The server’s advanced memory architecture contributes to faster data access and improved overall system performance.

Additionally, the IBM pSeries690 employs the innovative AIX operating system, which is tailored for high performance and reliability. AIX comes equipped with features such as dynamic logical partitioning (LPAR), allowing resources to be allocated flexibly and efficiently across different workloads. This technology supports virtualization, enabling multiple operating environments to run concurrently on a single physical server, optimizing resource utilization.

The pSeries690 also includes robust fault tolerance and reliability features. Redundant components, hot-swappable parts, and advanced error detection mechanisms ensure that the system remains operational even in the event of hardware failures. This reliability is crucial for mission-critical applications where downtime can lead to substantial financial losses.

Networking capabilities of the pSeries690 are enhanced by support for high-speed connections, including Gigabit Ethernet and the ability to integrate with various network topologies. This versatility facilitates seamless connectivity in complex IT environments.

In conclusion, the IBM pSeries690 remains a powerful and reliable choice for enterprises seeking to manage and optimize their most demanding workloads. Its combination of cutting-edge processing capabilities, expansive memory support, and robust virtualization features makes it a significant player in the realm of enterprise computing. As businesses continue to evolve, the core strengths of the pSeries690, especially in terms of scalability and reliability, ensure that it continues to meet the needs of demanding applications.