Sun Microsystems T6320 service manual Memory Fault Handling, Troubleshooting Memory Faults

Models: T6320

1 150
Download 150 pages 45.06 Kb
Page 37
Image 37
TABLE 2-2FB-DIMM Configuration and Installation (Continued)

TABLE 2-2FB-DIMM Configuration and Installation (Continued)

 

 

Motherboard

FB-DIMM

 

 

 

FB-DIMM

Installation

FB-DIMM

Branch Name Channel Name

FRU Name

Connector

Order*

Pair\

 

/SYS/MB/CMP0/BR3/CH0/D1

J2501

3

H

Channel 1

/SYS/MB/CMP0/BR3/CH1/D0

J2601

2

G

 

/SYS/MB/CMP0/BR3/CH1/D1

J2701

3

H

 

 

 

 

 

*Upgrade path: DIMMs should be added with each group populated in the order shown. \ Fault replacement path: Each pair is addressed as a unit, and each pair must be identical.

2.2.1.2Memory Fault Handling

The Sun Blade T6320 server module uses advanced ECC technology, also called chipkill, that corrects up to 4-bits in error on nibble boundaries, as long as they are all in the same DRAM. If a DRAM fails, the DIMM continues to function.

Note – The chipkill function is only supported on DIMMs that use “x4” DRAMs.

The following server module features manage memory faults independently.

POST – Runs when the server module is powered on (based on configuration variables) and thoroughly tests the memory subsystem.

If a memory fault is detected, POST displays the fault with the FRU name of the faulty DIMMs, logs the fault, and disables the faulty DIMMs by placing them in the Automatic System Recovery (ASR) blacklist. For a given memory fault, POST disables half of the physical memory in the system. When this occurs, you must replace the faulty DIMMs based on the fault message and enable the disabled DIMMs with the ILOM command set /SYS/component component_state= enabled .

Solaris Predictive Self-healing (PSH) technology – A feature of the Solaris OS, uses the fault manager daemon (fmd) to watch for various kinds of faults. When a fault occurs, the fault is assigned a unique fault ID (UUID), and logged. PSH reports the fault and provides a recommended proactive replacement for the DIMMs associated with the fault.

2.2.1.3Troubleshooting Memory Faults

If you suspect that the server module has a memory problem, follow the flowchart (see FIGURE 2-1). Type the ILOM command: show /SP/faultmgmt . The faultmgmt command lists memory faults and lists the specific DIMMs that are

Chapter 2 Sun Blade T6320 Server Module Diagnostics 2-11

Page 37
Image 37
Sun Microsystems T6320 service manual Memory Fault Handling, Troubleshooting Memory Faults