1.4 Memory Error Detection and Correction

IBM PC servers implement four different memory systems:

Standard (parity) memory

Error Correcting Code-Parity

Error Correcting Code (ECC) memory

ECC Memory on SIMMs (EOS) Memory

1.4.1 Standard (Parity) Memory

Parity memory is standard IBM memory with 32 bits of data space and 4 bits of parity information (one check bit/byte of data). The 4 bits of parity information are able to tell you an error has occurred but do not have enough information to locate which bit is in error. In the event of a parity error, the system generates a non-maskable interrupt (NMI) which halts the system. Double bit errors are undetected with parity memory.

Standard memory is implemented in the PC Servers 300 and 320 as well as in the majority of the IBM desktops (for example the IBM PC 300, IBM PC 700, and PC Power Series 800).

1.4.2 Error Correcting Code (ECC)

The requirements for system memory in PC servers has increased dramatically over the past few years. Several reasons include the availability of 32 bit operating systems and the caching of hard disk data on file servers.

As system memory is increased, the possibility for memory errors increase. Thus, protection against system memory failures becomes increasingly important. Traditionally, systems which implement only parity memory halt on single-bit errors, and fail to detect double-bit errors entirely. Clearly, as memory is increased, better techniques are required.

To combat this problem, the IBM PC servers employ schemes to detect and correct memory errors. These schemes are called Error Correcting Code (or sometimes Error Checking and Correcting but more commonly just ECC). ECC can detect and correct single bit-errors, detect double-bit errors, and detect some triple-bit errors.

ECC works like parity by generating extra check bits with the data as it is stored in memory. However, while parity uses only 1 check bit per byte of data, ECC uses 7 check bits for a 32-bit word and 8 bits for a 64-bit word. These extra check bits along with a special hardware algorithm allow for single-bit errors to be detected and corrected in real time as the data is read from memory.

Figure 5 on page 10 shows how the ECC circuits operate. The data is scanned as it is written to memory. This scan generates a unique 7-bit pattern which represents the data stored. This pattern is then stored in the 7-bit check space.

Chapter 1. IBM PC Server Technologies 9

Page 24
Image 24
IBM SG24-4576-00 manual Memory Error Detection and Correction, Standard Parity Memory, Error Correcting Code ECC