Intel 253668-032US , 8.1 LOCKED ATOMIC OPERATIONS,

8-2 Vol. 3

MULTIPLE-PROCESSOR MANAGEMENT

•To distribute interrupt handling among a group of processors — When several

processors are operating in a system in parallel, it is useful to have a centralized

mechanism for receiving interrupts and distributing them to available processors

for servicing.

•To increase system performance by exploiting the multi-threaded and multi-

process nature of contemporary operating systems and applications.

The caching mechanism and cache consistency of Intel 64 and IA-32 processors are

discussed in Chapter 11. The APIC architecture is described in Chapter 10. Bus and

memory locking, serializing instructions, memory ordering, and Intel Hyper-

Threading Technology are discussed in the following sections.

8.1 LOCKED ATOMIC OPERATIONS

The 32-bit IA-32 processors support locked atomic operations on locations in system

memory. These operations are typically used to manage shared data structures (such

as semaphores, segment descriptors, system segments, or page tables) in which two

or more processors may try simultaneously to modify the same field or flag. The

processor uses three interdependent mechanisms for carrying out locked atomic

operations:

•Guaranteed atomic operations

•Bus locking, using the LOCK# signal and the LOCK instruction prefix

•Cache coherency protocols that insure that atomic operations can be carried out

on cached data structures (cache lock); this mechanism is present in the

Pentium 4, Intel Xeon, and P6 family processors

These mechanisms are interdependent in the following ways. Certain basic memory

transactions (such as reading or writing a byte in system memory) are always guar-

anteed to be handled atomically. That is, once started, the processor guarantees that

the operation will be completed before another processor or bus agent is allowed

access to the memory location. The processor also supports bus locking for

performing selected memory operations (such as a read-modify-write operation in a

shared area of memory) that typically need to be handled atomically, but are not

automatically handled this way. Because frequently used memory locations are often

cached in a processor’s L1 or L2 caches, atomic operations can often be carried out

inside a processor’s caches without asserting the bus lock. Here the processor’s

cache coherency protocols insure that other processors that are caching the same

memory locations are managed properly while atomic operations are performed on

cached memory locations.

NOTE

Where there are contested lock accesses, software may need to

implement algorithms that ensure fair access to resources in order to

prevent lock starvation. The hardware provides no resource that

guarantees fairness to participating agents. It is the responsibility of