8-2 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
To distribute interrupt handling among a group of processors — When several
processors are operating in a system in parallel, it is useful to have a centralized
mechanism for receiving interrupts and distributing them to available processors
for servicing.
To increase system performance by exploiting the multi-threaded and multi-
process nature of contemporary operating systems and applications.
The caching mechanism and cache consistency of Intel 64 and IA-32 processors are
discussed in Chapter 11. The APIC architecture is described in Chapter 10. Bus and
memory locking, serializing instructions, memory ordering, and Intel Hyper-
Threading Technology are discussed in the following sections.
8.1 LOCKED ATOMIC OPERATIONS
The 32-bit IA-32 processors support locked atomic operations on locations in system
memory. These operations are typically used to manage shared data structures (such
as semaphores, segment descriptors, system segments, or page tables) in which two
or more processors may try simultaneously to modify the same field or flag. The
processor uses three interdependent mechanisms for carrying out locked atomic
operations:
Guaranteed atomic operations
Bus locking, using the LOCK# signal and the LOCK instruction prefix
Cache coherency protocols that insure that atomic operations can be carried out
on cached data structures (cache lock); this mechanism is present in the
Pentium 4, Intel Xeon, and P6 family processors
These mechanisms are interdependent in the following ways. Certain basic memory
transactions (such as reading or writing a byte in system memory) are always guar-
anteed to be handled atomically. That is, once started, the processor guarantees that
the operation will be completed before another processor or bus agent is allowed
access to the memory location. The processor also supports bus locking for
performing selected memory operations (such as a read-modify-write operation in a
shared area of memory) that typically need to be handled atomically, but are not
automatically handled this way. Because frequently used memory locations are often
cached in a processor’s L1 or L2 caches, atomic operations can often be carried out
inside a processor’s caches without asserting the bus lock. Here the processor’s
cache coherency protocols insure that other processors that are caching the same
memory locations are managed properly while atomic operations are performed on
cached memory locations.
NOTE
Where there are contested lock accesses, software may need to
implement algorithms that ensure fair access to resources in order to
prevent lock starvation. The hardware provides no resource that
guarantees fairness to participating agents. It is the responsibility of