Vol. 3 8-11
MULTIPLE-PROCESSOR MANAGEMENT

8.2.3 Examples Illustrating the Memory-Ordering Principles

This section provides a set of examples that illustrate the behavior of the memory-
ordering principles introduced in Section 8.2.2. They are designed to give software
writers an understanding of how memory ordering may affect the results of different
sequences of instructions.
These examples are limited to accesses to memory regions defined as write-back
cacheable (WB). (Section 8.2.3.1 describes other limitations on the generality of the
examples.) The reader should understand that they describe only software-visible
behavior. A logical processor may reorder two accesses even if one of examples indi-
cates that they may not be reordered. Such an example states only that software
cannot detect that such a reordering occurred. Similarly, a logical processor may
execute a memory access more than once as long as the behavior visible to software
is consistent with a single execution of the memory access.

8.2.3.1 Assumptions, Terminology, and Notation

As noted above, the examples in this section are limited to accesses to memory
regions defined as write-back cacheable (WB). They apply only to ordinary loads
stores and to locked read-modify-write instructions. They do not necessarily apply to
any of the following: out-of-order stores for string instructions (see Section 8.2.4);
accesses with a non-temporal hint; reads from memory by the processor as part of
address translation (e.g., page walks); and updates to segmentation and paging
structures by the processor (e.g., to update “accessed” bits).
The principles underlying the examples in this section apply to individual memory
accesses and to locked read-modify-write instructions. The Intel-64 memory-
ordering model guarantees that, for each of the following memory-access instruc-
tions, the constituent memory operation appears to execute as a single memory
access:
Instructions that read or write a single byte.
Instructions that read or write a word (2 bytes) whose address is aligned on a 2
byte boundary.
Instructions that read or write a doubleword (4 bytes) whose address is aligned
on a 4 byte boundary.
Instructions that read or write a quadword (8 bytes) whose address is aligned on
an 8 byte boundary.
Any locked instruction (either the XCHG instruction or another read-modify-write
instruction with a LOCK prefix) appears to execute as an indivisible and uninterrupt-
ible sequence of load(s) followed by store(s) regardless of alignment.
Other instructions may be implemented with multiple memory accesses. From a
memory-ordering point of view, there are no guarantees regarding the relative order
in which the constituent memory accesses are made. There is also no guarantee that
the constituent operations of a store are executed in the same order as the constit-
uent operations of a load.