8-18 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT

8.2.4 Out-of-Order Stores For String Operations

The Intel Core 2 Duo, Intel Core, Pentium 4, and P6 family processors modify the
processors operation during the string store operations (initiated with the MOVS and
STOS instructions) to maximize performance. Once the “fast string” operations initial
conditions are met (as described below), the processor will essentially operate on,
from an external perspective, the string in a cache line by cache line mode. This
results in the processor looping on issuing a cache-line read for the source address
and an invalidation on the external bus for the destination address, knowing that all
bytes in the destination cache line will be modified, for the length of the string. In this
mode interrupts will only be accepted by the processor on cache line boundaries. It is
possible in this mode that the destination line invalidations, and therefore stores, will
be issued on the external bus out of order.
Code dependent upon sequential store ordering should not use the string operations
for the entire data structure to be stored. Data and semaphores should be separated.
Order dependent code should use a discrete semaphore uniquely stored to after any
string operations to allow correctly ordered data to be seen by all processors.
“Fast string” operation can be disabled by clearing the fast-string-enable bit (bit 0) of
IA32_MISC_ENABLES MSR.
Initial conditions for “fast string” operations are implementation specific. Example
conditions include:
EDI and ESI must be 8-byte aligned for the Pentium III processor. EDI must be 8-
byte aligned for the Pentium 4 processor.
String operation must be performed in ascending address order.
The initial operation counter (ECX) must be equal to or greater than 64.
Source and destination must not overlap by less than a cache line (64 bytes, for
Intel Core 2 Duo, Intel Core, Pentium M, and Pentium 4 processors; 32 bytes P6
family and Pentium processors).
The memory type for both source and destination addresses must be either WB
or WC.
NOTE
Initial conditions for “fast string“ operation in future Intel 64 or IA-32 processor fami-
lies may differ from above.

8.2.4.1 Memory-Ordering Model for String Operations on Write-back (WB)

Memory

This section deals with the memory-ordering model for string operations on write-
back (WB) memory for the Intel 64 architecture.
The memory-ordering model respects the follow principles:
1. Stores within a single string operation may be executed out of order.