Chapter 5 Cache and Memory Optimizations 99
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
5.4 Multiprocessor Considerations
In a multiprocessor system, data within a single cache line that is shared between processors can
reduce performance. In certain cases (for example, semaphores), this kind of cache-line data sharing
cannot be avoided, but it should be minimized where possible.
Data can often be restructured so this does not occur. Cache lines on AMDAthlon 64 and
AMD Opteron processors are presently 64 bytes, but a scheme that avoids this problem regardless of
cache-line size makes for more performance-portable code.
For example, per-thread data can be allocated on the heap (for example, via calls to malloc()), and
this is preferred over statically defined shared arrays and variables that are potentially located in a
single cache line. Furthermore, some software environments even provide special versions of malloc
that guarantee data alignment to a specified value, and these can be useful in aligning data and
eliminating unwanted cache line overlap.
Application
This optimization applies to:
32-bit software
64-bit software