25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

5.4Multiprocessor Considerations

In a multiprocessor system, data within a single cache line that is shared between processors can reduce performance. In certain cases (for example, semaphores), this kind of cache-line data sharing cannot be avoided, but it should be minimized where possible.

Data can often be restructured so this does not occur. Cache lines on AMD Athlon 64 and

AMD Opteron processors are presently 64 bytes, but a scheme that avoids this problem regardless of cache-line size makes for more performance-portable code.

For example, per-thread data can be allocated on the heap (for example, via calls to malloc()), and this is preferred over statically defined shared arrays and variables that are potentially located in a single cache line. Furthermore, some software environments even provide special versions of malloc that guarantee data alignment to a specified value, and these can be useful in aligning data and eliminating unwanted cache line overlap.

Application

This optimization applies to:

32-bit software

64-bit software

Chapter 5

Cache and Memory Optimizations

99

Page 115
Image 115
AMD 250 manual Multiprocessor Considerations, Application