40555 Rev. 3.00 June 2006 | Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ |
| ccNUMA Multiprocessor Systems |
Spec JBB 2005 was run using the NUMA tools provided by Linux® to measure the performance improvement with node interleaving. The results were obtained on the same internal 4P Quartet system used for the synthetic tests.
3.3Avoid Cache Line Sharing
In a ccNUMA multiprocessor system, data within a single cache line that is shared between cores, even on the same node, can reduce performance. In certain cases, such as semaphores, this kind of
Data can often be restructured so that such
AMD Athlon™ 64 and AMD Opteron™ processors are currently 64 bytes, but a scheme that avoids this problem, regardless of
3.4Common Hop Myths Debunked
This section addresses several commonly held beliefs concerning the effect of memory access hops on system performance.
3.4.1Myth: All Equal Hop Cases Take Equal Time.
As a general rule, any n hop case is equivalent to any other n hop case in performance, if the only change between the two cases is thread and memory placement. However, there are exceptions to this rule.
The following example demonstrates how a given 1
In the graphs that follow, we compare the following three cases:
•Threads access local data
The first thread runs on node 0 and writes to memory on node 0 ( 0 hop). The second thread runs on node 1 and writes to memory on node 1 (0 hop).
•Threads not firing at each other (no crossfire)
The first thread runs on node 0 and writes to memory on node 1 (1 hop). The second thread runs on node 1 and writes to memory on node 3 (1 hop).
Chapter 3 | Analysis and Recommendations | 25 |