Avoid Cache Line Sharing, Common Hop Myths Debunked

40555 Rev. 3.00 June 2006	Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
	ccNUMA Multiprocessor Systems

Spec JBB 2005 was run using the NUMA tools provided by Linux® to measure the performance improvement with node interleaving. The results were obtained on the same internal 4P Quartet system used for the synthetic tests.

3.3Avoid Cache Line Sharing

In a ccNUMA multiprocessor system, data within a single cache line that is shared between cores, even on the same node, can reduce performance. In certain cases, such as semaphores, this kind of cache-line data sharing cannot be avoided, but it should be minimized where possible.

Data can often be restructured so that such cache-line sharing does not occur. Cache lines on

AMD Athlon™ 64 and AMD Opteron™ processors are currently 64 bytes, but a scheme that avoids this problem, regardless of cache-line size, makes for more performance-portable code. For example, a multithreaded application should avoid using statically defined shared arrays and variables that are potentially located in a single cache line and shared between threads.

3.4Common Hop Myths Debunked

This section addresses several commonly held beliefs concerning the effect of memory access hops on system performance.

3.4.1Myth: All Equal Hop Cases Take Equal Time.

As a general rule, any n hop case is equivalent to any other n hop case in performance, if the only change between the two cases is thread and memory placement. However, there are exceptions to this rule.

The following example demonstrates how a given 1 hop-1 hop case is not equivalent in performance to another 1 hop-1 hop case using the synthetic test. The example shows how saturating the HyperTransport link throughput and stressing the HyperTransport queue buffers can cause this exception to occur.

In the graphs that follow, we compare the following three cases:

•Threads access local data

The first thread runs on node 0 and writes to memory on node 0 ( 0 hop). The second thread runs on node 1 and writes to memory on node 1 (0 hop).

•Threads not firing at each other (no crossfire)

The first thread runs on node 0 and writes to memory on node 1 (1 hop). The second thread runs on node 1 and writes to memory on node 3 (1 hop).

Chapter 3 Analysis and Recommendations

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.

AMD 64 manual Avoid Cache Line Sharing, Common Hop Myths Debunked, Threads access local data

Models: 64