Keeping Data Local by Virtue of first Touch

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™	40555 Rev. 3.00 June 2006
ccNUMA Multiprocessor Systems

1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

T im e for w rite

1 4 9 %

1 27 % 12 9%

113 %

0 Ho p	1 Ho p	1 Ho p 2 Ho p

0 .0 .w .0 0 .0 .w .1 0 .0 .w .2 0 .0 .w .3

Figure 5. Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System

In this test case, a write access is similar to a read access as far as the coherent HyperTransport™ link traffic or the memory traffic generated, except for certain key differences. A write access brings data into the cache much like a read and then modifies it in the cache. However, in this particular synthetic test case, there are several successive write accesses to sequential cache line elements in a 64-MB array. This results in a steady state condition of cache line evictions or write-backs for each write access. This increases the memory and HyperTransport traffic that normally occurs for a write-only thread to almost twice that of a read-only thread. For our test bench, when a thread does local read- only accesses, it generates almost twice the memory bandwidth load of 1.64 GB/s, and when a thread performs local write-only accesses, it generates a memory bandwidth load of

2.98GB/s. Not only do writes take longer than reads for any given hop distance, but they slow down more quickly with hop distance as a result.

3.2.1Keeping Data Local by Virtue of first Touch

In order to keep data local, it is recommended that the following principles be observed.

As long as a thread initializes the data it needs (writes to it for the first time) and does not rely on any other thread to perform the initialization, a ccNUMA-aware OS keeps data local on the node where

the thread runs. This policy of keeping data local by writing to it for the first time is known as the local allocation policy by virtue of first touch. This is the default policy used by a ccNUMA-aware

OS.

A ccNUMA-aware OS ensures local allocation by taking a page fault at the time of the first touch to data. When the page fault occurs the OS maps the virtual pages associated with the data to zeroed out physical pages. Now the data is resident on the node where the first touch occurred and any subsequent accesses to the data will have to be serviced from that node.

Analysis and Recommendations

Chapter 3

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

Figure 5. Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System

3.2.1Keeping Data Local by Virtue of first Touch

64 specifications