Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

40555 Rev. 3.00 June 2006

ccNUMA Multiprocessor Systems

 

Very High: Total Time for both threads (write-write)

1.8

1.6

1.4

1.2

1

147%

158% 158%

169%

0.8

0.6

0.4

0.2

0

0 Hop

0 Hop

0 Hop

0 Hop

0 Hop

1 Hop

1 Hop

2 Hop

0.0.w.0 0.1.w.0 (0 Hops) (0 Hops)

0.0.w.0 0.1.w.1 (0 Hops) (1 Hops)

0.0.w.0 0.1.w.2 (0 Hops) (1 Hops) 0.0.w.0 0.1.w.3 (0 Hops) (2 Hops)

Figure 15. Both Write-Only Threads Running on Node 0 (Different Cores) under Very High Background Load (High Subscription)

Under a very high background load, for the 0 hop-1 hop case, there is a total memory access rate of

4.78GB/s on node 1. Several buffer queues on node 1 are saturated. For detailed analysis, refer to section Section A.5 on page 43.

Thus, greater hop distance does not always mean slower time. Remember that it is still advised that the developer keep the data local as much as possible. In the analogy used above, if the local queue has 20 customers and the remote one has two, the customer would much rather have been standing in front of the queue with two customers and make that his local queue in the first place. In the synthetic case above, keeping the first thread on node 0 doing local writes and the second thread on node 1 doing local writes would be the fastest.

3.5Locks

In general, it is good practice for user-level and kernel-level code to keep locks aligned to their natural boundaries. In some hardware implementations, locks that are not naturally aligned are handled with the mechanisms used for legacy memory mapped I/O and should absolutely be avoided if possible.

If a lock is aligned properly, it is treated as a faster cache lock. The significantly slower alternative to a cache lock is a bus lock, which should be avoided at all costs. Bus locks are very slow and force serialization of many operations unrelated to the lock within the processor. Furthermore bus locks prevent the entire HyperTransport fabric from making forward progress until the bus lock completes. Cache locks on the other hand are guaranteed atomicity by using the underlying cache coherence of the ccNUMA system and are much faster.

34

Analysis and Recommendations

Chapter 3

Page 34
Image 34
AMD 64 manual Locks

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.