Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

40555 Rev. 3.00 June 2006

ccNUMA Multiprocessor Systems

 

Data placement tools can also come in handy when a thread needs more data than the amount of physical memory available on a node. Certain OSs also allow data migration with these tools or API. Using this feature, data can be migrated from the node where it was first touched to the node where it is subsequently accessed. There is a cost associated with this migration and it is not advised to use it frequently. For additional details on the tools and APIs offered by various OS for thread and memory placement refer to Section A.7 on page 44.

It is recommended to avoid sharing of data resident within a single cache line between threads running on different cores.

Advanced developers may also run into interesting cases when experimenting with the thread and data placement tools and APIs. Sometimes, when comparing workloads that are symmetrical in all respects except for the thread and data placement used, the expected symmetry may be obscured.

These cases can mostly be explained by understanding the underlying system and avoiding saturation of resources due to an imbalanced load.

The buffer queues constitute one such resource. The lengths of these queues are configured by the BIOS with some hardware-specific limits that are specified in the BIOS Kernel and Developers Guide for the particular processor. Following AMD recommendations, the BIOS allocates these buffers on a link-by-link basis to optimize for the most common workloads.

In general, certain pathological access patterns should be avoided: several nodes trying to access data on one node or the crossfire scenario can saturate underlying resources such as the HyperTransport™ link bandwidth and HyperTransport buffer queues and should be avoided when possible. AMD makes event profiling tools available that developers can use to analyze whether their application is demonstrating such behavior.

AMD very strongly recommends keeping user-level and kernel-level locks aligned to their natural boundaries.

Some compilers for AMD multiprocessor systems provide additional hooks to allow for automatic parallelization of otherwise serial programs. There is also support for extensions to the OpenMP directives that can be used by OpenMP programs to improve performance.

While all the previous conclusions are stated in the context of threads, they can also be applied to processes.

38

Conclusions

Chapter 4

Page 38
Image 38
AMD 64 manual Conclusions

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.