Analysis and Recommendations, Scheduling Threads

40555 Rev. 3.00 June 2006	Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™
	ccNUMA Multiprocessor Systems

Chapter 3 Analysis and Recommendations

This section lays out recommendations to developers. Several of these recommendations are accompanied by empirical results collected from test cases with analysis, as applicable.

In addition to making recommendations for performance improvement, this section clarifies some of the common perceptions developers have about performance on AMD ccNUMA systems and, at the same time, reveals the impact of low level system resources on performance. The extent of the impact of these resources on the performance of any given application depends on the nature of the application. The goal is to help developers think like the machine when interpreting “counter intuitive” behavior while performance tuning.

While all analysis and recommendations are made with reference to the context of threads, they can also be applied to processes.

3.1Scheduling Threads

Scheduling multiple threads across nodes and cores of a system is complicated by a number of factors:

•Whether the system is idle.

•Whether multiple threads access independent data.

•Whether multiple threads access shared data.

3.1.1Multiple Threads-Independent Data

When scheduling multiple threads which access independent data on an idle system, it is preferable first to schedule the threads to an idle core of each node until all nodes are exhausted and then schedule the other idle core of each node. In other words, schedule using node major order first, followed by core major order. This is the suggested policy for a ccNUMA aware operating system on an AMD dual-core multiprocessor system.

For example, when scheduling threads, which access independent data, on the dual-core Quartet, scheduling the threads in the following order is recommended:

•Core 0 on node 0, node 1, node 2 and node 3 in any order

•Core 1 on node 0, node 1, node 2 and node 3 in any order

The two cores on each node of the dual-core AMD Opteron™ processor share the Northbridge resources, which include the memory controller and the physical memory that is connected to that node. The main motivation for this recommendation is to avoid overloading the resources on a single node, while leaving the resources on the rest of the system unused—in other words load balancing.

Chapter 3 Analysis and Recommendations

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.