Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

40555 Rev. 3.00 June 2006

ccNUMA Multiprocessor Systems

 

resources approach saturation. The test has two modes: read-onlyand write-only. When the test threads are read-only, the throughput does not stress the capacity of the system resources and, thus, the test is more sensitive to latency. However, when the threads are write-only, there is a heavy throughput load on the system. This is described in detail in later sections of this document.

Each thread is successively placed on all possible cores in the system. The data (array) accessed by each thread is also successively placed on all possible nodes in the system. Several Linux application programming interfaces (APIs) are used to explicitly pin a thread to a specified core and data to a specified node, thus allowing full control over thread and memory placement. (For additional details on the Linux API refer to section A.1 on page 39.) Once a thread or data is pinned to a core or node, it remains resident there for its entire lifetime. Thus the test runs through all permutations of thread and memory placement possible for the two threads. Since the test does not rely on the OS for thread and memory placement, the results obtained from the test are independent of the low level decisions made by the OS and are thus OS agnostic.

First, the two thread experiments are run on an idle system, thereby generating a truth table of 4096 timing entries for the two threads. The results are then mined to evaluate interesting scenarios of thread and memory placement. Several of these scenarios are presented in various graphs in this document.

Next, the experiments are enhanced by adding a variable load of background threads. The behavior of the two test (or foreground) threads is studied under the impact of these variable load background threads.

Each of the background threads reads a local 64-MB array. The rate at which each background thread accesses memory can be adjusted from low to medium to high to very high to control the background

load. Table 1 defines these rate qualifiers.

Table 1.

Data Access Rate Qualifiers

 

 

 

 

 

 

Data Access Rate Qualifier

 

Memory Bandwidth Demanded by a Background

 

 

Thread on an Idle System

 

 

 

 

 

 

 

 

 

 

 

 

Low

 

0.5 GB/s

 

 

 

 

 

Medium

 

1 GB/s

 

 

 

 

 

High

 

2 GB/s

 

 

 

 

 

Very High

 

4 GB/s

 

 

 

 

The number of background threads is also varied as needed to make an increasing number of cores and nodes on the system busy—in other words, to increase the subscription. Full subscription means that every core in the system is busy running a thread. High subscription means that while several cores are busy, there are still some cores left free in the system.

The data-mining suggested several basic recommendations for performance enhancement on these systems. Also revealed were some interesting cases of asymmetry that allowed the low level

16

Experimental Setup

Chapter 2

Page 16
Image 16
AMD 64 manual Data Access Rate Qualifiers

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.