Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ | 40555 Rev. 3.00 June 2006 |
ccNUMA Multiprocessor Systems |
|
resources approach saturation. The test has two modes:
Each thread is successively placed on all possible cores in the system. The data (array) accessed by each thread is also successively placed on all possible nodes in the system. Several Linux application programming interfaces (APIs) are used to explicitly pin a thread to a specified core and data to a specified node, thus allowing full control over thread and memory placement. (For additional details on the Linux API refer to section A.1 on page 39.) Once a thread or data is pinned to a core or node, it remains resident there for its entire lifetime. Thus the test runs through all permutations of thread and memory placement possible for the two threads. Since the test does not rely on the OS for thread and memory placement, the results obtained from the test are independent of the low level decisions made by the OS and are thus OS agnostic.
First, the two thread experiments are run on an idle system, thereby generating a truth table of 4096 timing entries for the two threads. The results are then mined to evaluate interesting scenarios of thread and memory placement. Several of these scenarios are presented in various graphs in this document.
Next, the experiments are enhanced by adding a variable load of background threads. The behavior of the two test (or foreground) threads is studied under the impact of these variable load background threads.
Each of the background threads reads a local
load. Table 1 defines these rate qualifiers.
Table 1. | Data Access Rate Qualifiers |
| |
|
|
|
|
| Data Access Rate Qualifier |
| Memory Bandwidth Demanded by a Background |
|
| Thread on an Idle System | |
|
|
| |
|
|
|
|
|
|
|
|
| Low |
| 0.5 GB/s |
|
|
|
|
| Medium |
| 1 GB/s |
|
|
|
|
| High |
| 2 GB/s |
|
|
|
|
| Very High |
| 4 GB/s |
|
|
|
|
The number of background threads is also varied as needed to make an increasing number of cores and nodes on the system
The
16 | Experimental Setup | Chapter 2 |