Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ | 40555 Rev. 3.00 June 2006 |
ccNUMA Multiprocessor Systems |
|
3.1.2Multiple Threads-Shared Data
When scheduling multiple threads that share data on an idle system, it is preferable to schedule the threads on both cores of an idle node first, then on both cores of the the next idle node, and so on. In other words, schedule using core major order first followed by node major order.
For example, when scheduling threads that share data on a
•Core 0 and core 1 on node 0 in any order
•Core 0 and core 1 on node 1 in any order
•Core 0 and core 1 on node 2 in any order
•Core 0 and core 1 on node 3 in any order
3.1.3Scheduling on a Non-Idle System
Scheduling multiple threads for an application optimally on a
In general, most developers will achieve good performance by relying on the
In addition to the scheduler, several
For additional details on the tools and API libraries supported in various OSs, refer to Section A.7 on page 44.
3.2Data Locality Considerations
It is best to keep data local to the node from which it is being accessed. Accessing data remotely is slower than accessing data locally. The further the hop distance to the data, the greater the cost of accessing remote memory. For most
As explained in Section 2.1 on page page 13, if a thread is running and accessing data on the same node, it is considered as a local access. If a thread is running on one node but accessing data resident on a different node, it is considered as a remote access. If the node where the thread is running and the node where the data is resident are directly connected to each other, it is considered as a 1 hop access
20 | Analysis and Recommendations | Chapter 3 |