Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

5.3Cache-Coherent Nonuniform Memory Access (ccNUMA)

Optimization

For applications with multiple threads, use OS functions to run a thread on a particular node and let that thread allocate the memory that it requires so that the memory used is local to that node. In the

Microsoft Windows environment, the function to run a thread on a particular node is SetThreadAffinityMask( ).

Be sure operating systems are properly configured to support ccNUMA. All versions of Microsoft Windows XP for AMD64 and Windows Server for AMD64 support ccNUMA without any changes. The 32-bit versions of Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition require the /PAE boot parameter to support ccNUMA.

For 64-bit Linux, there may be separate kernels supporting ccNUMA that should be selected.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

Most multiple processor systems available today employ a symmetric multiprocessing (SMP) architecture. Processors on an SMP platform generally share a common or centralized memory bus, having identical memory access latencies regardless of the processor position. Because the processors use the same bus and memory, system performance may be negatively affected when bottlenecks occur due to increased demands on the single memory bus. Figure 1 shows a simplified block diagram for a two processor SMP system.

96

Cache and Memory Optimizations

Chapter 5

Page 112
Image 112
AMD 250 manual Cache-Coherent Nonuniform Memory Access ccNUMA