40555 Rev. 3.00 June 2006

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

 

ccNUMA Multiprocessor Systems

However, as shown in Figure 11 on page 31, when both threads are write-only, the 0 hop-1 hop and 0 hop-2 hop cases are faster than the 0 hop-0 hop case.

Total Time for both threads (write-write)

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

 

147%

126%

125%

136%

 

 

 

 

 

 

 

 

 

 

 

 

0

0 Hop

0 Hop

0 Hop

 

 

 

 

 

 

 

 

 

Hop

1 Hop

1 Hop

2 Hop

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.0.w.0 0.1.w.0 (0 Hops) (0 Hops)

0.0.w.0 0.1.w.1 (0 Hops) (1 Hops)

0.0.w.0 0.1.w.2 (0 Hops) (1 Hops)

0.0.w.0 0.1.w.3 (0 Hops) (2 Hops)

Figure 11. Both Write-Only Threads Running on Node 0 (Different Cores) on an Idle System

When a single thread reads locally, it generates a memory bandwidth load of 1.64 GB/s. Assuming a sustained memory bandwidth of 70% of the theoretical maximum of 6.4 GB/s (PC3200 DDR memory), the cumulative bandwidth demanded by two read-only threads does not exceed the sustained memory bandwidth on that node and hence the local or 0 hop-0 hop case is the fastest.

However, when a single thread writes locally it generates a memory bandwidth load of 2.98 GB/s. This is because each write in this test case results in a cache line eviction and thus generates twice the memory traffic generated by a read. The cumulative memory bandwidth demanded by 2 write-only threads now exceeds the sustained memory bandwidth on that node. The 0 hop-0 hop case now incurs the penalty of saturating the memory bandwidth on that node. For detailed analysis, refer to Section A.4 on page 42.

It is useful to study whether this observation is also applicable under a variable background load.

One would expect that, if the memory bandwidth demanded of the remote node were increased, at some point the 0 hop-1 hop case would become as slow as, and perhaps slower than, the

0 hop-0 hop case for the write-only threads.

The same two write-only threads as before are running on node 0, going though the following cases:

Both threads access local memory.

First thread accesses local memory and second thread accesses memory that is remote by one hop.

First thread accesses local memory and second thread access memory that is remote by two hops.

Chapter 3

Analysis and Recommendations

31

Page 31
Image 31
AMD 64 manual 147% 126% 125% 136%

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.