40555 Rev. 3.00 June 2006

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

 

ccNUMA Multiprocessor Systems

4.4GB/s necessary. The two coherent HyperTransport links are loaded at 3.5 GB/s each. Thus the utilization of each of the two coherent HyperTransport links that connect node 0 and node 1 equals 87% (3.5÷4).

A.2.3 What Role Do Buffers Play in the Throughput Observed?

Node 0 queues up packets in HyperTransport buffers and sends them on the outgoing link only if node 1 can accommodate them. Likewise node 1 queues up packets in HyperTransport buffers and sends them on the outgoing link only if node 0 can accept them.

When the HyperTransport buffers are saturated, they can prevent the coherent HyperTransport links from reaching their full throughput capacity of 4GB/s and, thus, full 100% utilization.

Also, saturating the HyperTransport buffers in the XBar has a domino effect on the other buffers in the system. Remember, the SRI is connected to the XBar, which is connected to the coherent HyperTransport links.

When packets are stalled in the XBar buffer queue to be sent over the coherent HyperTransport links, a chain effect can cause packets stall in the SRI buffer queue to be sent to the XBar.

AMD makes several event profiling tools available under NDA to monitor the HyperTransport bandwidth and buffer queue usage patterns.

The buffer lengths are BIOS configurable within some hardware-specific limits that are specified in the appropriate BIOS Kernel and Developers Guide for the processor under consideration. Following AMD recommendations, the BIOS allocates these buffers on a link-by-link basis to optimize for the most common workloads.

A.2.4 What Resources Are Used When Write-Only Threads Do Not Fire at Each Other (No Crossfire) on an Idle System?

Now consider the case in which the writer threads do not fire at each other: i.e., the first thread runs on node 0 and writes to memory on node 1 and second thread runs on node 1 and writes to memory on node 3.

In this case, the bidirectional link from node 0 to node 1 is in under substantial use (60% utilization in each direction). In addition, the bidirectional link from node 1 to node 3 is also under substantial use (54% utilization in each direction).

As the load is now spread over two bidirectional links instead of 1, the performance is better than in the crossfire case.

Appendix A

41

Page 41
Image 41
AMD 64 manual What Role Do Buffers Play in the Throughput Observed?

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.