Appendix a

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™	40555 Rev. 3.00 June 2006
ccNUMA Multiprocessor Systems

Likewise packets to be transmitted from the MCT to the XBar are queued in the “MCT-to-XBar” buffers. The buffers in the SRI, XBar and MCT can be viewed as staggered queues on the various units.

A.2 Why Is the Crossfire Case Slower Than the No Crossfire Case on an Idle System?

The following analysis highlights some of the important characteristics of the underlying resources that come into play when there is crossfire versus no crossfire.

A.2.1 What Resources Are Used When a Single Read-Only or Write-Only Thread Accesses Remote Data?

When a thread running on node 0 reads data from node 1, on an otherwise idle system, there is traffic on both the incoming and outgoing links.

When a node makes a read memory request from a memory controller, it first sends a request for the memory to that memory controller, which can be local or remote. That memory controller then sends probes to all other nodes in the system to see if they have the memory in their cache. Once it receives the response from the nodes, it sends a response to the requesting node. Finally it also sends the read data to the requesting node.

When a thread running on node 0 reads data from node 1, it sees non-data traffic (loaded at

752 MB/s) on the outgoing link and both data and non-data traffic on the incoming link (2.2 GB/s). There is also some non-data traffic on the coherent HyperTransport links that connect nodes other than nodes 0 and 1 because of the probes and the responses.

When a thread running on node 0 writes data to node 1, it sees as much data traffic on the incoming link as it does on the outgoing link (incoming and outgoing link each at 2.2 GB/s). In this synthetic test case, there are several successive writes happening to successive cache line elements of a 64MB array. These result in steady state condition of a cache line eviction or write back for each write access. Each write access from node 0 to node 1 triggers a data read from node 1 and then a data write to node 1.

A.2.2 What Resources Are Used When Two Write-only Threads Fire at Each Other (Crossfire) on an Idle System?

Assuming the coherent HyperTransport links between node 0 and node 1 have infinite throughput capacity, it is expected that, when the write-only threads fire at each other, the throughput on each of these links would be twice that observed when a single write-only thread running on node 0 is writing to node 1, i.e., 2*(2.2 GB/s).

The theoretical maximum HyperTransport bandwidth of each coherent HyperTransport link between node 0 and node 1 is at 4 GB/s. Hence we can not expect the HyperTransport bandwidth to reach the

40	Appendix A

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

64 specifications