AMD 64 manual Myth Greater Hop Distance Always Means Slower Time

Page 29

40555 Rev. 3.00 June 2006

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

 

ccNUMA Multiprocessor Systems

VERY HIGH: Total Time for both threads (write-write)

2.4

2.2

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

 

 

 

 

 

 

 

 

 

216%

202%

 

 

 

 

 

 

 

 

 

 

 

 

156%

 

 

 

 

 

 

1 Hop

1 Hop

 

 

 

 

 

 

 

 

 

 

 

0 Hop

1 Hop

 

 

 

 

 

 

1 Hop

 

 

 

NO

 

 

 

0 Hop

 

 

 

Xfire

 

 

 

Xfire

 

 

 

 

 

 

 

 

 

 

 

 

0.0.w.0 1.0.w.1 (0 Hops) (0 Hops)

0.0.w.1 1.0.w.3 (1 Hops) (1 Hops)

0.0.w.1 1.0.w.0 (1 Hops) (1 Hops)

Figure 9. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Very High Background Load (Full Subscription)

In the no crossfire case, the total memory bandwidth observed on the memory controller on node 3 is

4.5GB/s and several buffer queues on node 3 are saturated. For detailed analysis, refer to Section A.3 on page 42.

Thus, while, in general, all equal hop cases take equal time, there can be exceptions to this rule if some resources in the system—such as HyperTransport link bandwidth and HyperTransport buffer capacity—are saturated

3.4.2Myth: Greater Hop Distance Always Means Slower Time.

As a general rule, a 2 hop case will be slower than a 1 hop case, which, in turn, will be slower than a 0 hop case, if the only change between the cases is thread and memory placement.

For example, the synthetic test demonstrates how a given 0 hop-0 hop case is slower than a

0 hop-1 hop case. The example shows how saturating memory resources can cause this to occur.

Imagine yourself in the following situation: you are ready to check out at your favorite grocery store with a shopping cart full of groceries. Directly in front of you is a check-out lane with 20 customers standing in line but 50 feet to your left is another check-out lane with only two customers standing in line. Which would you go to? The check-out lane closest to your position has the lowest latency because you don't have far to travel. But the check-out lane 50 feet away has much greater latency because you have to walk 50 feet.

Clearly most people would walk the 50 feet, suffer the latency and arrive at a check-out lane with only two customers instead of 20. Experience tells us that the time waiting to check-out with 20 people ahead is far longer than the time needed to walk to the “remote” check-out lane and wait for only two people.

Chapter 3

Analysis and Recommendations

29

Image 29
Contents Application Note Advanced Micro Devices, Inc. All rights reserved Contents Performance Guidelines for AMD Athlon 64 and AMD Opteron List of Figures List of FiguresList of Figures Revision History Revision HistoryRevision History Chapter Introduction IntroductionRelated Documents Chapter Introduction Introduction System Used Experimental SetupChapter Experimental Setup Quartet Topology Internal Resources Associated with a Quartet Node Synthetic TestData Access Rate Qualifiers Axis Display Reading and Interpreting Test GraphsLabels Used Scheduling Threads Analysis and RecommendationsMultiple Threads-Independent Data Chapter Analysis and RecommendationsScheduling on a Non-Idle System Data Locality ConsiderationsMultiple Threads-Shared Data Hop Keeping Data Local by Virtue of first Touch Chapter Analysis and Recommendations Analysis and Recommendations Avoid Cache Line Sharing Threads access local dataCommon Hop Myths Debunked Myth All Equal Hop Cases Take Equal TimeHop Hop Hop Myth Greater Hop Distance Always Means Slower Time 102% 108% 107% 147% 126% 125% 136% 145% 136% 127% 126% 146% 129% 139% Locks Performance Guidelines for AMD Athlon 64 and AMD Opteron Analysis and Recommendations Chapter Conclusions ConclusionsConclusions Appendix a Appendix aDescription of the Buffer Queues Appendix a What Role Do Buffers Play in the Throughput Observed? Performance Guidelines for AMD Athlon 64 and AMD Opteron Appendix a Controlling Process and Thread Affinity Support Under LinuxSupport under Microsoft Windows Support under SolarisMicrosoft Windows does not offer node interleaving Node Interleaving Configuration in the Bios CcNUMA Multiprocessor Systems Appendix a
Related manuals
Manual 6 pages 48.71 Kb Manual 3 pages 48.71 Kb Manual 2 pages 13.98 Kb

64 specifications

AMD64 is a 64-bit architecture developed by Advanced Micro Devices (AMD) as an extension of the x86 architecture. Introduced in the early 2000s, it aimed to offer enhanced performance and capabilities to powering modern computing systems. One of the main features of AMD64 is its ability to address a significantly larger amount of memory compared to its 32-bit predecessors. While the old x86 architecture was limited to 4 GB of RAM, AMD64 can theoretically support up to 16 exabytes of memory, making it ideal for applications requiring large datasets, such as scientific computing and complex simulations.

Another key characteristic of AMD64 is its support for backward compatibility. This means that it can run existing 32-bit applications seamlessly, allowing users to upgrade their hardware without losing access to their existing software libraries. This backward compatibility is achieved through a mode known as Compatibility Mode, enabling users to benefit from both newer 64-bit applications and older 32-bit applications.

AMD64 also incorporates several advanced technologies to optimize performance. One such technology is the support for multiple cores and simultaneous multithreading (SMT). This allows processors to handle multiple threads concurrently, improving overall performance, especially in multi-tasking and multi-threaded applications. With the rise of multi-core processors, AMD64 has gained traction in both consumer and enterprise markets, providing users with an efficient computing experience.

Additionally, AMD64 supports advanced vector extensions (AVX), which enhance the capability of processors to perform single instruction, multiple data (SIMD) operations. This is particularly beneficial for tasks such as video encoding, scientific simulations, and cryptography, allowing these processes to be executed much faster, thereby increasing overall throughput.

Security features are also integrated within AMD64 architecture. Technologies like AMD Secure Execution and Secure Memory Encryption help protect sensitive data and provide an enhanced security environment for virtualized systems.

In summary, AMD64 is a powerful and versatile architecture that extends the capabilities of x86, offering enhanced memory addressing, backward compatibility, multi-core processing, vector extensions, and robust security features. These innovations have positioned AMD as a strong competitor in the computing landscape, catering to the demands of modern users and applications. The continuous evolution of AMD64 technology demonstrates AMD's commitment to pushing the boundaries of computing performance and efficiency.