Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ | 40555 Rev. 3.00 June 2006 |
ccNUMA Multiprocessor Systems |
|
•Threads firing at each other (crossfire)
The first thread runs on node 0 and writes to memory on node 1 (1 hop). The second thread runs on node 1 and writes to memory on node 0 (1 hop).
In each case, the two threads are run on core 0 of whichever code they are running on. The system is left idle except for the two threads. As shown in Figure 6 on page 26, the crossfire 1
Total Time for both threads
2.2
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149% |
|
|
|
| 130% |
|
|
|
|
|
|
| |
| 113% |
|
|
| |
| 1 Hop | 1 Hop |
| ||
|
|
|
| ||
|
|
|
| ||
|
|
|
| ||
|
| 0 Hop | 1 Hop |
| |
|
| 1 Hop |
| ||
|
|
| |||
|
| 0 Hop | NO | Xfire |
|
|
|
| |||
|
|
| Xfire |
|
|
|
|
|
|
| |
|
|
|
|
|
|
0.0.w.0 1.0.w.1 (0 Hops) (0 Hops)
0.0.w.1 1.0.w.3 (1 Hops) (1 Hops)
0.0.w.1 1.0.w.0 (1 Hops) (1 Hops)
Figure 6. Crossfire 1
When the
On the other hand, when the
The saturation of these coherent HyperTransport links is responsible for the poor performance for the crossfire case compared to the no crossfire case. For detailed analysis, refer to Section A.2 on page 40.
In this synthetic test,
It is also useful to study whether this observation holds on a system that is not idle. The following analysis explores the behavior of the two foreground threads under a variable background load.
26 | Analysis and Recommendations | Chapter 3 |