AMD 64 manual Threads firing at each other crossfire, Analysis and Recommendations

Models: 64

1 48
Download 48 pages 55.63 Kb
Page 26
Image 26
•Threads firing at each other (crossfire)

Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™

40555 Rev. 3.00 June 2006

ccNUMA Multiprocessor Systems

 

Threads firing at each other (crossfire)

The first thread runs on node 0 and writes to memory on node 1 (1 hop). The second thread runs on node 1 and writes to memory on node 0 (1 hop).

In each case, the two threads are run on core 0 of whichever code they are running on. The system is left idle except for the two threads. As shown in Figure 6 on page 26, the crossfire 1 hop-1 hop case is the worst performer.

Total Time for both threads (write-write)

2.2

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

149%

 

 

 

 

130%

 

 

 

 

 

 

 

 

113%

 

 

 

 

1 Hop

1 Hop

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 Hop

1 Hop

 

 

 

1 Hop

 

 

 

 

 

 

0 Hop

NO

Xfire

 

 

 

 

 

 

 

Xfire

 

 

 

 

 

 

 

 

 

 

 

 

 

Manual backgroundManual background 0.0.w.0 1.0.w.1 (0 Hops) (0 Hops)

Manual backgroundManual background 0.0.w.1 1.0.w.3 (1 Hops) (1 Hops)

Manual backgroundManual background 0.0.w.1 1.0.w.0 (1 Hops) (1 Hops)

Figure 6. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case on an Idle System

When the write-only threads fire at each other (crossfire), the bidirectional HyperTransport link between node 0 and node 1 is saturated and loaded at 3.5 GB/s in each direction. The theoretical maximum bandwidth of the HyperTransport link is 4 GB/s in each direction. Thus, the utilization of the bidirectional HyperTransport link is 87% (3.5 ÷ 4) in each direction on that HyperTransport link.

On the other hand, when the write-only threads do not fire at each other (no crossfire), the utilization of the bidirectional link from node 0 to node 1 is at 60% in each direction. In addition, the utilization of the bidirectional link from node 1 to node 3 is at 54% in each direction. Since the load is now spread over two bidirectional HyperTransport links instead of one, the performance is better.

The saturation of these coherent HyperTransport links is responsible for the poor performance for the crossfire case compared to the no crossfire case. For detailed analysis, refer to Section A.2 on page 40.

In this synthetic test, read-only threads do not result in poor performance. Throughput of such threads is not high enough to exhaust the HyperTransport link resources. When both threads are read-only, the crossfire case is equivalent in performance to the no crossfire case.

It is also useful to study whether this observation holds on a system that is not idle. The following analysis explores the behavior of the two foreground threads under a variable background load.

26

Analysis and Recommendations

Chapter 3

Page 26
Image 26
AMD 64 manual Threads firing at each other crossfire, Analysis and Recommendations