Figure 6. The Six-Core AMD Opteron processor operates in the same power and thermal envelope as the Quad-Core Opteron processor while improving performance by up to 50%.

HT Assist

HT Assist helps increase performance of six-core AMD Opteron processor-based systems with four or eight sockets. It is designed to maintain data correctness (coherence) between the processors and minimize inter-processor communication traffic on the HyperTransport links.

In a multi-socket system, each processor has to ensure that it is executing the latest data, or cache line, to maintain coherence. Before a processor can execute a transaction, it probes the caches of the other processors by broadcasting a coherence protocol and only requests data from system memory is there is a cache miss. All of these latency-sensitive messages—probe requests, probe responses, data requests, and data responses—are transmitted over the HyperTransport links. For example, one cache line coherency check in a 4-socket system can generate 10 or more messages over the four HyperTransport links between the processors. In a 4- or 8-socket system with six-core AMD Opteron processors (a total of 24 or 48 processor cores), this traffic can severely load the HyperTransport links.

HT Assist uses 1MB of each processor's 6-MB L3 cache as a directory cache to track all cache lines stored in the multi-socket system. This allows a multi-core processor to probe its own L3 cache when checking a cache line, called a Probe Filter Lookup, instead of broadcasting numerous cache probes over the HyperTransport links. With HT Assist, a cache line coherency check in the previously mentioned 4-socket system may only generate two to three messages. The Probe Filter Lookup also reduces latency for accesses to local DRAM because there is no need to broadcast probe requests and wait for responses.

The performance benefits of HT Assist in 4- and 8-socket systems outweigh the small decrease in available L3 data cache. HT Assist does not need to be enabled on 2-socket systems where there is much less cache probe traffic.

13

Page 13
Image 13
HP DL585 - - G2 manual