IBM 750GL, 750GX 8.2.2.1 Miss-under-Missand System Performance

(1), (2) - Indicates masters 1 and 2,

or transactions 1 and 2 from

or transactions 1 and 2 from

the same master.

User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

data cache. If there is a miss in the L2 cache, then the request is passed on to the bus interface unit (BIU) via three additional L2-to-BIU reload-request queues. Data returned from the bus is loaded into the data-cache reload buffer, one of the L2 reload buffers, and the critical word is forwarded to the load/store unit.

A dedicated snoop copyback queue has been added, which enables a fifth transaction to pipeline on the bus. It supports enveloped write transactions with the assertion of DBWO. All snoop copybacks are issued from this queue.

A maximum of four reloads can be in progress through the L2 cache. The instruction cache will only request one reload at a time, and the data cache can request up to four. There can be a maximum of one instruction cache and three data cache reloads, or four data cache reloads.

An example of 1-level address pipelining is shown in Figure 8-5 on page 287. Note that to support address pipelining, the memory system must no longer require the first address on the bus in order to complete the first data tenure, and possibly can also queue the second address to maximize the parallelism on the bus.

Figure 8-5. First Level Address Pipelining

ABB

AACK

	Data #1		Data #2

(1)		(2)

DBG

DBB

8.2.2.1 Miss-under-Miss and System Performance

The MuM feature allows loads and stores that miss in the L1 cache to continue to the L2 cache, even though the L1 cache is busy reloading a prior miss. Hence the name, miss-under-miss (MuM). If MuM requests also miss in the L2 cache, they will proceed to the 60x bus in a pipelined fashion. A performance benefit is realized when pipelining on the 60x bus because the penalty for large memory latency only occurs with the first memory access. The greatest performance advantage is achieved if MuM requests can be sustained for as long as possible. Load/store instruction sequences affect how much benefit the MuM feature will produce.

The best sequence is a series of load instructions that reference a different cache-line index (EA[20:26]). Blocks of memory can be efficiently loaded into the data cache with tight loops that increment the address by x'20', as in the following example.

The L/S to data cache has two lines to indicate the normal request path and the MuM request path. MuM can serially request up to three more loads (hold,Eib0, and Eib1) but the address queues are really in the BIU which can hold up to 4 loads. MuM will be throttled by other events such as a full 3 entry Store queue in the L/S.

gx_08.fm.(1.2)	Bus Interface Operation
March 27, 2006	Page 287 of 377