IBM 750GL, 750GX 9.9 L2 Cache Timing, stw, stw, dcbz, dcbf, dcbz, dcbf, 9.8.2 L2 Cache Testing

User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

9.8.2 L2 Cache Testing

A typical test for verifying the proper operation of the 750GX microprocessor’s L2-cache memory follows this sequence:

1.Initialize the L2 test sequence by disabling address translation to invoke the default WIMG setting (0011). Set L2CR[DO] and L2CR[TS], and perform a global invalidation of the L1 data cache and the L2 cache. The L1 instruction cache can remain enabled to improve execution efficiency.

2.Test the L2-cache SRAM by enabling the L1 data cache and executing a sequence of dcbz, store word (stw), and dcbf instructions to initialize the L2 cache with a desired range of consecutive addresses and with cache data consisting of zeros. Once the L2 cache holds a sequential range of addresses, disable the L1 data cache and execute a series of single-beat load-and-store operations employing a variety of bit patterns to test for stuck bits and pattern sensitivities in the L2-cache SRAM. The performance monitor can be used to verify whether the number of L2-cache hits or misses corresponds to the tests performed.

3.Test the L2-cache tag memory by enabling the L1 data cache and executing a sequence of dcbz, stw, and dcbf instructions to initialize the L2 cache with a wide range of addresses and cache data. Once the L2 cache is populated with a known range of addresses and data, disable the L1 data cache and execute a series of store operations to addresses not previously in the L2 cache. These store operations should miss in every case. Note that setting L2CR[TS] inhibits L2-cache misses from being forwarded to the 60x bus interface, thereby avoiding the potential for bus errors due to addressing hardware or nonexistent memory. The L2 cache then can be further verified by reading the previously loaded addresses and observing whether all the tags hit, and that the associated data compares correctly. The performance monitor can also be used to verify whether the proper number of L2-cache hits and misses correspond to the test operations performed.

4.The entire L2 cache can be tested by clearing L2CR[DO] and L2CR[TS], restoring the L1 and L2 caches to their normal operational state, and executing a comprehensive test program designed to exercise all the caches. The test program should include operations that cause L2 hit, reload, and castout activity that can be subsequently verified through the performance monitor.

9.9 L2 Cache Timing

Loading the L2-cache SRAM can occur from the store data queue (which includes single beat stores and L1 castouts), or from the 2-entry, L2 reload data queue. When data is available in either queue, arbitration for the L2 cache takes place. The requests for the L2 cache, in prioritized order, include a snoop request, an L2 castout, the store data queue, a lookup request, or an L2 reload. Loads always take four beats, starting in the cycle in which a request is granted. A double word of data with the ECC correction bits is written with each beat, filling a 32-byte cache line. The arbitration phase and data phase are pipelined, allowing new arbitration during a previous data phase.

L1 misses that hit in the L2 cache will incur a 5-cycle latency for the critical word returned to the L1. This latency includes one cycle for ECC correction. The L2-cache read data path is 256 bits, which loads the L1 data-cache reload buffer in one cycle, at the same time forwarding the critical word to the load/store unit. Instruction-cache misses, however, will be serviced in four beats from the L2 with the critical word first.

gx_09.fm.(1.2)	L2 Cache
March 27, 2006	Page 333 of 377