User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
3. Instruction-Cache and Data-Cache Operation
The 750GX microprocessor contains separate 32-KB, 8-way set-associative instruction and data caches to allow the execution units and registers rapid access to instructions and data. This chapter describes the organization of the on-chip instruction and data caches, the modified, exclusive, invalid (MEI) cache-coherency protocol, cache-control instructions, various cache operations, and the interaction between the caches, the load/store unit (LSU), the instruction unit, and the bus interface unit (BIU).
Note that in this chapter, the term ‘multiprocessor’ is used in the context of maintaining cache coherency. These multiprocessor devices could be actual processors or other devices that can access system memory, maintain their own caches, and function as bus masters requiring cache coherency. If the L2 cache is enabled, read Chapter 9, L2 Cache, on page 323 before reading this chapter.
The 750GX L1 cache implementation has the following characteristics.
•There are two separate 32-KB instruction and data caches (Harvard architecture).
•Both instruction and data caches are 8-way set-associative.
•The caches implement a pseudo least-recently-used (PLRU) replacement algorithm within each set.
•The cache directories are physically addressed. The physical (real) address tag is stored in the cache directory.
•Both the instruction and data caches have 32-byte cache blocks. A cache block is the block of memory that a coherency state describes, also referred to as a cache line.
•Two coherency state bits for each data-cache block allow encoding for three states:
–Exclusive Modified (M)
–Exclusive Unmodified (E)
–Invalid (I)
•A single coherency state bit for each instruction-cache block allows encoding for two possible states:
–Invalid (INV)
–Valid (VAL)
•Each cache can be invalidated or locked by setting the appropriate bits in the Hardware-Implementation- Dependent Register 0 (HID0), a Special-Purpose Register (SPR) specific to the 750GX.
The 750GX supports a fully-coherent 4-GB physical memory address space. Bus snooping is used to drive the MEI 3-state cache-coherency protocol that ensures the coherency of global memory with respect to the processor’s data cache. The MEI protocol is described in Section 3.3.2 on page 126.
On a cache miss, the 750GX’s cache blocks are filled in four beats of 64 bits each. The burst fill is performed as a critical-double-word-first operation. The critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to cache fill latency. The data-cache line is first loaded into a 32-byte reload buffer, and, when it is full, it is written into the data cache in one cycle. This minimizes the contention between the load-store unit and the line reload function. See Figure 9-1on page 327.
The instruction and data caches are integrated into the 750GX as shown in Figure 3-1.
gx_03.fm.(1.2) | Instruction-Cache and Data-Cache Operation |
March 27, 2006 | Page 121 of 377 |