User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

instruction-cache flash invalidate bit (HID0[ICFI]). The instruction cache can be locked by setting HID0[ILOCK]. The instruction cache supports only the valid and invalid states, and requires software to maintain coherency if the underlying program changes.

The 750GX also implements a 64-entry (16-set, 4-way set-associative) branch target instruction cache (BTIC). The BTIC is a cache of branch instructions that have been encountered in branch/loop code sequences. If the target instruction is in the BTIC, it is fetched into the instruction queue a cycle sooner than it can be made available from the instruction cache. Typically, the BTIC contains the first two instructions in the target stream. The BTIC can be disabled and invalidated through software.

Coherency of the BTIC is transparent to the running software and is coupled with various functions in the 750GX processor. When the BTIC is enabled and loaded with instruction pairs to support zero-cycle delay on branches taken, the table must be invalidated if the underlying program changes. (This is also true for the instruction cache.) The BTIC is invalidated on an instruction-cache flash invalidate, an icbi or rfi instruction, and any exception.

For more information and timing examples showing cache hit and cache miss latencies, see Section 6.3.2, Instruction Fetch Timing, on page 216.

1.2.5 On-Chip Level 2 Cache Implementation

The L2 cache is a unified cache that receives memory requests from both the L1 instruction and data caches independently. The L2 cache is implemented with an L2 Cache Control Register (L2CR), an on-chip, 4-way, set-associative tag array, and with a 1-MB, integrated SRAM for data storage. The L2 cache normally operates in write-back mode and supports cache coherency through snooping. The access interface to the L2 is 64 bits for writes and requires four cycles to write a single cache block. The access interface to the L2 is 256 bits for reads and requires one cycle to read a single cache block. The L2 uses ECC on a double word, corrects most single-bit errors, and detects the remaining single-bit errors and all double-bit errors. See Figure 9-1, L2 Cache, on page 327.

The L2 cache is organized with 64-byte lines, which in turn are subdivided into 32-byte blocks, the unit at which cache coherency is maintained. This reduces the size of the tag array, and one tag supports two cache blocks. Each 32-byte cache block has its own valid and modified status bits. When a cache line is removed, the contents of both blocks and the tag are removed from the L2 cache. The cache block is only written to system memory if the modified bit is set.

Requests from the L1 cache generally result from instruction misses, data load or store misses, write-through operations, or cache-management instructions. Misses from the L1 cache are looked up in the L2 tags and serviced by the L2 cache if they hit; they are forwarded to the 60x bus interface if they miss.

The L2 cache can accept multiple, simultaneous accesses. However, they are serialized and processed one per cycle. The L1 instruction cache can request an instruction at the same time that the L1 data cache requests one load and two store operations. The L2 cache also services snoop requests from the bus. If there are multiple pending requests to the L2 cache, snoop requests have highest priority. Load-and-store requests from the L1 data cache have the next highest priority. The last priority consists of instruction fetch requests from the L1 instruction cache.

1.2.6 System Interface/Bus Interface Unit (BIU)

The PowerPC 750GX uses a reduced system signal set, which eliminates some optional 60x bus protocol pins. The system designer needs to make note of these differences.

gx_01.fm.(1.2)

PowerPC 750GX Overview

March 27,2006

Page 35 of 377