User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
In addition to the loads, stores, and instruction fetches, the 750GX performs hardware table-search operations following translation lookaside buffer (TLB) misses, L2 cache castout operations when the least-recently used (LRU) cache lines are written to memory after a cache miss, and cache-line snoop push-out operations when a modified cache line experiences a snoop hit from another bus master.
Figure 1-1, 750GX Microprocessor Block Diagram, on page 25 shows the address path from the execution units and instruction fetcher, through the translation logic to the caches and bus interface logic.
The 750GX uses separate address and data buses and a variety of control and status signals to perform reads and writes. The address bus is 32 bits wide, and the data bus is 64 bits wide. The interface is synchro- nous—all 750GX inputs are sampled at, and all outputs are driven from, the rising edge of the bus clock. The processor runs at a multiple of the bus-clock speed.
8.1.1 Operation of the Instruction and Data L1 CachesThe 750GX provides independent instruction and data L1 caches. Each cache is a physically-addressed, 32-KB cache with 8-way set associativity. Both caches consist of 128 sets of eight cache lines, with eight words in each cache line.
Because the data cache on the 750GX is an on-chip, write-back primary cache, the predominant type of transaction for most applications is burst-read memory operations, followed by burst-write memory operations and single-beat (noncacheable or write-through) memory read and write operations. Additionally, there can be address-only operations, variants of the burst and single-beat operations (that is, global memory operations that are snooped, and atomic memory operations), and address retry activity (that is, when a snooped read access hits a modified line in the cache).
Since the 750GX data-cache tags are single ported, simultaneous load or store and snoop accesses cause resource contention. Snoop accesses have the highest priority and are given first access to the tags, unless the snoop access coincides with a tag write, in which case the snoop is retried and must rearbitrate for access to the cache. Loads or stores that are deferred due to snoop accesses are performed on the clock cycle following the snoop.
The 750GX supports a 3-state coherency protocol that supports the modified, exclusive, and invalid (MEI) cache states. The protocol is a subset of the modified, exclusive, shared, and invalid (MESI) 4-state protocol and operates coherently in systems that contain 4-state caches. With the exception of the Data Cache Block Set to Zero (dcbz) instruction,1 the 750GX does not broadcast cache-control instructions. The cache-control instructions are intended for the management of the local cache, but not for other caches in the system.
Instruction-cache lines in the 750GX are loaded in four beats of 64 bits each. The burst load is performed as critical double word first. The critical double word is simultaneously written to the cache and forwarded to the instruction prefetch unit, thus minimizing stalls due to load delays. If subsequent loads follow in sequential order, the instructions will be forwarded to the requesting unit as the cache block is written.
Data-cache lines in the 750GX are loaded into the cache in one cycle of 256 bits. For a cache-line load due to the cache miss of a load instruction, the critical double word is simultaneously written to the 256-bit line fill buffer and forwarded to the requesting load/store unit. If subsequent loads follow in sequential order, the data will be forwarded to the load/store unit as the cache block is written into the cache.
1.And the Data Cache Block Invalidate (dcbi), Data Cache Block Store (dcbst), and Data Cache Block Flush (dcbf) instruc- tions, if the address broadcast enable bit in Hardware-Implementation-Dependent 0 Register (HID0[ABE]) is enabled.
gx_08.fm.(1.2) | Bus Interface Operation |
March 27, 2006 | Page 281 of 377 |