User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

1.2.2.3 Load/Store Unit (LSU)

The LSU executes all load-and-store instructions and provides the data-transfer interface between the GPRs, FPRs, and the data-cache/memory subsystem. The LSU functions as a 2-stage pipelined unit, which calculates effective addresses in the first stage. In the second stage, the address is translated, the cache is accessed, and the data is aligned if necessary. Unless extensive data alignment is required (for example, to cross a double-word boundary), the instructions complete in two cycles with a 1-cycle throughput. The LSU also provides sequencing for load/store string and multiple register transfer instructions.

Load-and-store instructions are translated and issued in program order. However, some memory accesses can occur out of order. Synchronizing instructions can be used to enforce strict ordering if necessary. When there are no data dependencies and the guard bit for the page or block is cleared, a maximum of one out-of- order cacheable load operation can execute per cycle, with a 2-cycle total latency on a cache hit. Data returned from the cache is held in a rename buffer until the completion logic commits the value to a GPR or FPR. Stores cannot be executed out of order and are held in the store queue until the completion logic signals that the store operation is to be completed to memory. The 750GX executes store instructions with a maximum throughput of one per cycle and a 3-cycle latency to the data cache. The time required to perform the actual load or store operation depends on the processor/bus clock ratio and whether the operation involves the L1 cache, the L2 cache, system memory, or an I/O device.

The L/S unit has two reservation stations, Eib0 and Eib1. For loads, there is also a hold queue and a miss queue. A load that misses in the dcache advances from Eib0 to the miss queue, where only necessary state for instruction completion like the instruction ID and register rename ID are stored. If another load misses under an outstanding miss, then it is held in the hold queue and Eib0 is free. Two more load instructions may now be dispatched to Eib0 and Eib1. The Miss-under-Miss feature allows the hold, Eib0, and Eib1 load requests to proceed out to the bus, even though there is an outstanding miss that would normally stall the pending loads.

1.2.2.4 System Register Unit (SRU)

The SRU executes various system-level instructions, as well as Condition Register logical operations and Move-to/Move-from Special-Purpose Register instructions. To maintain system state, most instructions executed by the SRU are execution-serialized with other instructions; that is, the instruction is held for execution in the SRU until all previously issued instructions have been retired. Results from execution-serialized instructions executed by the SRU are not available or forwarded for subsequent instructions until the instruction completes.

1.2.3 Memory Management Units (MMUs)

The 750GX’s MMUs support up to 4 petabytes (252) of virtual memory and 4 gigabytes (232) of physical memory for instructions and data. The MMUs also control access privileges for these spaces on block and page granularities. Referenced and changed status is maintained by the processor for each page to support demand-paged virtual memory systems.

The LSU, with the aid of the MMU, translates effective addresses for data loads and stores. The effective address is calculated on the first cycle, and the MMU translates it to a physical address at the same time it is accessing the L1 cache on the second cycle. The MMU also provides the necessary control and protection information to complete the access. By the end of the second cycle, the data and control information is available if no miss conditions for translate and cache access were encountered. This yields a 1-cycle throughput and a 2-cycle latency.

PowerPC 750GX Overview

gx_01.fm.(1.2)

Page 32 of 377

March 27,2006