User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
6.3.2.4 L2 Cache Access Timing ConsiderationsIf an instruction fetch misses both the BTIC and the L1 instruction cache, the 750GX next looks in the L2 cache. If the requested instructions are there, they are burst into the 750GX in much the same way as shown in Figure 6-6on page 223.
An instruction fetch from the L2 cache has a latency of five cycles.
6.3.2.5 Instruction Dispatch and Completion ConsiderationsSeveral factors affect the 750GX’s ability to dispatch instructions at a peak rate of two per cycle—the availability of the execution unit, destination Rename Registers, and completion queue, as well as the handling of completion-serialized instructions. Several of these limiting factors are illustrated in the previous instruction timing examples.
To reduce dispatch-unit stalls due to instruction data dependencies, the 750GX provides a single-entry reservation station for the FPU, SRU, and each IU, and a 2-entry reservation station for the LSU. If a data dependency keeps an instruction from starting execution, that instruction is dispatched to the reservation station associated with its execution unit (and the Rename Registers are assigned), thereby freeing the positions in the instruction queue so instructions can be dispatched to other execution units. Execution begins during the same clock cycle that the rename buffer is updated with the data the instruction is dependent on.
If both instructions in IQ0 and IQ1 require the same execution unit, they must be executed sequentially where IQ1 follows IQ0 through the execution unit. If these instructions require different execution units, they can be dispatched on the same cycle, execute in parallel on separate execution units, and could complete together and be retired together on the same cycle.
The completion unit maintains program order after instructions are dispatched from the instruction queue, guaranteeing in-order completion and a precise-exception model. Completing an instruction implies committing execution results to the architected destination registers. In-order completion ensures the correct architectural state when the 750GX must recover from a mispredicted branch or an exception.
Instruction state and all information required for completion is kept in the 6-entry, first-in/first-out completion queue. A completion queue entry is allocated for each instruction when it is dispatched to an execution unit. If no entry is available, the dispatch-unit stalls. A maximum of two instructions per cycle can be completed and retired from the completion queue, and the flow of instructions can stall when a longer-latency instruction reaches the last position in the completion queue. Subsequent instructions cannot be completed and retired until that longer-latency instruction completes and retires. Examples of this are shown in Section 6.3.2.2, Cache Hit and Section 6.3.2.3, Cache Miss.
The 750GX can execute instructions out-of-order, but in-order completion by the completion unit ensures a precise-exception mechanism. Program-related exceptions are signaled when the instruction causing the exception reaches the last position in the completion queue. By this time, previous instructions are retired.
6.3.2.6 Rename Register OperationTo avoid contention for a given register file location in the course of out-of-order execution, the 750GX provides Rename Registers for holding instruction results before the completion commits them to the architected register. There are six GPR Rename Registers, six FPR Rename Registers, and one each for the CR, LR, and CTR.
Instruction Timing | gx_06.fm.(1.2) |
Page 224 of 377 | March 27, 2006 |