User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

The instruction pipeline stages are described as follows:

The instruction fetch stage includes the clock cycles necessary to request instructions from the memory system and the time the memory system takes to respond to the request. Instruction fetch timing depends on many variables, such as whether the instruction is in the branch target instruction cache, the L1 instruction cache, or the L2 cache. If instructions must be fetched from system memory, other factors affect instruction fetch timing including the processor-to-bus clock ratio, the amount of bus traffic, and whether any cache-coherency operations are required.

Because there are so many variables, unless otherwise specified, the instruction timing examples below assume optimal performance and assume instructions are available in the instruction queue in the same clock cycle that they are requested. The fetch stage ends when instructions are loaded into the instruc- tion queue.

The decode/dispatch stage consists of the time it takes to decode the instruction and dispatch it from the instruction queue to the appropriate execution unit. Instruction dispatch requires the following:

Instructions can be dispatched only from the two lowest instruction queue entries, IQ0 and IQ1.

A maximum of two instructions can be dispatched per clock cycle, and one additional branch instruc- tion can be handled by the BPU.

Only one instruction can be dispatched to each execution unit per clock cycle.

There must be a vacancy in the specified execution-unit reservation station.

A Rename Register must be available for each destination operand specified by the instruction.

For an instruction to dispatch, the appropriate execution-unit reservation station must be available, and there must be an open position in the completion queue. If no entry is available, the instruction remains in the instruction queue (IQ).

The execute stage consists of the time between dispatch to the execution unit (or reservation station) and the point at which the instruction vacates the execution unit.

Most integer instructions have a 1-cycle latency; results of these instructions can be used in the clock cycle after an instruction enters the execution unit. However, integer multiply and divide instructions take multiple clock cycles to complete. IU1 can process all integer instructions; IU2 can process all integer instructions except multiply and divide instructions.

The LSU and FPU are pipelined (as shown in Figure 6-2on page 212).

The complete (complete/write-back) pipeline stage maintains the correct architectural machine state and commits the rename register values to the architectural registers at the proper time. If the completion logic detects an instruction containing an exception status, all subsequent instructions are cancelled; their execution results in the Rename Registers are discarded; and the correct instruction stream is fetched.

The complete stage ends when the instruction is retired. Two instructions can be retired per cycle. Instructions are retired only from the two lowest completion queue entries, CQ0 and CQ1.

gx_06.fm.(1.2)

Instruction Timing

March 27, 2006

Page 213 of 377