User’s Manual

IBM PowerPC 750GX and 750GL RISC Microprocessor

6.3 Timing Considerations

The 750GX is a superscalar processor; as many as three instructions can be issued to the execution units (one branch instruction to the branch processing unit, and two instructions issued from the dispatch queue to the other execution units) during each clock cycle. Only one instruction can be dispatched to each execution unit.

Although instructions appear to the programmer to execute in program order, the 750GX improves performance by executing multiple instructions at a time, using hardware to manage dependencies. When an instruction is dispatched, the register file or a Rename Register from a previous instruction provides the source data to the execution unit. The register files and Rename Register have sufficient bandwidth to allow dispatch of two instructions per clock under most conditions.

The 750GX’s BPU decodes and executes branches immediately after they are fetched. When a conditional branch cannot be resolved due to a CR data (or any) dependency, the branch direction is predicted and execution continues on the predicted path. If the prediction is incorrect, the following steps are taken:

1.The instruction queue is purged and fetching continues from the correct path.

2.Any instructions behind (in program order) the predicted branch in the completion queue are allowed to complete.

3.Instructions fetched on the mispredicted path of the branch are purged.

4.Fetching resumes along the correct (other) path.

After an execution unit finishes executing an instruction, it places resulting data into the appropriate GPR or FPR Rename Register. The results are then stored into the correct GPR or FPR during the write-back stage (retirement). If a subsequent instruction needs the result as a source operand, it is made available simultaneously to the appropriate execution unit, which allows a data-dependent instruction to be decoded and dispatched without waiting to read the data from the register file. Branch instructions that update either the LR or CTR write back their results in a similar fashion.

Section 6.3.1 describes this process in greater detail.

6.3.1 General Instruction Flow

As many as four instructions can be fetched into the instruction queue (IQ) in a single clock cycle. Instructions enter the IQ and are issued to the various execution units from the dispatch queue. The 750GX tries to keep the IQ full at all times, unless instruction-cache throttling is operating.

The number of instructions requested in a clock cycle is determined by the number of vacant spaces in the IQ during the previous clock cycle. This is shown in the examples in this section. Although the instruction queue can accept as many as four new instructions in a single clock cycle, if only one IQ entry is vacant, only one instruction is fetched. Typically, instructions are fetched from the L1 instruction cache, but they might also be fetched from the branch target instruction cache (BTIC) if a branch is taken. If the branch taken instruction request hits in the BTIC, it can usually present the first two instructions of the new instruction stream in the next clock cycle, giving enough time for the next pair of instructions to be fetched from the instruction L1 cache. This results in no idle cycles in the instruction stream (also known as a zero-cycle branch). If instructions are not in the BTIC or the L1 instruction cache, they are fetched from the L2 cache or from system memory.

gx_06.fm.(1.2)

Instruction Timing

March 27, 2006

Page 215 of 377