User’s Manual

 

IBM PowerPC 750GX and 750GL RISC Microprocessor

 

 

Stage

The processing of instructions in the 750GX is done in stages. They are: fetch,

 

decode/dispatch, execute, complete, and retirement. The fetch unit brings instruc-

 

tions from the memory system into the instruction queue. Once in the instruction

 

queue, the dispatch unit must do a partial decode on the instruction to determine its

 

type. If the instruction is an integer, it is passed to the integer execution unit. If it is

 

a floating-point type, it is passed to the floating-point execution unit. If it is a branch,

 

it is processed immediately by branch folding and branch prediction functions.

 

Instructions spend one or more cycles in each stage as they are being processed

 

by the 750GX processor.

Stall

An occurrence when an instruction cannot proceed to the next stage. An instruction

 

can spend multiple cycles in one stage. An integer multiply, for example, takes

 

multiple cycles in the execute stage. When this occurs, subsequent instructions

 

might stall.

Superscalar

A superscalar processor is one that has multiple execution units. The 750GX

 

processor has one floating-point unit, two integer units, one load/store unit, and a

 

system unit for miscellaneous instructions. PowerPC instructions are processed in

 

parallel by these execution units.

Throughput

A measure of the total number of instructions that are processed by all execution

 

units per unit of time.

Write-back

Write-back, in the context of instruction handling, occurs when a result is written

 

into the architectural registers (typically the GPRs and FPRs). Results are written

 

back at retirement time from the Rename Registers for most instructions. The

 

instruction is also removed from the completion queue at this time.

6.2 Instruction Timing Overview

The 750GX design minimizes average instruction execution latency, the number of clock cycles it takes to fetch, decode, dispatch, and execute instructions and make the results available for a subsequent instruction. Some instructions, such as loads and stores, access memory and require additional clock cycles between the execute phase and the write-back phase. These latencies vary depending on whether the access is to cache- able or noncacheable memory, whether it hits in the L1 or L2 cache, whether the cache access generates a write-back to memory, whether the access causes a snoop hit from another device that generates additional activity, and other conditions that affect memory accesses.

The 750GX implements many features to improve throughput, such as pipelining, issuing superscalar instruc- tions, branch folding, 2-level speculative branch handling, two types of branch prediction, and multiple execution units that operate independently and in parallel.

As an instruction passes from stage to stage in a pipelined system, multiple instruction are in various stages of execution at any given time. Also, with multiple execution units operating in parallel, more then one instruction can be completed in a single cycle.

The 750GX contains the following execution units that operate independently and in parallel:

Branch processing unit (BPU)

Integer unit 1 (IU1)—executes all integer instructions

Integer unit 2 (IU2)—executes all integer instructions except multiplies and divides

gx_06.fm.(1.2)

Instruction Timing

March 27, 2006

Page 211 of 377