IBM 750GL, 750GX 6.2 Instruction Timing Overview

	User’s Manual
	IBM PowerPC 750GX and 750GL RISC Microprocessor

Stage	The processing of instructions in the 750GX is done in stages. They are: fetch,
	decode/dispatch, execute, complete, and retirement. The fetch unit brings instruc-
	tions from the memory system into the instruction queue. Once in the instruction
	queue, the dispatch unit must do a partial decode on the instruction to determine its
	type. If the instruction is an integer, it is passed to the integer execution unit. If it is
	a floating-point type, it is passed to the floating-point execution unit. If it is a branch,
	it is processed immediately by branch folding and branch prediction functions.
	Instructions spend one or more cycles in each stage as they are being processed
	by the 750GX processor.
Stall	An occurrence when an instruction cannot proceed to the next stage. An instruction
	can spend multiple cycles in one stage. An integer multiply, for example, takes
	multiple cycles in the execute stage. When this occurs, subsequent instructions
	might stall.
Superscalar	A superscalar processor is one that has multiple execution units. The 750GX
	processor has one floating-point unit, two integer units, one load/store unit, and a
	system unit for miscellaneous instructions. PowerPC instructions are processed in
	parallel by these execution units.
Throughput	A measure of the total number of instructions that are processed by all execution
	units per unit of time.
Write-back	Write-back, in the context of instruction handling, occurs when a result is written
	into the architectural registers (typically the GPRs and FPRs). Results are written
	back at retirement time from the Rename Registers for most instructions. The
	instruction is also removed from the completion queue at this time.

6.2 Instruction Timing Overview

The 750GX design minimizes average instruction execution latency, the number of clock cycles it takes to fetch, decode, dispatch, and execute instructions and make the results available for a subsequent instruction. Some instructions, such as loads and stores, access memory and require additional clock cycles between the execute phase and the write-back phase. These latencies vary depending on whether the access is to cache- able or noncacheable memory, whether it hits in the L1 or L2 cache, whether the cache access generates a write-back to memory, whether the access causes a snoop hit from another device that generates additional activity, and other conditions that affect memory accesses.

The 750GX implements many features to improve throughput, such as pipelining, issuing superscalar instruc- tions, branch folding, 2-level speculative branch handling, two types of branch prediction, and multiple execution units that operate independently and in parallel.

As an instruction passes from stage to stage in a pipelined system, multiple instruction are in various stages of execution at any given time. Also, with multiple execution units operating in parallel, more then one instruction can be completed in a single cycle.

The 750GX contains the following execution units that operate independently and in parallel:

•Branch processing unit (BPU)

•Integer unit 1 (IU1)—executes all integer instructions

•Integer unit 2 (IU2)—executes all integer instructions except multiplies and divides

gx_06.fm.(1.2)	Instruction Timing
March 27, 2006	Page 211 of 377