User’s Manual
IBM PowerPC 750GX and 750GL RISC Microprocessor
For a more detailed discussion of instruction completion, see Section 6.6.1, Branch, Dispatch, and Completion -Unit Resource Requirements, on page 237.
1.2.2 Independent Execution UnitsIn addition to the BPU, the 750GX has the following five execution units:
•Two integer units (IUs)
•Floating-point unit (FPU)
•Load/store unit (LSU)
•System register unit (SRU)
1.2.2.1 Integer Units (IUs)The integer units, IU1 and IU2, are shown in Figure 1-1on page 25. IU1 can execute any integer instruction; IU2 can execute any integer instruction except multiplication and division instructions. Each IU has a single- entry reservation station that can receive instructions from the dispatch unit and operands from the GPRs or the rename buffers. The output of the IU is latched in the rename buffer assigned to the instruction by the dispatch unit.
Each IU consists of three single-cycle subunits—a fast adder/comparator, a subunit for logical operations, and a subunit for performing rotates, shifts, and count-leading-zero operations. These subunits handle all 1-cycle arithmetic and logical integer instructions; only one subunit can execute an instruction at a time.
The IU1 has a 32-bit integer multiplier/divider, as well as the adder, shift, and logical units of the IU2. The multiplier supports early exit for operations that do not require full 32 ⋅ 32-bit multiplication. Multiply and divide instructions spend several cycles in the execution stage before the results are written to the output rename buffer.
1.2.2.2 Floating-Point Unit (FPU)The FPU, shown in Figure 1-1on page 25, is designed as a 3-stage pipelined processing unit, where the first stage is for multiply, the second stage is for add, and the third stage is for normalize. A single-precision multiply/add operation is processed with 1-cycle throughput and 3-cycle latency. (A single-precision instruction spends one cycle in each stage of the FPU). A double-precision multiply requires two cycles in the multiply stage and one cycle in each additional stage. A double-precision multiply/add has a 2-cycle throughput and a 4-cycle latency. As instructions are dispatched to the FPU reservation station, source operand data can be accessed from the FPRs or from the FPR rename buffers. Results, in turn, are written to the rename buffers and are made available to subsequent instructions. Instructions pass through the reservation station and the pipeline stages in program order. Stalls due to contention for FPRs are minimized by automatic allocation of the six floating-point rename buffers. The completion unit writes the contents of the rename buffer to the appropriate FPR when floating-point instructions are retired.
The 750GX supports all IEEE 754-1985 floating-point data types (normalized, denormalized, not a number (NaN), zero, and infinity) in hardware, eliminating the latency incurred by software exception routines. (Note that “exception” is also referred to as “interrupt” in the architecture specification.)
gx_01.fm.(1.2) | PowerPC 750GX Overview |
March 27,2006 | Page 31 of 377 |