21264/EV68A Microarchitecture

Figure 2–6 Integer Execution Unit—Clusters 0 and 1

iop_wr iop_wr

U0

Register

L0

iop_wr iop_wr Load/Store Data Load/Store Data

eff_VA

U1

Register

L1

eff_VA

FM-05643.AI4

Most instructions have 1-cycle latency for consumers that execute within the same clus- ter. Also, there is another 1-cycle delay associated with producing a value in one cluster and consuming the value in the other cluster. The instruction issue queue minimizes the performance effect of this cross-cluster delay. The Ebox contains the following resources:

Four 64-bit adders that are used to calculate results for integer add instructions (located in U0, U1, L0, and L1)

The adders in the lower subclusters that are used to generate the effective virtual address for load and store instructions (located in L0 and L1)

Four logic units

Two barrel shifters and associated byte logic (located in U0 and U1)

Two sets of conditional branch logic (located in U0 and U1)

Two copies of an 80-entry register file

One pipelined multiplier (located in U1) with 7-cycle latency for all integer multiply operations

One fully-pipelined unit (located in U0), with 3-cycle latency, that executes the fol- lowing instructions:

CTLZ, CTPOP, CTTZ

PERR, MINxxx, MAXxxx, UNPKxx, PKxx

21264/EV68A Hardware Reference Manual

Internal Architecture 2–9

Page 37
Image 37
Compaq EV68A specifications PERR, MINxxx, MAXxxx, UNPKxx, PKxx, Integer Execution Unit-Clusters 0