21264/EV68A Hardware Reference Manual
InternalArchitecture 2–9
21264/EV68A Microarchitecture
Figure 2–6 Integer Execution Unit—Clusters 0 and 1
Mostinstructions have 1-cycle latency for consumers that execute within the same clus-
ter.Also, there is another 1-cycle delay associated with producinga value in one cluster
andconsuming the value in the other cluster. The instructionissue queue minimizes the
performance effect of this cross-cluster delay.The Ebox contains the following
resources:
Four 64-bit addersthat are used to calculate results for integer add instructions
(locatedin U0, U1, L0, and L1)
The addersin the lowersubclusters that are used to generate the effective virtual
address forload and store instructions (located in L0 andL1)
Four logicunits
Twobarrel shifters and associated byte logic (located in U0 and U1)
Twosets of conditional branch logic (located in U0 and U1)
Twocopies of an 80-entry register file
One pipelinedmultiplier (located in U1) with 7-cycle latency for all integer multiply
operations
One fully-pipelinedunit (located in U0), with 3-cycle latency, that executesthe fol-
lowinginstructions:
CTLZ, CTPOP,CTTZ
PERR, MINxxx, MAXxxx,UNP Kxx,PKxx
L0
Register
U0
Load/Store Data
L1
Register
U1
Load/Store Data
iop_wr
iop_wr
eff_VA eff_VA
iop_wr
iop_wr
FM-05643.AI4