For example, the following fetch line: ADD CMOVx SUB CMOVy
Results in the following three map cycles: ADD CMOVx1
CMOVx2 SUB CMOVy1 CMOVy2

Special Cases of Alpha Instruction Execution

If instruction 1 is dependent on the load instruction data and the load instruction hits, instruction 1 is removed from the queue one cycle later (at the start of cycle 8). If the load instruction misses, then instruction 1 is aborted from the Fbox pipeline and may request service again in cycle 7.

2.7.2 Floating-Point Store Instructions

Floating-point store instructions are duplicated and loaded into both the IQ and the FQ from the mapper. Each IQ entry contains a control bit, fpWait, that when set prevents that entry from asserting its requests. This bit is initially set for each floating-point store instruction that enters the IQ, unless it was the target of a replay trap. The instruction’s FQ clone is issued when its Ra register is about to become clean, resulting in its IQ clone’s fpWait bit being cleared and allowing the IQ clone to issue and be executed by the Mbox. This mechanism ensures that floating-point store instructions are always issued to the Mbox, along with the associated data, without requiring the floating-point register dirty bits to be available within the IQ.

2.7.3 CMOV Instruction

For the 21264/EV68A, the Alpha CMOV instruction has three operands, and so pre- sents a special case. The required operation is to move either the value in register Rb or the value from the old physical destination register into the new destination register, based upon the value in Ra. Since neither the mapper nor the Ebox and Fbox data paths are otherwise required to handle three operand instructions, the CMOV instruction is decomposed by the Ibox pipeline into two 2-operand instructions:

The Alpha architecture instruction CMOV Ra, Rb Rc

The first instruction, CMOV1, tests the value of Ra and recordstheresultof this test in a 65th bit of its destination register, newRc1. It also copies the value of the old physical destination register, oldRc, to newRc1.

Becomes the 21264/EV68A instructions CMOV1 Ra, oldRc newRc1

CMOV2 newRc1, Rb newRc2

The second instruction, CMOV2, then copies either the value in newRc1 or the value in Rb into a second physical destination register, newRc2, based on the CMOV predicate bit stored in newRc1.

In summary, the original CMOV instruction is decomposed into two dependent instruc- tions that each use a physical register from the free list.

To further simplify this operation, the two component instructions of a CMOV instruc- tion are driven through the mappers in successive cycles. Hence, if a fetch line contains n CMOV instructions, it takes n+1 cycles to run that fetch line through the mappers.

2–26Internal Architecture

21264/EV68A Hardware Reference Manual

Page 54
Image 54
Compaq EV68A specifications Floating-Point Store Instructions, Cmov Instruction, 26Internal Architecture