Compaq ECQD2KCTE manual Byte Swap, Stylized Code Forms, 4.1 NOP

Models: ECQD2KCTE

1 371
Download 371 pages 20.35 Kb
Page 285
Image 285

16-bit quotient digit plus a 48-bit new partial dividend. Three more such steps can generate the full quotient. Having prior knowledge of the possible sizes of the divisor and dividend, normal- izing away leading bytes of zeros, and performing an early-out test can reduce the average number of multiplies to about five (compared to a best case of one and a worst case of nine).

A.4.3 Byte Swap

When it is necessary to swap all the bytes of a datum, perhaps because the datum originated on a machine of the opposite byte numbering convention, the simplest sequence is to use the VAX floating-point load instruction to swap words, followed by an integer sequence to swap four pairs of bytes. Assume as shown below that an aligned quadword datum is in memory at loca- tion X and is to be left in R1 after byte-swapping; temp is an aligned quadword temporary, and

"."(period) in the comments stands for a byte of zeros. Similar sequences can be used for data in registers, sometimes doing the byte swaps first and word swap second:

 

 

; X

= ABCD EFGH

LDG

F0,X

; F0 = GHEF CDAB

STT

F0,temp

 

 

LDQ

R1,temp

; R1 = GHEF CDAB

SLL

R1,#8,R2

; R2 = HEFC DAB.

SRL

R1,#8,R1

; R1 = .GHE FCDA

ZAP

R2,#55(hex),R2

; R2

= H.F. D.B.

ZAP

R1,#AA(hex),R1

; R1

= .G.E .C.A

OR

R1,R2,R1

; R1

= HGFE DCBA

For bulk swapping of arrays, this sequence can be usefully unrolled about four times and scheduled, using four different aligned quadword memory temps.

A.4.4 Stylized Code Forms

Using the same stylized code form for a common operation improves the readability of com- piler output and increases the likelyhood that an implementation will speed up the stylized form.

A.4.4.1 NOP

The universal NOP form is:

UNOP

==

LDQ_U R31,0(Rx)

In most implementations, UNOP should encounter no operand issue delays, no destination issue delay, and no functional unit issue delays. (In some implementations, it may encounter an operand issue delay for Rx.) Implementations are free to optimize UNOP into no action and zero execution cycles.

If the actual instruction is encoded as LDQ_U Rn,0(Rx), where n is other than 31, and such an instruction generates a memory-management exception, it is UNPREDICTABLE whether UNOP would generate the same exception. On most implementations, UNOP does not gener- ate memory management exceptions.

Software Considerations A–11

Page 285
Image 285
Compaq ECQD2KCTE manual Byte Swap, Stylized Code Forms, 4.1 NOP