Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

3.432-Bit Legacy GPRs and Small Unsigned Integers

Optimization

Use the 32-bit legacy general-purpose registers (EAX through ESI) instead of their 64-bit extensions to store unsigned integer values whose range never requires more than 32 bits, even if subsequent statements use the 32-bit value in a 64-bit operation. (For example, use ECX instead of RCX until you need to perform a 64-bit operation; then use RCX.)

Rationale

In 64-bit mode, the machine-language representation of many instructions that operate on 64-bit register operands requires a REX prefix byte, which increases the size of the code. However, instructions that operate on a 32-bit legacy register operand do not require the prefix and have the desirable side-effect of clearing the upper 32 bits of the extended register to zero. For example, using the AND instruction on ECX clears the upper half of RCX.

Caution

Because the assembler also uses a REX prefix byte to encode the 32-bit sizes of the eight new 64-bit general-purpose registers (R8D–R15D), you should only use one of the original eight general- purpose registers (EAX through ESI) to implement this technique.

Example

The following example illustrates the unnecessary use of 64-bit registers to calculate the number of bytes remaining to be copied by an aligned block-copy routine after copying the first few bytes having addresses not meeting the routine’s 8-byte-alignment requirements. The first two statements, after the program comments, use the 64-bit R10 register—presumably, because this value is later used to adjust a 64-bit value in R8—even though the range of values stored in R10 take no more than four bits to represent. Using R10 instead of a smaller register requires a REX prefix byte (in this case, 49), which increases the size of the machine-language code.

;Input:

;R10 = source address (src)

;R8 = number of bytes to copy (count)

49

F7

DA

 

neg r10

 

; Subtract

the source address from 2^64.

49

83

E2

07

and

r10, 7

; Determine how many bytes were copied

separately.

4D

2B

C2

 

sub

r8,

r10 ;

Subtract

the number of bytes already

copied from

 

 

 

 

 

 

 

;

the number of bytes to copy.

 

68

General 64-Bit Optimizations

Chapter 3

Page 84
Image 84
AMD 250 manual Bit Legacy GPRs and Small Unsigned Integers, Example