68 General 64-Bit Optimizations Chapter 3
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
3.4 32-Bit Legacy GPRs and Small Unsigned Integers
Optimization
Use the 32-bit legacy general-purpose registers (EAX through ESI) instead of their 64-bit extensions
to store unsigned integer values whose range never requires more than 32bits, even if subsequent
statements use the 32-bit value in a 64-bit operation. (For example, use ECX instead of RCX until you
need to perform a 64-bit operation; then use RCX.)
Rationale
In 64-bit mode, the machine-language representation of many instructions that operate on 64-bit
register operands requires a REX prefix byte, which increases the size of the code. However,
instructions that operate on a 32-bit legacy register operand do not require the prefix and have the
desirable side-effect of clearing the upper 32 bits of the extended register to zero. For example, using
the AND instruction on ECX clears the upper half of RCX.

Caution

Because the assembler also uses a REX prefix byte to encode the 32-bit sizes of the eight new 64-bit
general-purpose registers (R8D–R15D), you should only use one of the original eight general-
purpose registers (EAX through ESI) to implement this technique.
Example
The following example illustrates the unnecessary use of 64-bit registers to calculate the number of
bytes remaining to be copied by an aligned block-copy routine after copying the first few bytes having
addresses not meeting the routine’s 8-byte-alignment requirements. The first two statements, after the
program comments, use the 64-bit R10 register—presumably, because this value is later used to
adjust a 64-bit value in R8—even though the range of values stored in R10 take no more than four bits
to represent. Using R10 instead of a smaller register requires a REX prefix byte (in this case, 49),
which increases the size of the machine-language code.
; Input:
; R10 = source address (src)
; R8 = number of bytes to copy (count)
49 F7 DA neg r10 ; Subtract the source address from 2^64.
49 83 E2 07 and r10, 7 ; Determine how many bytes were copied separately.
4D 2B C2 sub r8, r10 ; Subtract the number of bytes already copied from
; the number of bytes to copy.