Chapter 3 General 64-Bit Optimizations 69
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
To improve code density, the following rewritten code uses ECX until it is absolutely necessary to use
RCX, eliminating two REX prefix bytes:
F7 D9 neg ecx ; Subtract the source address from 2^32 (the processor
; clears the high 32 bits of RCX).
83 E1 07 and ecx, 7 ; Determine how many bytes were copied separately.
4C 2B C1 sub r8, rcx ; Subtract the number of bytes already copied from
; the number of bytes to copy.