25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

To improve code density, the following rewritten code uses ECX until it is absolutely necessary to use RCX, eliminating two REX prefix bytes:

F7

D9

 

neg ecx

 

; Subtract

the

source

address from 2^32 (the processor

 

 

 

 

 

 

; clears the

high 32 bits of RCX).

 

83

E1

07

and

ecx, 7

; Determine how

many bytes were copied

separately.

4C

2B

C1

sub

r8,

rcx

; Subtract

the

number of

bytes already

copied from

 

 

 

 

 

 

; the number

of

bytes to

copy.

 

Chapter 3

General 64-Bit Optimizations

69

Page 85
Image 85
AMD 250 manual Chapter General 64-Bit Optimizations