22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

FPU into truncating mode, and performing all of the conversions before restoring the original control word.

The speed of the above code is somewhat dependent on the nature of the code surrounding it. For applications in which the speed of floating-point-to-integer conversions is extremely critical for application performance, experiment with either of the following substitutions, which may or may not be faster than the code above.

The first substitution simulates a truncating floating-point to integer conversion provided that there are no NaNs, infinities, and overflows. This conversion is therefore not IEEE-754 compliant. This code works properly only if the current FPU rounding mode is round-to-nearest-even, which is usually the case.

Example 2 (Potentially faster).

FLD

QWORD PTR [X]

;load double to be converted

FST

DWORD PTR [TX]

;store X

because sign(X) is needed

FIST

DWORD PTR [I]

;store rndint(x) as default result

FISUB

DWORD PTR [I]

;compute

DIFF = X - rndint(X)

FSTP

DWORD PTR [DIFF]

;store DIFF as we need sign(DIFF)

MOV

EAX, [TX]

 

;X

 

MOV

EDX, [DIFF]

 

;DIFF

 

TEST

EDX, EDX

 

;DIFF ==

0 ?

JZ

$DONE

 

;default

result is OK, done

XOR

EDX, EAX

; need correction if sign(X) != sign(DIFF)

SAR

EAX, 31

 

;(X<0) ?

0xFFFFFFFF : 0

SAR

EDX, 31

 

; sign(X)!=sign(DIFF)?0xFFFFFFFF:0

LEA

EAX, [EAX+EAX+1]

;(X<0) ?

0xFFFFFFFF : 1

AND

EDX, EAX

 

;correction: -1, 0, 1

SUB

[I], EDX

 

;trunc(X)=rndint(X)-correction

$DONE:

 

 

 

 

The second substitution simulates a truncating floating-point to integer conversion using only integer instructions and therefore works correctly independent of the FPUs current rounding mode. It does not handle NaNs, infinities, and overflows according to the IEEE-754 standard. Note that the first instruction of this code may cause an STLF size mismatch resulting in performance degradation if the variable to be converted has been stored recently.

Minimize Floating-Point-to-Integer Conversions

101

Page 117
Image 117
AMD x86 manual Example 2 Potentially faster