AMD x86 manual Example 3 Potentially faster, Example 4 Fastest

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 118
Image 118

AMD Athlon™ Processor x86 Code Optimization

Example 3 (Potentially faster):

22007E/0 — November 1999

MOV

ECX, DWORD PTR[X+4]

;get

upper

32 bits of double

XOR

EDX, EDX

 

;i =

0

 

 

MOV

EAX, ECX

 

;save sign

bit

AND

ECX, 07FF00000h

 

;isolate

exponent field

CMP

ECX, 03FF00000h

 

;if abs(x)

< 1.0

JB

$DONE2

 

; then i

=

0

MOV

EDX, DWORD PTR[X]

;get

lower

32 bits of double

SHR

ECX, 20

 

;extract

exponent

SHRD

EDX, EAX, 21

 

;extract

mantissa

NEG

ECX

;compute shift factor for extracting

ADD

ECX, 1054

 

;non-fractional mantissa bits

OR

EDX, 080000000h

 

;set

integer bit of mantissa

SAR

EAX, 31

 

;x <

0 ?

0xffffffff : 0

SHR

EDX, CL

 

;i =

trunc(abs(x))

XOR

EDX, EAX

 

;i =

x <

0

? ~i : i

SUB

EDX, EAX

 

;i =

x <

0

? -i : i

$DONE2:

 

 

 

 

 

 

MOV

[I], EDX

 

;store result

For applications which can tolerate a floating-point-to-integer conversion that is not compliant with existing programming language standards (but is IEEE-754 compliant), perform the conversion using the rounding mode that is currently in effect (usually round-to-nearest-even).

Example 4 (Fastest):

FLD

QWORD

PTR

[X]

;

get double to

be converted

FISTP

DWORD

PTR

[I]

;

store integer

result

Some compilers offer an option to use the code from example 4 for floating-point-to-integer conversion, using the default rounding mode.

Lastly, consider setting the rounding mode throughout an application to truncate and using the code from example 4 to perform extremely fast conversions that are compliant with language standards and IEEE-754. This mode is also provided as an option by some compilers. Note that use of this technique also changes the rounding mode for all other FPU operations inside the application, which can lead to significant changes in numerical results and even program failure (for example, due to lack of convergence in iterative algorithms).

102

Minimize Floating-Point-to-Integer Conversions

Page 118
Image 118
AMD x86 manual Example 3 Potentially faster, Example 4 Fastest