22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Since out-of-range arguments are extremely uncommon, the conditional branch will be perfectly predicted, and the other instructions used to guard the trigonometric instruction can execute in parallel to it.

Take Advantage of the FSINCOS Instruction

Frequently, a piece of code that needs to compute the sine of an argument also needs to compute the cosine of that same argument. In such cases, the FSINCOS instruction should be used to compute both trigonometric functions concurrently, which is faster than using separate FSIN and FCOS instructions to accomplish the same task.

Example 1 (Avoid):

FLD

QWORD PTR

[x]

FLD

DWORD PTR

[two_to_the_63]

FCOMIP

ST,ST(1)

 

JBE

$in_range

 

CALL

$reduce_range

$in_range:

 

FLD

ST(0)

 

FCOS

 

 

FSTP

QWORD PTR

[cosine_x]

FSIN

 

 

FSTP

QWORD PTR

[sine_x]

Example 2 (Preferred):

FLD

QWORD PTR

[x]

FLD

DWORD PTR

[two_to_the_63]

FCOMIP

ST,ST(1)

 

JBE

$in_range

 

CALL

$reduce_range

$in_range:

 

FSINCOS

 

 

FSTP

QWORD PTR

[cosine_x]

FSTP

QWORD PTR

[sine_x]

Take Advantage of the FSINCOS Instruction

105

Page 121
Image 121
AMD x86 manual Take Advantage of the Fsincos Instruction