AMD x86 manual Use 3DNow! Instructions for Fast Division, Optimized 14-Bit Precision Divide

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 124
Image 124

AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

FEMMS instruction is supported for backward compatibility with AMD-K6 family processors, and is aliased to the EMMS instruction.

3DNow! and MMX instructions are designed to be used concurrently with no switching issues. Likewise, enhanced 3DNow! instructions can be used simultaneously with MMX instructions. However, x87 and 3DNow! instructions share the same architectural registers so there is no easy way to use them concurrently without cleaning up the register file in between using FEMMS/EMMS.

Use 3DNow!™ Instructions for Fast Division

3DNow! instructions can be used to compute a very fast, highly accurate reciprocal or quotient.

Optimized 14-Bit Precision Divide

This divide operation executes with a total latency of seven cycles, assuming that the program hides the latency of the first MOVD/MOVQ instructions within preceding code.

Example:

MOVD

MM0, [MEM]

;

0

W

 

PFRCP

MM0, MM0

;

1/W 1/W

(approximate)

MOVQ

MM2,

[MEM]

;

Y

X

 

PFMUL

MM2,

MM0

;

Y/W

X/W

 

Optimized Full 24-Bit Precision Divide

This divide operation executes with a total latency of 15 cycles, assuming that the program hides the latency of the first MOVD/MOVQ instructions within preceding code.

Example:

MOVD

MM0, [W]

;

0

W

 

PFRCP

MM1, MM0

;

1/W

1/W

(approximate)

PUNPCKLDQ

MM0, MM0

;

W

W

(MMX instr.)

PFRCPIT1

MM0, MM1

;

1/W

1/W

(refine)

MOVQ

MM2, [X_Y]

;

Y

X

 

PFRCPIT2

MM0, MM1

;

1/W

1/W

(final)

PFMUL

MM2, MM0

:

Y/W

X/W

 

108

Use 3DNow!™ Instructions for Fast Division

Page 124
Image 124
AMD x86 manual Use 3DNow! Instructions for Fast Division, Optimized 14-Bit Precision Divide