25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Table 15. x87 Floating-Point Instructions (Continued)

 

 

Encoding

Decode

FPU

 

 

Syntax

 

 

 

Latency

Note

First

Second

 

ModRM byte

type

pipe(s)

 

 

 

 

byte

byte

 

 

 

 

 

FMUL [mem32real]

D8h

 

mm-001-xxx

DirectPath

FMUL

6

 

 

 

 

 

 

 

 

 

FMUL [mem64real]

DCh

 

mm-001-xxx

DirectPath

FMUL

6

 

 

 

 

 

 

 

 

 

FMULP ST(i), ST

DEh

 

11-001-xxx

DirectPath

FMUL

4

1

 

 

 

 

 

 

 

 

FNCLEX

DBh

E2h

 

VectorPath

 

16

 

 

 

 

 

 

 

 

 

FNINIT

DBh

E3h

 

VectorPath

 

89

 

 

 

 

 

 

 

 

 

FNOP

D9h

 

11-010-000

DirectPath

FADD/FMUL/

2

2

 

 

 

 

 

FSTORE

 

 

 

 

 

 

 

 

 

 

FPATAN

D9h

 

11-110-011

VectorPath

-

136

 

 

 

 

 

 

 

 

 

FPREM

D9h

 

11-111-000

DirectPath

FMUL

9+e+n

4

 

 

 

 

 

 

 

 

FPREM1

D9h

 

11-110-101

DirectPath

FMUL

9+e+n

4

 

 

 

 

 

 

 

 

FPTAN

D9h

 

11-110-010

VectorPath

-

107

 

 

 

 

 

 

 

 

 

FRNDINT

D9h

 

11-111-100

VectorPath

-

10

 

 

 

 

 

 

 

 

 

FRSTOR [mem94byte]

DDh

 

mm-100-xxx

VectorPath

-

138

 

 

 

 

 

 

 

 

 

FRSTOR [mem108byte]

DDh

 

mm-100-xxx

VectorPath

-

138

 

 

 

 

 

 

 

 

 

FSAVE [mem94byte]

DDh

 

mm-110-xxx

VectorPath

-

159

 

 

 

 

 

 

 

 

 

FSAVE [mem108byte]

DDh

 

mm-110-xxx

VectorPath

-

159

 

 

 

 

 

 

 

 

 

FSCALE

D9h

 

11-111-101

VectorPath

-

9

 

 

 

 

 

 

 

 

 

FSIN

D9h

 

11-111-110

VectorPath

-

93

 

 

 

 

 

 

 

 

 

FSINCOS

D9h

 

11-111-011

VectorPath

-

104

 

 

 

 

 

 

 

 

 

FSQRT

D9h

 

11-111-010

DirectPath

FMUL

35

 

 

 

 

 

 

 

 

 

FST [mem32real]

D9h

 

mm-010-xxx

DirectPath

FSTORE

2

 

 

 

 

 

 

 

 

 

FST [mem64real]

DDh

 

mm-010-xxx

DirectPath

FSTORE

2

 

 

 

 

 

 

 

 

 

FST ST(i)

DDh

 

11-010xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

FSTCW [mem16]

D9h

 

mm-111-xxx

VectorPath

-

4

 

 

 

 

 

 

 

 

 

FSTENV [mem14byte]

D9h

 

mm-110-xxx

VectorPath

-

89

 

 

 

 

 

 

 

 

 

Notes:

1. The last three bits of the ModRM byte select the stack entry ST(i).

2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of three per cycle and can use any of the three execution resources.

3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).

4. There is additional latency associated with this instruction. “e” represents the difference between the exponents of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then

n = (s+1)/2 where (0 <= n <= 32).

5. The latency provided for this operation is the best-case latency.

6. The three latency numbers represent the latency values for precision control settings of single precision, double precision, and extended precision, respectively.

Appendix C

Instruction Latencies

311

Page 327
Image 327
AMD 250 manual 311