309, Fincstp, FADD/FMUL Fstore Finit

25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

Table 15. x87 Floating-Point Instructions (Continued)

		Encoding		Decode	FPU
Syntax						Latency	Note
	First	Second
			ModRM byte	type	pipe(s)
			ModRM byte	type	pipe(s)
	byte	byte
FDIVRP ST(i), ST	DEh		11-110-xxx	DirectPath	FMUL	16/20	1, 6
						/24

FFREE ST(i)	DDh		11-000-xxx	DirectPath	FADD/FMUL/	2	1, 2
					FSTORE

FIADD [mem32int]	DAh		mm-000-xxx	Double	-	11

FIADD [mem16int]	DEh		mm-000-xxx	Double	-	11

FICOM [mem32int]	DAh		mm-010-xxx	Double	-	9

FICOM [mem16int]	DEh		mm-010-xxx	Double	-	9

FICOMP [mem32int]	DAh		mm-011-xxx	Double	-	9

FICOMP [mem16int]	DEh		mm-011-xxx	Double	-	9

FIDIV [mem32int]	DAh		mm-110-xxx	Double	-	18

FIDIV [mem16int]	DEh		mm-110-xxx	Double	-	18

FIDIVR [mem32int]	DAh		mm-111-xxx	Double	-	18

FIDIVR [mem16int]	DEh		mm-111-xxx	Double	-	18

FILD [mem16int]	DFh		mm-000-xxx	DirectPath	FSTORE	6

FILD [mem32int]	DBh		mm-000-xxx	DirectPath	FSTORE	6

FILD [mem64int]	DFh		mm-101-xxx	DirectPath	FSTORE	6

FIMUL [mem32int]	DAh		mm-001-xxx	Double	-	11

FIMUL [mem16int]	DEh		mm-001-xxx	Double	-	11

FINCSTP	D9h		11-110-111	DirectPath	FADD/FMUL/	2	2
					FSTORE

FINIT	DBh		11-100-011	VectorPath	-	~

FIST [mem16int]	DFh		mm-010-xxx	DirectPath	FSTORE	4

FIST [mem32int]	DBh		mm-010-xxx	DirectPath	FSTORE	4

FISTP [mem16int]	DFh		mm-011-xxx	DirectPath	FSTORE	4

FISTP [mem32int]	DBh		mm-011-xxx	DirectPath	FSTORE	4

Notes:

1.The last three bits of the ModRM byte select the stack entry ST(i).

2.These instructions have an effective latency as shown. However, these instructions generate an internal NOP with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of three per cycle and can use any of the three execution resources.

3.This is a VectorPath decoded operation that uses one execution pipe (one ROP).

4.There is additional latency associated with this instruction. “e” represents the difference between the exponents of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then

n = (s+1)/2 where (0 <= n <= 32).

5.The latency provided for this operation is the best-case latency.

6.The three latency numbers represent the latency values for precision control settings of single precision, double precision, and extended precision, respectively.

Appendix C

Instruction Latencies

AMD 250 manual 309, Fincstp, FADD/FMUL Fstore Finit

Models: 250

Table 15. x87 Floating-Point Instructions (Continued)

FPU

Syntax

FMUL

FADD/FMUL/

FINCSTP

FADD/FMUL/

FSTORE

FINIT

309