308, Fadd Fcompp, Fadd Fcos, Fdecstp, Fdivrp

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 15. x87 Floating-Point Instructions (Continued)

		Encoding		Decode	FPU
Syntax						Latency	Note
	First	Second
			ModRM byte	type	pipe(s)
			ModRM byte	type	pipe(s)
	byte	byte
FCOMIP ST, ST(i)	DFh		11-110-xxx	VectorPath	FADD	3	3

FCOMP ST(i)	D8h		11-011-xxx	DirectPath	FADD	2	1

FCOMP [mem32real]	D8h		mm-011-xxx	DirectPath	FADD	4

FCOMP [mem64real]	DCh		mm-011-xxx	DirectPath	FADD	4

FCOMPP	DEh		11-011-001	DirectPath	FADD	2

FCOS	D9h		11-111-111	VectorPath	-	92

FDECSTP	D9h		11-110-110	DirectPath	FADD/FMUL/	2
					FSTORE

FDIV ST, ST(i)	D8h		11-110-xxx	DirectPath	FMUL	16/20	1, 6
						/24

FDIV ST(i), ST	DCh		11-111-xxx	DirectPath	FMUL	16/20	1, 6
						/24

FDIV [mem32real]	D8h		mm-110-xxx	DirectPath	FMUL	18/22	6
						/26

FDIV [mem64real]	DCh		mm-110-xxx	DirectPath	FMUL	18/22	6
						/26

FDIVP ST(i), ST	DEh		11-111-xxx	DirectPath	FMUL	16/20	1, 6
						/24

FDIVR ST, ST(i)	D8h		11-110-xxx	DirectPath	FMUL	16/20	1, 6
						/24

FDIVR ST(i), ST	DCh		11-111-xxx	DirectPath	FMUL	16/20	1, 6
						/24

FDIVR [mem32real]	D8h		mm-111-xxx	DirectPath	FMUL	18/22	6
						/26

FDIVR [mem64real]	DCh		mm-111-xxx	DirectPath	FMUL	18/22	6
						/26

FDIVRP	DEh		11-110-001	DirectPath	FMUL	16/20	6
						/24

Notes:

1.The last three bits of the ModRM byte select the stack entry ST(i).

2.These instructions have an effective latency as shown. However, these instructions generate an internal NOP with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of three per cycle and can use any of the three execution resources.

3.This is a VectorPath decoded operation that uses one execution pipe (one ROP).

4.There is additional latency associated with this instruction. “e” represents the difference between the exponents of the divisor and the dividend. If “s” is the number of normalization shifts performed on the result, then

n = (s+1)/2 where (0 <= n <= 32).

5.The latency provided for this operation is the best-case latency.

6.The three latency numbers represent the latency values for precision control settings of single precision, double precision, and extended precision, respectively.

308

Instruction Latencies

Appendix C

AMD 250 manual 308, Fadd Fcompp, Fadd Fcos, Fdecstp, Fdivrp

Models: 250

Software Optimization Guide for AMD64 Processors

Table 15. x87 Floating-Point Instructions (Continued)

FPU

Syntax

FADD

FCOMPP

FADD

FCOS

FDECSTP

FADD/FMUL/

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FDIVRP

FMUL

16/20

308