324

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 18. SSE Instructions (Continued)

		Encoding			Decode
Syntax						FPU pipe(s)	Latency	Note
	Prefix	First	2nd
				ModRM byte	type

	byte	byte	byte
	byte	byte	byte

SHUFPS xmmreg1,	0Fh	C6h		11-xxx-xxx	VectorPath	FMUL	4	1
xmmreg2, imm8

SHUFPS xmmreg,	0Fh	C6h		mm-xxx-xxx	VectorPath	FMUL	6	2
mem128, imm8

SQRTPS xmmreg1,	0Fh	51h		11-xxx-xxx	Double	FMUL	39
xmmreg2

SQRTPS xmmreg,	0Fh	51h		mm-xxx-xxx	Double	FMUL	41
mem128

SQRTSS xmmreg1,	F3h	0Fh	51h	11-xxx-xxx	DirectPath	FMUL	19
xmmreg2

SQRTSS xmmreg,	F3h	0Fh	51h	mm-xxx-xxx	DirectPath	FMUL	21
mem32

STMXCSR mem32	0Fh	AEh		mm-011-xxx	VectorPath		11	4

SUBPS xmmreg1,	0Fh	5Ch		11-xxx-xxx	Double	FADD	5	1
xmmreg2

SUBPS xmmreg,	0Fh	5Ch		mm-xxx-xxx	Double	FADD	7	1
mem128

SUBSS xmmreg1,	F3h	0Fh	5Ch	11-xxx-xxx	DirectPath	FADD	4
xmmreg2

SUBSS xmmreg,	F3h	0Fh	5Ch	mm-xxx-xxx	DirectPath	FADD	6
mem32

UCOMISS xmmreg1,	0Fh	2Eh		11-xxx-xxx	VectorPath		4
xmmreg2

UCOMISS xmmreg,	0Fh	2Eh		mm-xxx-xxx	VectorPath		6
mem32

UNPCKHPS xmmreg1,	0Fh	15h		11-xxx-xxx	Double	FMUL	3	1
xmmreg2

UNPCKHPS xmmreg,	0Fh	15h		mm-xxx-xxx	Double	FMUL	5	1
mem128

Notes:

1. The low half of the result is available one cycle earlier than listed.

2. The second latency value indicates when the low half of the result becomes available.

3. The high half of the result is available one cycle earlier than listed.

4. The latency listed is the absolute minimum, while average latencies may be higher and are a function of internal pipeline conditions.

5. For the PREFETCHNTA/T0/T1/T2 instructions, the mem8 value refers to an address in the 64-byte line to be prefetched.

6. The 8-clock latency is only visible to younger stores that need to do an external write. The 2-clock latency is visible to the other stores and instructions.

7. This is the execution latency for the instruction. The time to complete the external write depends on the memory speed and the hardware implementation.

Instruction Latencies

Appendix C

AMD 250 manual 324

Models: 250

Software Optimization Guide for AMD64 Processors

Table 18. SSE Instructions (Continued)

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

324