320

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 18. SSE Instructions (Continued)

		Encoding			Decode
Syntax						FPU pipe(s)	Latency	Note
	Prefix	First	2nd
				ModRM byte	type

	byte	byte	byte
	byte	byte	byte

MOVAPS xmmreg1,	0Fh	29h		11-xxx-xxx	Double		2
xmmreg2

MOVAPS mem128,	0Fh	29h		mm-xxx-xxx	Double		3	1
xmmreg

MOVHLPS xmmreg1,	0Fh	12h		11-xxx-xxx	DirectPath		2
xmmreg2

MOVHPS xmmreg,	0Fh	16h		mm-xxx-xxx	DirectPath		2
mem64

MOVHPS mem64,	0Fh	17h		mm-xxx-xxx	DirectPath		2
xmmreg

MOVLHPS xmmreg1,	0Fh	16h		11-xxx-xxx	DirectPath		2
xmmreg2

MOVLPS xmmreg,	0Fh	12h		mm-xxx-xxx	DirectPath		2
mem64

MOVLPS mem64,	0Fh	13h		mm-xxx-xxx	DirectPath		2
xmmreg

MOVMSKPS reg32,	0Fh	50h		11-xxx-xxx	VectorPath		3
xmmreg

MOVNTPS mem128,	0Fh	2Bh		mm-xxx-xxx	Double		3	7
xmmreg

MOVNTQ mem64,	0Fh	E7h		mm-xxx-xxx	DirectPath	FSTORE	2	7
mmreg

MOVSS xmmreg1,	F3h	0Fh	10h	11-xxx-xxx	DirectPath		2
xmmreg2

MOVSS xmmreg,	F3h	0Fh	10h	mm-xxx-xxx	Double		3
mem32

MOVSS xmmreg1,	F3h	0Fh	11h	11-xxx-xxx	DirectPath		2
xmmreg2

Notes:

1. The low half of the result is available one cycle earlier than listed.

2. The second latency value indicates when the low half of the result becomes available.

3. The high half of the result is available one cycle earlier than listed.

4. The latency listed is the absolute minimum, while average latencies may be higher and are a function of internal pipeline conditions.

5. For the PREFETCHNTA/T0/T1/T2 instructions, the mem8 value refers to an address in the 64-byte line to be prefetched.

6. The 8-clock latency is only visible to younger stores that need to do an external write. The 2-clock latency is visible to the other stores and instructions.

7. This is the execution latency for the instruction. The time to complete the external write depends on the memory speed and the hardware implementation.

Instruction Latencies

Appendix C

AMD 250 manual 320

Models: 250

Software Optimization Guide for AMD64 Processors

Table 18. SSE Instructions (Continued)

320