332

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 19. SSE2 Instructions (Continued)

		Encoding					Latency	Throughput
		Encoding
Syntax					Decode	FPU			Note
	Prefix	First	2nd		Decode	FPU
				ModRM byte	type	pipe(s)

	byte	byte	byte
	byte	byte	byte

MOVUPD xmmreg1,	66h	0Fh	10h		Double	FADD/	2
xmmreg2						FMUL

MOVUPD xmmreg,	66h	0Fh	10h		VectorPath	FADD/	7
mem128						FMUL/
						FSTORE

MOVUPD xmmreg1,	66h	0Fh	11h		Double	FADD/	2
xmmreg2						FMUL

MOVUPD mem128,	66h	0Fh	11h		VectorPath	FSTORE	4
xmmreg

MULPD xmmreg1,	66h	0Fh	59h		Double	FMUL	5	1/2
xmmreg2

MULPD xmmreg,	66h	0Fh	59h		Double	FMUL	7	1/2
mem128

MULSD xmmreg1,	F2h	0Fh	59h		DirectPath	FMUL	4	1/1
xmmreg2

MULSD xmmreg,	F2h	0Fh	59h		DirectPath	FMUL	6	1/1
mem64

ORPD xmmreg1,	66h	0Fh	56h		Double	FMUL	3	1/2
xmmreg2

ORPD xmmreg,	66h	0Fh	56h		Double	FMUL	5	1/2
mem128

PACKSSDW xmmreg1,	66h	0Fh	6Bh		VectorPath	~	4
xmmreg2

PACKSSDW xmmreg,	66h	0Fh	6Bh		VectorPath	~	6
mem128

PACKSSWB xmmreg1,	66h	0Fh	63h		VectorPath	~	4
xmmreg2

PACKSSWB xmmreg,	66h	0Fh	63h		VectorPath	~	6
mem128

PACKUSWB xmmreg1,	66h	0Fh	67h		VectorPath	~	4
xmmreg2

PACKUSWB xmmreg,	66h	0Fh	67h		VectorPath	~	6
mem128

PADDB xmmreg1,	66h	0Fh	FCh		Double	FADD/	2	1/1
xmmreg2						FMUL

Notes:

1. The low half of the result is available one cycle earlier than listed.

2. This is the execution latency for the instruction. The time to complete the external write depends on the memory speed and the hardware implementation.

Instruction Latencies

Appendix C

AMD 250 manual 332

Models: 250

Software Optimization Guide for AMD64 Processors

Table 19. SSE2 Instructions (Continued)

FPU

Prefix

FMUL

FMUL/

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

332