336, Fadd Fmul | AMD 250 specification

Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 19. SSE2 Instructions (Continued)

		Encoding					Latency	Throughput
		Encoding
Syntax					Decode	FPU			Note
	Prefix	First	2nd		Decode	FPU
				ModRM byte	type	pipe(s)

	byte	byte	byte
	byte	byte	byte

PMULHUW xmmreg1,	66h	0Fh	E4h		Double	FMUL	4	1/2
xmmreg2

PMULHUW xmmreg,	66h	0Fh	E4h		Double	FMUL	6	1/2
mem128

PMULHW xmmreg1,	66h	0Fh	E5h		Double	FMUL	4	1/2
xmmreg2

PMULHW xmmreg,	66h	0Fh	E5h		Double	FMUL	6	1/2
mem128

PMULLW xmmreg1,	66h	0Fh	D5h		Double	FMUL	4	1/2
xmmreg2

PMULLW xmmreg,	66h	0Fh	D5h		Double	FMUL	6	1/2
mem128

PMULUDQ mmreg1,	0Fh	F4h			DirectPath	FMUL	3	1/2
mmreg2

PMULUDQ mmreg,	0Fh	F4h			DirectPath	FMUL	5	1/2
mem64

PMULUDQ xmmreg1,	66h	0Fh	F4h		Double	FMUL	4	1/2
xmmreg2

PMULUDQ xmmreg,	66h	0Fh	F4h		Double	FMUL	6	1/2
mem128

POR xmmreg1,	66h	0Fh	EBh		Double	FADD/	2	1/1
xmmreg2						FMUL

POR xmmreg, mem128	66h	0Fh	EBh		Double	FADD/	4	1/1
						FMUL

PSADBW xmmreg1,	66h	0Fh	F6h		Double	FADD	4	1/2
xmmreg2

PSADBW xmmreg,	66h	0Fh	F6h		Double	FADD	6	1/2
mem128

PSHUFD xmmreg1,	66h	0Fh	70h		VectorPath	~	4
xmmreg2, imm8

PSHUFD xmmreg,	66h	0Fh	70h		VectorPath	~	6
mem128, imm8

PSHUFHW xmmreg1,	F3h	0Fh	70h		Double	FADD/	2	1/1
xmmreg2, imm8						FMUL

Notes:

1. The low half of the result is available one cycle earlier than listed.

2. This is the execution latency for the instruction. The time to complete the external write depends on the memory speed and the hardware implementation.

336

Instruction Latencies

Appendix C

AMD 250 manual 336, Fadd Fmul

Models: 250

Software Optimization Guide for AMD64 Processors

Table 19. SSE2 Instructions (Continued)

FPU

Prefix

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FMUL

FADD/

FMUL

FMUL

336