Appendix C Instruction Latencies 335
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
PCMPGTD xmmreg,
mem128
66h 0Fh 66h Double FADD/
FMUL
41/1
PCMPGTW xmmreg1,
xmmreg2
66h 0Fh 65h Double FADD/
FMUL
21/1
PCMPGTW xmmreg,
mem128
66h 0Fh 65h Double FADD/
FMUL
41/1
PEXTRW reg32/64,
xmmreg, imm8
66h 0Fh C5h Double FSTORE 4 1/1
PINSRW xmmreg,
reg32/64, imm8
66h 0Fh C4h VectorPath FADD/
FMUL
10 1/1
PINSRW xmmreg,
mem128, imm8
66h 0Fh C4h Double FADD/
FMUL
41/1
PMADDWD xmmreg1,
xmmreg2
66h 0Fh F5h Double FMUL 4 1/2
PMADDWD xmmreg,
mem128
66h 0Fh F5h Double FMUL 6 1/2
PMAXSW xmmreg1,
xmmreg2
66h 0Fh EEh Double FADD/
FMUL
21/1
PMAXSW xmmreg,
mem128
66h 0Fh EEh Double FADD/
FMUL
41/1
PMAXUB xmmreg1,
xmmreg2
66h 0Fh DEh Double FADD/
FMUL
21/1
PMAXUB xmmreg,
mem128
66h 0Fh DEh Double FADD/
FMUL
41/1
PMINSW xmmreg1,
xmmreg2
66h 0Fh EAh Double FADD/
FMUL
21/1
PMINSW xmmreg,
mem128
66h 0Fh EAh Double FADD/
FMUL
41/1
PMINUB xmmreg1,
xmmreg2
66h 0Fh DAh Double FADD/
FMUL
21/1
PMINUB xmmreg,
mem128
66h 0Fh DAh Double FADD/
FMUL
41/1
PMOVMSKB reg32/64,
xmmreg
66h 0Fh D7h VectorPath FADD/
FMUL
31/1
Table 19. SSE2 Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency
Throughput
Note
Prefix
byte
First
byte
2nd
byte ModRM byte
Notes:
1. The low half of the result is available one cycle earlier than listed.
2. This is the execution latency for the instruction. The time to complete the external write depends on the memory
speed and the hardware implementation.