Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 14. MMX™ Technology Instructions (Continued)

 

 

Encoding

Decode

 

 

 

Syntax

 

 

 

FPU pipe(s)

Latency

Note

Prefix

First

 

ModRM byte

type

 

 

 

 

 

byte

byte

 

 

 

 

 

PSUBSB mmreg1, mmreg2

0Fh

E8h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PSUBSB mmreg, mem64

0Fh

E8h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PSUBSW mmreg1, mmreg2

0Fh

E9h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PSUBSW mmreg, mem64

0Fh

E9h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PSUBUSB mmreg1, mmreg2

0Fh

D8h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PSUBUSB mmreg, mem64

0Fh

D8h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PSUBUSW mmreg1, mmreg2

0Fh

D9h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PSUBUSW mmreg, mem64

0Fh

D9h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PSUBW mmreg1, mmreg2

0Fh

F9h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PSUBW mmreg, mem64

0Fh

F9h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PUNPCKHBW mmreg1,

0Fh

68h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

mmreg2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PUNPCKHBW mmreg, mem64

0Fh

68h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PUNPCKHDQ mmreg1,

0Fh

6Ah

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

mmreg2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PUNPCKHDQ mmreg, mem64

0Fh

6Ah

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PUNPCKHWD mmreg1,

0Fh

69h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

mmreg2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PUNPCKHWD mmreg, mem64

0Fh

69h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PUNPCKLBW mmreg1,

0Fh

60h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

mmreg2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PUNPCKLBW mmreg, mem64

0Fh

60h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PUNPCKLDQ mmreg1,

0Fh

62h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

mmreg2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PUNPCKLDQ mmreg, mem64

0Fh

62h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PUNPCKLWD mmreg1,

0Fh

61h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

mmreg2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PUNPCKLWD mmreg, mem64

0Fh

61h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PXOR mmreg1, mmreg2

0Fh

EFh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PXOR mmreg, mem64

0Fh

EFh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

Notes:

1. Bits 2, 1, and 0 of the ModRM byte select the integer register.

2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of three per cycle and can use any of the three execution resources.

306

Instruction Latencies

Appendix C

Page 322
Image 322
AMD 250 manual 306