Software Optimization Guide for AMD64 Processors

25112 Rev. 3.06 September 2005

Table 14. MMX™ Technology Instructions (Continued)

 

 

Encoding

Decode

 

 

 

Syntax

 

 

 

FPU pipe(s)

Latency

Note

Prefix

First

 

ModRM byte

type

 

 

 

 

 

byte

byte

 

 

 

 

 

PADDSW mmreg1, mmreg2

0Fh

EDh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDSW mmreg, mem64

0Fh

EDh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PADDUSB mmreg1, mmreg2

0Fh

DCh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDUSB mmreg, mem64

0Fh

DCh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PADDUSW mmreg1, mmreg2

0Fh

DDh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDUSW mmreg, mem64

0Fh

DDh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PADDW mmreg1, mmreg2

0Fh

FDh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDW mmreg, mem64

0Fh

FDh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PAND mmreg1, mmreg2

0Fh

DBh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PAND mmreg, mem64

0Fh

DBh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PANDN mmreg1, mmreg2

0Fh

DFh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PANDN mmreg, mem64

0Fh

DFh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PCMPEQB mmreg1, mmreg2

0Fh

74h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PCMPEQB mmreg, mem64

0Fh

74h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PCMPEQD mmreg1, mmreg2

0Fh

76h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PCMPEQD mmreg, mem64

0Fh

76h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PCMPEQW mmreg1, mmreg2

0Fh

75h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PCMPEQW mmreg, mem64

0Fh

75h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PCMPGTB mmreg1, mmreg2

0Fh

64h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PCMPGTB mmreg, mem64

0Fh

64h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PCMPGTD mmreg1, mmreg2

0Fh

66h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PCMPGTD mmreg, mem64

0Fh

66h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PCMPGTW mmreg1, mmreg2

0Fh

65h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PCMPGTW mmreg, mem64

0Fh

65h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PMADDWD mmreg1, mmreg2

0Fh

F5h

11-xxx-xxx

DirectPath

FMUL

3

 

 

 

 

 

 

 

 

 

PMADDWD mmreg, mem64

0Fh

F5h

mm-xxx-xxx

DirectPath

FMUL

5

 

 

 

 

 

 

 

 

 

PMULHW mmreg1, mmreg2

0Fh

E5h

11-xxx-xxx

DirectPath

FMUL

3

 

 

 

 

 

 

 

 

 

PMULHW mmreg, mem64

0Fh

E5h

mm-xxx-xxx

DirectPath

FMUL

5

 

 

 

 

 

 

 

 

 

PMULLW mmreg1, mmreg2

0Fh

D5h

11-xxx-xxx

DirectPath

FMUL

3

 

 

 

 

 

 

 

 

 

PMULLW mmreg, mem64

0Fh

D5h

mm-xxx-xxx

DirectPath

FMUL

5

 

 

 

 

 

 

 

 

 

Notes:

1. Bits 2, 1, and 0 of the ModRM byte select the integer register.

2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of three per cycle and can use any of the three execution resources.

304

Instruction Latencies

Appendix C

Page 320
Image 320
AMD 250 manual 304, Fmul