25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

C.3

MMX™ Technology Instructions

 

 

 

Table 14.

MMX™ Technology Instructions

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Encoding

Decode

 

 

 

Syntax

 

 

 

 

FPU pipe(s)

Latency

Note

 

Prefix

First

 

 

ModRM byte

type

 

 

 

 

 

 

 

byte

byte

 

 

 

 

 

EMMS

 

0Fh

77h

 

DirectPath

FADD/FMUL/

6

2

 

 

 

 

 

 

FSTORE

 

 

 

 

 

 

 

 

 

 

MOVD mmreg, reg32

0Fh

6Eh

11-xxx-xxx

Double

-

9

1

 

 

 

 

 

 

 

 

MOVD mmreg, reg64

0Fh

6Eh

11-xxx-xxx

Double

-

9

1

 

 

 

 

 

 

 

 

MOVD mmreg, mem32

0Fh

6Eh

mm-xxx-xxx

DirectPath

FADD/FMUL/

4

2

 

 

 

 

 

 

FSTORE

 

 

 

 

 

 

 

 

 

 

MOVD mmreg, mem64

0Fh

6Eh

mm-xxx-xxx

DirectPath

FADD/FMUL/

4

2

 

 

 

 

 

 

FSTORE

 

 

 

 

 

 

 

 

 

 

MOVD reg32, mmreg

0Fh

7Eh

11-xxx-xxx

Double

-

4

1

 

 

 

 

 

 

 

 

MOVD reg64, mmreg

0Fh

7Eh

11-xxx-xxx

Double

-

4

1

 

 

 

 

 

 

 

 

MOVD mem32, mmreg

0Fh

7Eh

mm-xxx-xxx

DirectPath

FSTORE

2

 

 

 

 

 

 

 

 

 

MOVD mem64, mmreg

0Fh

7Eh

mm-xxx-xxx

DirectPath

FSTORE

2

 

 

 

 

 

 

 

 

 

MOVQ mmreg1, mmreg2

0Fh

6Fh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

MOVQ mmreg, mem64

0Fh

6Fh

mm-xxx-xxx

DirectPath

FADD/FMUL/

4

2

 

 

 

 

 

 

FSTORE

 

 

 

 

 

 

 

 

 

 

MOVQ mmreg2, mmreg1

0Fh

7Fh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

MOVQ mem64, mmreg

0Fh

7Fh

mm-xxx-xxx

DirectPath

FSTORE

2

 

 

 

 

 

 

 

 

 

PACKSSDW mmreg1, mmreg2

0Fh

6Bh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PACKSSDW mmreg, mem64

0Fh

6Bh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PACKSSWB mmreg1, mmreg2

0Fh

63h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PACKSSWB mmreg, mem64

0Fh

63h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PACKUSWB mmreg1, mmreg2

0Fh

67h

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PACKUSWB mmreg, mem64

0Fh

67h

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PADDB mmreg1, mmreg2

0Fh

FCh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDB mmreg, mem64

0Fh

FCh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PADDD mmreg1, mmreg2

0Fh

FEh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDD mmreg, mem64

0Fh

FEh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

PADDSB mmreg1, mmreg2

0Fh

ECh

11-xxx-xxx

DirectPath

FADD/FMUL

2

 

 

 

 

 

 

 

 

 

PADDSB mmreg, mem64

0Fh

ECh

mm-xxx-xxx

DirectPath

FADD/FMUL

4

 

 

 

 

 

 

 

 

 

 

Notes:

 

 

 

 

 

 

 

 

1. Bits 2, 1, and 0 of the ModRM byte select the integer register.

 

 

 

 

2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP

with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of

three per cycle and can use any of the three execution resources.

 

 

 

 

 

 

 

 

 

 

 

 

Appendix C

Instruction Latencies

303

Page 319
Image 319
AMD 250 manual MMX Technology Instructions, 303