AMD x86 manual 3DNow! and MMX Intra-Operand Swapping, AMD Athlon, Specific Code, Blended Code

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 128
Image 128

AMD Athlon™ Processor x86 Code Optimization

Example:

22007E/0 — November 1999

PXOR

MM2, MM2

;

0

0

 

MOVD

MM0, [ab]

; 0 0 b a

MOVD

MM1, [cd]

; 0

0

d c

PUNPCKLWD MM0, MM2

; 0

b 0

a

PUNCPKLWD

MM1, MM2

; 0

d 0

c

PMADDWD

MM0, MM1

; b*d a*c

3DNow!™ and MMX™ Intra-Operand Swapping

AMD Athlon™

If the swapping of MMX register halves is necessary, use the

Specific Code

PSWAPD instruction, which is a new AMD Athlon 3DNow! DSP

 

extension . Use of this instruction should only be for

 

AMD Athlon specific code. “PSWAPD MMreg1, MMreg2”

 

performs the following operation:

 

mmreg1[63:32] = mmreg2[31:0])

 

mmreg1[31:0] = mmreg2[63:32])

 

See the AMD Extensions to the 3DNow! and MMX Instruction Set

 

Manual, order #22466 for more usage information.

Blended Code

Otherwise, for blended code, which needs to run well on

 

AMD-K6 and AMD Athlon family processors, the following code

 

is recommended:

 

 

Example 1 (Preferred, faster):

 

 

;MM1 = SWAP

(MM0), MM0 destroyed

 

MOVQ

MM1, MM0

;make a copy

 

PUNPCKLDQ

MM0, MM0

;duplicate lower half

 

PUNPCKHDQ

MM1, MM0

;combine lower halves

 

Example 2 (Preferred, fast):

 

 

;MM1 = SWAP

(MM0), MM0 preserved

 

MOVQ

MM1, MM0

;make a copy

 

PUNPCKHDQ

MM1, MM1

;duplicate upper half

 

PUNPCKLDQ

MM1, MM0

;combine upper halves

Both examples accomplish the swapping, but the first example should be used if the original contents of the register do not need to be preserved. The first example is faster due to the fact that the MOVQ and PUNPCKLDQ instructions can execute in parallel. The instructions in the second example are dependent on one another and take longer to execute.

112

3DNow!™ and MMX™ Intra-Operand Swapping

Page 128
Image 128
AMD x86 manual 3DNow! and MMX Intra-Operand Swapping, AMD Athlon, Specific Code, Blended Code