AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

/* block fill (destination QWORD aligned) */

__asm {

 

mov

edx, [dst_ptr]

mov

ecx, [blk_size]

shr

ecx, 6

movq

mm0, [fill_data]

align 16

 

$fill_nc:

 

movntq

[edx], mm0

movntq

[edx+8], mm0

movntq

[edx+16], mm0

movntq

[edx+24], mm0

movntq

[edx+32], mm0

movntq

[edx+40], mm0

movntq

[edx+48], mm0

movntq

[edx+56], mm0

add

edx, 64

dec

ecx

jnz

$fill_nc

femms

 

sfence

 

}

 

Use MMX™ PXOR to Clear All Bits in an MMX™ Register

To clear all the bits in an MMX register to zero, use:

PXOR MMreg, MMreg

Note that PXOR MMreg, MMreg is dependent on previous writes to MMreg. Therefore, using PXOR in the manner described can lengthen dependency chains, which in return may lead to reduced performance. An alternative in such cases is to use:

zero DD 0

MOVD MMreg, DWORD PTR [zero]

i.e., to load a zero from a statically initialized and properly aligned memory location. However, loading the data from memory runs the risk of cache misses. Cases where MOVD is superior to PXOR are therefore rare and PXOR should be used in general.

118

Use MMX™ PXOR to Clear All Bits in an MMX™ Register

Page 134
Image 134
AMD x86 manual Use MMX Pxor to Clear All Bits in an MMX Register