22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Use MMX™ Instructions for Block Copies and Block Fills

AMD-K6®and

AMD Athlon™

Processor Blended

Code

For moving or filling small blocks of data (e.g., less than 512 bytes) between cacheable memory areas, the REP MOVS and REP STOS families of instructions deliver good performance and are straightforward to use. For moving and filling larger blocks of data, or to move/fill blocks of data where the destination is in non-cacheable space, it is recommended to make use of MMX instructions and MMX extensions. The following examples all use quadword-aligned blocks of data. In cases where memory blocks are not quadword aligned, additional code is required to handle end cases as needed.

The following example code, written for the inline assembler of Microsoft Visual C, is suitable for moving/filling a large quad- word aligned block of data in the following situations:

Blended code, i.e., code that needs to perform well on both AMD Athlon and AMD-K6 family processors

AMD Athlon processor specific code where the destination is in cacheable memory and immediate data re-use of the data at the destination is expected

AMD-K6 family specific code where the destination is in non-cacheable memory

Example 1:

/* block copy (source and destination QWORD aligned) */

__asm {

 

 

mov

eax, [src_ptr]

mov

edx, [dst_ptr]

mov

ecx,

[blk_size]

shr

ecx,

6

align 16

Use MMX™ Instructions for Block Copies and Block Fills

115

Page 131
Image 131
AMD x86 manual Use MMX Instructions for Block Copies and Block Fills, AMD-K6and AMD Athlon Processor Blended Code