AMD x86 manual Ensure DF=0 UP, Align Source, Destination with, Operand Size, Inline REP String

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 101
Image 101

22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Ensure DF=0 (UP)

Always make sure that DF = 0 (UP) (after execution of CLD) for

 

REP MOVS and REP STOS. DF = 1 (DOWN) is only needed for

 

certain cases of overlapping REP MOVS (for example, source

 

and destination overlap).

 

While string instructions with DF = 1 (DOWN) are slower, only

 

the overhead part of the cycle equation is larger and not the

 

throughput part. See Table 1, “Latency of Repeated String

 

Instructions,” on page 84 for additional latency numbers.

Align Source and

For REP MOVS, make sure that both source and destination are

Destination with

aligned with regard to the operand size. Handle the end case

Operand Size

separately, if necessary. If either source or destination cannot

 

be aligned, make the destination aligned and the source

 

misaligned. For REP STOS, make the destination aligned.

Inline REP String

Expand REP string instructions into equivalent sequences of

with Low Counts

simple x86 instructions, if the repeat count is constant and less

 

than eight. Use an inline sequence of loads and stores to

 

accomplish the move. Use a sequence of stores to emulate REP

 

STOS. This technique eliminates the setup overhead of REP

 

instructions and increases instruction throughput.

Use Loop for REP

If the repeated count is variable, but is likely less than eight,

String with Low

use a simple loop to move/store the data. This technique avoids

Variable Counts

the overhead of REP MOVS and REP STOS.

Using MOVQ and

To fill or copy blocks of data that are larger than 512 bytes, or

MOVNTQ for Block

where the destination is in uncacheable memory, it is

Copy/Fill

recommended to use the MMX instructions MOVQ/MOVNTQ

 

instead of REP STOS and REP MOVS in order to achieve

 

maximum performance. (See the guideline, “Use MMX™

 

Instructions for Block Copies and Block Fills” on page 115.)

Repeated String Instruction Usage

85

Page 101
Image 101
AMD x86 Ensure DF=0 UP, Align Source, Destination with, Operand Size, Inline REP String, With Low Counts, Use Loop for REP