AMD x86 manual Repeated String Instruction Usage, Latency of Repeated String Instructions

Models: x86

1 256
Download 256 pages 58.62 Kb
Page 100
Image 100

AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

In addition, using MMX instructions increases the available parallelism. The AMD Athlon processor can issue three integer OPs and two MMX OPs per cycle.

Repeated String Instruction Usage

Latency of Repeated String Instructions

Table 1 shows the latency for repeated string instructions on the

AMD Athlon processor.

Table 1. Latency of Repeated String Instructions

Instruction

ECX=0 (cycles)

DF = 0 (cycles)

DF = 1 (cycles)

 

 

 

 

 

REP

MOVS

11

15 + (4/3*c)

25 + (4/3*c)

 

 

 

 

 

REP

STOS

11

14 + (1*c)

24 + (1*c)

 

 

 

 

 

REP

LODS

11

15 + (2*c)

15 + (2*c)

 

 

 

 

 

REP

SCAS

11

15 + (5/2*c)

15 + (5/2*c)

 

 

 

 

 

REP

CMPS

11

16 + (10/3*c)

16 + (10/3*c)

 

 

 

 

 

Note:

 

 

 

 

c = value of ECX, (ECX > 0)

Table 1 lists the latencies with the direction flag (DF) = 0 (increment) and DF = 1. In addition, these latencies are assumed for aligned memory operands . Note that for MOVS/STOS, when DF = 1 (DOWN), the overhead portion of the latency increases significantly. However, these types are less commonly found. The user should use the formula and round up to the nearest integer value to determine the latency.

Guidelines for Repeated String Instructions

 

To help achieve good performance, this section contains

 

guidelines for the careful scheduling of VectorPath repeated

 

string instructions.

Use the Largest

Always move data using the largest operand size possible. For

Possible Operand

example, use REP MOVSD rather than REP MOVSW and REP

Size

MOVSW rather than REP MOVSB. Use REP STOSD rather than

 

REP STOSW and REP STOSW rather than REP MOVSB.

84

Repeated String Instruction Usage

Page 100
Image 100
AMD x86 manual Repeated String Instruction Usage, Latency of Repeated String Instructions