22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

Example 2 (Preferred):

05

78

56

34 12

add

eax, 12345678h

;uses single byte

 

 

 

 

 

 

; opcode form

83

C3

FB

 

add

ebx, -5

;uses 8-bit sign

 

 

 

 

 

 

; extended immediate

74

05

 

 

jz

$label1

;uses 1-byte opcode,

 

 

 

 

 

 

; 8-bit immediate

Avoid Partial Register Reads and Writes

In order to handle partial register writes, the AMD Athlon processor execution core implements a data-merging scheme.

In the execution unit, an instruction writing a partial register merges the modified portion with the current state of the remainder of the register. Therefore, the dependency hardware can potentially force a false dependency on the most recent instruction that writes to any part of the register.

Example 1 (Avoid):

MOV

AL,

10

;inst

1

 

 

MOV

AH,

12

;inst

2

has a false dependency

on

 

 

 

;

inst 1

 

 

 

 

 

;inst

2

merges new AH with current

 

 

 

;

EAX

register value forwarded

 

; by inst 1

In addition, an instruction that has a read dependency on any part of a given architectural register has a read dependency on the most recent instruction that modifies any part of the same architectural register.

Example 2 (Avoid):

MOV

BX, 12h

;inst 1

 

MOV

BL, DL

;inst 2,

false dependency on

 

 

; completion of inst 1

MOV

BH, CL

;inst 3,

false dependency on

 

 

; completion of inst 2

MOV

AL, BL

;inst 4,

depends on completion of

 

 

; inst 2

 

Avoid Partial Register Reads and Writes

37

Page 53
Image 53
AMD x86 manual Avoid Partial Register Reads and Writes, Example 2 Avoid