25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

4.7Partial-Register Reads and Writes

Optimization

Avoid partial register reads and writes.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

In order to handle partial register writes, the processor’s execution core implements a data merging scheme.

In the execution unit, an instruction that writes part of a register merges the modified portion with the current state of the other part of the register. Therefore, the dependency hardware can potentially force a false dependency on the most recent instruction that writes to any part of the register.

In addition, an instruction that has a read dependency on any part of a given architectural register has a read dependency on the most recent instruction that modifies any part of the same architectural register.

Example 1

Avoid code such as the following, which writes to only part of a register:

mov al, 10 ; Instruction 1

mov ah, 12 ; Instruction 2 has a false dependency on instruction 1.

;Instruction 2 merges new AH with current EAX register

;value forwarded by instruction 1.

Example 2

Avoid code such as the following, which both reads and writes only parts of registers:

mov

bx,

12h

;

Instruction

1

 

mov

bl,

dl

;

Instruction

2

has a false dependency on the completion

;of instruction 1.

mov

bh,

cl

; Instruction

3

has a false dependency on

the completion

 

 

 

 

;

of instruction 2.

 

 

mov

al,

bl

;

Instruction

4

depends on the completion

of instruction

2.

Chapter 4

Instruction-Decoding Optimizations

81

Page 97
Image 97
AMD 250 manual Partial-Register Reads and Writes