22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

4

Instruction Decoding

Optimizations

This chapter discusses ways to maximize the number of instructions decoded by the instruction decoders in the AMD Athlon™ processor. Guidelines are listed in order of importance.

Overview

The AMD Athlon processor instruction fetcher reads 16-byte aligned code windows from the instruction cache. The instruction bytes are then merged into a 24-byte instruction queue. On each cycle, the in-order front-end engine selects for decode up to three x86 instructions from the instruction-byte queue.

All instructions (x86, x87, 3DNow!™, and MMX™) are classified into two types of decodes — DirectPath and VectorPath (see “DirectPath Decoder” and “VectorPath Decoder” on page 133 for more information). DirectPath instructions are common instructions that are decoded directly in hardware. VectorPath instructions are more complex instructions that require the use of a sequence of multiple operations issued from an on-chip ROM.

Up to three DirectPath instructions can be selected for decode per cycle. Only one VectorPath instruction can be selected for decode per cycle. DirectPath instructions and VectorPath instructions cannot be simultaneously decoded.

Overview

33

Page 49
Image 49
AMD x86 manual Instruction Decoding Optimizations, Overview