254 Microarchitecture for AMDAthlon™ 64 and AMD Opteron™ Processors Appendix A
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
A.7 Fetch-Decode Unit
The fetch-decode unit performs early decoding of AMD64 instructions into macro-ops. The outputs
of the early decoders keep all (DirectPath or VectorPath) instructions in program order. Early
decoding produces three macro-ops per cycle from either path. The outputs of both decoders are
multiplexed together and passed to the next stage in the pipeline, the instruction control unit.
Decoding a VectorPath instruction may prevent simultaneously decoding of a DirectPath instruction.
When the target 16-byte instruction window is obtained from the L1 instruction cache, the instruction
bytes are examined to determine whether the type of basic decode to occur is DirectPath or
VectorPath.
A.8 Instruction Control Unit
The instruction control unit (ICU) is the control center for the AMDAthlon 64 and AMD Opteron
processors. It controls the centralized in-flight reorder buffer, the integer scheduler, and the floating-
point scheduler. In turn, the ICU is responsible for the following functions: macro-op dispatch,
macro-op retirement, register and flag dependency resolution and renaming, execution resource
management, interrupts, exceptions, and branch mispredictions.
The instruction control unit takes the three macro-ops per cycle from the early decoders and places
them in a centralized, fixed-issue reorder buffer. This buffer is organized into 24lines of three macro-
ops each. The reorder buffer allows the instruction control unit to track and monitor up to 72in-flight
macro-ops (whether integer or floating-point) for maximum instruction throughput. The instruction
control unit can simultaneously dispatch multiple macro-ops from the reorder buffer to both the
integer and floating-point schedulers for final decode, issue, and execution as micro-ops. In addition,
the instruction control unit handles exceptions and manages the retirement of macro-ops.
A.9 Translation-Lookaside Buffer
A translation-lookaside buffer (TLB) is a special on-chip cache that holds a table that matches the
most-recently-used virtual addresses to their physical addresses.
The AMD Athlon 64 and AMD Opteron processors utilize a two-level TLB structure. A flush filter—
new on the AMDAthlon 64 and AMD Opteron processors—eliminates unnecessary TLB flushes
when loading the CR3 register.

L1 Instruction TLB Specifications

Table provides the specifications of the L1 instruction TLB for various AMD processors.