Software Optimization Guide for AMD64 Processors | 25112 Rev. 3.06 September 2005 |
A.7 Fetch-Decode Unit
The fetch-decode unit performs early decoding of AMD64 instructions into macro-ops. The outputs of the early decoders keep all (DirectPath or VectorPath) instructions in program order. Early decoding produces three macro-ops per cycle from either path. The outputs of both decoders are multiplexed together and passed to the next stage in the pipeline, the instruction control unit. Decoding a VectorPath instruction may prevent simultaneously decoding of a DirectPath instruction.
When the target 16-byte instruction window is obtained from the L1 instruction cache, the instruction bytes are examined to determine whether the type of basic decode to occur is DirectPath or VectorPath.
A.8 Instruction Control Unit
The instruction control unit (ICU) is the control center for the AMD Athlon 64 and AMD Opteron processors. It controls the centralized in-flight reorder buffer, the integer scheduler, and the floating- point scheduler. In turn, the ICU is responsible for the following functions: macro-op dispatch, macro-op retirement, register and flag dependency resolution and renaming, execution resource management, interrupts, exceptions, and branch mispredictions.
The instruction control unit takes the three macro-ops per cycle from the early decoders and places them in a centralized, fixed-issue reorder buffer. This buffer is organized into 24 lines of three macro- ops each. The reorder buffer allows the instruction control unit to track and monitor up to 72 in-flight macro-ops (whether integer or floating-point) for maximum instruction throughput. The instruction control unit can simultaneously dispatch multiple macro-ops from the reorder buffer to both the integer and floating-point schedulers for final decode, issue, and execution as micro-ops. In addition, the instruction control unit handles exceptions and manages the retirement of macro-ops.
A.9 Translation-Lookaside Buffer
A translation-lookaside buffer (TLB) is a special on-chip cache that holds a table that matches the most-recently-used virtual addresses to their physical addresses.
The AMD Athlon 64 and AMD Opteron processors utilize a two-level TLB structure. A flush filter— new on the AMD Athlon 64 and AMD Opteron processors—eliminates unnecessary TLB flushes when loading the CR3 register.
L1 Instruction TLB Specifications
Table provides the specifications of the L1 instruction TLB for various AMD processors.