22007E/0 — November 1999

AMD Athlon™ Processor x86 Code Optimization

2-Way, 64-Kbyte Instruction Cache

Predecode

Branch

24-Entry L1 TLB/256-Entry L2 TLB

Cache

Prediction Table

Fetch/Decode

 

 

 

 

3-Way x86 Instruction Decoders

 

Control

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Instruction Control Unit (72-Entry)

Bus

Interface

Unit

Integer Scheduler (18-Entry)

IEU0 AGU0

IEU1 AGU1

IEU2 AGU2

FPU Stack Map / Rename

FPU Scheduler (36-Entry)

FPU Register File (88-Entry)

FADD

FMUL FSTORE

MMX™

MMX

3DNow!™

3DNow!

L2 Cache

Controller

Load / Store Queue Unit

2-Way, 64-Kbyte Data Cache

System Interface

32-Entry L1 TLB/256-Entry L2 TLB

L2 SRAMs

Figure 1. AMD Athlon™ Processor Block Diagram

Instruction Cache

The out-of-order execute engine of the AMD Athlon processor contains a very large 64-Kbyte L1 instruction cache. The L1 instruction cache is organized as a 64-Kbyte, two-way, set-associative array. Each line in the instruction array is 64 bytes long. Functions associated with the L1 instruction cache are instruction loads, instruction prefetching, instruction predecoding, and branch prediction. Requests that miss in the L1 instruction cache are fetched from the backside L2 cache or, subsequently, from the local memory using the bus interface unit (BIU).

The instruction cache generates fetches on the naturally aligned 64 bytes containing the instructions and the next sequential line of 64 bytes (a prefetch). The principal of program spatial locality makes data prefetching very effective and avoids or reduces execution stalls due to the amount of time wasted reading the necessary data . Ca che line

AMD Athlon™ Processor Microarchitecture

131

Page 147
Image 147
AMD x86 manual AMD Athlon Processor Microarchitecture 131