Technical Reference Guide

Figure 3-1 illustrates the internal architecture of the Pentium 4 processor.

 

 

 

 

 

 

Pentium 4 Processor

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Branch

 

 

 

Execution

 

128-bit

 

Prediction

 

 

 

Trace Cache

 

 

 

 

 

 

Integer

 

 

 

 

CPU

 

 

FPU

 

 

 

 

 

 

 

 

Rapid Exe. Eng.

 

Out-of-

 

 

 

FSB

 

 

ALUs

 

 

Order Core

 

 

 

 

 

 

I/F

 

 

 

 

 

 

 

 

 

L1

Data

Cache

256-KB

8-Way

L2

Adv.

Transfer

Cache

ALU Speed: Core speed x2

Core Speed: 1.4, 1.5, 2.0, 2.2 GHz

FSB Speed: 400 MHz (effective data transfer rate)

Figure 3–2.Pentium 4 Processor Internal Architecture

The Pentium 4 increases processing speed with higher clock speeds made possible with hyper- pipelined technology that can handle significantly more instructions at a time. Since branch mis- predicts would result in serious performance hits with such a long pipeline, the Pentium 4 features a branch prediction mechanism improved with the addition of an execution trace cache and a refined prediction algorithm. The execution trace cache can store 12k micro-ops (decoded instructions dealing with branching sequences) that are checked when re-occurring branches are processed. Code that is not executed (bypassed) is no longer stored in the L1 cache as was the case in the Pentium III.

The out-of-order core features Advanced Dynamic Execution, which provides a large window (126 instructions) for execution units to work with. A more accurate branch prediction algorithm, along with a larger (4-KB) branch target buffer that stores more details on branch history results in a 33% reduction in branch mis-predictions over the Pentium III.

The L1 data cache features a low-latency design for minimum response to cache hits. The 256-KB advanced transfer L2 cache features a 256-bit (32-byte) interface operating at processing speed. The L2 cache of the 1.5 GHz Pentium 4 can therefore provide a transfer rate of 48 GB/s.

The combined improvements of the Pentium 4’s CPU core the rapid execution engine’s ALUs to operate at twice the processing frequency to handle the steady stream of instructions.

The front side bus (FSB) of the Pentium 4 uses a 100-MHz clock but provides bi- and quad- pumped transfers through the use of 200- and 400-MHz strobes. The Pentium 4 can transfer a complete 64-byte cache line in two 100-MHz bus cycles for a throughput rate of 3.2 GB/s. Address information is transferred at a 200-MHz rate.

Compaq Evo and Workstation Personal Computer 3-3

Featuring the Intel Pentium 4 Processor

Second Edition - January 2003

Page 49
Image 49
Compaq W4000 manual Cpu Fpu, Fsb