16 | The UltraSPARC T2 Processor with CoolThreads Technology | Sun Microsystems, Inc. |
An
Fetch Cache Pick Decode Execute Mem Bypass W
Fetch | Cache | Pick | Decode | Execute | Fx1 | Fx2 | Fx3 | Fx4 | Fx5 | Fx6 | FW |
|
|
|
|
|
|
|
|
|
|
|
|
Figure 7. UltraSPARC T2
To illustrate how the dual pipelines function, Figure 8 depicts the integer pipeline with the load store unit (LSU). The instruction cache is shared by all eight threads within the core. A
F2
C6
IFU
P0
D2
E0
M3
B1
W2
Thread Group 0
LSU
M4
B1
W6
P5
D7
E6
M4
B7
W6
Thread Group 1
Figure 8. Threads are interleaved between pipeline stages with very few restrictions (integer pipeline shown, letters depict pipeline stages, numbers depict different scheduled threads)
The “pick” stage chooses one thread each cycle within each thread group. Picking within each thread group is independent of the other, and a