VLC Architecture
The Express5800/1000 series server implements the VLC architecture, which allows for low latency
In a split BUS architecture, for a cache-
be accessed directly by one another, bypassing the chipset. This allows for lower latency between the cache memory, which results in faster data transfers.
Very Large Cache (VLC) Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
| Increased enterprise |
| CPU |
| CPU |
|
| CPU |
| CPU | ||||||
|
|
|
|
| applications | |||||||||
| Cache |
| Cache |
|
| Cache |
| Cache | ||||||
| Memory |
| Memory |
|
| Memory |
| Memory | performance through | |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| reduced cache memory |
|
|
|
|
|
|
|
|
|
|
|
|
|
| access latency |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Memory |
|
|
| chipset |
|
| Direct | |||||||||
|
|
|
|
|
| |||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| FSB |
| |
Intel® Itanium® 2 processor |
|
|
|
| ||||||||||||||
|
|
|
|
|
|
|
| |||||||||||
(Madison : L3 9MB) |
|
|
|
|
|
|
|
| ||||||||||
Latency |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||||
|
|
|
| L3 of other CPU |
|
|
|
| transfers | |||||||||
| CPU |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
| Cache |
| Cache | Cache |
|
|
|
|
|
|
|
|
|
| |||
| L3 |
| Memory | Memory | Memory |
|
| Data Size |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
| |||||||||||
(Montvale : L3 24MB) |
|
|
|
|
|
|
|
| ||||||||||
Latency |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
| L3 of other CPU |
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||
|
| CPU |
|
|
|
|
|
|
|
|
| |||||||
|
|
| Cache |
| Cache | Cache |
|
|
|
| ||||||||
|
|
| L3 |
| Memory |
| Memory | Memory |
|
| Data Size | |||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Split BUS Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Higher cache memory | ||
|
| CPU |
| CPU |
| CPU |
| CPU |
| access latency. | |||||||||||
|
|
|
|
|
| ||||||||||||||||
|
| Cache |
| Cache |
| Cache |
| Cache |
|
| |||||||||||
|
| Memory |
| Memory |
| Memory |
| Memory |
|
| |||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| transfer. | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Inconsistent | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| performance. | ||
|
| Memory |
| Data transfer controller |
|
|
|
|
|
|
|
|
|
| |||||||
|
|
|
|
|
|
| chipset |
| Overhead | from transferring | |||||||||||
Intel® Itanium® 2 processor |
|
|
|
|
|
|
| data | through the chipset. | ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||||||
(Madison : L3 9MB) |
|
|
|
|
|
|
|
|
|
| FSB | chipset | FSB |
| |||||||
Latency |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| L3 of other | Latency |
| |
| L3 of | CPU on | degradation |
| |
other CPU on | different FSB | (approx 3x) |
| ||
same FSB |
|
|
| This area increases | |
CPU | Cache | Cache | Cache |
| due to the increase in |
L3 | Memory Memory Memory | Data Size | cache size and | ||
|
|
|
| ||
| higher latency | ||||
(Montvale : L3 24MB) |
|
|
| ||
Latency |
|
|
|
|
|
|
|
|
| Higher |
| L3 of other CPU on | |||
L3 of other CPU | latency | |||
| different FSB | (approx 3x) | ||
on same FSB |
|
|
|
|
CPU | Cache | Cache | Cache |
|
L3 | Memory | Memory | Memory | Data Size |
|
|
|
|
This image does not depict actual numbers
Dedicated Cache Coherency Interface (CCI)
Another technology implemented in the Express5800/1000 series server to improve
Information containing the location and state of cached data is required for the CPU to access the specific data stored in cache memory. By accessing the cache memory according to this information, the CPU is able to retrieve the desired data.
Two main mechanisms exist for
The benefit of the TAG based mechanism, thus implemented in the Express5800/1000 series server, is that by accessing the TAG, unnecessary inquiries to the cache memory are filtered for a smoother transfer of data. Furthermore, the Express5800/1000 series server includes a dedicated
Tag Based Cache Coherency | A3 Chipset |
|
|
|
| Performance | ||||||||||
Request is broadcasted to all CPU |
|
| CPU | chip | chip | CPU | chip | chip | CPU | increase with | ||||||
|
| set | set | set | set | the A3 chipset | ||||||||||
simultaneously |
|
|
|
|
|
| TAG |
|
|
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
| ||||
CPU | CPU CPU CPU | CPU | CPU CPU CPU | CPU | CPU CPU CPU | Directory Based Cache Coherency |
|
|
| |||||||
|
|
|
|
|
|
|
|
| ||||||||
chip | Memory | chip | Memory | chip | Memory | CPU | chip | chip | Memory | chip | chip | CPU | chip | chip | CPU | |
set | set | set | set | set | set | set | set | set | ||||||||
|
|
|
| |||||||||||||
TAG | TAG | TAG |
|
|
| DIR |
|
|
|
|
|
|
In a directory based system, the requestor CPU will first access the external memory to confirm the location of the cached data, and then will access the appropriate cache memory. On the other hand, in a TAG based system, the requestor CPU broadcasts a request to all other cache simultaneously via TAG.
The Express5800/1000 Series server implements a dedicated connection (CCI) for snooping
Directory Based Cache Coherency
Access Directory to confirm the location of the data first, then access the appropriate cache memory
CPU | CPU | CPU | CPU | CPU | CPU | CPU | CPU | CPU | CPU | CPU | CPU |
chip |
| Memory | chip |
| Memory | chip |
| Memory | |||
set |
| set |
| set |
| ||||||
|
|
|
|
|
|
|
|
| |||
|
|
| DIR |
|
|
| DIR |
|
|
| DIR |
CPU
CPU
Memory
TAG
DIR
CPU requesting the information
CPU storing the newest information
Memory that is storing location regarding the memory
TAG memory (Manages cache line information for all of the CPUs loaded on a CELL card)
DIR Memory (Manages cache line information for all of the memory loaded on a CELL card)
Crossbar-less configuration
Improved data transfer latency through direct attached Cell configuration
Within the Express5800/1000 series server lineup, the 1080Rf has been able to lower the data transfer latency by removing the crossbar and directly connecting Cell to Cell, and Cell to PCI box.
Even with the
5