NEC 1000 Series, 5800 Series, 1080Rf VLC Architecture, Dedicated Cache Coherency Interface CCI

Page 5

VLC Architecture

High-speed / low latency Intra-Cell cache-to-cache data transfer

The Express5800/1000 series server implements the VLC architecture, which allows for low latency cache-to-cache data transfer between multiple CPUs within a cell.

In a split BUS architecture, for a cache- to-cache data transfer to take place, the data must be passed through a chipset. However, in the VLC architecture, data within the cache memory can

be accessed directly by one another, bypassing the chipset. This allows for lower latency between the cache memory, which results in faster data transfers.

Very Large Cache (VLC) Architecture

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Increased enterprise

 

CPU

 

CPU

 

 

CPU

 

CPU

 

 

 

 

 

applications

 

Cache

 

Cache

 

 

Cache

 

Cache

 

Memory

 

Memory

 

 

Memory

 

Memory

performance through

 

 

 

 

 

 

 

 

 

 

 

 

 

 

reduced cache memory

 

 

 

 

 

 

 

 

 

 

 

 

 

 

access latency

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Memory

 

 

 

chipset

 

 

Direct CPU-to-CPU transfers

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FSB

 

Intel® Itanium® 2 processor

 

 

 

 

 

 

 

 

 

 

 

 

(Madison : L3 9MB)

 

 

 

 

 

 

 

 

Latency

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

High-speed

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cache-to-cache

 

 

 

 

L3 of other CPU

 

 

 

 

transfers

 

CPU

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Cache

 

Cache

Cache

 

 

 

 

 

 

 

 

 

 

 

L3

 

Memory

Memory

Memory

 

 

Data Size

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dual-Core Intel® Itanium® processor

 

 

 

 

 

 

 

 

(Montvale : L3 24MB)

 

 

 

 

 

 

 

 

Latency

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

L3 of other CPU

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CPU

 

 

 

 

 

 

 

 

 

 

 

 

Cache

 

Cache

Cache

 

 

 

 

 

 

 

L3

 

Memory

 

Memory

Memory

 

 

Data Size

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Split BUS Architecture

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Higher cache memory

 

 

CPU

 

CPU

 

CPU

 

CPU

 

access latency.

 

 

 

 

 

 

Non-uniform

 

 

Cache

 

Cache

 

Cache

 

Cache

 

 

cache-to-cache data

 

 

Memory

 

Memory

 

Memory

 

Memory

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

transfer.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Inconsistent

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

performance.

 

 

Memory

 

Data transfer controller

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

chipset

 

Overhead

from transferring

Intel® Itanium® 2 processor

 

 

 

 

 

 

 

data

through the chipset.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(Madison : L3 9MB)

 

 

 

 

 

 

 

 

 

 

FSB

chipset

FSB

 

Latency

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

L3 of other

Latency

 

 

L3 of

CPU on

degradation

 

other CPU on

different FSB

(approx 3x)

 

same FSB

 

 

 

This area increases

CPU

Cache

Cache

Cache

 

due to the increase in

L3

Memory Memory Memory

Data Size

cache size and

 

 

 

 

Dual-Core Intel® Itanium® processor

 

higher latency

(Montvale : L3 24MB)

 

 

 

Latency

 

 

 

 

 

 

 

 

 

Higher

 

L3 of other CPU on

L3 of other CPU

latency

 

different FSB

(approx 3x)

on same FSB

 

 

 

 

CPU

Cache

Cache

Cache

 

L3

Memory

Memory

Memory

Data Size

 

 

 

 

This image does not depict actual numbers

Dedicated Cache Coherency Interface (CCI)

High-speed / low latency Inter-Cell cache-to-cache data transfer

Another technology implemented in the Express5800/1000 series server to improve cache-to-cache data transfer is the Cache Coherency Interface (CCI). CCI, the inter-Cell counterpart of the VLC architecture, allows for a lower latency cache-to-cache data transfer between Cells.

Information containing the location and state of cached data is required for the CPU to access the specific data stored in cache memory. By accessing the cache memory according to this information, the CPU is able to retrieve the desired data.

Two main mechanisms exist for cache-to-cache data transfer between Cells, directory based and TAG based cache coherency. The cache information, described above, is stored in external memory (DIR memory) for the directory based, and within the chipset for the TAG based mechanisms.

The benefit of the TAG based mechanism, thus implemented in the Express5800/1000 series server, is that by accessing the TAG, unnecessary inquiries to the cache memory are filtered for a smoother transfer of data. Furthermore, the Express5800/1000 series server includes a dedicated high-speed cache coherency interface (CCI) which is used to connect the Cells directly to one another without using a crossbar. This interface is used for broadcasting and other cache coherency transactions to allow for even faster cache-to-cache data transfer.

Tag Based Cache Coherency

A3 Chipset

 

 

 

 

Performance

Request is broadcasted to all CPU

 

 

CPU

chip

chip

CPU

chip

chip

CPU

increase with

 

 

set

set

set

set

the A3 chipset

simultaneously

 

 

 

 

 

 

TAG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CPU

CPU CPU CPU

CPU

CPU CPU CPU

CPU

CPU CPU CPU

Directory Based Cache Coherency

 

 

 

 

 

 

 

 

 

 

 

 

chip

Memory

chip

Memory

chip

Memory

CPU

chip

chip

Memory

chip

chip

CPU

chip

chip

CPU

set

set

set

set

set

set

set

set

set

 

 

 

 

TAG

TAG

TAG

 

 

 

DIR

 

 

 

 

 

 

In a directory based system, the requestor CPU will first access the external memory to confirm the location of the cached data, and then will access the appropriate cache memory. On the other hand, in a TAG based system, the requestor CPU broadcasts a request to all other cache simultaneously via TAG.

The Express5800/1000 Series server implements a dedicated connection (CCI) for snooping

Directory Based Cache Coherency

Access Directory to confirm the location of the data first, then access the appropriate cache memory

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

chip

 

Memory

chip

 

Memory

chip

 

Memory

set

 

set

 

set

 

 

 

 

 

 

 

 

 

 

 

 

 

DIR

 

 

 

DIR

 

 

 

DIR

CPU

CPU

Memory

TAG

DIR

CPU requesting the information

CPU storing the newest information

Memory that is storing location regarding the memory

TAG memory (Manages cache line information for all of the CPUs loaded on a CELL card)

DIR Memory (Manages cache line information for all of the memory loaded on a CELL card)

Crossbar-less configuration

Improved data transfer latency through direct attached Cell configuration

Within the Express5800/1000 series server lineup, the 1080Rf has been able to lower the data transfer latency by removing the crossbar and directly connecting Cell to Cell, and Cell to PCI box.

Even with the crossbar-less configuration, virtualization of the Cell card and I/O box has been retained as not to diminish computing and I/O resources.

5

Image 5
Contents NEC Express5800/1000 Series Very Large Cache VLC Architecture Crossbar-less configuration Available only on 1080RfDedicated Cache Coherency Interface CCI Resource virtualization through Floating IODual-Core Intel Itanium processor CellA3 Chipset Intel Itanium processor supported compiler Features for performance improvementCompiler VLC Architecture Crossbar-less configurationDedicated Cache Coherency Interface CCI Directory Based Cache CoherencyRAS Design Philosophy Framework for hardware, firmware and OS error handlingPartial Chipset degradation Memory MirroringComplete modularization and redundancy Highly Available Center PlaneSubstantial strengthening of data integrity Diagnostics of the error detection circuitsExpress5800/1000 Series Cell cardTwo independent power sources Enhanced error detection of the high-speed interconnectRealization of a mainframe-class platform serviceability InternetInvestment Protection Superior standard chassis configurationResource virtualization through floating I/O Multi OS support / Rich application lineupL1 Cache/core 16KB I / 16KB D Processor Dual Core Intel Itanium processorL2 Cache/core 1MB I / 256KB D L3 Cache/core 12MB

5800 Series, 1000 Series, 1320Xf/1160Xf, 1080Rf specifications

The NEC 1080Rf, 1000 Series, 1320Xf/1160Xf, and 5800 Series are advanced solutions designed to deliver superior performance, efficiency, and versatility in a variety of applications. These models incorporate cutting-edge technologies, making them suitable for diverse usage scenarios, including professional broadcasting, industrial applications, and high-demand environments.

The NEC 1080Rf series is notable for its exceptional resolution and image quality, offering full HD capabilities that ensure clarity and sharpness in every frame. Its robust construction allows it to perform reliably in challenging conditions, making it ideal for outdoor events and installations. With a focus on energy efficiency, the 1080Rf series employs innovative technologies that reduce power consumption while maintaining high brightness and contrast levels.

The 1000 Series represents a versatile line of displays suitable for various settings, from corporate environments to retail spaces. Key features include a range of sizes, ensuring flexibility to meet specific requirements. Its user-friendly interface allows for easy operation and management, streamlining the display setup process. Advanced connectivity options, including HDMI and DisplayPort, facilitate seamless integration with other devices and systems, enhancing interoperability.

The 1320Xf and 1160Xf models are renowned for their superior display performance and color accuracy. These models utilize cutting-edge LED technology, which ensures vibrant colors and enhanced brightness. The 1320Xf model stands out for its ability to support higher resolutions, making it particularly suitable for applications that require detailed imagery, such as medical imaging and high-end graphics presentations. Both models feature exceptional durability and reliability, catering to the needs of intensive use environments.

The 5800 Series is designed for professional-grade applications, featuring high-resolution displays that excel in demanding visual scenarios. This series is equipped with advanced calibration technologies, ensuring color precision and consistency across all outputs. Moreover, the 5800 Series is designed with a robust thermal management system, extending its lifespan and guaranteeing consistent performance even in high-usage situations.

Together, these cutting-edge series exemplify NEC's commitment to innovation, providing tailored solutions across various sectors. With a focus on performance, reliability, and energy efficiency, they are built to meet the needs of today's dynamic and evolving market. Whether for corporate, industrial, or creative applications, NEC's displays offer the technological edge necessary to elevate visual communications to new heights.