AltiVec Vector Processing Unit, Compute Node Asic

Mercury can configure systems with hundreds of compute nodes, communicating over the second-generation RACE++ switch fabric interconnect. Merging RACE++ and AltiVec technology provides embedded computers with unprec- edented computational power.

AltiVec Vector Processing Unit

The AltiVec vector processing unit operates on 128 bits of data concurrently with the other PowerPC execution units. AltiVec instructions may be interleaved with other PowerPC instructions without any penalty such as a context switch. The 128-bit wide execution unit can be used to operate on four floating-point numbers, four 32-bit integers, eight 16-bit integers, or sixteen 8-bit integers simultaneously.

AltiVec instructions are carried out by one of two AltiVec sub-units. The Vector arithmetic logic unit handles the vector fixed-point and vector floating-point operations. Two floating-point operations are possible in a single cycle with the vector multiply-add instruction and the vector negative multiply-subtract instruction.

The Permute sub-unit incorporates a crossbar network to perform 16 individual byte moves in a single cycle. This capability can be used for simple tasks such as converting the "endian-ness" of data or for more complicated tasks such as byte interleaving, dynamic address alignment, or accelerating small look-up tables.

MPC7410 Data

and Instruction Flow

performance cache system provides quick access to data previously loaded from memory but too large to fit into the on-chip cache.

Compute Node ASIC

The CN ASIC, included in each compute node, acts as both a memory controller and as a network interface to the RACE++ switch fabric interconnect. The CN ASIC includes an enhanced DMA controller, a high-performance memory system with error checking and correcting, metering logic, and a RACE++ interface. By combining memory control and network interface into a single chip, Mercury's compute node provides the highest performance with the lowest power consumption and highest reliability.

PowerPC RISC Architecture

In addition to the AltiVec execution unit, the MPC7410 contains a floating-point unit and two integer units that can operate concurrently with the AltiVec unit. Data and instruc- tions are fed through two on-chip, 32-Kbyte, eight-way set-associative caches that enhance performance of both vector and scalar code.

Each PowerPC 7410 CN also includes a fully pipelined backside L2 cache operating at 250 MHz. This high-

Compute Node ASIC Architecture

High-Performance Memory System

Mercury's high-performance memory subsystem allows the memory to reach the intrinsic limits of its performance capability with:

125-MHz Synchronous DRAM

Prefetch Buffers: bring sequential data to the ASIC ahead of their explicit requests by the processor. These prefetch buffers greatly improve the performance of the CN in vec- tor operations such as those used in DSP applications.

FIFO Buffers: efficiently overlap accesses to SDRAM from the local processor and the RACEway interconnect.

The PowerPC CN contains error-correcting circuitry for improved data integrity. One-bit errors are corrected on the fly, and multi-bit errors generate an interrupt error condition.

Enhanced DMA Controller

Each CN has an advanced DMA controller to support RACEway transfers at 267 MB/s with chaining and striding.