SPRA921

2 C67x CPU and Instruction Set

The TMS320C6713 floating-point digital signal processor uses the C67x VelociTI advanced very-long instruction words (VLIW) CPU. The CPU fetches (256 bits wide) to supply up to eight 32-bit instructions to the eight functional units during every clock cycle. The VelociTI VLIW architecture also features variable-length execute packets; these variable-length execute packets are a key memory-saving feature, distinguishing the C67x CPU from other VLIW architectures.

Operating at 225 MHz, the TMS320C6713 delivers up to 1350 million floating-point operations per second (MFLOPS), 1800 million instructions per second (MIPS), and with dual fixed-floating-point multipliers up to 450 million multiply-accumulate operations per second (MMACS).

2.1Functional Units

The CPU features eight of functional units supported by 32 32-bit general purpose registers. This data path is divided into two symmetric sides consisting of 16 registers and 4 functional units each. Additionally, each side features a data bus connected to all the registers on the other side, by which the two sets of functional units can access data from the register files on the opposite side.

2.2Fixed and Floating Point Instruction Set

The C67x CPU executes the C62x integer instruction set. In addition, the C67x CPU natively supports IEEE 32-bit single precision and 64-bit double precision floating point. In addition to C62x fixed-point instructions, six out of the eight functional units also execute floating-point instructions: two multipliers, two ALUs, and two auxiliary floating point units. The remaining two functional units support floating point by providing address generation for the 64-bit loads the C67x CPU adds to the C62x instruction set. This provides 128-bits of data bandwidth per cycle. This double-word load capability allows multiple operands to be loaded into the register file for 32-bit floating point instructions. Unlike other floating point architectures the C67x had independent control of the its two floating point multipliers and its two the floating point ALUs. This enables the CPU to operate on a broader mix of floating point algorithms rather than to be tied to the typical multiply-accumulate oriented functions.

2.3Load/Store Architecture

Another key feature of the C67x CPU is the load/store architecture, where all instructions operate on registers (as opposed to directly on data in memory). Two sets of data-addressing units are responsible for all data transfers between the register files and the memory. The data address driven by the .D units allows data addresses generated from one register file to be used to load or store data to or from the other register file.

2.4Benchmark Performance

Table 1 shows the TMSC32067x CPU floating-point benchmark performance of some algorithms commonly used in audio applications. The times for each benchmark are listed for a 225 MHz C6713 CPU.

TMS320C6713 Digital Signal Processor Optimized for High Performance Multichannel Audio Systems

5

Page 5
Image 5
Philips TMS320C6713 manual C67x CPU and Instruction Set, Functional Units, Fixed and Floating Point Instruction Set