Analog Devices ADSP-TS201S Dual Compute Blocks, Data Alignment Buffer DAB, Dual Integer ALU Ialu

Page 4

ADSP-TS201S

The TigerSHARC DSP uses a Static SuperscalarTMarchitecture. This architecture is superscalar in that the ADSP-TS201S pro- cessor’s core can execute simultaneously from one to four 32-bit instructions encoded in a very large instruction word (VLIW) instruction line using the DSP’s dual compute blocks. Because the DSP does not perform instruction re-ordering at runtime— the programmer selects which operations will execute in parallel prior to runtime—the order of instructions is static.

With few exceptions, an instruction line, whether it contains one, two, three, or four 32-bit instructions, executes with a throughput of one cycle in a 10-deep processor pipeline.

For optimal DSP program execution, programmers must follow the DSP’s set of instruction parallelism rules when encoding an instruction line. In general, the selection of instructions that the DSP can execute in parallel each cycle depends on the instruc- tion line resources each instruction requires and on the source and destination registers used in the instructions. The program- mer has direct control of three core components—the IALUs, the compute blocks, and the program sequencer.

The ADSP-TS201S processor, in most cases, has a two-cycle execution pipeline that is fully interlocked, so—whenever a computation result is unavailable for another operation depen- dent on it—the DSP automatically inserts one or more stall cycles as needed. Efficient programming with dependency-free instructions can eliminate most computational and memory transfer data dependencies.

In addition, the ADSP-TS201S processor supports SIMD opera- tions two ways—SIMD compute blocks and SIMD computations. The programmer can load both compute blocks with the same data (broadcast distribution) or different data (merged distribution).

DUAL COMPUTE BLOCKS

The ADSP-TS201S processor has compute blocks that can exe- cute computations either independently or together as a single- instruction, multiple-data (SIMD) engine. The DSP can issue up to two compute instructions per compute block each cycle, instructing the ALU, multiplier, shifter, or CLU to perform independent, simultaneous operations. Each compute block can execute eight 8-bit, four 16-bit, two 32-bit, or one 64-bit SIMD computations in parallel with the operation in the other block. These computation units support IEEE 32-bit single-precision floating-point, extended-precision 40-bit floating point, and 8-, 16-, 32-, and 64-bit fixed-point processing.

The compute blocks are referred to as X and Y in assembly syn- tax, and each block contains four computational units—an ALU, a multiplier, a 64-bit shifter, a 128-bit CLU—and a 32- word register file.

Register File—each compute block has a multiported 32- word, fully orthogonal register file used for transferring data between the computation units and data buses and for

Static Superscalar is a trademark of Analog Devices, Inc.

storing intermediate results. Instructions can access the registers in the register file individually (word-aligned), in sets of two (dual-aligned), or in sets of four (quad-aligned).

ALU—the ALU performs a standard set of arithmetic oper- ations in both fixed- and floating-point formats. It also performs logic operations.

Multiplier—the multiplier performs both fixed- and float- ing-point multiplication and fixed-point multiply and accumulate.

Shifter—the 64-bit shifter performs logical and arithmetic shifts, bit and bit stream manipulation, and field deposit and extraction operations.

Communications Logic Unit (CLU)—this 128-bit unit pro- vides trellis decoding (for example, Viterbi and Turbo decoders) and executes complex correlations for CDMA communication applications (for example, chip-rate and symbol-rate functions).

Using these features, the compute blocks can:

Provide 8 MACS per cycle peak and 7.1 MACS per cycle sustained 16-bit performance and provide 2 MACS per cycle peak and 1.8 MACS per cycle sustained 32-bit perfor- mance (based on FIR)

Execute six single-precision floating-point or execute 24 fixed-point (16-bit) operations per cycle, providing

3.6G FLOPS or 14.4G/s regular operations performance at

600 MHz

Perform two complex 16-bit MACS per cycle

Execute eight trellis butterflies in one cycle

DATA ALIGNMENT BUFFER (DAB)

The DAB is a quad-word FIFO that enables loading of quad- word data from nonaligned addresses. Normally, load instruc- tions must be aligned to their data size so that quad words are loaded from a quad-aligned address. Using the DAB signifi- cantly improves the efficiency of some applications, such as FIR filters.

DUAL INTEGER ALU (IALU)

The ADSP-TS201S processor has two IALUs that provide pow- erful address generation capabilities and perform many general- purpose integer operations. The IALUs are referred to as J and K in assembly syntax and have the following features:

Provide memory addresses for data and update pointers

Support circular buffering and bit-reverse addressing

Perform general-purpose integer operations, increasing programming flexibility

Include a 31-word register file for each IALU

As address generators, the IALUs perform immediate or indi- rect (pre- and post-modify) addressing. They perform modulus and bit-reverse operations with no constraints placed on mem- ory addresses for the modulus data buffer placement. Each IALU can specify either a single-, dual-, or quad-word access from memory.

Rev. C Page 4 of 48 December 2006

Image 4
Contents ADSP-TS201S ADSP-TS201S General-Purpose Algorithm Benchmarks at 600 MHz ClockBenchmark Speed Cycles FIR filter per real tap 83 nsData Alignment Buffer DAB Dual Compute BlocksDual Integer ALU Ialu Program Sequencer DSP MemoryInterrupt Controller Flexible Instruction SetExternal Port OFF-CHIP MEMORY/PERIPHERALS Interface Internal SpaceDMA Controller Host InterfaceMultiprocessor Interface Sdram ControllerDMA controller provides these additional features Timer and GENERAL-PURPOSE I/O Reset and BootingNo Boot, Run from Memory Addresses Link Ports LvdsFiltering Reference Voltage and Clocks Power DomainsDevelopment Tools Evaluation KIT Additional InformationPin Definitions-Clocks and Reset Signal Type Term DescriptionSclk Ratio RatioPin Definitions-External Port Bus Controls ACK T/ODSignal Type Pin Definitions-External Port ArbitrationPin Definitions-External Port DMA/Flyby DSP performs DMA transfers according to the DMASample the data instead of the TigerSHARC MakesPin Definitions-External Port Sdram Controller LdqmHdqm SDA10Pin Definitions-Flags, Interrupts, and Timer Pin Definitions-JTAG Port Signal Type Term DescriptionPin Definitions-Link Ports CONTROLIMP0CONTROLIMP1 DS1Pin Definitions-Power, Ground, and Reference Driver ModeDS2-0 Drive Output Pins Strength Impedance Impedance Control SelectionType at Signal Reset Pin Definitions-I/O Strap PinsPin Rstin = Operating Conditions SclkvrefElectrical Characteristics Maximum Duty Cycle for Input Transient VoltageMaximum Duty VIN Max VIN Min Cycle2Package Information ESD SensitivityAbsolute Maximum Ratings Package Brand InformationTiming Specifications AC Asynchronous Signal SpecificationsGeneral AC Timing Reference Clocks-Core Clock Cclk Cycle TimeReference Clocks-System Clock Sclk Cycle Time Reference Clocks-JTAG Test Clock TCK Cycle TimeSclkrat = 5⋅, 7⋅ Parameter Description Min Max Unit Parameter Description Min Max UnitPower-Up Timing1 Power-Up Reset TimingNormal Reset Timing On-Chip Dram Refresh1AC Signal Specifications OutputDisable MaxDS2-0 Static Pins-Must Be Constant Strap PinsJtag SYS 11 Jtag System Pins +2.5 +10.0 +12.0 Strap SYS 9Link Port Lvds Transmit Electrical Characteristics Link Port Lvds Receive Electrical CharacteristicsParameter Description Test Conditions Min Max Unit VODLink Port-Data Out Timing Parameter Description Min Max UnitLink Ports-Output Clock Link Ports-Transmission End and Stops Link Port-Data In Timing LxBCMPI Hold FigureLink Ports-Data Input Setup and Hold1 Output Drive Currents Typical Drive Currents at StrengthOutput Disable Time Test ConditionsOutput Enable Time Capacitive LoadingTimes Andfall Rise Fall Time Thermal Characteristics Thermal Characteristics for 25 mm × 25 mm PackageEnvironmental Conditions Parameter Condition Typical UnitBall Bgaed PIN Configurations Ball No Signal Name Ball 25 mm × 25 mm Bgaed Ball AssignmentsSdcke SCLKRAT1 L0ACKOL0DATI1N L0DATI3NDS2 Enedreg TCK ID2 TDI TMR0EDS1 CONTROLIMP1 TDO FLAG3 L1CLKINNSurface Mount Design BGA Data for Use with Surface Mount DesignPackage Ball Attach Type Solder Mask Opening Ball Pad Size Ball Bgaed Nonsolder Mask Defined Nsmd Mm diameter BP-576Ordering Guide Temperature Instruction On-Chip Package Model Range1 Rate2Operating Voltage Option Description Rev. C Page 47 of 48 December Rev. C Page 48 of 48 December