Philips TMS320C6713 manual Two-Level Cache, Cache Overview, Cache Hides Off-Chip Latency

Page 6

SPRA921

Table 1. C6713 Benchmark Performance

Algorithm

Description

Parameter Values

Cycles

Time

Biquad filter

nx input/output cycles

nx = 60

316

1.4

s

(IIR filter direct form II)

 

nx = 90

436

1.9

s

Real FIR filter

nh coefficients

nh = 24

802

3.6

s

 

nr output samples

nr = 64

 

 

 

 

 

nh = 30,

 

3.5

 

 

 

nr = 50

795

s

IIR filter

nr number of output samples

nr = 64

443

2.0

s

IIR lattice filter

nr number of samples

nk = 10,

4125

18.3 s

 

nk number of reflection coefficients

nr = 100

 

 

 

Dotproduct

nx number of values

nx = 512

281

1.2

s

 

 

 

 

 

 

3 Two-Level Cache

3.1Cache Overview

The TMS320C6713 device utilizes a highly efficient two-level real-time cache for internal program and data storage. The cache delivers high performance without the cost of large arrays of on-chip memory. The efficiency of the cache makes low cost, high-density external memory, such as SDRAM, as effective as on-chip memory.

The first level of the memory architecture has dedicated 4K Byte instruction and data caches, L1I and L1D respectively. The LII is direct-mapped where as the L1D provides 2-way associativity to handle multiple types of data. The second level (L2) consists of a total of 256K bytes of memory. 64K bytes of this can be configured in one of five ways:

64K 4-way associative cache

48K 3-way associative cache, 16K mapped RAM

32K 2-way associative cache, 32K mapped RAM

16K direct mapped associative cache, 48K mapped RAM

64K Mapped RAM

Dedicated L1 caches eliminate conflicts for the memory resources between the program and data busses. A unified L2 memory provides flexible memory allocation between program and data for accesses that do not reside in L1.

3.2Cache Hides Off-Chip Latency

The external memories that interface to the TMS320C6713 may operate at a maximum of

100 MHz, while the device operates at a 225 MHz maximum frequency. All external memory devices have significant start-up latencies associated with them. For example, SDRAMs typically have a read latency of 2-4 bus cycles. The reduced frequency and additional latency of memories would normally significantly degrade processor performance. There is a significant reduction in latency for retrieving data from on-chip L2 memory than from an external memory. By having the intermediate L2 cache, this latency is hidden from the user. Using the fast L2 memories to cache the slower external memories reduces the latency of external accesses by a factor of five.

6TMS320C6713 Digital Signal Processor Optimized for High Performance Multichannel Audio Systems

Image 6
Contents Contents Introduction AbstractList of Figures IntroductionSystem I/O List of TablesDigital Surround Receiver Block Diagram Generalized High Performance Multichannel Audio System Fixed and Floating Point Instruction Set C67x CPU and Instruction SetFunctional Units Load/Store ArchitectureCache Hides Off-Chip Latency Two-Level CacheCache Overview C6713 Benchmark PerformanceReal Time Features Unified L2 for Program and DataCache Summary TDM Synchronous Transfer ModeMcASP McASP OverviewDIT Transfer Mode McASP clock generatorsMcASP Error Handling and Management ConclusionMcASP Summary References Important Notice