Motorola SC140 user manual Multi-Sample Exercise, = ∑ a i x n For 0 ≤ n L

Page 18

Compiler Support on StarCore

Good To Know

The use of four variables removes the accumulation dependency that is required for parallelism.

Bit exact considerations must be understood if this technique is used: overflow/saturation characteristics may change during split summation.

6 Multi-Sample Exercise

The multi-sample exercise demonstrates the multisample technique. As the exercise in Section 5 shows, the split summation technique allows a sum of products operation to be calculated using all four ALUs by evaluating four intermediate products at a time. However, it does not guarantee bit-exact agreement with serially accumulating each intermediate product using a single ALU. To ensure bit-exactness, the order of summation must be preserved by performing each intermediate product/accumulation in turn.Therefore, the intermediate products cannot be evaluated in parallel. Furthermore, the split summation technique may not be suited for the application. Other techniques can be used where it is possible to evaluate one intermediate product from each of four output sample calculations in parallel. Consider the FIR filtering operation described by Equation 4:

N – 1

 

 

y(n) = ai x(n i) ,

for 0 n < L

(4)

i = 0

A C code implementation of this operation typically resembles the implementation of Ex6.c. To use all four ALUs, the operations can be grouped as illustrated in the following equation:

y(n) = a0x(n) y(n + 1) = a0x(n + 1) y(n + 2) = a0x(n + 2) y(n + 3) = a0x(n + 3)

+a1x(n – 1)

+a1x(n)

+a1x(n + 1)

+a1x(n + 2)

+a2x(n – 2) + a3x(n – 3) + + aN – 2x(n N + 2) + aN – 1x(n N + 1)

+ a2x(n – 1) + a3x(n – 2) + + aN – 2x(n N + 3) + aN – 1x(n N + 2)

(5)

+ a2x(n)

+ a3x(n – 1) + + aN – 2x(n N + 4) + aN – 1x(n N + 3)

 

+ a2x(n + 1) + a3x(n)

+ + aN – 2x(n N + 5) + aN – 1x(n N + 4)

 

Group 0 Group 1 Group 2 Group 3

Group N-2

Group N-1

In Equation 5, the products and accumulations within each group are calculated in parallel, but the groups themselves are evaluated in sequence, thus preserving the order of accumulation, which in turn preserves the bit-exactness of Equation 4. Therefore, parallelization is achieved by processing multiple samples in parallel rather than multiple intermediate products belonging to only one output sample. When one group (for example, Group 2) is evaluated, only two words of data need to be loaded for the next group (Group 3): a3 and x(n-3). The other values needed for the calculations in Group 3—x(n-2), x(n-1), and x(n)—should already exist in the DSP registers from the calculation of Group 2. The result is a reduction in memory bandwidth requirements that increases code efficiency.

18

Introduction to the SC140 Tools

Image 18
Contents Motorola Typical development process is represented in Figure Compiler File I/O Exercise Hands OnGood To Know Integer and Fractional Arithmetic Exercise Interpretation of 16-bit Integer and Fractional Data ValuesCompiler Support on StarCore Integer ArithmeticFractional Arithmetic Good To Know Local Versus Global Optimization Exercise Local OptimizationCompiler Support on StarCore StarCore C Compiler Local Optimization Global OptimizationMemory Alignment Exercise AA BB CC DD First Code Section Second Code Section Split Summation Exercise Code Generated Assembly Code Multi-Sample Exercise = ∑ a i x n For 0 ≤ n LIntermediate Version Compromise Between Memory and Speed Further Speed Optimization Control Code The True Bit Exercise Save Ex7.c as Ex71.c Calling an Assembly Routine From C Exercise Current Following output should be displayed Challenge Solutions to Exercises Introduction to the SC140 ToolsExercise AlignedExpected Motorola INC Intermediate version Compromise between Memory and Speed Compiler Support on StarCore Further Optimizing the Speed Compiler Support on StarCore Compiler Support on StarCore Motorola INC Zoffset equ Moffset equ Exercise Compiler Support on StarCore Compiler Support on StarCore AN2009/D