Motorola SC140 user manual Further Speed Optimization

Page 20

Compiler Support on StarCore

C Code

Generated Assembly Code

Further Speed Optimization

The register-to-register transfers can be eliminated by expanding the inner loop so that each group of four MAC instructions uses the data registers already containing the required data values. This yields faster code, but code size is greater.

9.Save Ex6_1.c as Ex6_2.c.

10.In Ex6_2.c, “unroll” the inner loop instructions four times so that the first four groups (Group 0, Group 1, Group 2, and Group 3) are all processed in the loop. This loop expansion avoids transferring data. You must reduce the number of loop iterations by a factor of four to compensate for the fact that the loop is unrolled by a factor of 4.

If your inner loop consumes just four cycles, and your code still produces the correct output, congratulations. You have completed Exercise 6.

Notice that each group of four MAC operations and two data load operations now requires just one processor cycle, which is half the time required by the filtering operation and a quarter of the time required by a single-ALU DSP device. However, the code size for the inner loop has increased by a significant amount (approximately four times that of the second implementation), and this must be weighed up against the cycle-count performance improvements obtained. Table 3 summarizes the main characteristics of the multi-sample technique.

Table 3. Inner Loop Characteristics of Multi-sample and Single-sample Techniques.

Characteristic

Single-sample Algorithm

Multi-sample Algorithm

 

 

 

 

 

 

Cycle count

N

N/4

 

 

 

Registers used

Fewer

More

 

 

 

Sample delay

1

4

 

 

 

20

Introduction to the SC140 Tools

Image 20
Contents Motorola Typical development process is represented in Figure Compiler Good To Know File I/O ExerciseHands On Integer and Fractional Arithmetic Exercise Interpretation of 16-bit Integer and Fractional Data ValuesCompiler Support on StarCore Integer ArithmeticFractional Arithmetic Good To Know Local Versus Global Optimization Exercise Local OptimizationCompiler Support on StarCore StarCore C Compiler Local Optimization Global OptimizationMemory Alignment Exercise AA BB CC DD First Code Section Second Code Section Split Summation Exercise Code Generated Assembly Code Multi-Sample Exercise = ∑ a i x n For 0 ≤ n LIntermediate Version Compromise Between Memory and Speed Further Speed Optimization Control Code The True Bit Exercise Save Ex7.c as Ex71.c Calling an Assembly Routine From C Exercise Current Following output should be displayed Challenge Solutions to Exercises Introduction to the SC140 ToolsExercise AlignedExpected Motorola INC Intermediate version Compromise between Memory and Speed Compiler Support on StarCore Further Optimizing the Speed Compiler Support on StarCore Compiler Support on StarCore Motorola INC Zoffset equ Moffset equ Exercise Compiler Support on StarCore Compiler Support on StarCore AN2009/D