Compiler Support on StarCore

C Code

Generated Assembly Code

Further Speed Optimization

The register-to-register transfers can be eliminated by expanding the inner loop so that each group of four MAC instructions uses the data registers already containing the required data values. This yields faster code, but code size is greater.

9.Save Ex6_1.c as Ex6_2.c.

10.In Ex6_2.c, “unroll” the inner loop instructions four times so that the first four groups (Group 0, Group 1, Group 2, and Group 3) are all processed in the loop. This loop expansion avoids transferring data. You must reduce the number of loop iterations by a factor of four to compensate for the fact that the loop is unrolled by a factor of 4.

If your inner loop consumes just four cycles, and your code still produces the correct output, congratulations. You have completed Exercise 6.

Notice that each group of four MAC operations and two data load operations now requires just one processor cycle, which is half the time required by the filtering operation and a quarter of the time required by a single-ALU DSP device. However, the code size for the inner loop has increased by a significant amount (approximately four times that of the second implementation), and this must be weighed up against the cycle-count performance improvements obtained. Table 3 summarizes the main characteristics of the multi-sample technique.

Table 3. Inner Loop Characteristics of Multi-sample and Single-sample Techniques.

Characteristic

Single-sample Algorithm

Multi-sample Algorithm

 

 

 

 

 

 

Cycle count

N

N/4

 

 

 

Registers used

Fewer

More

 

 

 

Sample delay

1

4

 

 

 

20

Introduction to the SC140 Tools

Page 20
Image 20
Motorola SC140 user manual Further Speed Optimization