Compiler Support on StarCore
C Code
Generated Assembly Code
Further Speed Optimization
The
9.Save Ex6_1.c as Ex6_2.c.
10.In Ex6_2.c, “unroll” the inner loop instructions four times so that the first four groups (Group 0, Group 1, Group 2, and Group 3) are all processed in the loop. This loop expansion avoids transferring data. You must reduce the number of loop iterations by a factor of four to compensate for the fact that the loop is unrolled by a factor of 4.
If your inner loop consumes just four cycles, and your code still produces the correct output, congratulations. You have completed Exercise 6.
Notice that each group of four MAC operations and two data load operations now requires just one processor cycle, which is half the time required by the filtering operation and a quarter of the time required by a
Table 3. Inner Loop Characteristics of
Characteristic | ||
|
|
|
|
|
|
Cycle count | N | N/4 |
|
|
|
Registers used | Fewer | More |
|
|
|
Sample delay | 1 | 4 |
|
|
|
20 | Introduction to the SC140 Tools |