DSP_mat_mul
for (i = 0; i < r1; i++) for (j = 0; j < c2; j++)
{
sum = 0;
for (k = 0; k < c1; k++)
sum += x[k + i*c1] * y[j + k*c2];
r[j + i*c2] = sum >> qs;
}
}
Special Requirements
-The arrays x[], y[], and r[] are stored in distinct arrays. That is,
-The input matrices have minimum dimensions of at least 1 row and 1 column, and maximum dimensions of 32767 rows and 32767 columns.
Implementation Notes
-Bank Conflicts: No bank conflicts occur.
-Interruptibility: This code blocks interrupts during its innermost loop. Interrupts are not blocked otherwise. As a result, interrupts can be blocked for up to 0.25*c1’ + 16 cycles at a time.
-The ‘i’ loop and ‘k’ loops are unrolled 2x. The ’j’ loop is unrolled 4x. For dimensions that are not multiples of the various loops’ unroll factors, this code calculates extra results beyond the edges of the matrix. These extra results are ultimately discarded. This allows the loops to be unrolled for efficient operation on large matrices while not losing flexibility.
Benchmarks
Cycles | 0.25 * ( r1’ | * c2’ * c1’ ) + 2.25 * ( r1’ * c2’ ) + 11, where: | |
| r1’ = 2 | * ceil(r1/2.0) | (r1 rounded up to next even) |
| c1’ = 2 * ceil(c1/2.0) | (c1 rounded up to next even) | |
| c2’ = 4 * ceil(c2/4.0) | (c2 rounded up to next mult of 4) |
For r1= 1, c1= 1, c2= 1: 33 cycles
For r1= 8, c1=20, c2= 8: 475 cycles
Codesize 416 bytes