Texas Instruments TMS320C64X manual DSPmatmul

Models: TMS320C64X

1 169
Download 169 pages 59.58 Kb
Page 102
Image 102

DSP_mat_mul

for (i = 0; i < r1; i++) for (j = 0; j < c2; j++)

{

sum = 0;

for (k = 0; k < c1; k++)

sum += x[k + i*c1] * y[j + k*c2];

r[j + i*c2] = sum >> qs;

}

}

Special Requirements

-The arrays x[], y[], and r[] are stored in distinct arrays. That is, in-place processing is not allowed.

-The input matrices have minimum dimensions of at least 1 row and 1 column, and maximum dimensions of 32767 rows and 32767 columns.

Implementation Notes

-Bank Conflicts: No bank conflicts occur.

-Interruptibility: This code blocks interrupts during its innermost loop. Interrupts are not blocked otherwise. As a result, interrupts can be blocked for up to 0.25*c1’ + 16 cycles at a time.

-The ‘i’ loop and ‘k’ loops are unrolled 2x. The ’j’ loop is unrolled 4x. For dimensions that are not multiples of the various loops’ unroll factors, this code calculates extra results beyond the edges of the matrix. These extra results are ultimately discarded. This allows the loops to be unrolled for efficient operation on large matrices while not losing flexibility.

Benchmarks

Cycles

0.25 * ( r1’

* c2’ * c1’ ) + 2.25 * ( r1’ * c2’ ) + 11, where:

 

r1’ = 2

* ceil(r1/2.0)

(r1 rounded up to next even)

 

c1’ = 2 * ceil(c1/2.0)

(c1 rounded up to next even)

 

c2’ = 4 * ceil(c2/4.0)

(c2 rounded up to next mult of 4)

For r1= 1, c1= 1, c2= 1: 33 cycles

For r1= 8, c1=20, c2= 8: 475 cycles

Codesize 416 bytes

4-74

Page 102
Image 102
Texas Instruments TMS320C64X manual DSPmatmul