|
|
| DSP_mat_trans | |
| Matrix Transpose |
|
| |
DSP_mat_trans |
|
| ||
Function |
| void DSP_mat_trans (const short *x, short rows, short columns, short *r) |
| |
Arguments |
| x[rows*columns] | Pointer to input matrix. | |
|
| rows | Number of rows in the input matrix. Must be a multiple | |
|
|
| of 4. | |
|
| columns | Number of columns in the input matrix. Must be a multiple | |
|
|
| of 4. | |
|
| r[columns*rows] | Pointer to output data vector of size rows*columns. | |
Description |
| This function transposes the input matrix x[ ] and writes the result to matrix r[ ]. | ||
Algorithm |
| This is the C equivalent of the assembly code without restrictions. Note that | ||
|
| the assembly code is hand optimized and restrictions may apply. |
void DSP_mat_trans(short *x, short rows, short columns, short *r)
{
short i,j;
for(i=0; i<columns; i++) for(j=0; j<rows; j++)
*(r+i*rows+j)=*(x+i+columns*j);
}
Special Requirements
-Rows and columns must be a multiple of 4.
-Matrices are assumed to have
Implementation Notes
| - Bank Conflicts: No bank conflicts occur. | |
| - Interruptibility: The code is | |
| - Data from four adjacent rows, spaced “columns” apart are read, and a | |
| local 4x4 transpose is performed in the register file. This leads to four | |
| double words, that are “rows” apart. These loads and stores can cause | |
| bank conflicts; hence, | |
Benchmarks | Cycles | (2 * rows + 9) * columns/4 + 3 |
| Codesize | 224 bytes |
C64x+ DSPLIB Reference |