Matrix Transpose, DSPmattrans | Texas Instruments TMS320C64X

			DSP_mat_trans
	Matrix Transpose
DSP_mat_trans	Matrix Transpose
Function		void DSP_mat_trans (const short x, short rows, short columns, short r)
Arguments		x[rows*columns]	Pointer to input matrix.
		rows	Number of rows in the input matrix. Must be a multiple
			of 4.
		columns	Number of columns in the input matrix. Must be a multiple
			of 4.
		r[columns*rows]	Pointer to output data vector of size rows*columns.
Description		This function transposes the input matrix x[ ] and writes the result to matrix r[ ].
Algorithm		This is the C equivalent of the assembly code without restrictions. Note that
		the assembly code is hand optimized and restrictions may apply.

void DSP_mat_trans(short *x, short rows, short columns, short *r)

{

short i,j;

for(i=0; i<columns; i++) for(j=0; j<rows; j++)

*(r+i*rows+j)=*(x+i+columns*j);

}

-Rows and columns must be a multiple of 4.

-Matrices are assumed to have 16-bit elements.

	- Bank Conflicts: No bank conflicts occur.
	- Interruptibility: The code is interrupt-tolerant but not interruptible.
	- Data from four adjacent rows, spaced “columns” apart are read, and a
	local 4x4 transpose is performed in the register file. This leads to four
	double words, that are “rows” apart. These loads and stores can cause
	bank conflicts; hence, non-aligned loads and stores are used.
Benchmarks	Cycles	(2 * rows + 9) * columns/4 + 3
	Codesize	224 bytes

C64x+ DSPLIB Reference

4-75

Texas Instruments TMS320C64X manual Matrix Transpose, DSPmattrans