DSP_fft16x16r

DSP_fft16x16r(N, &x[0],

&w[0],

brev,y,N/4,0,

N)

DSP_fft16x16r(N/4,&x[0],

&w[2*3*N/4],brev,y,rad,0,

N)

DSP_fft16x16r(N/4,&x[2*N/4],

&w[2*3*N/4],brev,y,rad,N/4,

N)

DSP_fft16x16r(N/4,&x[2*N/2],

&w[2*3*N/4],brev,y,rad,N/2,

N)

DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4,N)

As discussed previously, N can be either a power of 4 or 2. If N is a power of 4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2. “rad” controls how many stages of decomposition are performed. It also determines whether a radix4 or DSP_radix2 decomposition should be performed at the last stage. Hence, when “rad” is set to “N/4”, the first stage of the transform alone is performed and the code exits. To complete the FFT, four other calls are required to perform N/4 size FFTs. In fact, the ordering of these 4 FFTs amongst themselves does not matter and, thus, from a cache perspective, it helps to go through the remaining 4 FFTs in exactly the opposite order to the first. This is illustrated as follows:

DSP_fft16x16r(N, &x[0],

&w[0],

brev,y,N/4,0,

N)

DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4, N)

DSP_fft16x16r(N/4,&x[2*N/2],

&w[2*3*N/4],brev,y,rad,N/2,

N)

DSP_fft16x16r(N/4,&x[2*N/4],

&w[2*3*N/4],brev,y,rad,N/4,

N)

DSP_fft16x16r(N/4,&x[0],

&w[2*3*N/4],brev,y,rad,0,

N)

 

In addition, this function can be used to minimize call overhead by completing

 

the FFT with one function call invocation as shown below:

 

DSP_fft16x16r(N, &x[0], &w[0], y, brev, rad, 0, N)

Algorithm

This is the C equivalent of the assembly code without restrictions. Note that

 

the assembly code is hand optimized and restrictions may apply.

void fft16x16r

 

(

 

int

n,

short

*ptr_x,

short

*ptr_w,

unsigned char

*brev,

short

*y,

int

radix,

int

offset,

int

nmax

)

 

C64x+ DSPLIB Reference

4-17

Page 45
Image 45
Texas Instruments TMS320C64X manual FFT with one function call invocation as shown below