DSP_fft16x16r
DSP_fft16x16r(N, &x[0], | &w[0], | brev,y,N/4,0, | N) |
DSP_fft16x16r(N/4,&x[0], | &w[2*3*N/4],brev,y,rad,0, | N) | |
DSP_fft16x16r(N/4,&x[2*N/4], | &w[2*3*N/4],brev,y,rad,N/4, | N) | |
DSP_fft16x16r(N/4,&x[2*N/2], | &w[2*3*N/4],brev,y,rad,N/2, | N) |
DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4,N)
As discussed previously, N can be either a power of 4 or 2. If N is a power of 4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2. “rad” controls how many stages of decomposition are performed. It also determines whether a radix4 or DSP_radix2 decomposition should be performed at the last stage. Hence, when “rad” is set to “N/4”, the first stage of the transform alone is performed and the code exits. To complete the FFT, four other calls are required to perform N/4 size FFTs. In fact, the ordering of these 4 FFTs amongst themselves does not matter and, thus, from a cache perspective, it helps to go through the remaining 4 FFTs in exactly the opposite order to the first. This is illustrated as follows:
DSP_fft16x16r(N, &x[0], | &w[0], | brev,y,N/4,0, | N) |
DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4, N) | |||
DSP_fft16x16r(N/4,&x[2*N/2], | &w[2*3*N/4],brev,y,rad,N/2, | N) | |
DSP_fft16x16r(N/4,&x[2*N/4], | &w[2*3*N/4],brev,y,rad,N/4, | N) | |
DSP_fft16x16r(N/4,&x[0], | &w[2*3*N/4],brev,y,rad,0, | N) |
| In addition, this function can be used to minimize call overhead by completing |
| the FFT with one function call invocation as shown below: |
| DSP_fft16x16r(N, &x[0], &w[0], y, brev, rad, 0, N) |
Algorithm | This is the C equivalent of the assembly code without restrictions. Note that |
| the assembly code is hand optimized and restrictions may apply. |
void fft16x16r |
|
( |
|
int | n, |
short | *ptr_x, |
short | *ptr_w, |
unsigned char | *brev, |
short | *y, |
int | radix, |
int | offset, |
int | nmax |
) |
|
C64x+ DSPLIB Reference |