DSP_fft16x16r

The function takes the twiddle factors and input data, and calculates the FFT producing the frequency domain data in the y[ ] array. As the FFT allows every input point to affect every output point, which causes cache thrashing in a cache based system. This is mitigated by allowing the main FFT of size N to be divided into several steps, allowing as much data reuse as possible. For example, see the following function:

DSP_fft16x16r(1024,&x[0],

&w[0],

y,brev,4,

0,1024);

is equivalent to:

 

 

 

DSP_fft16x16r(1024,&x[2*0],

&w[0]

,y,brev,256,

0,1024);

DSP_fft16x16r(256, &x[2*0],

&w[2*768],y,brev,4,

0,1024);

DSP_fft16x16r(256, &x[2*256],&w[2*768],y,brev,4,

256,1024);

DSP_fft16x16r(256, &x[2*512],&w[2*768],y,brev,4,

512,1024);

DSP_fft16x16r(256, &x[2*768],&w[2*768],y,brev,4,

768,1024);

Notice how the first FFT function is called on the entire 1K data set. It covers the first pass of the FFT until the butterfly size is 256.

The following 4 FFTs do 256-point FFTs 25% of the size. These continue down to the end when the butterfly is of size 4. They use an index to the main twiddle factor array of 0.75*2*N. This is because the twiddle factor array is composed of successively decimated versions of the main array.

N not equal to a power of 4 can be used; i.e. 512. In this case, the following would be needed to decompose the FFT:

DSP_fft16x16r(512, &x[0],

&w[0],

y,brev,2,

0,512);

is equivalent to:

 

 

 

DSP_fft16x16r(512, &x[0],

&w[0],

y,brev,128,

0,512);

DSP_fft16x16r(128, &x[2*0],

&w[2*384],y,brev,2,

0,512);

DSP_fft16x16r(128, &x[2*128],&w[2*384],y,brev,2,

128,512);

DSP_fft16x16r(128, &x[2*256],&w[2*384],y,brev,2,

256,512);

DSP_fft16x16r(128, &x[2*384],&w[2*384],y,brev,2,

384,512);

The twiddle factor array is composed of log4(N) sets of twiddle factors, (3/4)*N, (3/16)*N, (3/64)*N, etc. The index into this array for each stage of the FFT is calculated by summing these indices up appropriately. For multiple FFTs, they can share the same table by calling the small FFTs from further down in the twiddle factor array, in the same way as the decomposition works for more data reuse.

Thus, the above decomposition can be summarized for a general N, radix “rad” as follows.

4-16

Page 44
Image 44
Texas Instruments TMS320C64X manual Is equivalent to