DSP_fft16x16r
The function takes the twiddle factors and input data, and calculates the FFT producing the frequency domain data in the y[ ] array. As the FFT allows every input point to affect every output point, which causes cache thrashing in a cache based system. This is mitigated by allowing the main FFT of size N to be divided into several steps, allowing as much data reuse as possible. For example, see the following function:
DSP_fft16x16r(1024,&x[0], | &w[0], | y,brev,4, | 0,1024); |
is equivalent to: |
|
|
|
DSP_fft16x16r(1024,&x[2*0], | &w[0] | ,y,brev,256, | 0,1024); |
DSP_fft16x16r(256, &x[2*0], | &w[2*768],y,brev,4, | 0,1024); | |
DSP_fft16x16r(256, &x[2*256],&w[2*768],y,brev,4, | 256,1024); | ||
DSP_fft16x16r(256, &x[2*512],&w[2*768],y,brev,4, | 512,1024); | ||
DSP_fft16x16r(256, &x[2*768],&w[2*768],y,brev,4, | 768,1024); |
Notice how the first FFT function is called on the entire 1K data set. It covers the first pass of the FFT until the butterfly size is 256.
The following 4 FFTs do
N not equal to a power of 4 can be used; i.e. 512. In this case, the following would be needed to decompose the FFT:
DSP_fft16x16r(512, &x[0], | &w[0], | y,brev,2, | 0,512); |
is equivalent to: |
|
|
|
DSP_fft16x16r(512, &x[0], | &w[0], | y,brev,128, | 0,512); |
DSP_fft16x16r(128, &x[2*0], | &w[2*384],y,brev,2, | 0,512); | |
DSP_fft16x16r(128, &x[2*128],&w[2*384],y,brev,2, | 128,512); | ||
DSP_fft16x16r(128, &x[2*256],&w[2*384],y,brev,2, | 256,512); | ||
DSP_fft16x16r(128, &x[2*384],&w[2*384],y,brev,2, | 384,512); |
The twiddle factor array is composed of log4(N) sets of twiddle factors, (3/4)*N, (3/16)*N, (3/64)*N, etc. The index into this array for each stage of the FFT is calculated by summing these indices up appropriately. For multiple FFTs, they can share the same table by calling the small FFTs from further down in the twiddle factor array, in the same way as the decomposition works for more data reuse.
Thus, the above decomposition can be summarized for a general N, radix “rad” as follows.