DSP_fft16x16t

There is one slight break in the flow of packed processing. The real part of the complex number is in the lower half, and the imaginary part is in the upper half. The flow breaks for “xl0” and “xl1” because in this case the real part needs to be combined with the imaginary part because of the multiplication by “j”. This requires a packed quantity like “xl21xl20” to be rotated as “xl20xl21” so that it can be combined using ADD2s and SUB2s. Hence, the natural version of C code shown below is transformed using packed data processing as shown:

xl0

= x[2 * i0

 

] − x[2 * i2

];

xl1

= x[2 * i0

+

1]

− x[2 * i2

+ 1];

xl20 = x[2 *

i1

 

] − x[2 *

i3

];

xl21 = x[2 *

i1

+

1]

− x[2 *

i3

+ 1];

xt1 = xl0 + xl21; yt2 = xl1 + xl20; xt2 = xl0 − xl21; yt1 = xl1 − xl20;

xl1_xl0

= _sub2(x21_x20, x21_x20)

xl21_xl20

= _sub2(x32_x22, x23_x22)

xl20_xl21

= _rotl(xl21_xl20, 16)

yt2_xt1

= _add2(xl1_xl0, xl20_xl21)

yt1_xt2

= _sub2(xl1_xl0, xl20_xl21)

Also notice that xt1, yt1 end up on separate words, these need to be packed together to take advantage of the packed twiddle factors that have been loaded. To achiev this, they are re-aligned as follows:

 

yt1_xt1 = _packhl2(yt1_xt2, yt2_xt1)

 

yt2_xt2 = _packhl2(yt2_xt1, yt1_xt2)

 

The packed words “yt1_xt1” allow the loaded “sc” twiddle factor to be used for

 

the complex multiplies. The real part of the complex multiply is implemented

 

using DOTP2. The imaginary part of the complex multiply is implemented

 

using DOTPN2 after the twiddle factors are swizzled within the half word.

 

(X + jY) ( C + j S) = (XC + YS) + j (YC − XS).

 

The actual twiddle factors for the FFT are cosine, − sine. The twiddle factors

 

stored in the table are cosine and sine, hence the sign of the ”sine” term is

 

comprehended during multiplication as shown above.

Benchmarks

Cycles

(10 * nx/8 + 19) * ceil[log4(nx) − 1] + (nx/8 + 2) * 7 + 28 + BC

 

 

where BC = N/8, the number of bank conflicts.

 

Codesize

1004 bytes

4-120

Page 148
Image 148
Texas Instruments TMS320C64X manual + jY C + j S = XC + YS + j YC − XS