AMD Athlon™ Processor x86 Code Optimization

22007E/0 — November 1999

Complex Number Arithmetic

Complex numbers have a “real” part and an “imaginary” part. Multiplying complex numbers (ex. 3 + 4i) is an integral part of many algorithms such as Discrete Fourier Transform (DFT) and complex FIR filters. Complex number multiplication is shown below:

(src0.real + src0.imag) * (src1.real + src1.imag) = result result = (result.real + result.imag)

result.real <= src0.real*src1.real - src0.imag*src1.imag result.imag <= src0.real*src1.imag + src0.imag*src1.real

Example:

(1+2i) * (3+4i) => result.real + result.imag result.real <= 1*3 - 2*4 = -5

result.imag <= 1*4i + 2i*3 = 10i result = -5 +10i

Assuming that complex numbers are represented as two element vectors [v.real, v.imag], one can see the need for swapping the elements of src1 to perform the multiplies for result.imag, and the need for a mixed positive/negative accumulation to complete the parallel computation of result.real and result.imag.

PSWAPD performs the swapping of elements for src1 and PFPNACC performs the mixed positive/negative accumulation to complete the computation. The code example below summarizes the computation of a complex number multiply.

Example:

;MM0 = s0.imag s0.real

;reg_hi reg_lo

 

;MM1 = s1.imag s1.real

 

 

PSWAPD

MM2, MM0

;M2 =

s0.real

s0.imag

PFMUL

MM0, MM1

;M0 = s0.imag*s1.imag

s0.real*s1.real

PFMUL

MM1, MM2

;M1 = s0.real*s1.imag

s0.imag*s1.real

PFPNACC

MM0, MM1

;M0 =

res.imag

res.real

PSWAPD supports independent source and result operands and enables PSWAPD to also perform a copy function. In the above example, this eliminates the need for a separate “MOVQ MM2, MM0” instruction.

126

Complex Number Arithmetic

Page 142
Image 142
AMD x86 manual Complex Number Arithmetic