25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

pswapd mm1, mm0

; MM1 = [r,i]

;Additionally, PSWAPD can be used with a 64-bit memory location. Suppose

;that EDI contains the address of two floats: r and i.

;INPUT:

;[EDI:EDI+8] = [b,a]

;OUTPUT:

;MM1 = [r,i]

pswapd mm1, [edi]

; MM1 = [r,i]

;PFPNACC

;Suppose that MM0 contains two floats: r1 * r2 (the product of the real parts

;of 2 complex numbers) and i1 * i2 (the product of the imaginary parts

;of 2 complex numbers).

;Also suppose that MM1 contains two floats: r1 * i2 (the product of the real

;part of the first complex number and the imaginary part of the second

;complex number) and i1 * r2 (the product of the imaginary part of the

;first complex number and the real part of the second complex number).

;INPUTS:

;MM0 = [i1*i2,r1*r2]

;MM1 = [i1*r2,r1*i2]

;OUTPUT:

;MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]

pfpnacc mm0, mm1

; MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]

;Additionally, PSWAPD can be used with a 64-bit memory location. Suppose

;that EDI contains the address of two floats: r1 * i2 (the product of the

;real part of the first complex number and the imaginary part of the

;second complex number) and i1 * r2 (the product of the imaginary part of

;the first complex and the real part of the second complex number).

;INPUTS:

;MM0 = [i1*i2,r1*r2]

;[EDI:EDI+8] = [i1*r2,r1*i2]

;OUTPUT:

;MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]

pfpnacc mm0, [edi]

; MM0 = [r1*i2+i1*r2,r1*r2-i1*i2]

The PFPNACC instruction is specifically designed for use in complex arithmetic operations.

Chapter 9

Optimizing with SIMD Instructions

227

Page 243
Image 243
AMD 250 manual 227, Pfpnacc