Software Optimization Guide for AMD64 Processors | 25112 Rev. 3.06 September 2005 |
9.15Accumulating
Optimization
In
Application
This optimization applies to:
•
•
Rationale
Though SSE, SSE2, and 3DNow! instrucitons are similar in the sense that they all have vectorized multiplication and addition, 3DNow! technology supports certain special instructions. One of these is the PFACC instruction. There are many instances where PFACC is useful, such as
Examples
The following example accumulates two floats in two MMX registers:
;accumulate_3dnow(float *a_and_b, float *c_and_d, float *aplusb_cplusd);
;
;TO ASSEMBLE INTO *.obj DO THE FOLLOWING:
;ml.exe
.586
.K3D
.XMM
_TEXT SEGMENT
PUBLIC _accumulate_3dnow
_accumulate_3dnow PROC NEAR
;==============================================================================
;INSTRUCTIONS BELOW SAVE THE REGISTER STATE WITH WHICH THIS ROUTINE WAS ENTERED
;REGISTERS (EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED)
;WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM
218 | Optimizing with SIMD Instructions | Chapter 9 |