25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

;REGISTERS EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED

;WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM pop edi

pop esi pop ebx

mov esp, ebp

pop ebp ;==============================================================================

ret _matrix_x_vector_sse ENDP _TEXT ENDS

END

To greatly enhance performance, the previous function can perform the matrix multiplication not only upon one four-column vector, but upon many. Creating a separate function to transform a single vertex and repeatedly calling the function is prohibitively expensive because of the overhead in pushing and popping registers from the stack. This applies to routines that negate a single vector, nullify a single vector, and add two vectors. Listing 28 is the 3DNow! technology counterpart to Listing 27 on page 231.

Listing 28. 4 4 Matrix Multiplication (3DNow!™ Technology)

;matrix_x_vector_3dnow(float *trR, float *v, int num_vertices_to_rotate, float *rotv);

;

;TO ASSEMBLE INTO *.obj DO THE FOLLOWING:

;ml.exe -coff -c matrix_x_vector_3dnow.asm

.586

.K3D

.XMM

_TEXT SEGMENT

PUBLIC _matrix_x_vector_3dnow

_matrix_x_vector_3dnow PROC NEAR ;==============================================================================

;INSTRUCTIONS BELOW SAVE THE REGISTER STATE WITH WHICH THIS ROUTINE WAS

;ENTERED.

;REGISTERS EAX, ECX, AND EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED,

;WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM

push ebp

mov ebp, esp ;==============================================================================

;Parameters passed into routine:

;[ebp+8] = ->trR

;[ebp+12] = ->v

;[ebp+16] = num_vertices_to_rotate

;[ebp+20] = ->rotv ;==============================================================================

push ebx push esi push edi

;===============================================================================

Chapter 9

Optimizing with SIMD Instructions

233

Page 249
Image 249
AMD 250 manual 233, Listing 28 ⋅ 4 Matrix Multiplication 3DNow! Technology