Chapter 9 Optimizing with SIMD Instructions 211
Software Optimization Guide for AMD64 Processors
25112 Rev. 3.06 September 2005
9.11 Using SIMD Instructions for Fast Square Roots and Fast Reciprocal Square Roots
Optimization
Use SIMD vectorized square root (SQRTPS) and reciprocation (RCCPS) instructions to calculate
square roots and reciprocal square roots of single-precision numbers.
Application
This optimization applies to:
32-bit software
64-bit software
Rationale
SIMD instructions exist for performing vectorized square root and reciprocation of single-precision
numbers. These operations are often used in multimedia applications and also can be utilized in
scientific arenas, such as molecular dynamics simulations.
Example
The following function highlights the use of both the vectorized reciprocal and square-root SSE
instructions:
; reciprocal_sqrt_sse(float *r, float *rcp_sqrt_r, int num_points);
;
; TO ASSEMBLE INTO *.obj DO THE FOLLOWING:
; ml.exe -coff -c reciprocal_sqrt_sse.asm
;
.586
.K3D
.XMM
_TEXT SEGMENT
PUBLIC _reciprocal_sqrt_sse
_reciprocal_sqrt_sse PROC NEAR
;==============================================================================
; INSTRUCTIONS BELOW SAVE THE REGISTER STATE WITH WHICH THIS ROUTINE WAS
; ENTERED.
; REGISTERS EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED
; WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM
push ebp
mov ebp, esp
;==============================================================================
; Parameters passed into routine: