25112 Rev. 3.06 September 2005

Software Optimization Guide for AMD64 Processors

9.11Using SIMD Instructions for Fast Square Roots and Fast Reciprocal Square Roots

Optimization

Use SIMD vectorized square root (SQRTPS) and reciprocation (RCCPS) instructions to calculate square roots and reciprocal square roots of single-precision numbers.

Application

This optimization applies to:

32-bit software

64-bit software

Rationale

SIMD instructions exist for performing vectorized square root and reciprocation of single-precision numbers. These operations are often used in multimedia applications and also can be utilized in scientific arenas, such as molecular dynamics simulations.

Example

The following function highlights the use of both the vectorized reciprocal and square-root SSE instructions:

;reciprocal_sqrt_sse(float *r, float *rcp_sqrt_r, int num_points);

;TO ASSEMBLE INTO *.obj DO THE FOLLOWING:

;ml.exe -coff -c reciprocal_sqrt_sse.asm

;

.586

.K3D

.XMM

_TEXT SEGMENT

PUBLIC _reciprocal_sqrt_sse

_reciprocal_sqrt_sse PROC NEAR ;==============================================================================

;INSTRUCTIONS BELOW SAVE THE REGISTER STATE WITH WHICH THIS ROUTINE WAS

;ENTERED.

;REGISTERS EAX, ECX, EDX ARE CONSIDERED VOLATILE AND ASSUMED TO BE CHANGED

;WHILE THE REGISTERS BELOW MUST BE PRESERVED IF THE USER IS CHANGING THEM push ebp

mov ebp, esp

;============================================================================== ; Parameters passed into routine:

Chapter 9

Optimizing with SIMD Instructions

211

Page 227
Image 227
AMD 250 manual 211, XMM Text Segment